1,721,219 research outputs found
A COP Model For Graph-Constrained Coalition Formation
We consider Graph-Constrained Coalition Formation (GCCF), a widely studied subproblem of coalition formation in which the set of valid coalitions is restricted by a graph. We propose COP-GCCF, a novel approach that models GCCF as a COP, and we solve such COP with a highly-parallel approach based on Bucket Elimination executed on the GPU, which is able to exploit the high constraint tightness of COP-GCCF. Results show that our approach outperforms state of the art algorithms (i.e., DyCE and IDPG) by at least one order of magnitude on realistic graphs, i.e., a crawl of the Twitter social graph, both in terms of runtime and memory
Enhancing Deep Reinforcement Learning Approaches for Multi-Robot Navigation via Single-Robot Evolutionary Policy Search
Recent Multi-Agent Deep Reinforcement Learning approaches factorize a global action-value to address non- stationarity and favor cooperation. These methods, however, hinder exploration by introducing constraints (e.g., additive value-decomposition) to guarantee the factorization. Our goal is to enhance exploration and improve sample efficiency of multi-robot mapless navigation by incorporating a periodical Evolutionary Policy Search (EPS). In detail, the multi-agent training ”specializes” the robots’ policies to learn the collision avoidance skills that are mandatory for the task. Concurrently, in this work we propose the use of Evolutionary Algorithms to explore different regions of the policy space in an environment with only a single robot. The idea is that core navigation skills, originated by the multi-robot policies using mutation operators, improve faster in the single-robot EPS. Hence, policy parameters can be injected into the multi-robot setting using crossovers, leading to improved performance and sample efficiency. Experiments in tasks with up to 12 robots confirm the beneficial transfer of navigation skills from the EPS to the multi-robot setting, improving the performance of prior methods
Learning queuing strategies in human-multi-robot interaction
We consider multi-robot applications, where a team of robots can ask for the intervention of a human operator to handle difficult situations. As the number of requests grows, team members will have to wait for the operator attention, hence the operator becomes a bottleneck for the system. In contrast to previous work we consider a balking queue model where robots can decide either to join the queue or balk (leave the queue). Our aim is to devise an approach that allows the robots to learn cooperative balking strategies to decrease the time spent waiting for the operator. In more detail, we formalize the problem as Decentralized Markov Decision Process (Dec-MDP) and provide a scalable state representation by adding the state of the queue as an extra feature to each robot’s local observation. We then apply multi-agent reinforcement learning to solve the model and evaluate aour approach on a simulated scenario
Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach
Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain. We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver. We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman)
Online Inductive Learning from Answer Sets for Efficient Reinforcement Learning Exploration
This paper presents a novel approach combining inductive logic programming with reinforcement learning to improve training performance and explainability. We exploit inductive learning of answer set programs from noisy examples to learn a set of logical rules representing an explainable approximation of the agent’s policy at each batch of experience. We then perform answer set reasoning on the learned rules to guide the exploration of the learning agent at the next batch, without requiring inefficient reward shaping and preserving optimality with soft bias. The entire procedure is conducted during the online execution of the reinforcement learning algorithm. We preliminarily validate the efficacy of our approach by integrating it into the Q-learning algorithm for the Pac-Man scenario in two maps of increasing complexity. Our methodology produces a significant boost in the discounted return achieved by the agent, even in the first batches of training. Moreover, inductive learning does not compromise the computational time required by Q-learning and learned rules quickly converge to an explanation of the agent’s policy
Lazy max-sum for allocation of tasks with growing costs
We propose a model for the allocation of agents to tasks when the tasks have a cost which grows over time. Our model accounts for both the natural growth of tasks and the effort of the agents at containing such growth. The objective is to produce solutions that minimize the growth of tasks (potentially stopping such growth) by efficiently coordinating the operations of the agents. This problem has strong spatial and temporal components, as the agents require time not only to work on the tasks but also to move between tasks and during that time the costs of completing the tasks continue to grow. We propose a novel distributed coordination algorithm, called Lazy max-sum, which works well even when the model of the environment has errors. The algorithm handles homogeneous as well as heterogeneous agents, which can do different amounts of work per time unit and have different travel speeds. We show experimentally that the algorithm outperforms other methods in both a simple simulation and the RoboCup Rescue agent simulation. (C) 2018 Elsevier B.V. All rights reserved
Maximising Sensor Network Efficiency Through Agent-Based Coordination of Sense/Sleep Schedules
In this paper we consider the problem of maximising the efficiency of a sensor network deployed for wide-area surveillance, by coordinating of the sense/sleep schedules of power constrained energy-harvesting sensor nodes. We propose a formal model of the wide-area surveillance problem that we face, and theoretically analyse the performance of a sensor network (i.e. the probability that an event within the environment is detected) in the case of (i) continuously powered, (ii) randomly coordinated, and (iii) optimal coordinated sensors. We show that coordinating the sense/sleep schedules of the sensors can yield a significant increase in the performance of the network. Hence, we demonstrate that we can appropriately decompose the system wide goal of maximising the probability that events are detected, in order that we can optimise it using generic decentralised agent-based coordination algorithms (specifically, one based on the max-sum algorithm) that use only local communication and computation. We empirically evaluate our approach in a simulated environment and show that this decentralised algorithm is able to successfully coordinate the sense/sleep schedule of sensors, while attaining results close to the theoretically indicated optimum
Situation Assessment and Information Fusion: an experimental framework for performance evaluation
- …
