1,721,049 research outputs found

    Control of large distributed systems using games with pure strategy nash equilibria

    No full text
    Control mechanisms for optimisation in large distributed systems cannot be constructed based on traditional methods of control because they are typically characterised by distributed information and costly and/or noisy communication. Furthermore, noisy observations and dynamism are also inherent to these systems, so their control mechanisms need to be flexible, agile and robust in the face of these characteristics. In such settings, a good control mechanism should satisfy the following four design requirements: (i) it should produce high quality solutions, (ii) it should be robustness and flexibility in the face of additions, removals and failures of components, (iii) it should operate by making limited use of communication, and (iv) its operation should be computational feasible. Against this background, in order to satisfy these requirements, in this thesis we adopt a design approach based on dividing control over the system across a team of self–interested agents. Such multi–agent systems (MAS) are naturally distributed (matching the application domains in question), and by pursing their own private goals, the agents can collectively implement robust, flexible and scalable control mechanisms. In more detail, the design approach we adopt is (i) to use games with pure strategy Nash equilibria as a framework or template for constructing the agents’ utility functions, such that good solutions to the optimisation problem arise at the pure strategy Nash equilibria of the game, and (ii) to derive distributed techniques for solving the games for their Nash equilibria. The specific problems we tackle can be grouped into four main topics. First, we investigate a class of local algorithms for distributed constraint optimisation problems (DCOPs). We introduce a unifying analytical framework for studying such algorithms, and develop a parameterisation of the algorithm design space, which represents a mapping from the algorithms’ components to their performance according to each of our design requirements. Second, we develop a game–theoretic control mechanism for distributed dynamic task allocation and scheduling problems. The model in question is an expansion of DCOPs to encompass dynamic problems, and the control mechanism we derive builds on the insights from our first topic to address our four design requirements. Third, we elaborate a general class of problems including DCOPs with noisy rewards and state observations, which are realistic traits of great concern in real–world problems, and derive control mechanisms for these environments. These control mechanism allow the agents to either learn their reward functions or decide when to make observations of the world’s state and/or communicate their beliefs over the state of the world, in such a manner that they perform well according to our design requirements. Fourth, we derive an optimal algorithm for computing and optimising over pure strategy Nash equilibria in games with sparse interaction structure. By exploiting the structure present in many multi-agent interactions, this distributed algorithm can efficiently compute equilibria that optimise various criteria, thus reducing the computational burden on any one agent and operating using less communication than an equivalent centralised algorithms.For each of these topics, the control mechanisms that we derive are developed such that they perform well according to all four f our design requirements. In sum, by making the above contributions to these specific topics, we demonstrate that the general approach of using games with pure strategy Nash equilibria as a template for designing MAS produces good control mechanisms for large distributed systems

    Filtered Fictitious Play for Perturbed Observation Potential Games and Decentralised POMDPs

    No full text
    Potential games and decentralised partially observable MDPs (Dec–POMDPs) are two commonly used models of multi–agent interaction, for static optimisation and sequential decision-making settings, respectively. In this paper we introduce filtered fictitious play for solving repeated potential games in which each player’s observations of others’ actions are perturbed by random noise, and use this algorithm to construct an online learning method for solving Dec–POMDPs. Specifically, we prove that noise in observations prevents standard fictitious play from converging to Nash equilibrium in potential games, which also makes fictitious play impractical for solving Dec–POMDPs. To combat this, we derive filtered fictitious play, and provide conditions under which it converges to a Nash equilibrium in potential games with noisy observations. We then use filtered fictitious play to construct a solver for Dec–POMDPs, and demonstrate our new algorithm’s performance in a box pushing problem. Our results show that we consistently outperform the state-of-the-art Dec-POMDP solver by an average of 100% across the range of noise in the observation function

    Benchmarking hybrid algorithms for distributed constraint optimisation games

    No full text
    In this paper, we consider algorithms for distributed constraint optimisation problems (DCOPs). Using a potential game characterisation of DCOPs, we decompose eight DCOP algorithms, taken from the game theory and computer science literatures, into their salient components. We then use these components to construct three novel hybrid algorithms. Finally, we empirical evaluate all eleven algorithms, in terms of solution quality, timeliness and communication resources used, in a series of graph colouring experiments. Our experimental results show the existence of several performance trade-offs (such as quick convergence to a solution, but with a cost of high communication needs), which may be exploited by a system designer to tailor a DCOP algorithm to suit their mix of requirements

    A Parameterisation of Algorithms for Distributed Constraint Optimisation via Potential Games

    No full text
    This paper introduces a parameterisation of learning algorithms for distributed constraint optimisation problems (DCOPs). This parameterisation encompasses many algorithms developed in both the computer science and game theory literatures. It is built on our insight that when formulated as noncooperative games, DCOPs form a subset of the class of potential games. This result allows us to prove convergence properties of algorithms developed in the computer science literature using game theoretic methods. Furthermore, our parameterisation can assist system designers by making the pros and cons of, and the synergies between, the various DCOP algorithm components clear

    A Unifying Framework for Iterative Approximate Best–Response Algorithms for Distributed Constraint Optimisation Problems

    No full text
    Distributed constraint optimisation problems (DCOPs) are important in many areas of computer science and optimisation. In a DCOP, each variable is controlled by one of many autonomous agents, who together have the joint goal of maximising a global objective function. A wide variety of techniques have been explored to solve such problems, and here we focus on one of the main families, namely iterative approximate best–response algorithms used as local search algorithms for DCOPs. We define these algorithms as those in which, at each iteration, agents communicate only the states of the variables under their control to their neighbours on the constraint graph, and that reason about their next state based on the messages received from their neighbours. These algorithms include the distributed stochastic algorithm and stochastic coordination algorithms, the maximum–gain messaging algorithms, the families of fictitious play and adaptive play algorithms, and algorithms that use regret–based heuristics. This family of algorithms is commonly employed in real world systems, as they can be used in domains where communication is difficult or costly, where it is appropriate to trade timeliness off against optimality, or where hardware limitations render complete or more computationally intensive algorithms unusable. However, until now, no overarching framework has existed for analysing this broad family of algorithms, resulting in similar and overlapping work being published independently in several different literatures. The main contribution of this paper, then, is the development of a unified analytical framework for studying such algorithms. This framework is built on our insight that when formulated as noncooperative games, DCOPs form a subset of the class of potential games. This result allows us to prove convergence properties of iterative approximate best–response algorithms developed in the computer science literature using game theoretic methods (which also shows that such algorithms can also be applied to the more general problem of finding Nash equilibria in potential games), and, conversely, also allows us to show that many game–theoretic algorithms can be used to solve DCOPs. By so doing, our framework can assist system designers by making the pros and cons of, and the synergies between, the various iterative approximate best–response DCOP algorithm components clear

    Learning in unknown reward games: application to sensor networks

    No full text
    This paper demonstrates a decentralised method for optimisation using game-theoretic multi-agent techniques, applied to a sensor network management problem. Our first major contribution is to show how the marginal contribution utility design is used to construct a unknown-reward potential game formulation of the problem. This formulation exploits the sparse structure of sensor networkproblems, and allows us to apply a bound to the price of anarchy of the Nash equilibria of theinduced game. Furthermore, since the game is a potential game, solutions can be found using multiagentlearning techniques. The techniques we derive use Q-learning to estimate an agent’s rewards,while an action adaptation process responds to an agent’s opponents’ behaviour. However, there aremany different algorithmic configurations that could be used to solve these games. Thus, our secondmajor contribution is an extensive evaluation of several action adaptation processes. Specifically,we compare six algorithms across a variety of parameter settings to ascertain the quality of thesolutions they produce, their speed of convergence, and their robustness to pre-specified parameterchoices. Our results show that they each perform similarly across a wide range of parameters.There is, however, a significant effect from moving to a learning policy with sampling probabilitiesthat go to zero too quickly for rewards to be accurately estimated

    Decentralised Dynamic Task Allocation: A Practical Game–Theoretic Approach

    No full text
    This paper reports on a novel decentralised technique for planning agent schedules in dynamic task allocation problems. Specifically, we use a Markov game formulation of these problems for tasks with varying hard deadlines and processing requirements. We then introduce a new technique for approximating this game using a series of static potential games, before detailing a decentralised solution method for the approximating games that uses the Distributed Stochastic Algorithm. Finally, we discuss an implementation of our approach to a task allocation problem in the RoboCup Rescue disaster management simulator. Our results show that our technique performs comparably to a centralised task scheduler (within 6% on average), and also, unlike its centralised counterpart, it is robust to restrictions on the agents' communication and observation range

    Knapsack based optimal policies for budget-limited multi-armed bandits

    No full text
    In budget-limited multi-armed bandit (MAB) problems, the learner’s actions are costly and constrained by a fixed budget. Consequently, an optimal exploitation policy may not be to pull the optimal arm repeatedly, as is the case in other variants of MAB, but rather to pull the sequence of different arms that maximises the agent’s total reward within the budget. This difference from existing MABs means that new approaches to maximising the total reward are required. Given this, we develop two pulling policies, namely: (i) KUBE; and (ii) fractional KUBE. Whereas the former provides better performance up to 40% in our experimental settings, the latter is computationally less expensive. We also prove logarithmic upper bounds for the regret of both policies, and show that these bounds are asymptotically optimal (i.e. they only differ from the best possible regret by a constant factor)
    corecore