1,720,986 research outputs found

    Client-master multiagent deep reinforcement learning for task offloading in mobile edge computing

    No full text
    As mobile applications grow in complexity, there is an increasing need to perform computationally intensive tasks. However, user devices (UDs), such as tablets and smartphones, have limited capacity to carry out the required computations. Task offloading in mobile edge computing (MEC) is a strategy that meets this demand by distributing tasks between UDs and servers. Deep reinforcement learning (DRL) is a promising solution for this strategy because it can adapt to dynamic changes and minimize online computational complexity. However, various types of continuous and discrete resource constraints on UDs and MEC servers pose challenges to the design of an efficient DRL algorithm. Existing DRL-based task-offloading algorithms focus on the constraints of the UDs, assuming the availability of enough resources on the server. Moreover, existing Multiagent DRL (MADRL)-based task-offloading algorithms are homogeneous agents and consider homogeneous constraints as a penalty in their reward function. We propose a novel Client-Master MADRL (CMMADRL) algorithm for task offloading in MEC that uses client agents at the UDs to decide on their resource requirements and a master agent at the server to make a combinatorial action selection based on the decision of the UDs. CMMADRL is shown to achieve up to 59% improvement in performance over existing benchmark and heuristic algorithms

    Combinatorial client-master multiagent deep reinforcement learning for task offloading in mobile edge computing: extended abstract

    No full text
    Deep reinforcement learning (DRL) is gaining attention in task-offloading problems because it can adapt to dynamic changes and minimize online computational complexity. However, the various types of continuous and discrete resource constraints on the user devices (UDs) and mobile edge computing (MEC) servers pose challenges to the design of an efficient DRL-based task-offloading strategy. Existing DRL-based task-offloading algorithms focus on the constraints of the UDs, assuming the availability of enough storage resources on the server. Moreover, existing multiagent DRL (MADRL)-based task-offloading algorithms are homogeneous agents and consider homogeneous constraints as a penalty in their reward function. We proposed a novel combinatorial client-master MADRL (CCM_MADRL) algorithm for task offloading in mobile edge computing (CCM_MADRL_MEC) that enables: UDs to decide their resource requirements, and the server to make a combinatorial decision based on the requirements of the UDs. CCM_MADRL_MEC is the first MADRL in task offloading to consider server storage capacity in addition to the constraints in the UDs. By taking advantage of the combinatorial action selection, CCM_MADRL_MEC has shown superior convergence over existing benchmark and heuristic algorithms

    Dataset in support of the thesis 'Temporal dynamics in emergent communication'

    No full text
    Olaf Lipinski PhD Thesis Dataset/Code This dataset contains the code bases for every chapter of the thesis, including instructions on how to run them. The code bases present are: Code for the Temporal Progression Games - accompanying our paper &quot;Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication&quot;. This code allows for analysis of spatial relationships in emergent communication. Code for the Temporal Referential Games - accompanying our paper &quot;It&rsquo;s About Time: Temporal References in Emergent Communication&quot;. This code provides a new architecture and dataset for Emergent Communication research. We introduce a variant of the well-known referential games, where we include a temporal aspect to the communication. This is done through skewing the target distribution to include target repetitions at random intervals. Through this, we aim to study how and when temporal references can emerge between agents. Code for the Emergent Communication in Werewolf - accompanying our paper &quot;Emergent Password Signalling in the Game of Werewolf&quot;. This code analyses the impact of communication time and voting plurality in Emergent Communication in the game of Werewolf. Code for emlangkit - A toolkit that aims to collect all metrics currently used in emergent communication research into one place. The usage should be convenient and the inputs should be standardised, to ease adoption and spread of these metrics. Related publications: 1. Olaf Lipinski, Adam J. Sobey, Federico Cerutti, and Timothy J. Norman. Speaking Your Language: Spatial Relationships in Interpretable Emergent Communication. In Neural Information Processing Systems (NeurIPS), December 2024 2. Olaf Lipinski, Adam J. Sobey, Federico Cerutti, and Timothy J. Norman. It&rsquo;s About Time: Temporal References in Emergent Communication. In arXiv:2310.06555 (Under review), October 2023 3. Olaf Lipinski, Adam J. Sobey, Federico Cerutti, and Timothy J. Norman. Emergent Password Signalling in the Game of Werewolf. In Emergent Communication Workshop at ICLR 2022, April 2022 </span

    HAVA: hybrid approach to value-alignment through reward weighing for reinforcement learning

    No full text
    Our society is governed by a set of norms which together bring about the values we cherish such as safety, fairness or trustworthiness. The goal of value alignment is to create agents that not only do their tasks but through their behaviours also promote these values. Many of the norms are written as laws or rules (legal / safety norms) but even more remain unwritten (social norms). Furthermore, the techniques used to represent these norms also differ. Safety / legal norms are often represented explicitly, for example, in some logical language while social norms are typically learned and remain hidden in the parameter space of a neural network. There is a lack of approaches in the literature that could combine these various norm representations into a single algorithm. We propose a novel method that integrates these norms into the reinforcement learning process. Our method monitors the agent's compliance with the given norms and summarizes it in a quantity we call the agent's reputation. This quantity is used to weigh the received rewards to motivate the agent to become value aligned. We carry out a series of experiments including a continuous state space traffic problem to demonstrate the importance of the written and unwritten norms and show how our method can find the value aligned policies. Furthermore, we carry out ablations to demonstrate why it is better to combine these two groups of norms rather than using either separately

    Dataset supporting the thesis &quot;Explaining the future context of deep reinforcement learning agents&rsquo; decision-making&quot;

    No full text
    # Dataset supporting the thesis &quot;Explaining the future context of deep reinforcement learning agents&rsquo; decision-making&quot; By Mark Towers, supervised by Prof. Timothy Norman, Dr Yali Du, and Prof. Chris Freeman This dataset contains folders for all three research chapters (Chapters 4, 5 and 6) * temporal-explanations-4-drl (Chapter 4) was published &quot;Temporal Explanations for Deep Reinforcement Learning&quot; at AAMAS EXTRAAMAS 2024 (https://link.springer.com/chapter/10.1007/978-3-031-70074-3_6) * temporal-reward-decomposition (Chapter 5) was published &quot;Explaining an Agent&#39;s Future Beliefs through Temporally Decomposing Future Reward Estimators&quot; at ECAI 2024 (https://arxiv.org/abs/2408.08230) * eval-xrl-goal-identification (Chapter 6) is unpublished currently Each folder contains their own readme with more details and are also available at https://github.com/pseudo-rnd-thoughts/{folder-name} DOI: https://doi.org/10.5258/SOTON/D3553 </span

    Speaking your language: spatial relationships in interpretable emergent communication

    No full text
    Effective communication requires the ability to refer to specific parts of an observation in relation to others. While emergent communication literature shows success in developing various language properties, no research has shown the emergence of such positional references. This paper demonstrates how agents can communicate about spatial relationships within their observations. The results indicate that agents can develop a language capable of expressing the relationships between parts of their observation, achieving over 90% accuracy when trained in a referential game which requires such communication. Using a collocation measure, we demonstrate how the agents create such references. This analysis suggests that agents use a mixture of non-compositional and compositional messages to convey spatial relationships. We also show that the emergent language is interpretable by humans. The translation accuracy is tested by communicating with the receiver agent, where the receiver achieves over 78% accuracy using parts of this lexicon, confirming that the interpretation of the emergent language was successful

    Going Beyond Counting First Authors in Author Co-citation Analysis

    Full text link
    The present study examines one of the fundamental aspects of author co-citation analysis (ACA) - the way co-citation counts are defined. Co-citation counting provides the data on which all subsequent statistical analyses and mappings are based, and we compare ACA results based on two different types of co-citation counting - the traditional type that only counts the first one among a cited work's authors on the one hand and a non-traditional type that takes into account the first 5 authors of a cited work on the other hand. Results indicate that the picture produced through this non-traditional author co-citation counting contains more coherent author groups and is therefore considerably clearer. However, this picture represents fewer specialties in the research field being studied than that produced through the traditional first-author co-citation counting when the same number of top-ranked authors is selected and analyzed. Reasons for these effects are discussed
    corecore