5,224 research outputs found

    Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

    No full text
    Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across different runs, especially when the algorithm is sensitive to hyper-parameter tuning. Motivated by the capability of Generative Adversarial Networks (GANs) in learning complex data manifold, the adversarial training procedure could be utilized to learn a population of good-performing policies instead. We extend the teacher-student methodology observed in the Knowledge Distillation field in typical deep neural network prediction tasks to RL paradigm. Instead of learning a single compressed student network, an adversarially-trained generative model (hypernetwork) is learned to output network weights of a population of good-performing policy networks, representing a school of apprentices. Our proposed framework, named Teacher-Apprentices RL (TARL), is modular and could be used in conjunction with many existing RL algorithms. We illustrate the performance gain and improved robustness by combining TARL with various types of RL algorithms, including direct policy search Cross-Entropy Method, Q-learning, Actor-Critic, and policy gradient-based methods.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

    BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

    No full text
    While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

    Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

    No full text
    Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different trajectories is still largely unexplored. In this paper, we propose an actor critic method for safe RL that uses an implicit quantile network to approximate the distribution of accumulated safety-costs. Using an accurate estimate of the distribution of accumulated safetycosts, in particular of the upper tail of the distribution, greatly improves the performance of riskaverse RL agents. The empirical analysis shows that our method achieves good risk control in complex safety-constrained environments.AlgorithmicsIntelligent Electrical Power Grid

    qgym: A Gym for Training and Benchmarking RL-Based Quantum Compilation

    No full text
    Compiling a quantum circuit for specific quantum hardware is a challenging task. Moreover, current quantum computers have severe hardware limitations. To make the most use of the limited resources, the compilation process should be optimized. To improve currents methods, Reinforcement Learning (RL), a technique in which an agent interacts with an environment to learn complex policies to attain a specific goal, can be used. In this work, we present qgym, a software framework derived from the OpenAI gym, together with environments that are specifically tailored towards quantum compilation. The goal of qgym is to connect the research fields of Artificial Intelligence (AI) with quantum compilation by abstracting parts of the process that are irrelevant to either domain. It can be used to train and benchmark RL agents and algorithms in highly customizable environments.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Quantum Circuit Architectures and Technolog

    Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems

    No full text
    Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for deep RL to be applicable. We focus on domains where agents interact with a reduced portion of a larger environment while still being affected by the global dynamics. Our method combines the use of local simulators with learned models that mimic the influence of the global system. The experiments reveal that incorporating this idea into the deep RL workflow can considerably accelerate the training process and presents several opportunities for the future.Interactive IntelligenceAlgorithmic

    Self-supervision, Data augmentation und Online fine-tuning für Offline RL

    Full text link
    Reinforcement learning (RL) methods learn through interaction with an environment. The RL paradigm is inherently designed to be performed in an online fashion. However, for many applications in the real world, learning online is not always feasible due to resource and/or safety constraints. Unlike online RL, offline RL, the main topic of this thesis, allows the agent to learn policies from previously collected datasets. Current RL algorithms have a number of other major limitations, among them data-inefficiency. Two promising streams of research that address this limitation are self-supervised methods and data augmentation. These methods were, however, developed for online RL, and it is not yet clear if their benefits translate to the offline case. Moreover, it is not always ideal to eliminate online environment interaction altogether. Both online RL and offline RL have their individual advantages and disadvantages. Algorithms that combine both approaches, e.g., via offline pre-training and online fine-tuning, can draw from the best of both worlds. Consequently, there is a need for RL agents that can learn both online and offline in a data-efficient way. In this thesis, we improve the learning performance of offline RL algorithms by integrating existing self-supervised methods, data augmentations and online fine-tuning into the learning process. We select three established self-supervised online RL architectures (Curl, SPR, SGI) and five prominent data augmentations and adapt them for the offline setting. We then augment a state-of-the-art offline RL algorithm, Conservative Q-Learning (CQL), with the selected methods and compare them against five established baselines. We empirically evaluate all algorithms on both discrete and continuous control tasks usingoffline Atari and Gym-MuJoCo datasets, respectively. To this end, we select four Atari games (Pong, Breakout, Seaquest, QBert) and three Gym-MuJoCo tasks (Halfcheetah, Hopper, Walker-2d) for our experiments. Our results show that self-supervised methods and data augmentations can outperform the baseline agents and considerably improve the learning performance of offline RL algorithms on Gym-MuJoCo but are not beneficial on Atari. Furthermore, we investigate how offline pre-training followed by online fine-tuning affects the learning performance of the selected offline RL algorithm. Our results further demonstrate that hybrid algorithms that learn both offline and online can be far superior to learning online or offline alone

    Subtask-masked curriculum learning for reinforcement learning with application to UAV maneuver decision-making

    No full text
    Unmanned Aerial Vehicle (UAV) maneuver strategy learning remains a challenge when using Reinforcement Learning (RL) in this sparse reward task. In this paper, we propose Subtask-Masked curriculum learning for RL (SUBMAS-RL), an efficient RL paradigm that implements curriculum learning and knowledge transfer for UAV maneuver scenarios involving multiple missiles. First, this study introduces a novel concept known as subtask mask to create source tasks from a target task by masking partial subtasks. Then, a subtask-masked curriculum generation method is proposed to generate a sequenced curriculum by alternately conducting task generation and task sequencing. To establish efficient knowledge transfer and avoid negative transfer, this paper employs two transfer techniques, policy distillation and policy reuse, along with an explicit transfer condition that masks irrelevant knowledge. Experimental results demonstrate that our method achieves a 94.8% success rate in the UAV maneuver scenario, where the direct use of reinforcement learning always fails. The proposed RL framework SUBMAS-RL is expected to learn an effective policy in complex tasks with sparse rewards.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Algorithmic

    Reinforcement Learning for Intelligent Healthcare Systems: A Review of Challenges, Applications, and Open Research Issues

    No full text
    The rise of chronic disease patients and the pandemic pose immediate threats to healthcare expenditure and mortality rates. This calls for transforming healthcare systems away from one-on-one patient treatment into intelligent health systems, leveraging the recent advances of Internet of Things and smart sensors. Meanwhile, reinforcement learning (RL) has witnessed an intrinsic breakthrough in solving a variety of complex problems for distinct applications and services. Thus, this article presents a comprehensive survey of the recent models and techniques of RL that have been developed/used for supporting Intelligent-healthcare (I-health) systems. It can guide the readers to deeply understand the state-of-the-art regarding the use of RL in the context of I-health. Specifically, we first present an overview of the I-health systems' challenges, architecture, and how RL can benefit these systems. We then review the background and mathematical modeling of different RL, deep RL (DRL), and multiagent RL models. We highlight important guidelines on how to select the appropriate RL model for a given problem, and provide quantitative comparisons, showing the results of deploying key RL models in two scenarios that can be followed in monitoring applications. After that, we conduct an in-depth literature review on RL's applications in I-health systems, covering edge intelligence, smart core network, and dynamic treatment regimes. Finally, we highlight emerging challenges and future research directions to enhance RL's success in I-health systems, which opens the door for exploring some interesting and unsolved problems.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Networked System
    corecore