2,684 research outputs found

    Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

    No full text
    Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across different runs, especially when the algorithm is sensitive to hyper-parameter tuning. Motivated by the capability of Generative Adversarial Networks (GANs) in learning complex data manifold, the adversarial training procedure could be utilized to learn a population of good-performing policies instead. We extend the teacher-student methodology observed in the Knowledge Distillation field in typical deep neural network prediction tasks to RL paradigm. Instead of learning a single compressed student network, an adversarially-trained generative model (hypernetwork) is learned to output network weights of a population of good-performing policy networks, representing a school of apprentices. Our proposed framework, named Teacher-Apprentices RL (TARL), is modular and could be used in conjunction with many existing RL algorithms. We illustrate the performance gain and improved robustness by combining TARL with various types of RL algorithms, including direct policy search Cross-Entropy Method, Q-learning, Actor-Critic, and policy gradient-based methods.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

    BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

    No full text
    While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

    Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

    No full text
    Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different trajectories is still largely unexplored. In this paper, we propose an actor critic method for safe RL that uses an implicit quantile network to approximate the distribution of accumulated safety-costs. Using an accurate estimate of the distribution of accumulated safetycosts, in particular of the upper tail of the distribution, greatly improves the performance of riskaverse RL agents. The empirical analysis shows that our method achieves good risk control in complex safety-constrained environments.AlgorithmicsIntelligent Electrical Power Grid

    qgym: A Gym for Training and Benchmarking RL-Based Quantum Compilation

    No full text
    Compiling a quantum circuit for specific quantum hardware is a challenging task. Moreover, current quantum computers have severe hardware limitations. To make the most use of the limited resources, the compilation process should be optimized. To improve currents methods, Reinforcement Learning (RL), a technique in which an agent interacts with an environment to learn complex policies to attain a specific goal, can be used. In this work, we present qgym, a software framework derived from the OpenAI gym, together with environments that are specifically tailored towards quantum compilation. The goal of qgym is to connect the research fields of Artificial Intelligence (AI) with quantum compilation by abstracting parts of the process that are irrelevant to either domain. It can be used to train and benchmark RL agents and algorithms in highly customizable environments.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Quantum Circuit Architectures and Technolog

    Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems

    No full text
    Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for deep RL to be applicable. We focus on domains where agents interact with a reduced portion of a larger environment while still being affected by the global dynamics. Our method combines the use of local simulators with learned models that mimic the influence of the global system. The experiments reveal that incorporating this idea into the deep RL workflow can considerably accelerate the training process and presents several opportunities for the future.Interactive IntelligenceAlgorithmic

    ADDITIONAL THERAPEUTIC EFFECTS OF ELECTROACUPUNCTURE IN CONJUNCTION WITH CONVENTIONAL REHABILITATION FOR PATIENTS WITH FIRST-EVER ISCHAEMIC STROKE

    No full text
    Objective: This study examined the additional therapeutic effects of electroacupuncture for patients with first-ever ischaemic stroke. Design: Randomized controlled study. Subjects: A total of 63 patients with first-ever ischaemic stroke. Methods: The study and control groups underwent a conventional rehabilitation program, with the former receiving an additional 8 courses of electroacupuncture over a period of one month. Therapeutic effects were assessed by the Fugl-Meyer Assessment for motor performance and the Functional Independence Measure (FIM (TM)) for the independence of functional performance at 2 and 4 weeks after treatment, and 3 months and 6 months after stroke. Results: For total Fugl-Meyer Assessment score, improvement was more significant for the study group relative to the control group at 2 weeks (16.2 vs 10.6; p = 0.047) and 4 weeks after treatment (27.4 vs 17.1; p = 0.005), and at 3 months after the stroke (34.7 vs 21.8; p = 0.009). The Fugl- Meyer Assessment scores improved significantly, especially in upper-limb motor function for the study group. There was no statistically significant between-group difference in total FIM (TM) score improvement. Conclusion: Electroacupuncture can improve motor function, especially in upper-limb motor function, for patients with first-ever ischaemic stroke
    corecore