10,623 research outputs found

    Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

    No full text
    Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across different runs, especially when the algorithm is sensitive to hyper-parameter tuning. Motivated by the capability of Generative Adversarial Networks (GANs) in learning complex data manifold, the adversarial training procedure could be utilized to learn a population of good-performing policies instead. We extend the teacher-student methodology observed in the Knowledge Distillation field in typical deep neural network prediction tasks to RL paradigm. Instead of learning a single compressed student network, an adversarially-trained generative model (hypernetwork) is learned to output network weights of a population of good-performing policy networks, representing a school of apprentices. Our proposed framework, named Teacher-Apprentices RL (TARL), is modular and could be used in conjunction with many existing RL algorithms. We illustrate the performance gain and improved robustness by combining TARL with various types of RL algorithms, including direct policy search Cross-Entropy Method, Q-learning, Actor-Critic, and policy gradient-based methods.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

    BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

    No full text
    While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partially observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Interactive Intelligenc

    RL-m155 mice and miR-155 knockout (KO) mice displayed the unaltered β-cell proliferation and hormone profiles in pancreas.

    No full text
    (A) Morphology of the entire pancreas from RL-m155 mice and miR-155 KO mice. (B) Haematoxylin and eosin (H&E), insulin and glucagon stainings of pancreatic islets in RL-m155 mice and miR-155 KO mice. (C) qRT-PCR analysis of Ins1 and Ins2 mRNA expression in pancreas tissue of RL-m155 mice and miR-155 KO mice. (D) Percentage of β-cell mass in RL-m155 mice and miR-155 KO mice. (E) BrdU and Ki67 stainings of pancreatic islets in RL-m155 mice and miR-155 KO mice. (F) Percentage of Ki67-positive cells in pancreatic islets in RL-m155 mice and miR-155 KO mice. In this study, the non-transgenic littermates/wild-type littermates [i.e., control (con) mice] were used as controls of RL-m155 mice, while wild-type C57BL/6J mice (i.e., WT) of the same age and sex were used as controls of miR-155–/–C57BL/6J mice.</p

    DLR-RM/rl-baselines3-zoo: RL-Zoo3 v2.3.0

    No full text
    &lt;h3&gt;Breaking Changes&lt;/h3&gt; &lt;ul&gt; &lt;li&gt;Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC&lt;/li&gt; &lt;li&gt;Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)&lt;/li&gt; &lt;li&gt;Upgraded to SB3 &gt;= 2.3.0&lt;/li&gt; &lt;/ul&gt; &lt;h3&gt;Other&lt;/h3&gt; &lt;ul&gt; &lt;li&gt;Added test dependencies to &lt;code&gt;setup.py&lt;/code&gt; (@power-edge)&lt;/li&gt; &lt;li&gt;Simplify dependencies of &lt;code&gt;requirements.txt&lt;/code&gt; (remove duplicates from &lt;code&gt;setup.py&lt;/code&gt;)&lt;/li&gt; &lt;/ul&gt; &lt;p&gt;&lt;strong&gt;Full Changelog&lt;/strong&gt;: https://github.com/DLR-RM/rl-baselines3-zoo/compare/v2.2.1...v2.3.0&lt;/p&gt

    Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

    No full text
    Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different trajectories is still largely unexplored. In this paper, we propose an actor critic method for safe RL that uses an implicit quantile network to approximate the distribution of accumulated safety-costs. Using an accurate estimate of the distribution of accumulated safetycosts, in particular of the upper tail of the distribution, greatly improves the performance of riskaverse RL agents. The empirical analysis shows that our method achieves good risk control in complex safety-constrained environments.AlgorithmicsIntelligent Electrical Power Grid

    RL-SAR: A Robotic System for Fine-Grained RFID Localization using RL-based Synthetic Aperture Radar

    No full text
    Efficient localization of RFID-tagged items is crucial in scenarios that require tracking and managing a large inventory. Current systems for fine-grained RFID localization have shown limitations since they only collect measurements on a pre-defined trajectory or optimize measurement locations for a single tag. Thus, there is a need for an RFID localization system that can autonomously optimize for multiple tags and adaptively relocalize tags with lower confidence to achieve a more precise and efficient localization. We introduce RL-SAR, an end-to-end autonomous Synthetic Aperture Radar (SAR) based RFID localization system, utilizing a Reinforcement Learning (RL) algorithm to determine the most optimal trajectory for localizing multiple tags. We implemented this system with an antenna moving on a ceiling-mounted 2D track. The core of the system is a RL-based trajectory optimization algorithm for collecting RF measurements. Based on these RF measurements, we developed a data processing pipeline to compute the estimated tag locations along with their confidence metrics, derived from the RF SAR hologram. The RL algorithm leverages confidence metrics associated with the tags and is capable of learning a strategy that minimizes the antenna’s traveled distance while enhancing the localization accuracy. We built and evaluated a proof-of-concept prototype of RL-SAR. Experimental evaluation demonstrates a mean 3D localization accuracy of 0.244m and the capability to locate 15 tags within an average scanning distance of 19.14 m. We compared our algorithm to naive baselines and show that the baselines require 86% longer trajectory than RL-SAR. Our results show the potential for achieving robust and efficient localization to enhance the current inventory processes across the manufacturing, retail, and logistics sectors.M.Eng

    qgym: A Gym for Training and Benchmarking RL-Based Quantum Compilation

    No full text
    Compiling a quantum circuit for specific quantum hardware is a challenging task. Moreover, current quantum computers have severe hardware limitations. To make the most use of the limited resources, the compilation process should be optimized. To improve currents methods, Reinforcement Learning (RL), a technique in which an agent interacts with an environment to learn complex policies to attain a specific goal, can be used. In this work, we present qgym, a software framework derived from the OpenAI gym, together with environments that are specifically tailored towards quantum compilation. The goal of qgym is to connect the research fields of Artificial Intelligence (AI) with quantum compilation by abstracting parts of the process that are irrelevant to either domain. It can be used to train and benchmark RL agents and algorithms in highly customizable environments.Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.Quantum Circuit Architectures and Technolog

    Macrobrachium dongaoensis Chen & Chen & Guo 2018, sp. nov.

    No full text
    Macrobrachium dongaoensis sp. nov. (Figs. 7–9) Material examined. Holotype. Adult male (FU, 17–06–23–01), tl: 79.0 mm, cl: 19.5 mm, rl: 12.0 mm; Dong’ao island, Zhuhai City, Guangdong Province (E 113°43'09", N 22°01'08", al. 22.4 m, stn.7), 26 Jun 2017, coll. Z. L. Guo, W. J. Chen. Paratypes. 1 male (FU, 17–06–23–02), tl: 73.7 mm, cl: 19.3 mm, rl: 10.5 mm; 1 ovigerous female (FU, 17– 06–23–03), tl: 67.0 mm, cl: 18.0 mm, rl: 12.0 mm, data same as holotype. Diagnosis. Rl is about 0.54–0.67 of cl, nearly straight, falling slightly short of anterior end of antennal scale, dorsal margin with 10–13 teeth, 4 or 5 (usually 4) teeth behind orbit, equally spaced, ventral margin with 1–3 teeth. Cephalothorax and abdomen smooth, without microspinules. Second pereiopods shorter than tl in both sexes, subequal in length in male, the right slightly longer, but equal in female; merus is about 1.1 times as long as ischium; carpus is 4.4–5.4 times as long as width, about 1.1–1.4 times as long as merus and almost same length of palm; palm is not inflated, 4.3–4.9 times as long as wide; the finger 0.69–0.78 times as long as palm, fingers without gape when crossed, the fixed finger with 2 teeth at proximal, moveable finger with 2 proximal teeth; all segments are covered with numerous spines particularly on dorsal and lateral surfaces. Eggs small, 0.33–0.42 X 0.37–0.44 mm in diameter. Description. Rostrum. (Fig. 7a) Rl is about 0.54–0.67 of cl, nearly straight, reaching to or slightly beyond end of scaphocerite, dorsal margin with 10–13 teeth, 4 or 5 (usually 4) teeth behind orbit, equally spaced, ventral margin with 1–3 teeth. Carapace. (Fig. 7a) Glabrous; antennal spine well developed; hepatic spine much smaller than antennal spine, situated backwardly, distinctly below level of antennal spine. Antennule. (Fig. 7a) with sharp stylocerite, reaching one–third basal segment of antennular peduncle; anterior margin of basal segment distinctly convex; second segment about 0.45 times as long as basal segment, about 0.81 time as long as distal segment. All segments with submarginal plumose setae. Antenna. (Fig. 7a) with scaphocerite large, rectangular, 3.4 times as long as wide, outer margin almost straight, ended with a strong spine, overreached by lamella. Third maxilliped with robust endopod, ischiomerus slightly bow-shaped, with rows of long simple setae on distal inner and outer margins; carpus about 0.75 times length of ischiomerus, with row of long simple setae on inner margin and sparse row of simple setae on outer margin; distal segment about 0.81 times penultimate segment, with long simple setae on inner margin; exopod reach distal end of ischiomerus, with plumose setae distally; basal with well developed oval lateral plate, two arthrobranchs, one rudimentary, obscured by the larger. First pereiopod. (Fig. 7b) Slender, overreaching antennal scale by carpus, carpus 1.7–1.9 times as long as chela; fingers as long as palm. Second pereiopod. (Fig. 7c, d) Slightly shorter than the tl in both sexes, subequal in length in male, the right slightly larger, extending beyond antennal scale by 1/2 carpus, equal in female; the shape and segment ratios of the left and the right are similar; merus is about 1.1 times as long as ischium; carpus is 4.4–5.4 times as long as width, about 1.1–1.4 times as long as merus and almost same length of palm; palm is not inflated, 4.3–4.9 times as long as wide; the finger 0.69–0.78 times as long as palm, fingers without gape when crossed, the fixed finger with 2 teeth at proximal, moveable finger with 2 proximal teeth; all segments are covered with numerous spines particularly on dorsal and lateral surfaces. Third pereiopod. (Fig. 7e) Extending to end of third antennular peduncle segment by distal propodus; propodus 2.8–3.0 times as long as dactylus, with 5–7 spines on posterior margin, dactylus about 5.0 times as long as width, terminating in a claw. Fifth pereiopod. (Fig. 7f) Extending to end of third antennular peduncle segment; propodus 3.4–4.3 times as long as dactylus, with 4 spines on posterior margin, dactylus about 5.3 times as long as width, terminating in a small claw. First pleopod of male with endopod of about half of exopod, slightly concave at inner margin, top rounded, without appendix interna. Second pleopod with well developed appendix masculina, reaching middle of endopod, about twice as long as appendix interna with numerous stiff setae. Abdomen glabrous; pleura of first three somites broadly rounded, pleura of somites 4 and 5 also rounded, but with almost rectangular posterolateral angle; sixth somite 1.2–1.4 times as long as fifth somite, about 0.40–0.43 times as long as telson. Telson. (Fig. 7g) Smooth, about 0.62–0.72 times of cl, longer than sixth abdominal segment; dorsal surface with 2 pairs of stout movable spines; posterior margin tapers regularly to a sharp point with 2 pairs of posterior spines; numerous setae present between inner spines. Uropodal diaeresis with a spine, shorter than outer angle. Eggs small, 0.33–0.42 X 0.37–0.44 mm in diameter. Live coloration. The live specimens (Fig. 8a, b) are light green and translucent, uropod with numerous small reddish spots uropods; all segments of second pereiopods is brown, with one dark rings on outer posterior surface on merus and two dark rings on carpus, the palm have two longitudinal dark stripes near margins; first, third, fourth and fifth pereiopods transparent; eggs green (Fig. 8b). Etymology. The new species is named after its distribution area, Dong’ao Island. Remarks. Macrobrachium dongaoensis superficially resembles M. inflatum Liang & Yan, 1985 in having similar ratios of various segments of the second pereiopods and breeding female bears smaller sized numerous eggs. However, it can be distinguished from M. inflatum by its shorter rostrum (extending to end of third antennular peduncle segment versus beyond antennal scale; rl<cl versus rl=cl); the male second pereiopods are subequal (versus equal), the right slightly longer, the palm is not inflated (versus inflated) and 4.3–4.9 times (versus 3.5–3.6 times) as long as broad, the merus is distinctly longer than the ischium (versus shorter than the ischium), without a gape present when closed (versus with a distinct gape present). M. dongaoensis morphologically resembles M. heterorhynchos Guo & He, 2008, which was originally described also from Guangdong Province. It can be distinguished from M. heterorhynchos by the shorter (reaching to end of scphocerite versus one–third distal 1/3 of rostrum extending beyond scaphocerite) and non-sexually dimorphic rostrum (versus sexually dimorphic); the carpus of male second pereiopods is as long as palm (versus distinctly longer than the palm). M. Dongaoensis is also close to M. nipponense (De Haan, 1849). It can be distinguished from latter by characters of the male second pereopods. The second pereiopods of M. Dongaoensis are distinctly shorter than those of M. nipponense; the carpus is as long as palm (versus distinctly longer than the palm). and the finger without setae on cutting edge (versus covered with long dense setae). Habitat. The type specimens were collected from Dong’ao Island, Zhuhai City, Guangdong Province (E 113°43'09", N 22°01'08", al. 22.4 m). This stream (Fig, 9) is biggest in Dong’ao Island, which with the width of about 4 m, deepth about 0.5–0.8 m, with beds of sand and gravel patches between large boulders, and with full of bank vegetation and spirogyras. It is a moderately fast stream, flows into the sea. It is found together with Caridina serrata. Distribution. Only known from the type locality in Guangdong Province, southern China.Published as part of Chen, Qing-Hua, Chen, Wen-Jian & Guo, Zhao-Liang, 2018, Caridean prawn (Crustacea, Decapoda) from Dong'ao Island, Guangdong, China, pp. 315-328 in Zootaxa 4399 (3) on pages 323-327, DOI: 10.11646/zootaxa.4399.3.2, http://zenodo.org/record/120666

    Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems

    No full text
    Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for deep RL to be applicable. We focus on domains where agents interact with a reduced portion of a larger environment while still being affected by the global dynamics. Our method combines the use of local simulators with learned models that mimic the influence of the global system. The experiments reveal that incorporating this idea into the deep RL workflow can considerably accelerate the training process and presents several opportunities for the future.Interactive IntelligenceAlgorithmic
    corecore