1,721,035 research outputs found
Robot attorno a noi: Dove sono, cosa fanno, cosa faranno?
Stiamo assistendo all’invasione silenziosa dei robot nella vita di tutti i giorni. Dall’originale impiego in fabbrica, i robot vedono ormai una crescita esponenziale al di fuori dell’ambito della produzione industriale. È importante quindi capire come fa un robot autonomo a svolgere il proprio compito, quali sono le possibilità reali di applicazione in questo momento, e quali potranno essere in un prossimo futuro. Presentiamo una rapida rassegna dello stato della robotica ad oggi, e degli sviluppi in un futuro prossimo. Infine, mostriamo alcune implicazioni sociali ed etiche che stanno emergendo di conseguenza.We are witnessing the silent invasion of robots in everyday's life. From the original use in the factory, the robots diffusion is now growing exponentially outside the scope of industrial production. It is therefore important to understand how an autonomous robot can carry out its task, what are the real possibilities of application at this time, and which ones will be possible in a near future. We present a quick review of the state of the robotics to date, and developments in the near future. Finally, we show some social and ethical implications that are about to emerge
Augmented Memory Replay in Reinforcement Learning With Continuous Control
Online reinforcement learning agents are currently able to process an increasing amount of data by converting it into a higher order value functions. This expansion of the information collected from the environment increases the agent’s state space enabling it to scale up to more complex problems but also increases the risk of forgetting by learning on redundant or conflicting data. To improve the approximation of a large amount of data, a random mini-batch of the past experiences that are stored in the replay memory buffer is often replayed at each learning step. The proposed work takes inspiration from a biological mechanism which acts as a protective layer of higher cognitive functions found in mammalian brain: active memory consolidation mitigates the effect of forgetting previous memories by dynamically processing the new ones. Similar dynamics are implemented by the proposed augmented memory replay or AMR algorithm. The architecture of AMR, based on a simple artificial neural network is able to provide an augmentation policy which modifies each of the agents experiences by augmenting their relevance prior to storing them in the replay memory. The function approximator of AMR is evolved using genetic algorithm in order to obtain the specific augmentation policy function that yields the best performance of a learning agent in a specific environment given by its received cumulative reward. Experimental results show that an evolved AMR augmentation function capable of increasing the significance of the specific memories is able to further increase the stability and convergence speed of the learning algorithms dealing with the complexity of continuous action domains
Towards learning agents with personality traits: Modeling Openness to Experience
Recent advances in neurosciences and cognitive sciences show us that the human neocortex is not a slave to the experiences from our perception and that the memories stored in hippocampus are goal weighted during the replay of the experiences for the purpose of relearning from them. Temporal difference reinforcement learning systems that use neural networks as function approximators rely on an experience replay memory structure similar to the hippocampus. We bring forward this similarity and present a novel way of using a goal weighted prioritization of the memory that is biologically inspired. Furthermore, we introduce a novel prioritization criteria called Variety of Experience Index, or VEI, for weighting the selection of the experiences that are stored in the replay memory. Weighting the experiences based on two different extremes of VEI can behaviourally modify the agent's learning process, generating different types of learning agents that exhibit different personality traits along the dimension of Openness to Experience. (C) 2019 Elsevier B.V. All rights reserved
HAVPTAT: A Human Activity Video Pose Tracking Annotation Tool
We propose a new semi-automatic annotation software: Human Activity Video Pose Tracking Annotation Tool (HAVPTAT). It can automatically detect and track multiple people and their pose in the video to improve work efficiency. HAVPTAT also provides the dynamical visualization of human pose, bounding boxes, person tracking ID, and possible prediction results together. The lightweight software can be launched in a few seconds and easily distributed. Its ease of use will allow non-professionals to get started quickly. This software will accelerate the development of human activity recognition models and service robots
Correlation minimizing replay memory in temporal-difference reinforcement learning
Online reinforcement learning agents are now able to process an increasing amount of data which makes their approximation and compression into value functions a more demanding task. To improve approximation, thus the learning process itself, it has been proposed to select randomly a mini-batch of the past experiences that are stored in the replay memory buffer to be replayed at each learning step. In this work, we present an algorithm that classifies and samples the experiences into separate contextual memory buffers using an unsupervised learning technique. This allows each new experience to be associated to a mini-batch of the past experiences that are not from the same contextual buffer as the current one, thus further reducing the correlation between experiences. Experimental results show that the correlation minimizing sampling improves over Q-learning algorithms with uniform sampling, and that a significant improvement can be observed when coupled with the sampling methods that prioritize on the experience temporal difference error
Adaptation of learning agents through artificial perception
The process of online reinforcement learning also creates a stream of experiences that an agent can store to re-learn from them. In this work, we introduce a concept of artificial perception affecting the dynamics of experience memory replay, which induces a secondary goal-directed drive that complements the main goal defined by the reinforcement function. The different perception dynamics are capable of inducing different "personality" types able to govern the agent behavior, possibly enabling it to exhibit an improved performance over an environment with specific characteristics. Experimental results show that different personalities show different performance levels when facing environment variations, therefore, showcasing the influence of artificial perception in agent's adaptation
Using Q-learning to Automatically Tune Quadcopter PID Controller Online for Fast Altitude Stabilization
Unmanned Arial Vehicles (UAVs), and more specifically, quadcopters need to be stable during their flights. Altitude stability is usually achieved by using a PID controller that is built into the flight controller software. Furthermore, the PID controller has gains that need to be tuned to reach optimal altitude stabilization during the quadcopter's flight. For that, control system engineers need to tune those gains by using extensive modeling of the environment, which might change from one environment and condition to another. As quadcopters penetrate more sectors from the military to the consumer sectors, they have been put into complex and challenging environments more than ever before. Hence, intelligent self-stabilizing quadcopters are needed to maneuver through those complex environments and situations. Here we show that by using online reinforcement learning with minimal background knowledge, the altitude stability of the quadcopter can be achieved using a model-free approach. We found that by using background knowledge and an activation function like Sigmoid, altitude stabilization can be achieved faster with a small memory footprint. In addition, using this approach will accelerate development by avoiding extensive simulations before applying the PID gains to the real-world quadcopter. Our results demonstrate the possibility of using the trial and error approach of reinforcement learning combined with activation function and background knowledge to achieve faster quadcopter altitude stabilization in different environments and conditions
Graph-Based Design of Hierarchical Reinforcement Learning Agents
There is an increasing interest in Reinforcement Learning to solve new and more challenging problems, as those emerging in robotics and unmanned autonomous vehicles. To face these complex systems, a hierarchical and multi-scale representation is crucial. This has brought the interest on Hierarchical Deep Reinforcement learning systems. Despite their successful application, Deep Reinforcement Learning systems suffer from a variety of drawbacks: they are data hungry, they lack of interpretability, and it is difficult to derive theoretical properties about their behavior. Classical Hierarchical Reinforcement Learning approaches, while not suffering from these drawbacks, are often suited for finite actions, and finite states, only. Furthermore, in most of the works, there is no systematic way to represent domain knowledge, which is often only embedded in the reward function.We present a novel Hierarchical Reinforcement Learning framework based on the hierarchical design approach typical of control theory. We developed our framework extending the block diagram representation of control systems to fit the needs of a Hierarchical Reinforcement Learning scenario, thus giving the possibility to integrate domain knowledge in an effective hierarchical architecture
Uncertainty maximization in partially observable domains: A cognitive perspective
Faced with an ever-increasing complexity of their domains of application, artificial learning agents are now able to scale up in their ability to process an overwhelming amount of data. However, this comes at the cost of encoding and processing an increasing amount of redundant information. This work exploits the possibility of learning systems, applied in partially observable domains, to selectively focus on the specific type of information that is more likely related to the causal interaction among transitioning states. A temporal difference displacement criterion is defined to implement adaptive masking of the observations. It can enable a significant improvement of convergence of temporal difference algorithms applied to partially observable Markov processes, as shown by experiments performed under a variety of machine learning problems, ranging from highly complex visuals as Atari games to simple textbook control problems such as CartPole. The proposed framework can be added to most RL algorithms since it only affects the observation process, selecting the parts more promising to explain the dynamics of the environment and reducing the dimension of the observation space
Informed Sampling of Prioritized Experience Replay
Experience replay plays an essential role as an information-generating mechanism in reinforcement learning systems that use neural networks as function approximators. It enables the artificial learning agents to store their past experiences in a sliding-window buffer, effectively recycling them in the process of a continual re-training of a neural network. The intermediary process of experience caching opens a possibility for an agent to optimize the order in which the experiences are sampled from the buffer. This may improve the default standard, i.e., the stochastic prioritization based on Temporal-Difference error (or TD-error), which focuses on experiences that carry more Temporal-Difference surprise for the approximator. A notion of informed prioritization is proposed, a method relying on fast on-line confidence estimates of approximator predictions in order to be able to dynamically exploit the benefits of TD-error prioritization only when its prediction confidence about the selected experiences increases. The presented informed-stochastic prioritization method of replay buffer sampling, implemented as a part of standard staple Deep Q-learning algorithm outperformed the vanilla stochastic prioritization based on TD-error in 41 out of 54 trialed Atari games
- …
