Search CORE

1,721,230 research outputs found

Learning control policies from constrained motion

Author: Howard Matthew
Publication venue
Publication date: 01/01/2009
Field of study

Many everyday human skills can be framed in terms of performing some task subject to constraints imposed by the task or the environment. Constraints are usually unobservable and frequently change between contexts. In this thesis, we explore the problem of learning control policies from data containing variable, dynamic and non-linear constraints on motion. We show that an effective approach for doing this is to learn the unconstrained policy in a way that is consistent with the constraints. We propose several novel algorithms for extracting these policies from movement data, where observations are recorded under different constraints. Furthermore, we show that, by doing so, we are able to learn representations of movement that generalise over constraints and can predict behaviour under new constraints. In our experiments, we test the algorithms on systems of varying size and complexity, and show that the novel approaches give significant improvements in performance compared with standard policy learning approaches that are naive to the effect of constraints. Finally, we illustrate the utility of the approaches for learning from human motion capture data and transferring behaviour to several robotic platforms

Edinburgh Research Archive

Learning Dynamics for Robot Control under Varying Contexts

Author: Petkos Georgios
Publication venue
Publication date: 01/01/2008
Field of study

Institute of Perception, Action and BehaviourHigh fidelity, compliant robot control requires a sufficiently accurate dynamics model. Often though, it is not possible to obtain a dynamics model sufficiently accurately or at all using analytical methods. In such cases, an alternative is to learn the dynamics model from movement data. This thesis discusses the problems specific to dynamics learning for control under nonstationarity of the dynamics. We refer to the cause of the nonstationarity as the context of the dynamics. Contexts are, typically, not directly observable. For instance, the dynamics of a robot manipulator changes as the robot manipulates different objects and the physical properties of the load – the context of the dynamics – are not directly known by the controller. Other examples of contexts that affect the dynamics are changing force fields or liquids with different viscosity in which a manipulator has to operate. The learned dynamics model needs to be adapted whenever the context and therefore the dynamics changes. Inevitably, performance drops during the period of adaptation. The goal of this work, is to reuse and generalize the experience obtained by learning the dynamics of different contexts in order to adapt to changing contexts fast. We first examine the case that the dynamics may switch between a discrete, finite set of contexts and use multiple models and switching between them to adapt the controller fast. A probabilistic formulation of multiple models is used, where a discrete latent variable is used to represent the unobserved context and index the models. In comparison to previous multiple model approaches, the developed method is able to learn multiple models of nonlinear dynamics, using an appropriately modified EM algorithm. We also deal with the case when there exists a continuum of possible contexts that affect the dynamics and hence, it becomes essential to generalize from a set of experienced contexts to novel contexts. There is very little previous work on this direction and the developed methods are completely novel. We introduce a set of continuous latent variables to represent context and introduce a dynamics model that depends on this set of variables. We first examine learning and inference in such a model when there is strong prior knowledge on the relationship of these continuous latent variables to the modulation of the dynamics, e.g., when the load at the end effector changes. We also develop methods for the case that there is no such knowledge available. Finally, we formulate a dynamics model whose input is augmented with observed variables that convey contextual information indirectly, e.g., the information from tactile sensors at the interface between the load and the arm. This approach also allows generalization to not previously seen contexts and is applicable when the nature of the context is not known. In addition, we show that use of such a model is possible even when special sensory input is not available by using an instance of an autoregressive model. The developed methods are tested on realistic, full physics simulations of robot arm systems including a simplistic 3 degree of freedom (DOF) arm and a simulation of the 7 DOF DLR light weight robot arm. In the experiments, varying contexts are different manipulated objects. Nevertheless, the developed methods (with the exception of the methods that require prior knowledge on the relationship of the context to the modulation of the dynamics) are more generally applicable and could be used to deal with different context variation scenarios

Edinburgh Research Archive

Stochastic optimal control with learned dynamics models

Author: Mitrovic Djordje
Publication venue
Publication date: 01/01/2011
Field of study

The motor control of anthropomorphic robotic systems is a challenging computational task mainly because of the high levels of redundancies such systems exhibit. Optimality principles provide a general strategy to resolve such redundancies in a task driven fashion. In particular closed loop optimisation, i.e., optimal feedback control (OFC), has served as a successful motor control model as it unifies important concepts such as costs, noise, sensory feedback and internal models into a coherent mathematical framework. Realising OFC on realistic anthropomorphic systems however is non-trivial: Firstly, such systems have typically large dimensionality and nonlinear dynamics, in which case the optimisation problem becomes computationally intractable. Approximative methods, like the iterative linear quadratic gaussian (ILQG), have been proposed to avoid this, however the transfer of solutions from idealised simulations to real hardware systems has proved to be challenging. Secondly, OFC relies on an accurate description of the system dynamics, which for many realistic control systems may be unknown, difficult to estimate, or subject to frequent systematic changes. Thirdly, many (especially biologically inspired) systems suffer from significant state or control dependent sources of noise, which are difficult to model in a generally valid fashion. This thesis addresses these issues with the aim to realise efficient OFC for anthropomorphic manipulators. First we investigate the implementation of OFC laws on anthropomorphic hardware. Using ILQG we optimally control a high-dimensional anthropomorphic manipulator without having to specify an explicit inverse kinematics, inverse dynamics or feedback control law. We achieve this by introducing a novel cost function that accounts for the physical constraints of the robot and a dynamics formulation that resolves discontinuities in the dynamics. The experimental hardware results reveal the benefits of OFC over traditional (open loop) optimal controllers in terms of energy efficiency and compliance, properties that are crucial for the control of modern anthropomorphic manipulators. We then propose a new framework of OFC with learned dynamics (OFC-LD) that, unlike classic approaches, does not rely on analytic dynamics functions but rather updates the internal dynamics model continuously from sensorimotor plant feedback. We demonstrate how this approach can compensate for unknown dynamics and for complex dynamic perturbations in an online fashion. A specific advantage of a learned dynamics model is that it contains the stochastic information (i.e., noise) from the plant data, which corresponds to the uncertainty in the system. Consequently one can exploit this information within OFC-LD in order to produce control laws that minimise the uncertainty in the system. In the domain of antagonistically actuated systems this approach leads to improved motor performance, which is achieved by co-contracting antagonistic actuators in order to reduce the negative effects of the noise. Most importantly the shape and source of the noise is unknown a priory and is solely learned from plant data. The model is successfully tested on an antagonistic series elastic actuator (SEA) that we have built for this purpose. The proposed OFC-LD model is not only applicable to robotic systems but also proves to be very useful in the modelling of biological motor control phenomena and we show how our model can be used to predict a wide range of human impedance control patterns during both, stationary and adaptation tasks

Edinburgh Research Archive

Bayesian locally weighted online learning

Author: Edakunni Narayanan U.
Publication venue
Publication date: 01/01/2010
Field of study

Locally weighted regression is a non-parametric technique of regression that is capable of coping with non-stationarity of the input distribution. Online algorithms like Receptive FieldWeighted Regression and Locally Weighted Projection Regression use a sparse representation of the locally weighted model to approximate a target function, resulting in an efficient learning algorithm. However, these algorithms are fairly sensitive to parameter initializations and have multiple open learning parameters that are usually set using some insights of the problem and local heuristics. In this thesis, we attempt to alleviate these problems by using a probabilistic formulation of locally weighted regression followed by a principled Bayesian inference of the parameters. In the Randomly Varying Coefficient (RVC) model developed in this thesis, locally weighted regression is set up as an ensemble of regression experts that provide a local linear approximation to the target function. We train the individual experts independently and then combine their predictions using a Product of Experts formalism. Independent training of experts allows us to adapt the complexity of the regression model dynamically while learning in an online fashion. The local experts themselves are modeled using a hierarchical Bayesian probability distribution with Variational Bayesian Expectation Maximization steps to learn the posterior distributions over the parameters. The Bayesian modeling of the local experts leads to an inference procedure that is fairly insensitive to parameter initializations and avoids problems like overfitting. We further exploit the Bayesian inference procedure to derive efficient online update rules for the parameters. Learning in the regression setting is also extended to handle a classification task by making use of a logistic regression to model discrete class labels. The main contribution of the thesis is a spatially localised online learning algorithm set up in a probabilistic framework with principled Bayesian inference rule for the parameters of the model that learns local models completely independent of each other, uses only local information and adapts the local model complexity in a data driven fashion. This thesis, for the first time, brings together the computational efficiency and the adaptability of ‘non-competitive’ locally weighted learning schemes and the modelling guarantees of the Bayesian formulation

Edinburgh Research Archive

Computational models of motor adaptation under multiple classes of sensorimotor disturbance

Author: Haith Adrian
Publication venue
Publication date: 01/01/2009
Field of study

The human motor system exhibits remarkable adaptability, enabling us to maintain high levels of performance despite ever-changing requirements. There are many potential sources of error duringmovement to which the motor system may need to adapt: the properties of our bodies or tools may vary over time, either at a dynamic or a kinematic level; our senses may become miscalibrated over time and mislead us as to the state of our bodies or the true location of an intended goal; the relationship between sensory stimuli and movement goals may change. Despite these many varied ways in which our movements may be disturbed, existing models of human motor adaptation have tended to assume just a single adaptive component. In this thesis, I argue that the motor system maintains multiple components of adaptation, corresponding to the multiple potential sources of error to which we are exposed. I outline some of the shortcomings of existing adaptation models in scenarious where multiple kinds of disturbances may be present - in particular examining how different distal learning problems associated with different classes of disturbance can affect adaptation within alternative cerebellar-based learning architectures - and outline the computational challenges associated with extending these existing models. Focusing on the specific problem in which the potential disturbances are miscalibrations of vision and proprioception and changes in arm dynamics during reaching, a unified model of sensory and motor adaptation is derived based on the principle of Bayesian estimation of the disturbances given noisy observations. This model is able to account parsimoniously for previously reported patterns of sensory and motor adaptation during exposure to shifted visual feedback. However the model additionally makes the novel and surprising prediction that adaptation to a force field will also result in sensory adaptation. These predictions are confirmed experimentally. The success of the model strongly supports the idea that the motor system maintains multiple components of adaptation, which it updates according to the principles of Bayesian estimation

Edinburgh Research Archive

Exoskeleton-assisted locomotion: design, control and evaluation of wearable robotic devices

Author: Gordon Daniel
Publication venue
Publication date: 30/11/2021
Field of study

Assistive robotic devices such as exoskeletons and prosthetic limbs have great potential as tools for both augmentation and rehabilitation. However, due to the complexity of controlling these devices, especially in unstructured environments where factors such as walking speed and incline can vary rapidly, it is uncommon to see exoskeletons outside of a clinical or research setting. Prostheses, whilst more common, are typically passive, which limits their ability to match the push off forces associated with healthy gait. Motivated by modern techniques for controlling legged robots, this thesis motivates the pursuit of an optimisation-based approach to the control and design of exoskeletons. We identify a number of open problems within the field, namely (1) how to model the dynamic interaction between a human subject and an attached exoskeleton; (2) identifying the appropriate metric or combination of metrics to optimise for in exoskeleton-assisted locomotion; and (3) how to account for changes in human walking style induced by the presence of external assistive forces. This thesis details attempts to solve each of these problems. We present a methodology for expressing human-exoskeleton system models as a combination of musculoskeletal models, exoskeleton inertial parameters and constraint forces. A specific human-exoskeleton model is detailed, along with a range of methods for modelling the interaction forces which occur at the attachment points between the human and exoskeleton agents. Experimental motion data is analysed using musculoskeletal modelling software (OpenSim) to quantify the effect that each of these interaction models, which represent various degrees of approximation, have on the resulting humanexoskeleton dynamics. Applying exoskeleton assistance is inherently a shared control problem. The overall goal is not to achieve a prescribed motion at any cost, or to do so while minimising exoskeleton joint torques, but rather to enhance aspects of the assisted humans motions; for example, increasing energy efficiency or stability. Therefore, in order to optimise exoskeleton control patterns we must first consider what it means for the resultant gait patterns to be optimal, or even good. We present a detailed analysis of exoskeleton-assisted walking in healthy subjects, with a particular focus on identifying those metrics which are invariant to changes in walking condition (e.g. walking speed or incline). We posit that such metrics, which exhibit strong invariance properties, are good candidates for the objective function of an optimisation-based controller. Human walking strategies are unique and complex, and the problem of predicting the effect of exoskeleton assistance on a subjects gait pattern is a challenging one. In recent years, success has been had by methods which aim to learn suitable assistance strategies directly from a subject, via a process known as human-in-the-loop optimisation. We present a novel humanin- the-loop framework which utilises musculoskeletal modelling to make the learning process more time-efficient. Our method is evaluated on a number of subjects walking on a treadmill with exoskeleton assistance. In addition, we also explore how human-in-the-loop optimisation can be used to inform the design of exoskeletons to enhance their assistive capabilities. Overall, these contributions represent a step towards enabling the wider usage of exoskeletons and other assistive robotic devices, which could lead to significant improvements to quality of life for many

Edinburgh Research Archive

Understanding the fundamentals of bipedal locomotion in humans and robots

Author: McGreavy Christopher
Publication venue
Publication date: 25/04/2023
Field of study

Walking is a robust and efficient method of moving around the world, which would greatly enhance the capabilities of humanoid robots, although they cannot match the performance of their biological counterparts. The highly nonlinear dynamics of locomotion create a vast state-action space, which makes model-based control difficult, yet biological humans are highly proficient and robust in their motion while operating under similar constraints. This disparity in performance naturally leads to the question: what can we learn about locomotion control by observing humans, and how can this be used to develop bio-inspired locomotion control in mechatronic humanoids? This thesis investigates bio-inspired locomotion control, but also explores the limitations of this approach and how we can use robotic platforms to move towards a better understanding of locomotion. We first present a methodology for measuring and analysing human locomotion behaviour, specifically disturbance recovery, and fit models to this complex behaviour to represent it in as simple as possible such that it can be easily translated into a simple controller for reactive motion. A minimum-jerk Model Predictive Control algorithm at the Centre of Mass (CoM) best captured human motion during multiple recovery strategies instead of using one controller for each strategy, which is common in this area. Capturing this simple CoM model of complex human behaviour shows that bio-inspiration can be an important tool for controller development, but behaviour varies between and even within individuals given similar initial conditions, which manifests as stochastic behaviour. Coupled with the ability to only measure expressed behaviours instead of direct control policies, this stochasticity presents a fundamental limit to using bio-inspiration for control purposes, as only indirect inferences can be made about a complex, stochastic system. To overcome these barriers, we investigate the use of mechatronic humanoid robots as a means to explore invariant aspects of the vast dynamic state-space of locomotion which are described by physical laws, and are therefore not subject to the stochastic behaviour of individual humans, that apply to both biological and mechatronic humanoid forms. We present a pipeline to explore the invariant energetics of humanoid robots during stepping for push recovery, where the most efficient stepping parameters are identified for a given initial CoM velocity and desired step length. Using this to explore the stepping state-space, our analysis finds a region of attraction between disturbance magnitude and optimal step length surrounded by a region of similarly efficient alternatives which corresponds to the stochastic behavior observed in humans during push recovery, which we would be unable to identify without reproducibility, direct access to internal measurements and known full body dynamics, which is not available in humans. We expand this paradigm further to investigate the invariant energetics of continuous walking using a full-body humanoid by exploring the state-space of step-length and step-timing to identify the most efficient sub-spaces of these parameters which describes the most efficient way to walk. Through analysis of this state-space, we provide evidence that the humanoid morphology exhibits a passive tendency towards energy-optimal motion and its dynamics follow a region of attraction towards Cost of Transport-optimal motion. Overall, these findings demonstrate the utility of robotics as a tool with which to explore certain aspects of legged locomotion and the results gained from our methodology suggest that humans do not need to explore a vast state-action space to learn to walk, they need only internalise simple heuristics for the natural dynamics of stepping that are easy to learn and can produce rapid, reactive and efficient stepping without costly decision-making processes

Edinburgh Research Archive

Dyadic collaborative manipulation formalism for optimizing human-robot teaming

Author: Stouraitis Theodoros
Publication venue
Publication date: 31/07/2021
Field of study

Dyadic collaborative Manipulation (DcM) is a term we use to refer to a team of two individuals, the agent and the partner, jointly manipulating an object. The two individuals partner together to form a distributed system, augmenting their manipulation abilities. Effective collaboration between the two individuals during joint action depends on: (i) the breadth of the agent’s action repertoire, (ii) the level of model acquaintance between the two individuals, (iii) the ability to adapt online of one’s own actions to the actions of their partner, and (iv) the ability to estimate the partner’s intentions and goals. Key to the successful completion of co-manipulation tasks with changing goals is the agent’s ability to change grasp-holds, especially in large object co-manipulation scenarios. Hence, in this work we developed a Trajectory Optimization (TO) method to enhance the repertoire of actions of robotic agents, by enabling them to plan and execute hybrid motions, i.e. motions that include discrete contact transitions, continuous trajectories and force profiles. The effectiveness of the TO method is investigated numerically and in simulation, in a number of manipulation scenarios with both a single and a bimanual robot. In addition, it is worth noting that transitions from free motion to contact is a challenging problem in robotics, in part due to its hybrid nature. Additionally, disregarding the effects of impacts at the motion planning level often results in intractable impulsive contact forces. To address this challenge, we introduce an impact-aware multi-mode TO method that combines hybrid dynamics and hybrid control in a coherent fashion. A key concept in our approach is the incorporation of an explicit contact force transmission model into the TO method. This allows the simultaneous optimization of the contact forces, contact timings, continuous motion trajectories and compliance, while satisfying task constraints. To demonstrate the benefits of our method, we compared our method against standard compliance control and an impact-agnostic TO method in physical simulations. Also, we experimentally validated the proposed method with a robot manipulator on the task of halting a large-momentum object. Further, we propose a principled formalism to address the joint planning problem in DcM scenarios and we solve the joint problem holistically via model-based optimization by representing the human's behavior as task space forces. The task of finding the partner-aware contact points, forces and the respective timing of grasp-hold changes are carried out by a TO method using non-linear programming. Using simulations, the capability of the optimization method is investigated in terms of robot policy changes (trajectories, timings, grasp-holds) to potential changes of the collaborative partner policies. We also realized, in hardware, effective co-manipulation of a large object by the human and the robot, including eminent grasp changes as well as optimal dyadic interactions to realize the joint task. To address the online adaptation challenge of joint motion plans in dyads, we propose an efficient bilevel formulation which combines graph search methods with trajectory optimization, enabling robotic agents to adapt their policy on-the-fly in accordance to changes of the dyadic task. This method is the first to empower agents with the ability to plan online in hybrid spaces; optimizing over discrete contact locations, contact sequence patterns, continuous trajectories, and force profiles for co-manipulation tasks. This is particularly important in large object co-manipulation tasks that require on-the-fly plan adaptation. We demonstrate in simulation and with robot experiments the efficacy of the bilevel optimization by investigating the effect of robot policy changes in response to real-time alterations of the goal. This thesis provides insight into joint manipulation setups performed by human-robot teams. In particular, it studies computational models of joint action and exploits the uncharted hybrid action space, that is especially relevant in general manipulation and co-manipulation tasks. It contributes towards developing a framework for DcM, capable of planning motions in the contact-force space, realizing these motions while considering impacts and joint action relations, as well as adapting on-the-fly these motion plans with respect to changes of the co-manipulation goals

Edinburgh Research Archive

Video object segmentation and applications in temporal alignment and aspect learning

Author: Papazoglou Anestis
Publication venue
Publication date: 29/11/2016
Field of study

Modern computer vision has seen recently significant progress in learning visual concepts from examples. This progress has been fuelled by recent models of visual appearance as well as recently collected large-scale datasets of manually annotated still images. Video is a promising alternative, as it inherently contains much richer information compared to still images. For instance, in video we can observe an object move which allows us to differentiate it from its surroundings, or we can observe a smooth transition between different viewpoints of the same object instance. This richness in information allows us to effectively tackle tasks that would otherwise be very difficult if we only considered still images, or even adress tasks that are video-specific. Our first contribution is a computationally efficient technique for video object segmentation. Our method relies solely on motion in order to rapidly create a rough initial estimate of the foreground object. This rough initial estimate is then refined through an energy formulation to be spatio-temporally smooth. The method is able to handle rapidly moving backgrounds and objects, as well as non-rigid deformations and articulations without having prior knowledge about the objects appearance, size or location. In addition to this class-agnostic method, we present a class-specific method that incorporates additional class-specific appearance cues when the class of the foreground object is known in advance (e.g. a video of a car). For our second contribution, we propose a novel model for temporal video alignment with regard to the viewpoint of the foreground object (i.e., a pair of aligned frames shows the same object viewpoint) Our work relies on our video object segmentation technique to automatically localise the foreground objects and extract appearance measurements solely from them instead of the background. Our model is able to temporally align realistic videos, where events may occur in a different order, or occur only in one of the videos. This is in contrast to previous works that typically assume that the videos show a scripted sequence of events and can simply be aligned by stretching or compressing one of the videos. As a final contribution, we once again use our video object segmentation technique as a basis for automatic visual aspect discovery from videos of an object class. Compared to previous works, we use a broader definition of an aspect that considers four factors of variation: viewpoint, articulated pose, occlusions and cropping by the image border. We pose the aspect discovery task as a clustering problem and provide an extensive experimental exploration on the benefits of object segmentation for this task

Edinburgh Research Archive

An optimization-based formalism for shared autonomy in dynamic environments

Author: Mower Christopher
Publication venue
Publication date: 15/03/2022
Field of study

Teleoperation is an integral component of various industrial processes. For example, concrete spraying, assisted welding, plastering, inspection, and maintenance. Often these systems implement direct control that maps interface signals onto robot motions. Successful completion of tasks typically requires high levels of manual dexterity and cognitive load. In addition, the operator is often present nearby dangerous machinery. Consequently, safety is of critical importance and training is expensive and prolonged -- in some cases taking several months or even years. An autonomous robot replacement would be an ideal solution since the human could be removed from danger and training costs significantly reduced. However, this is currently not possible due to the complexity and unpredictability of the environments, and the levels of situational and contextual awareness required to successfully complete these tasks. In this thesis, the limitations of direct control are addressed by developing methods for shared autonomy. A shared autonomous approach combines human input with autonomy to generate optimal robot motions. The approach taken in this thesis is to formulate shared autonomy within an optimization framework that finds optimized states and controls by minimizing a cost function, modeling task objectives, given a set of (changing) physical and operational constraints. Online shared autonomy requires the human to be continuously interacting with the system via an interface (akin to direct control). The key challenges addressed in this thesis are: 1) ensuring computational feasibility (such a method should be able to find solutions fast enough to achieve a sampling frequency bound below by 40Hz), 2) being reactive to changes in the environment and operator intention, 3) knowing how to appropriately blend operator input and autonomy, and 4) allowing the operator to supply input in an intuitive manner that is conducive to high task performance. Various operator interfaces are investigated with regards to the control space, called a mode of teleoperation. Extensive evaluations were carried out to determine for which modes are most intuitive and lead to highest performance in target acquisition tasks (e.g. spraying/welding/etc). Our performance metrics quantified task difficulty based on Fitts' law, as well as a measure of how well constraints affecting the task performance were met. The experimental evaluations indicate that higher performance is achieved when humans submit commands in low-dimensional task spaces as opposed to joint space manipulations. In addition, our multivariate analysis indicated that those with regular exposure to computer games achieved higher performance. Shared autonomy aims to relieve human operators of the burden of precise motor control, tracking, and localization. An optimization-based representation for shared autonomy in dynamic environments was developed. Real-time tractability is ensured by modulating the human input with information of the changing environment within the same task space, instead of adding it to the optimization cost or constraints. The method was illustrated with two real world applications: grasping objects in cluttered environments and spraying tasks requiring sprayed linings with greater homogeneity. Maintaining motion patterns -- referred to as skills -- is often an integral part of teleoperation for various industrial processes (e.g. spraying, welding, plastering). We develop a novel model-based shared autonomous framework for incorporating the notion of skill assistance to aid operators to sustain these motion patterns whilst adhering to environment constraints. In order to achieve computational feasibility, we introduce a novel parameterization for state and control that combines skill and underlying trajectory models, leveraging a special type of curve known as Clothoids. This new parameterization allows for efficient computation of skill-based short term horizon plans, enabling the use of a model predictive control loop. Our hardware realization validates the effectiveness of our method to recognize a change of intended skill, and showing an improved quality of output motion, even under dynamically changing obstacles. In addition, extensions of the work to supervisory control are described. An exploratory study presents an approach that improves computational feasibility for complex tasks with minimal interactive effort on the part of the human. Adaptations are theorized which might allow such a method to be applicable and beneficial to high degree of freedom systems. Finally, a system developed in our lab is described that implements sliding autonomy and shown to complete multi-objective tasks in complex environments with minimal interaction from the human

Edinburgh Research Archive