1,721,230 research outputs found
Learning control policies from constrained motion
Many everyday human skills can be framed in terms of performing some task subject
to constraints imposed by the task or the environment. Constraints are usually
unobservable and frequently change between contexts.
In this thesis, we explore the problem of learning control policies from data containing
variable, dynamic and non-linear constraints on motion. We show that an effective
approach for doing this is to learn the unconstrained policy in a way that is
consistent with the constraints.
We propose several novel algorithms for extracting these policies from movement
data, where observations are recorded under different constraints. Furthermore, we
show that, by doing so, we are able to learn representations of movement that generalise
over constraints and can predict behaviour under new constraints.
In our experiments, we test the algorithms on systems of varying size and complexity,
and show that the novel approaches give significant improvements in performance
compared with standard policy learning approaches that are naive to the effect of constraints.
Finally, we illustrate the utility of the approaches for learning from human
motion capture data and transferring behaviour to several robotic platforms
Learning Dynamics for Robot Control under Varying Contexts
Institute of Perception, Action and BehaviourHigh fidelity, compliant robot control requires a sufficiently accurate dynamics
model. Often though, it is not possible to obtain a dynamics model sufficiently accurately
or at all using analytical methods. In such cases, an alternative is to learn the
dynamics model from movement data. This thesis discusses the problems specific to
dynamics learning for control under nonstationarity of the dynamics.
We refer to the cause of the nonstationarity as the context of the dynamics. Contexts
are, typically, not directly observable. For instance, the dynamics of a robot manipulator
changes as the robot manipulates different objects and the physical properties of
the load – the context of the dynamics – are not directly known by the controller. Other
examples of contexts that affect the dynamics are changing force fields or liquids with
different viscosity in which a manipulator has to operate.
The learned dynamics model needs to be adapted whenever the context and therefore
the dynamics changes. Inevitably, performance drops during the period of adaptation.
The goal of this work, is to reuse and generalize the experience obtained by
learning the dynamics of different contexts in order to adapt to changing contexts fast.
We first examine the case that the dynamics may switch between a discrete, finite
set of contexts and use multiple models and switching between them to adapt the
controller fast. A probabilistic formulation of multiple models is used, where a discrete
latent variable is used to represent the unobserved context and index the models.
In comparison to previous multiple model approaches, the developed method is able
to learn multiple models of nonlinear dynamics, using an appropriately modified EM
algorithm.
We also deal with the case when there exists a continuum of possible contexts that
affect the dynamics and hence, it becomes essential to generalize from a set of experienced
contexts to novel contexts. There is very little previous work on this direction
and the developed methods are completely novel. We introduce a set of continuous
latent variables to represent context and introduce a dynamics model that depends on
this set of variables. We first examine learning and inference in such a model when
there is strong prior knowledge on the relationship of these continuous latent variables
to the modulation of the dynamics, e.g., when the load at the end effector changes. We
also develop methods for the case that there is no such knowledge available.
Finally, we formulate a dynamics model whose input is augmented with observed
variables that convey contextual information indirectly, e.g., the information from tactile
sensors at the interface between the load and the arm. This approach also allows
generalization to not previously seen contexts and is applicable when the nature of the
context is not known. In addition, we show that use of such a model is possible even
when special sensory input is not available by using an instance of an autoregressive
model.
The developed methods are tested on realistic, full physics simulations of robot
arm systems including a simplistic 3 degree of freedom (DOF) arm and a simulation
of the 7 DOF DLR light weight robot arm. In the experiments, varying contexts are
different manipulated objects. Nevertheless, the developed methods (with the exception
of the methods that require prior knowledge on the relationship of the context to
the modulation of the dynamics) are more generally applicable and could be used to
deal with different context variation scenarios
Stochastic optimal control with learned dynamics models
The motor control of anthropomorphic robotic systems is a challenging computational
task mainly because of the high levels of redundancies such systems exhibit. Optimality
principles provide a general strategy to resolve such redundancies in a task driven
fashion. In particular closed loop optimisation, i.e., optimal feedback control (OFC),
has served as a successful motor control model as it unifies important concepts such
as costs, noise, sensory feedback and internal models into a coherent mathematical
framework.
Realising OFC on realistic anthropomorphic systems however is non-trivial: Firstly,
such systems have typically large dimensionality and nonlinear dynamics, in which
case the optimisation problem becomes computationally intractable. Approximative
methods, like the iterative linear quadratic gaussian (ILQG), have been proposed to
avoid this, however the transfer of solutions from idealised simulations to real hardware
systems has proved to be challenging. Secondly, OFC relies on an accurate description
of the system dynamics, which for many realistic control systems may be unknown,
difficult to estimate, or subject to frequent systematic changes. Thirdly, many (especially
biologically inspired) systems suffer from significant state or control dependent
sources of noise, which are difficult to model in a generally valid fashion. This thesis
addresses these issues with the aim to realise efficient OFC for anthropomorphic
manipulators.
First we investigate the implementation of OFC laws on anthropomorphic hardware.
Using ILQG we optimally control a high-dimensional anthropomorphic manipulator
without having to specify an explicit inverse kinematics, inverse dynamics
or feedback control law. We achieve this by introducing a novel cost function that
accounts for the physical constraints of the robot and a dynamics formulation that resolves
discontinuities in the dynamics. The experimental hardware results reveal the
benefits of OFC over traditional (open loop) optimal controllers in terms of energy
efficiency and compliance, properties that are crucial for the control of modern anthropomorphic
manipulators.
We then propose a new framework of OFC with learned dynamics (OFC-LD) that,
unlike classic approaches, does not rely on analytic dynamics functions but rather updates
the internal dynamics model continuously from sensorimotor plant feedback. We
demonstrate how this approach can compensate for unknown dynamics and for complex
dynamic perturbations in an online fashion.
A specific advantage of a learned dynamics model is that it contains the stochastic
information (i.e., noise) from the plant data, which corresponds to the uncertainty in
the system. Consequently one can exploit this information within OFC-LD in order
to produce control laws that minimise the uncertainty in the system. In the domain of
antagonistically actuated systems this approach leads to improved motor performance,
which is achieved by co-contracting antagonistic actuators in order to reduce the negative
effects of the noise. Most importantly the shape and source of the noise is unknown
a priory and is solely learned from plant data. The model is successfully tested on an
antagonistic series elastic actuator (SEA) that we have built for this purpose.
The proposed OFC-LD model is not only applicable to robotic systems but also
proves to be very useful in the modelling of biological motor control phenomena and
we show how our model can be used to predict a wide range of human impedance
control patterns during both, stationary and adaptation tasks
Bayesian locally weighted online learning
Locally weighted regression is a non-parametric technique of regression that is capable
of coping with non-stationarity of the input distribution. Online algorithms like
Receptive FieldWeighted Regression and Locally Weighted Projection Regression use
a sparse representation of the locally weighted model to approximate a target function,
resulting in an efficient learning algorithm. However, these algorithms are fairly sensitive
to parameter initializations and have multiple open learning parameters that are
usually set using some insights of the problem and local heuristics. In this thesis,
we attempt to alleviate these problems by using a probabilistic formulation of locally
weighted regression followed by a principled Bayesian inference of the parameters.
In the Randomly Varying Coefficient (RVC) model developed in this thesis, locally
weighted regression is set up as an ensemble of regression experts that provide
a local linear approximation to the target function. We train the individual experts independently
and then combine their predictions using a Product of Experts formalism.
Independent training of experts allows us to adapt the complexity of the regression
model dynamically while learning in an online fashion. The local experts themselves
are modeled using a hierarchical Bayesian probability distribution with Variational
Bayesian Expectation Maximization steps to learn the posterior distributions over the
parameters. The Bayesian modeling of the local experts leads to an inference procedure
that is fairly insensitive to parameter initializations and avoids problems like
overfitting. We further exploit the Bayesian inference procedure to derive efficient online
update rules for the parameters. Learning in the regression setting is also extended
to handle a classification task by making use of a logistic regression to model discrete
class labels.
The main contribution of the thesis is a spatially localised online learning algorithm
set up in a probabilistic framework with principled Bayesian inference rule for the
parameters of the model that learns local models completely independent of each other,
uses only local information and adapts the local model complexity in a data driven
fashion. This thesis, for the first time, brings together the computational efficiency
and the adaptability of ‘non-competitive’ locally weighted learning schemes and the
modelling guarantees of the Bayesian formulation
Computational models of motor adaptation under multiple classes of sensorimotor disturbance
The human motor system exhibits remarkable adaptability, enabling us to maintain
high levels of performance despite ever-changing requirements. There are many potential
sources of error duringmovement to which the motor system may need to adapt:
the properties of our bodies or tools may vary over time, either at a dynamic or a kinematic
level; our senses may become miscalibrated over time and mislead us as to the
state of our bodies or the true location of an intended goal; the relationship between
sensory stimuli and movement goals may change. Despite these many varied ways in
which our movements may be disturbed, existing models of human motor adaptation
have tended to assume just a single adaptive component.
In this thesis, I argue that the motor system maintains multiple components of
adaptation, corresponding to the multiple potential sources of error to which we are
exposed. I outline some of the shortcomings of existing adaptation models in scenarious
where multiple kinds of disturbances may be present - in particular examining
how different distal learning problems associated with different classes of disturbance
can affect adaptation within alternative cerebellar-based learning architectures - and
outline the computational challenges associated with extending these existing models.
Focusing on the specific problem in which the potential disturbances are miscalibrations
of vision and proprioception and changes in arm dynamics during reaching,
a unified model of sensory and motor adaptation is derived based on the principle
of Bayesian estimation of the disturbances given noisy observations. This model is
able to account parsimoniously for previously reported patterns of sensory and motor
adaptation during exposure to shifted visual feedback. However the model additionally
makes the novel and surprising prediction that adaptation to a force field will also
result in sensory adaptation. These predictions are confirmed experimentally. The success
of the model strongly supports the idea that the motor system maintains multiple
components of adaptation, which it updates according to the principles of Bayesian
estimation
Exoskeleton-assisted locomotion: design, control and evaluation of wearable robotic devices
Assistive robotic devices such as exoskeletons and prosthetic limbs have great
potential as tools for both augmentation and rehabilitation. However, due to
the complexity of controlling these devices, especially in unstructured environments
where factors such as walking speed and incline can vary rapidly, it is
uncommon to see exoskeletons outside of a clinical or research setting. Prostheses,
whilst more common, are typically passive, which limits their ability
to match the push off forces associated with healthy gait.
Motivated by modern techniques for controlling legged robots, this thesis
motivates the pursuit of an optimisation-based approach to the control and
design of exoskeletons. We identify a number of open problems within the
field, namely (1) how to model the dynamic interaction between a human
subject and an attached exoskeleton; (2) identifying the appropriate metric
or combination of metrics to optimise for in exoskeleton-assisted locomotion;
and (3) how to account for changes in human walking style induced by the
presence of external assistive forces. This thesis details attempts to solve each
of these problems.
We present a methodology for expressing human-exoskeleton system models
as a combination of musculoskeletal models, exoskeleton inertial parameters
and constraint forces. A specific human-exoskeleton model is detailed,
along with a range of methods for modelling the interaction forces which occur
at the attachment points between the human and exoskeleton agents. Experimental
motion data is analysed using musculoskeletal modelling software
(OpenSim) to quantify the effect that each of these interaction models, which
represent various degrees of approximation, have on the resulting humanexoskeleton
dynamics.
Applying exoskeleton assistance is inherently a shared control problem.
The overall goal is not to achieve a prescribed motion at any cost, or to do
so while minimising exoskeleton joint torques, but rather to enhance aspects
of the assisted humans motions; for example, increasing energy efficiency or
stability. Therefore, in order to optimise exoskeleton control patterns we must
first consider what it means for the resultant gait patterns to be optimal, or
even good. We present a detailed analysis of exoskeleton-assisted walking in
healthy subjects, with a particular focus on identifying those metrics which are
invariant to changes in walking condition (e.g. walking speed or incline). We
posit that such metrics, which exhibit strong invariance properties, are good
candidates for the objective function of an optimisation-based controller.
Human walking strategies are unique and complex, and the problem of
predicting the effect of exoskeleton assistance on a subjects gait pattern is a
challenging one. In recent years, success has been had by methods which
aim to learn suitable assistance strategies directly from a subject, via a process
known as human-in-the-loop optimisation. We present a novel humanin-
the-loop framework which utilises musculoskeletal modelling to make the
learning process more time-efficient. Our method is evaluated on a number of
subjects walking on a treadmill with exoskeleton assistance. In addition, we
also explore how human-in-the-loop optimisation can be used to inform the
design of exoskeletons to enhance their assistive capabilities.
Overall, these contributions represent a step towards enabling the wider
usage of exoskeletons and other assistive robotic devices, which could lead to
significant improvements to quality of life for many
Understanding the fundamentals of bipedal locomotion in humans and robots
Walking is a robust and efficient method of moving around the world, which would greatly enhance the capabilities of humanoid robots, although they cannot match the performance of their biological counterparts. The highly nonlinear dynamics of locomotion create a vast state-action space, which makes model-based control difficult, yet biological humans are highly proficient and robust in their motion while operating under similar constraints. This disparity in performance naturally leads to the question: what can we learn about locomotion control by observing humans, and how can this be used to develop bio-inspired locomotion control in mechatronic humanoids? This thesis investigates bio-inspired locomotion control, but also explores the limitations of this approach and how we can use robotic platforms to move towards a better understanding of locomotion.
We first present a methodology for measuring and analysing human locomotion behaviour, specifically disturbance recovery, and fit models to this complex behaviour to represent it in as simple as possible such that it can be easily translated into a simple controller for reactive motion. A minimum-jerk Model Predictive Control algorithm at the Centre of Mass (CoM) best captured human motion during multiple recovery strategies instead of using one controller for each strategy, which is common in this area. Capturing this simple CoM model of complex human behaviour shows that bio-inspiration can be an important tool for controller development, but behaviour varies between and even within individuals given similar initial conditions, which manifests as stochastic behaviour. Coupled with the ability to only measure expressed behaviours instead of direct control policies, this stochasticity presents a fundamental limit to using bio-inspiration for control purposes, as only indirect inferences can be made about a complex, stochastic system.
To overcome these barriers, we investigate the use of mechatronic humanoid robots as a means to explore invariant aspects of the vast dynamic state-space of locomotion which are described by physical laws, and are therefore not subject to the stochastic behaviour of individual humans, that apply to both biological and mechatronic humanoid forms. We present a pipeline to explore the invariant energetics of humanoid robots during stepping for push recovery, where the most efficient stepping parameters are identified for a given initial CoM velocity and desired step length. Using this to explore the stepping state-space, our analysis finds a region of attraction between disturbance magnitude and optimal step length surrounded by a region of similarly efficient alternatives which corresponds to the stochastic behavior observed in humans during push recovery, which we would be unable to identify without reproducibility, direct access to internal measurements and known full body dynamics, which is not available in humans.
We expand this paradigm further to investigate the invariant energetics of continuous walking using a full-body humanoid by exploring the state-space of step-length and step-timing to identify the most efficient sub-spaces of these parameters which describes the most efficient way to walk. Through analysis of this state-space, we provide evidence that the humanoid morphology exhibits a passive tendency towards energy-optimal motion and its dynamics follow a region of attraction towards Cost of Transport-optimal motion.
Overall, these findings demonstrate the utility of robotics as a tool with which to explore certain aspects of legged locomotion and the results gained from our methodology suggest that humans do not need to explore a vast state-action space to learn to walk, they need only internalise simple heuristics for the natural dynamics of stepping that are easy to learn and can produce rapid, reactive and efficient stepping without costly decision-making processes
Dyadic collaborative manipulation formalism for optimizing human-robot teaming
Dyadic collaborative Manipulation (DcM) is a term we use to refer to a team of two individuals, the agent and the partner, jointly manipulating an object. The two individuals partner together to form a distributed system, augmenting their manipulation abilities. Effective collaboration between the two individuals during joint action depends on: (i) the breadth of the agent’s action repertoire, (ii) the level of model acquaintance between the two individuals, (iii) the ability to adapt online of one’s own actions to the actions of their partner, and (iv) the ability to estimate the partner’s intentions and goals.
Key to the successful completion of co-manipulation tasks with changing goals is the agent’s ability to change grasp-holds, especially in large object co-manipulation scenarios. Hence, in this work we developed a Trajectory Optimization (TO) method to enhance the repertoire of actions of robotic agents, by enabling them to plan and execute hybrid motions, i.e. motions that include discrete contact transitions, continuous trajectories and force profiles. The effectiveness of the TO method is investigated numerically and in simulation, in a number of manipulation scenarios with both a single and a bimanual robot.
In addition, it is worth noting that transitions from free motion to contact is a challenging problem in robotics, in part due to its hybrid nature. Additionally, disregarding the effects of impacts at the motion planning level often results in intractable impulsive contact forces. To address this challenge, we introduce an impact-aware multi-mode TO method that combines hybrid dynamics and hybrid control in a coherent fashion. A key concept in our approach is the incorporation of an explicit contact force transmission model into the TO method. This allows the simultaneous optimization of the contact forces, contact timings, continuous motion trajectories and compliance, while satisfying task constraints. To demonstrate the benefits of our method, we compared our method against standard compliance control and an impact-agnostic TO method in physical simulations. Also, we experimentally validated the proposed method with a robot manipulator on the task of halting a large-momentum object.
Further, we propose a principled formalism to address the joint planning problem in DcM scenarios and we solve the joint problem holistically via model-based optimization by representing the human's behavior as task space forces. The task of finding the partner-aware contact points, forces and the respective timing of grasp-hold changes are carried out by a TO method using non-linear programming. Using simulations, the capability of the optimization method is investigated in terms of robot policy changes (trajectories, timings, grasp-holds) to potential changes of the collaborative partner policies. We also realized, in hardware, effective co-manipulation of a large object by the human and the robot, including eminent grasp changes as well as optimal dyadic interactions to realize the joint task.
To address the online adaptation challenge of joint motion plans in dyads, we propose an efficient bilevel formulation which combines graph search methods with trajectory optimization, enabling robotic agents to adapt their policy on-the-fly in accordance to changes of the dyadic task. This method is the first to empower agents with the ability to plan online in hybrid spaces; optimizing over discrete contact locations, contact sequence patterns, continuous trajectories, and force profiles for co-manipulation tasks. This is particularly important in large object co-manipulation tasks that require on-the-fly plan adaptation. We demonstrate in simulation and with robot experiments the efficacy of the bilevel optimization by investigating the effect of robot policy changes in response to real-time alterations of the goal.
This thesis provides insight into joint manipulation setups performed by human-robot teams. In particular, it studies computational models of joint action and exploits the uncharted hybrid action space, that is especially relevant in general manipulation and co-manipulation tasks. It contributes towards developing a framework for DcM, capable of planning motions in the contact-force space, realizing these motions while considering impacts and joint action relations, as well as adapting on-the-fly these motion plans with respect to changes of the co-manipulation goals
Video object segmentation and applications in temporal alignment and aspect learning
Modern computer vision has seen recently significant progress in learning visual concepts
from examples. This progress has been fuelled by recent models of visual appearance
as well as recently collected large-scale datasets of manually annotated still
images. Video is a promising alternative, as it inherently contains much richer information
compared to still images. For instance, in video we can observe an object move
which allows us to differentiate it from its surroundings, or we can observe a smooth
transition between different viewpoints of the same object instance. This richness in
information allows us to effectively tackle tasks that would otherwise be very difficult
if we only considered still images, or even adress tasks that are video-specific.
Our first contribution is a computationally efficient technique for video object segmentation.
Our method relies solely on motion in order to rapidly create a rough initial
estimate of the foreground object. This rough initial estimate is then refined through
an energy formulation to be spatio-temporally smooth. The method is able to handle
rapidly moving backgrounds and objects, as well as non-rigid deformations and articulations
without having prior knowledge about the objects appearance, size or location.
In addition to this class-agnostic method, we present a class-specific method that incorporates
additional class-specific appearance cues when the class of the foreground
object is known in advance (e.g. a video of a car).
For our second contribution, we propose a novel model for temporal video alignment
with regard to the viewpoint of the foreground object (i.e., a pair of aligned
frames shows the same object viewpoint) Our work relies on our video object segmentation
technique to automatically localise the foreground objects and extract appearance
measurements solely from them instead of the background. Our model is able
to temporally align realistic videos, where events may occur in a different order, or
occur only in one of the videos. This is in contrast to previous works that typically
assume that the videos show a scripted sequence of events and can simply be aligned
by stretching or compressing one of the videos.
As a final contribution, we once again use our video object segmentation technique
as a basis for automatic visual aspect discovery from videos of an object class. Compared
to previous works, we use a broader definition of an aspect that considers four
factors of variation: viewpoint, articulated pose, occlusions and cropping by the image
border. We pose the aspect discovery task as a clustering problem and provide an
extensive experimental exploration on the benefits of object segmentation for this task
An optimization-based formalism for shared autonomy in dynamic environments
Teleoperation is an integral component of various industrial processes. For
example, concrete spraying, assisted welding, plastering, inspection, and
maintenance. Often these systems implement direct control that maps interface
signals onto robot motions. Successful completion of tasks typically requires
high levels of manual dexterity and cognitive load. In addition, the operator is
often present nearby dangerous machinery. Consequently, safety is of critical
importance and training is expensive and prolonged -- in some cases taking
several months or even years.
An autonomous robot replacement would be an ideal solution since the human could
be removed from danger and training costs significantly reduced. However, this
is currently not possible due to the complexity and unpredictability of the
environments, and the levels of situational and contextual awareness required to
successfully complete these tasks.
In this thesis, the limitations of direct control are addressed by developing
methods for shared autonomy. A shared autonomous approach combines
human input with autonomy to generate optimal robot motions. The approach taken
in this thesis is to formulate shared autonomy within an optimization framework
that finds optimized states and controls by minimizing a cost function, modeling
task objectives, given a set of (changing) physical and operational constraints.
Online shared autonomy requires the human to be continuously interacting with
the system via an interface (akin to direct control). The key challenges
addressed in this thesis are: 1) ensuring computational feasibility (such a
method should be able to find solutions fast enough to achieve a sampling
frequency bound below by 40Hz), 2) being reactive to changes in the
environment and operator intention, 3) knowing how to appropriately blend
operator input and autonomy, and 4) allowing the operator to supply input in an
intuitive manner that is conducive to high task performance.
Various operator interfaces are investigated with regards to the control space,
called a mode of teleoperation. Extensive evaluations were carried out
to determine for which modes are most intuitive and lead to highest performance
in target acquisition tasks (e.g. spraying/welding/etc). Our performance metrics
quantified task difficulty based on Fitts' law, as well as a measure of how well
constraints affecting the task performance were met. The experimental
evaluations indicate that higher performance is achieved when humans submit
commands in low-dimensional task spaces as opposed to joint space manipulations.
In addition, our multivariate analysis indicated that those with regular
exposure to computer games achieved higher performance.
Shared autonomy aims to relieve human operators of the burden of precise motor
control, tracking, and localization. An optimization-based representation for
shared autonomy in dynamic environments was developed. Real-time tractability is
ensured by modulating the human input with information of the changing
environment within the same task space, instead of adding it to the optimization
cost or constraints. The method was illustrated with two real world
applications: grasping objects in cluttered environments and spraying tasks
requiring sprayed linings with greater homogeneity.
Maintaining motion patterns -- referred to as skills -- is often an
integral part of teleoperation for various industrial processes (e.g. spraying,
welding, plastering). We develop a novel model-based shared autonomous framework
for incorporating the notion of skill assistance to aid operators to sustain
these motion patterns whilst adhering to environment constraints. In order to
achieve computational feasibility, we introduce a novel parameterization for
state and control that combines skill and underlying trajectory models,
leveraging a special type of curve known as Clothoids. This new parameterization
allows for efficient computation of skill-based short term horizon plans,
enabling the use of a model predictive control loop. Our hardware realization
validates the effectiveness of our method to recognize a change of intended
skill, and showing an improved quality of output motion, even under dynamically
changing obstacles.
In addition, extensions of the work to supervisory control are described. An
exploratory study presents an approach that improves computational feasibility
for complex tasks with minimal interactive effort on the part of the human.
Adaptations are theorized which might allow such a method to be applicable and
beneficial to high degree of freedom systems. Finally, a system developed in our
lab is described that implements sliding autonomy and shown to complete
multi-objective tasks in complex environments with minimal interaction from the
human
- …
