1,720,974 research outputs found
A Consensus Q-Learning Approach for Decentralized Control of Shared Energy Storage
In this letter, we study the say decentralized scheduling of an energy storage system say shared among residential households. In particular, we consider the households as learning agents and model their interaction as a Markov Game. To address the challenges associated with the non-stationary nature of the multi-agent learning, we propose a consensus-based Tabular learning method. Additionally, we provide simulation studies utilizing a real-world household dataset and demonstrate the effectiveness of our approach
Coverage Area Determination for Conical Fields of View Considering an Oblate Earth
This paper introduces a new analytical method for the determination of the coverage area modeling the Earth as an oblate ellipsoid of rotation. Starting from the knowledge of the satellite’s position vector and the direction of the navigation antenna line of sight, the surface generated by the intersection of the oblate ellipsoid and the assumed conical field of view is decomposed in many ellipses, obtained by cutting the Earth’s surface with every plane containing the navigation antenna line of sight. The geometrical parameters of each ellipse can be derived analytically together with the points of intersection of the conical field of view with the ellipse itself by assuming a proper value of the half-aperture angle or the minimum elevation angle from which the satellite can be considered visible from the Earth’s surface. The method can be applied for different types of pointing (geocentric, geodetic, and generic) according to the mission requirements. Finally, numerical simulations compare the classical spherical approach with the new ellipsoidal method in the determination of the coverage area, and also show the dependence of the coverage errors on some relevant orbital parameters
Reinforcement learning in spacecraft control applications: Advances, prospects, and challenges
This paper presents and analyzes Reinforcement Learning (RL) based approaches to solve spacecraft control problems. Different application fields are considered, e.g., guidance, navigation and control systems for spacecraft landing on celestial bodies, constellation orbital control, and maneuver planning in orbit transfers. It is discussed how RL solutions can address the emerging needs of designing spacecraft with highly autonomous on-board capabilities and implementing controllers (i.e., RL agents) robust to system uncertainties and adaptive to changing environments. For each application field, the RL framework core elements (e.g., the reward function, the RL algorithm and the environment model used for the RL agent training) are discussed with the aim of providing some guidelines in the formulation of spacecraft control problems via a RL framework. At the same time, the adoption of RL in real space projects is also analyzed. Different open points are identified and discussed, e.g., the availability of high-fidelity simulators for the RL agent training and the verification of RL-based solutions. This way, recommendations for future work are proposed with the aim of reducing the technological gap between the solutions proposed by the academic community and the needs/requirements of the space industry
A switching control strategy for policy selection in stochastic Dynamic Programming problems
This paper presents a switching control strategy as a criterion for policy selection in stochastic Dynamic Programming problems over an infinite time horizon. In particular, the Bellman operator, applied iteratively to solve such problems, is generalized to the case of stochastic policies, and formulated as a discrete-time switched affine system. Then, a Lyapunov-based policy selection strategy is designed to ensure the practical convergence of the resulting closed-loop system trajectories towards an appropriately chosen reference value function. This way, it is possible to verify how the chosen reference value function can be approached by using a stabilizing switching signal, the latter defined on a given finite set of stationary stochastic policies. Finally, the presented method is applied to the Value Iteration algorithm, and an illustrative example of a recycling robot is provided to demonstrate its effectiveness in terms of convergenc
Design of Exponentially Stabilizers for Distributed Control Systems Subject to Cyber Disconnections
Off-Policy Temporal Difference Learning for Perturbed Markov Decision Processes
Dynamic Programming suffers from the curse of dimensionality due to large state and action spaces, a challenge further compounded by uncertainties in the environment. To mitigate these issue, we explore an off-policy based Temporal Difference Approximate Dynamic Programming approach that preserves contraction mapping when projecting the problem into a subspace of selected features, accounting for the probability distribution of the perturbed transition probability matrix. We further demonstrate how this Approximate Dynamic Programming approach can be implemented as a particular variant of the Temporal Difference learning algorithm, adapted for handling perturbations. To validate our theoretical findings, we provide a numerical example using a Markov Decision Process corresponding to a resource allocation problem
Price Management in Resource Allocation Problem with Approximate Dynamic Programming
The problem of managing the price for resource allocation arises in several applications, such as purchasing plane tickets, reserving a parking slot, booking a hotel room or renting SW/HW resources on a cloud. In this paper, we model a price management resource allocation problem with parallel Birth-Death stochastic Processes (BDPs) to account for the fact that the same resource can be possibly purchased by customers at different prices. In addition, customers can hold the resource at the purchase price to the necessary extent. The maximization of the revenue in both the finite and infinite time horizon cases is addressed in this paper with Stochastic Dynamic Programming (DP) approaches. To overcome the difficulty in solving the corresponding optimization problem due to the state space explosion, Approximate Dynamic Programming (ADP) techniques (in particular, the Least Square Temporal Difference method along with Monte Carlo simulations) are adopted. Furthermore, a MATLAB Toolbox is developed with the aim of solving stochastic DP/ADP problems and supporting probabilistic analysis. Extensive simulations are performed to show the effectiveness of the proposed model and the optimization approach
FDIR development approaches in space systems
This chapter presents technical solutions and industrial processes used by the Space Industry to design, develop, test, and operate health (or failure) management systems, which are needed to devise and implement space missions with the required levels of dependability and safety. The overall chapter is inspired by Failure (or Fault) Detection, Isolation and Recovery (FDIR) systems designed for European Space Agency missions; however, the presentation is maintained at a proper level of detail so that its contents are in line with the FDIR practices adopted by other space agencies
Gain-Scheduled Control of LPV Systems with Structural Constraints
In large-scale dynamic systems, consisting of the interconnection of several sub-systems, the control law must obey a certain distributed structure defined by the information exchange pattern on a communication network. In case the system exhibits parameters which vary over time, the Linear Parameter Varying (LPV) paradigm provides a control-oriented framework for the design of robust or gain-scheduled control laws. Since each subsystem has only access to a partial knowledge of the overall set of parameters, due to the constraint imposed by the communication network, the controller gain must satisfy some structural constraints which increase the complexity of the design problem. This paper proposes a novel approach based on Linear Matrix Inequalities (LMIs) to deal with the distributed control of large-scale systems described by a LPV dynamics. LMI conditions are established for the design of a gain-scheduled controller ensuring exponential stability of the overall system with a prescribed decay rate, while satisfying the structural constraints imposed by the communication network. A simulation case study is presented to show the effectiveness of the proposed approach
A stochastic dynamic programming approach for the machine replacement problem
This paper addresses both the modeling and the resolution of the replacement problem for a population of machines. The main objective is the computation of a minimum cost replacement policy, which, based on the status of each machine, determines whether one or more machines have to be replaced over a given finite time horizon.
The replacement problem of a set of machines can be regarded as a sequential decision-making problem under uncertainty. Thanks to this, we propose a novel formulation for such problems consisting of a composition of discrete-time multi-state Markov Decision Processes (MDPs), one for each specific machine. The underlying optimization problem is formulated as a stochastic Dynamic Programming (DP), and then solved by using the principles of the backward DP algorithm. Moreover, to deal with the curse of dimensionality due to the high-cardinality state–space of real-world/industrial applications, a new generalized multi-trajectory Least-Squares Temporal Difference (LSTD) based method is introduced. The resulting algorithm computes an approximate optimal cost function by: (i) running Monte Carlo simulations over different trajectories of a given length; (ii) embedding the policy improvement step within the recursive LSTD iterations; (iii) enforcing an off-policy mechanism to improve the LSTD exploration capabilities. A study on the convergence properties of the proposed approach is also provided. Several numerical examples are given to illustrate its effectiveness in terms of parametric sensitivity, computational burden, and performance of the computed policies compared with some heuristics defined in the literature
- …
