1,721,003 research outputs found

    Applying unweighted least-squares based techniques to stochastic dynamic programming: Theory and application

    No full text
    Big data and the curse of dimensionality are common vocabularies that researchers in different communities have recently been dealing with, e.g. dynamic programming (DP) in automatic control system society. A novel unweighted sampled based least square projection approach is proposed in this study to address the issue of the large state space in the DP optimisation problem. The method, in particular, takes into account both contraction mapping and monotonicity properties of the DP algorithm for value function approximation. Specifically, the batch of samples are gathered by uniform probability distribution at first, and an unweighted LS sub-problem in the subspace is solved. As the case study, a new Markov decision process model associated with a resource allocation problem is considered to illustrate the technique and evaluate its effectiveness. It is noted that the approach can be employed for different applications as well. Moreover, a MATLAB based software is developed to implement and examine different parts of the proposed method. Simulation examples are considered to support the results of the approach via developed software. The idea makes a connection between the recent advances in big data analysis and approximate DP as well

    Reinforcement learning in spacecraft control applications: Advances, prospects, and challenges

    No full text
    This paper presents and analyzes Reinforcement Learning (RL) based approaches to solve spacecraft control problems. Different application fields are considered, e.g., guidance, navigation and control systems for spacecraft landing on celestial bodies, constellation orbital control, and maneuver planning in orbit transfers. It is discussed how RL solutions can address the emerging needs of designing spacecraft with highly autonomous on-board capabilities and implementing controllers (i.e., RL agents) robust to system uncertainties and adaptive to changing environments. For each application field, the RL framework core elements (e.g., the reward function, the RL algorithm and the environment model used for the RL agent training) are discussed with the aim of providing some guidelines in the formulation of spacecraft control problems via a RL framework. At the same time, the adoption of RL in real space projects is also analyzed. Different open points are identified and discussed, e.g., the availability of high-fidelity simulators for the RL agent training and the verification of RL-based solutions. This way, recommendations for future work are proposed with the aim of reducing the technological gap between the solutions proposed by the academic community and the needs/requirements of the space industry

    An actor-critic approach for control of residential photovoltaic-battery systems

    No full text
    The rationale of shifting towards green energy, along with the cost reduction and the increasing capacity of lithium-ion batteries, has motivated the end-users to go for energy storage systems integrated with solar technology solutions. Such systems provide the end-users with greater flexibility, thereby enhancing their role as prosumers in a range of grid-management programs. In this regard, we consider a residential household equipped with a battery and photovoltaic panels, collectively known as the photovoltaic-battery (PV-B) system. We further learn (off-line) a deterministic sub-optimal policy for charging/discharging of the residential battery using an actor-critic reinforcement learning based method. Such proposed approach, named polynomial deterministic policy gradient (PDPG), does not require any model of the system and uses polynomials as function approximator, as opposed to conventional neural networks. The usefulness of the proposed approach is tested on real power data (demand and PV generation) of a residential household in Australia. Numerical simulations indicate that the proposed PDPG algorithm outperforms the OFFON control approach in terms of electricity bill savings and the model-based receding horizon control in terms of computation time

    A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system

    No full text
    In this paper, we analyse the convergence properties of the Dynamic Programming Value Iteration algorithm by exploiting the stability theory of discrete-time switched affine systems. More specifically, by formulating the Value Iteration algorithm as a switched affine system, a Lyapunov-based optimal policy selection strategy is designed to guarantee the practical stabilisation of the resulting system towards an invariant set of attraction containing a given target value function. The switching control algorithm, referred to as Lyapunov-based Value Iteration algorithm, can be regarded as a convergence analysis tool and can be adopted to verify if and how such target value function can be approached by choosing from a subset of suitable stationary policies, at each time slot. The usage of the proposed algorithm in practice is also discussed. Finally, two different applications are provided to further illustrate and examine the key-aspects of the approach presented

    A system-level engineering approach for preliminary performance analysis and design of global navigation satellite system constellations

    Full text link
    This paper presents a system-level engineering approach for the preliminary coverage performance analysis and the design of a generic Global Navigation Satellite System (GNSS) constellation. This analysis accounts for both the coverage requirements and the robustness to transient or catastrophic failures of the constellation. The European GNSS, Galileo, is used as reference case to prove the effectiveness of the proposed tool. This software suite, named GNSS Coverage Analysis Tool (G-CAT), requires as input the state vector of each satellite of the constellation and provides the performance of the GNSS constellation in terms of coverage. The tool offers an orbit propagator, an attitude propagator, an algorithm to identify the visibility region on the Earth’s surface from each satellite, and a counter function to compute how many satellites are in view from given locations on the Earth’s surface. Thanks to its low computational burden, the tool can be adopted to compute the optimal number of satellites per each orbital plane by verifying if the coverage and accuracy requirements are fulfilled under the assumption of uniform in-plane angular spacing between coplanar satellites

    Analysis of artificial neural network performance based on influencing factors for temperature forecasting applications

    No full text
    Artificial neural network (ANN)-based methods belong to one of the most growing research fields within the artificial intelligence ecosystem, and many novel contributions have been developed over the last years. They are applied in many contexts, although some 'influencing factors' such as the number of neurons, the number of hidden layers, and the learning rate can impact the performance of the resulting artificial neural network-based applications. This paper provides a deep analysis about artificial neural network performance based on such factors for real-world temperature forecasting applications. An improved back propagation algorithm for such applications is also presented. By using the results of this paper, researchers and practitioners can analyse the encountered issues when applying ANN-based models for their own specific applications with the aim of achieving better performance indexes

    A data-driven approximate dynamic programming approach based on association rule learning: Spacecraft autonomy as a case study

    No full text
    Dynamic programming (DP) and Markov Decision Process (MDP) offer powerful tools for formulating, modeling, and solving decision making problems under uncertainty. In real-world applications, the applicability of DP is limited by severe scalability issues. These issues can be addressed by Approximate Dynamic Programming (ADP) techniques. ADP methods are based on the assumption of having either a proper estimation of the underlying state transition probability distributions or a simulation mechanism with the capability of generating samples according to such probability distributions. In this paper, we present a data-driven ADP-based approach, which can offer an alternative in case such assumption cannot be guaranteed. In particular, when varying the set-up of the MDP state transition probability matrix, different policies can be calculated through exact DP or ADP methods. Such policies are then processed by an Apriori-based algorithm to find frequent association rules within them. A pruning procedure is used to select the most suitable association rules, and finally an Association Classifier infers the optimal policy in all the possible circumstances. We show a detailed application of the proposed approach for the calculation of a proper mission operations plan for spacecrafts with a high level of on-board autonomy. (C) 2019 Elsevier Inc. All rights reserved
    corecore