1,720,970 research outputs found

    Applying unweighted least-squares based techniques to stochastic dynamic programming: Theory and application

    No full text
    Big data and the curse of dimensionality are common vocabularies that researchers in different communities have recently been dealing with, e.g. dynamic programming (DP) in automatic control system society. A novel unweighted sampled based least square projection approach is proposed in this study to address the issue of the large state space in the DP optimisation problem. The method, in particular, takes into account both contraction mapping and monotonicity properties of the DP algorithm for value function approximation. Specifically, the batch of samples are gathered by uniform probability distribution at first, and an unweighted LS sub-problem in the subspace is solved. As the case study, a new Markov decision process model associated with a resource allocation problem is considered to illustrate the technique and evaluate its effectiveness. It is noted that the approach can be employed for different applications as well. Moreover, a MATLAB based software is developed to implement and examine different parts of the proposed method. Simulation examples are considered to support the results of the approach via developed software. The idea makes a connection between the recent advances in big data analysis and approximate DP as well

    A Lyapunov-based version of the value iteration algorithm formulated as a discrete-time switched affine system

    No full text
    In this paper, we analyse the convergence properties of the Dynamic Programming Value Iteration algorithm by exploiting the stability theory of discrete-time switched affine systems. More specifically, by formulating the Value Iteration algorithm as a switched affine system, a Lyapunov-based optimal policy selection strategy is designed to guarantee the practical stabilisation of the resulting system towards an invariant set of attraction containing a given target value function. The switching control algorithm, referred to as Lyapunov-based Value Iteration algorithm, can be regarded as a convergence analysis tool and can be adopted to verify if and how such target value function can be approached by choosing from a subset of suitable stationary policies, at each time slot. The usage of the proposed algorithm in practice is also discussed. Finally, two different applications are provided to further illustrate and examine the key-aspects of the approach presented

    Approximate dynamic programming for stochastic resource allocation problems

    No full text
    A stochastic resource allocation model, based on the principles of Markov decision processes (MDPs), is proposed in this paper. In particular, a general-purpose framework is developed, which takes into account resource requests for both instant and future needs. The considered framework can handle two types of reservations (i.e., specified and unspecified time interval reservation requests), and implement an overbooking business strategy to further increase business revenues. The resulting dynamic pricing problems can be regarded as sequential decision-making problems under uncertainty, which is solved by means of stochastic dynamic programming (DP) based algorithms. In this regard, Bellman's backward principle of optimality is exploited in order to provide all the implementation mechanisms for the proposed reservation pricing algorithm. The curse of dimensionality, as the inevitable issue of the DP both for instant resource requests and future resource reservations, occurs. In particular, an approximate dynamic programming (ADP) technique based on linear function approximations is applied to solve such scalability issues. Several examples are provided to show the effectiveness of the proposed approach

    Transmission scheduling for multi-process multi-sensor remote estimation via approximate dynamic programming

    No full text
    In this paper, we consider a remote estimation problem where multiple dynamical systems are observed by smart sensors, which transmit their local estimates to a remote estimator over channels prone to packet losses. Unlike previous works, we allow multiple sensors to transmit simultaneously even though they can cause interference, thanks to the multi-packet reception capability at the remote estimator. In this setting, the remote estimator can decode multiple sensor transmissions (successful packet arrivals) as long as their signal-to-interference-and-noise ratios (SINR) are above a certain threshold. In this setting, we address the problem of optimal sensor transmission scheduling by minimizing a finite horizon discounted expected estimation error covariance cost across all systems at the remote estimator, subject to an average transmission cost. While this problem can be posed as a stochastic control problem, the optimal solution requires solving a Bellman equation for a dynamic programming (DP) problem, the complexity of which scales exponentially with the number of systems being measured and their state dimensions. In this paper, we resort to a novel Least Squares Temporal Difference (LSTD) Approximate Dynamic Programming (ADP) based approach to approximating the value function. More specifically, an off-policy based LSTD approach, named in short Enhanced-Exploration Greedy LSTD (EG-LSTD), is proposed. We discuss the convergence analysis of the EG-LSTD algorithm and its implementation. A Python based program is developed to implement and analyse the different aspects of the proposed method. Simulation examples are presented to support the results of the proposed approach both for the exact DP and ADP cases

    A kernel-based approximate dynamic programming approach: Theory and application

    Full text link
    This article proposes a novel kernel-based Dynamic Programming (DP) approximation method to tackle the typical curse of dimensionality of stochastic DP problems over the finite time horizon. Such a method utilizes kernel functions in combination with Support Vector Machine (SVM) regression to determine an approximate cost function for the entire state space of the underlying Markov Decision Process (MDP), by leveraging cost function computed for selected representative states. Kernel functions are used to define the so-called kernel matrix, while the parameter vector of the given kernel-based cost function approximation is computed by moving backwards in time from the terminal condition and by applying SVM regression. This way, the difficulty of selecting a proper set of features is also tackled. The proposed method is then extended to the infinite time horizon case. To show the effectiveness of the proposed approach, the resulting Recursive Residual Approximate Dynamic Programming (RR-ADP) algorithm is applied to the sensor scheduling design in multi-process remote state estimation problems

    Enhanced Exploration Least-Squares Methods for Optimal Stopping Problems

    No full text
    This letter presents an Approximate Dynamic Programming (ADP) least-squares based approach for solving optimal stopping problems with a large state space. By extending some previous work in the area of optimal stopping problems, it provides a framework for their formulation and resolution. The proposed method uses a combined on/off policy exploration mechanism, where states are generated by means of state transition probability distributions different from the ones dictated by the underlying Markov decision processes. The contraction mapping property of the associated projected Bellman operator is analysed as well as the convergence of the resulting algorithm

    Allocating resources via price management systems: a dynamic programming-based approach

    No full text
    In this paper, a novel model for price management systems in resource allocation problems is proposed. Stochastic customer requests for resource allocations and releases are modelled as constrained parallel Birth–Death Processes (BDP). We address both instant (i.e. the customer requires a resource to be allocated immediately) and advance (i.e. the customer books a resource for future use) reservation requests, the latter with both bounded and unbounded time interval options. Algorithms based on Dynamic Programming (DP) principles are proposed for the calculation of suitable price profiles. At the core of such algorithms, there is the resolution of stochastic optimisation problems. In particular, the maximisation of the expected total revenue is formulated via a constrained Stochastic Dynamic Programming (SDP) approach, which becomes time-variant in case of advance reservation requests. Approximate Dynamic Programming (ADP) techniques are adopted in case of large state spaces. Simulations are performed to show the effectiveness of the proposed models and the related algorithms

    Allocating resources via price management systems: a dynamic programming-based approach

    No full text
    In this paper, a novel model for price management systems in resource allocation problems is proposed. Stochastic customer requests for resource allocations and releases are modelled as constrained parallel Birth–Death Processes (BDP). We address both instant (i.e. the customer requires a resource to be allocated immediately) and advance (i.e. the customer books a resource for future use) reservation requests, the latter with both bounded and unbounded time interval options. Algorithms based on Dynamic Programming (DP) principles are proposed for the calculation of suitable price profiles. At the core of such algorithms, there is the resolution of stochastic optimisation problems. In particular, the maximisation of the expected total revenue is formulated via a constrained Stochastic Dynamic Programming (SDP) approach, which becomes time-variant in case of advance reservation requests. Approximate Dynamic Programming (ADP) techniques are adopted in case of large state spaces. Simulations are performed to show the effectiveness of the proposed models and the related algorithms

    Modelling and solving resource allocation problems via a dynamic programming approach

    No full text
    In this paper, resource allocation problems are formulated via a set of parallel birth–death processes (BDP). This way, we can model the fact that resources can be allocated to customers at different prices, and that customers can hold them as long as they like. More specifically, a discretisation approach is applied to model resource allocation problems as a set of discrete-time BDPs, which are then integrated into one Markov decision process. The stochastic dynamics of the resulting system are also investigated. As a result, revenue management becomes a stochastic decision-making problem, where price managers can propose suitable prices to the allocation requests such that the maximum expected total revenue is obtained at the end of a predefined finite time horizon. Stochastic Dynamic Programming is employed to solve the related optimisation problem with the support of an ad-hoc Matlab-based application. Several simulations are performed to prove the effectiveness of the proposed model and the optimisation approach
    corecore