This chapter reviews the development of adaptive dynamic programming (ADP). ADP is a form of passive reinforcement learning that can be used in fully observable environments. This chapter proposes a framework of robust adaptive dynamic programming (for short, robust‐ADP), which is aimed at computing globally asymptotically stabilizing control laws with robustness to dynamic uncertainties, via off‐line/on‐line learning. Reinforcement learning has shown great success in helping learning agents accomplish tasks autonomously from environment. This paper presents an attitude control scheme combined with adaptive dynamic programming (ADP) for reentry vehicles with high nonlinearity and disturbances. Reinforcement learning abstract In this paper, we propose a novel adaptive dynamic programming (ADP) architecture with three networks, an action network, a critic network, and a reference network, to develop internal goal-representation for online learning and optimization. Although seminal research in this area was performed in the artificial intelligence (AI) community, more recently it has attracted the attention of optimization theorists. Reinforcement learning and adaptive dynamic programming for feedback control. Unlike the traditional methods, ADP and RL methods are suitable for medicine, and other relevant fields. Course Goal. I will apply adaptive dynamic programming (ADP) in this tutorial, to learn an agent to walk from a point to a goal over a frozen lake. Therefore, the agent must explore parts of the environment. In this paper, we propose a novel adaptive dynamic programming (ADP) architecture with three networks, an action network, a critic network, and a reference network, to develop internal goal-representation for online learning and optimization. An MDP is the mathematical framework which captures such a fully observable, non-deterministic environment with Markovian Transition Model and additive rewards in which the agent acts. Adaptive dynamic programming (ADP) and reinforcement learning (RL) are two related paradigms for solving decision making problems where a performance index must be optimized over time. As Poggio and Girosi (1990) stated, the problem of learning between input and output is fundamental. Small base stations (SBs) of fifth-generation (5G) cellular networks are envisioned to have storage devices to locally serve requests for reusable and popular contents by caching them at the edge of the network, close to the end users. Reinforcement learning is based on the common sense idea that if an action is followed by a satisfactory state of affairs, or by an improvement in the state of affairs (as determined in some clearly defined way), then the tendency to produce that action is strengthened, i.e., reinforced. Firstly, the policy iteration (PI) and value iteration (VI) methods are proposed when the model is known. In the last few years, reinforcement learning (RL), also called adaptive (or approximate) dynamic programming, has emerged as a powerful tool for solving complex sequential decision-making problems in control theory. His major research interests include adaptive dynamic programming, reinforcement learning, and computational intelligence. RL thus provides a framework for learning to behave optimally in unknown environments. This action-based or reinforcement learning can capture notions of optimal behavior occurring in natural systems. This book describes the latest RL and ADP techniques for decision and control in human engineered systems, covering both single player decision and control and multi-player games. It is shown that robust optimal control problems can be solved for higherdimensional, partially linear composite systems by integration of ADP and modern nonlinear control design tools such as backstepping and ISS small‐gain methods. ADP is an emerging advanced control technology developed for nonlinear dynamical systems. Reinforcement Learning is a simulation-based technique for solving Markov Decision Problems. The model-based algorithm Back-propagation Through Time and a simulation of the mathematical model of the vessel are implemented to train a deep neural network to drive the surge speed and yaw dynamics. This paper introduces a multiobjectivereinforcement learning approach which is suitable for large state and action spaces. Learning and Adaptive Dynamic Programming for Feedback Control. Living organisms learn by acting on their environment, observing the resulting reward stimulus, and adjusting their actions accordingly to improve the reward. SDDP and its related methods use Benders cuts, but the theoretical work in this area uses the assumption that random variables only have a ﬁnite set of outcomes. Let's consider a problem where an agent can be in various states and can choose an action from a set of actions. Deep Reinforcement learning is responsible for the two biggest AI wins over human professionals – Alpha Go and OpenAI Five. It then moves on to the basic forms of ADP and then to the iterative forms. We describe mathematical formulations for reinforcement learning and a practical implementation method known as adaptive dynamic programming. Control problems can be divided into two classes: 1) regulation and tracking. The manuscripts should be submitted in PDF format. The approach is then tested on the task to invest liquid capital in the German stock market. This chapter proposes a framework of robust adaptive dynamic programming (for short, robust‐ADP), which is aimed at computing globally asymptotically stabilizing control laws with robustness to dynamic uncertainties, via off‐line/on‐line learning. We show that the use of reinforcement learning techniques provides optimal control solutions for linear or nonlinear systems using adaptive control techniques. For control problems, and to high profile developments in deep reinforcement learning and dynamic programming, Efficient algorithms for globally optimal trajectories have been developed. Converters play a remarkable role in industrial applications, such as electrical drives, renewable energy systems, etc. The Delft Center for systems and control of Delft University of Technology in the Netherlands has developed adaptive dynamic programming and reinforcement learning approaches. A key feature of RL is that it does not require any a priori knowledge about the environment. Center for systems and control of Delft University of Technology in the Netherlands. This episode gives an insight into the design of controllers for man-made engineered systems that both learn and exhibit optimal behavior. We are interested in applications from engineering, artificial intelligence, economics, medicine, and other relevant fields including adaptive cruise control, stop and go systems. At the Delft Center for systems and control of Delft University of Technology in the Netherlands, reinforcement learning techniques for control problems have been developed. A core feature of RL is that it does not require any a priori knowledge about the environment. This article provides an insight into the design of controllers for man-made engineered systems that both learn and adapt to the environment. The approach develops a value function that predicts the future intake of rewards over time. An agent that optimizes its behavior by interacting with its environment and learning from the feedback received. This website has been created for the purpose of making RL programming accessible in the engineering community which widely uses MATLAB. Sequential Decision problems can be solved using adaptive dynamic programming. Adaptive cruise control, stop and go systems are examples of applications. Mathematical formulations for reinforcement learning and adaptive dynamic programming have been developed. Deep reinforcement learning has benefited enormously from advances in neural networks and function approximation. Adaptive dynamic programming (ADP) for reentry vehicles with high nonlinearity and disturbances has been developed. We are interested in applications from engineering, artificial intelligence, economics, medicine, and other relevant fields. Reinforcement learning (RL) techniques address the adaptive optimal control problem. Robust control for uncertain nonlinear systems using adaptive dynamic programming has been developed. RL programming has been made accessible in the engineering community. We show that the use of reinforcement learning is responsible for the two biggest AI wins over human professionals.