# Dynamic programming

Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. In both contexts it refers to simplifying a complicated problem by breaking it down into simpler sub-problems in a recursive manner. While some decision problems cannot be taken apart this way, decisions that span several points in time do often break apart recursively. Likewise, in computer science, if a problem can be solved optimally by breaking it into sub-problems and then recursively finding the optimal solutions to the sub-problems, then it is said to have optimal substructure.

If sub-problems can be nested recursively inside larger problems, so that dynamic programming methods are applicable, then there is a relation between the value of the larger problem and the values of the sub-problems.[1] In the optimization literature this relationship is called the Bellman equation.

## Overview

### Mathematical optimization

In terms of mathematical optimization, dynamic programming usually refers to simplifying a decision by breaking it down into a sequence of decision steps over time. This is done by defining a sequence of value functions V1, V2, ..., Vn taking y as an argument representing the state of the system at times i from 1 to n. The definition of Vn(y) is the value obtained in state y at the last time n. The values Vi at earlier times i = n 1, n  2, ..., 2, 1 can be found by working backwards, using a recursive relationship called the Bellman equation. For i = 2, ..., n, Vi1 at any state y is calculated from Vi by maximizing a simple function (usually the sum) of the gain from a decision at time i  1 and the function Vi at the new state of the system if this decision is made. Since Vi has already been calculated for the needed states, the above operation yields Vi1 for those states. Finally, V1 at the initial state of the system is the value of the optimal solution. The optimal values of the decision variables can be recovered, one by one, by tracking back the calculations already performed.

### Control theory

In control theory, a typical problem is to find an admissible control ${\displaystyle \mathbf {u} ^{\ast }}$ which causes the system ${\displaystyle {\dot {\mathbf {x} }}(t)=\mathbf {g} \left(\mathbf {x} (t),\mathbf {u} (t),t\right)}$ to follow an admissible trajectory ${\displaystyle \mathbf {x} ^{\ast }}$ on a continuous time interval ${\displaystyle t_{0}\leq t\leq t_{1}}$ that minimizes a cost function

${\displaystyle J=b\left(\mathbf {x} (t_{1}),t_{1}\right)+\int _{t_{0}}^{t_{1}}f\left(\mathbf {x} (t),\mathbf {u} (t),t\right)\mathrm {d} t}$

The solution to this problem is an optimal control law or policy ${\displaystyle \mathbf {u} ^{\ast }=h(\mathbf {x} (t),t)}$, which produces an optimal trajectory ${\displaystyle \mathbf {x} ^{\ast }}$ and an optimized loss function ${\displaystyle J^{\ast }}$. The latter obeys the fundamental equation of dynamic programming:

${\displaystyle -J_{t}^{\ast }=\min _{\mathbf {u} }\left\{f\left(\mathbf {x} (t),\mathbf {u} (t),t\right)+J_{x}^{\ast {\mathsf {T}}}\mathbf {g} \left(\mathbf {x} (t),\mathbf {u} (t),t\right)\right\}}$

a partial differential equation known as the Hamilton–Jacobi–Bellman equation, in which ${\displaystyle J_{x}^{\ast }={\frac {\partial J^{\ast }}{\partial \mathbf {x} }}=\left[{\frac {\partial J^{\ast }}{\partial x_{1}}}~~~~{\frac {\partial J^{\ast }}{\partial x_{2}}}~~~~\dots ~~~~{\frac {\partial J^{\ast }}{\partial x_{n}}}\right]^{\mathsf {T}}}$ and ${\displaystyle J_{t}^{\ast }={\frac {\partial J^{\ast }}{\partial t}}}$. One finds the minimizing ${\displaystyle \mathbf {u} }$ in terms of ${\displaystyle t}$, ${\displaystyle \mathbf {x} }$, and the unknown function ${\displaystyle J_{x}^{\ast }}$ and then substitutes the result into the Hamilton–Jacobi–Bellman equation to get the partial differential equation to be solved with boundary condition ${\displaystyle J\left(t_{1}\right)=b\left(\mathbf {x} (t_{1}),t_{1}\right)}$.[2] In practice, this generally requires numerical techniques for some discrete approximation to the exact optimization relationship.

Alternatively, the continuous process can be approximated by a discrete system, which leads to a following recurrence relation analog to the Hamilton–Jacobi–Bellman equation:

${\displaystyle J_{k}^{\ast }\left(\mathbf {x} _{n-k}\right)=\min _{\mathbf {u} _{n-k}}\left\{{\hat {f}}\left(\mathbf {x} _{n-k},\mathbf {u} _{n-k}\right)+J_{k-1}^{\ast }\left({\hat {g}}\left(\mathbf {x} _{n-k},\mathbf {u} _{n-k}\right)\right)\right\}}$

at the ${\displaystyle k}$-th stage of ${\displaystyle n}$ equally spaced discrete time intervals, and where ${\displaystyle {\hat {f}}}$ and ${\displaystyle {\hat {g}}}$ denote discrete approximations to ${\displaystyle f}$ and ${\displaystyle \mathbf {g} }$. This functional equation is known as the Bellman equation, which can be solved for an exact solution of the discrete approximation of the optimization equation.[3]

#### Example from economics: Ramsey's problem of optimal saving

In economics, the objective is generally to maximize (rather than minimize) some dynamic social welfare function. In Ramsey's problem, this function relates amounts of consumption to levels of utility. Loosely speaking, the planner faces the trade-off between contemporaneous consumption and future consumption (via investment in capital stock that is used in production), known as intertemporal choice. Future consumption is discounted at a constant rate ${\displaystyle \beta \in (0,1)}$. A discrete approximation to the transition equation of capital is given by

${\displaystyle k_{t+1}={\hat {g}}\left(k_{t},c_{t}\right)=f(k_{t})-c_{t}}$

where ${\displaystyle c}$ is consumption, ${\displaystyle k}$ is capital, and ${\displaystyle f}$ is a production function satisfying the Inada conditions. An initial capital stock ${\displaystyle k_{0}>0}$ is assumed.

Let ${\displaystyle c_{t}}$ be consumption in period t, and assume consumption yields utility ${\displaystyle u(c_{t})=\ln(c_{t})}$ as long as the consumer lives. Assume the consumer is impatient, so that he discounts future utility by a factor b each period, where ${\displaystyle 0. Let ${\displaystyle k_{t}}$ be capital in period t. Assume initial capital is a given amount ${\displaystyle k_{0}>0}$, and suppose that this period's capital and consumption determine next period's capital as ${\displaystyle k_{t+1}=Ak_{t}^{a}-c_{t}}$, where A is a positive constant and ${\displaystyle 0. Assume capital cannot be negative. Then the consumer's decision problem can be written as follows:

${\displaystyle \max \sum _{t=0}^{T}b^{t}\ln(c_{t})}$ subject to ${\displaystyle k_{t+1}=Ak_{t}^{a}-c_{t}\geq 0}$ for all ${\displaystyle t=0,1,2,\ldots ,T}$

Written this way, the problem looks complicated, because it involves solving for all the choice variables ${\displaystyle c_{0},c_{1},c_{2},\ldots ,c_{T}}$. (The capital ${\displaystyle k_{0}}$ is not a choice variable—the consumer's initial capital is taken as given.)

The dynamic programming approach to solve this problem involves breaking it apart into a sequence of smaller decisions. To do so, we define a sequence of value functions ${\displaystyle V_{t}(k)}$, for ${\displaystyle t=0,1,2,\ldots ,T,T+1}$ which represent the value of having any amount of capital k at each time t. There is (by assumption) no utility from having capital after death, ${\displaystyle V_{T+1}(k)=0}$.

The value of any quantity of capital at any previous time can be calculated by backward induction using the Bellman equation. In this problem, for each ${\displaystyle t=0,1,2,\ldots ,T}$, the Bellman equation is

${\displaystyle V_{t}(k_{t})\,=\,\max \left(\ln(c_{t})+bV_{t+1}(k_{t+1})\right)}$ subject to ${\displaystyle k_{t+1}=Ak_{t}^{a}-c_{t}\geq 0}$

This problem is much simpler than the one we wrote down before, because it involves only two decision variables, ${\displaystyle c_{t}}$ and ${\displaystyle k_{t+1}}$. Intuitively, instead of choosing his whole lifetime plan at birth, the consumer can take things one step at a time. At time t, his current capital ${\displaystyle k_{t}}$ is given, and he only needs to choose current consumption ${\displaystyle c_{t}}$ and saving ${\displaystyle k_{t+1}}$.

To actually solve this problem, we work backwards. For simplicity, the current level of capital is denoted as k. ${\displaystyle V_{T+1}(k)}$ is already known, so using the Bellman equation once we can calculate ${\displaystyle V_{T}(k)}$, and so on until we get to ${\displaystyle V_{0}(k)}$, which is the value of the initial decision problem for the whole lifetime. In other words, once we know ${\displaystyle V_{T-j+1}(k)}$, we can calculate ${\displaystyle V_{T-j}(k)}$, which is the maximum of ${\displaystyle \ln(c_{T-j})+bV_{T-j+1}(Ak^{a}-c_{T-j})}$, where ${\displaystyle c_{T-j}}$ is the choice variable and ${\displaystyle Ak^{a}-c_{T-j}\geq 0}$.

Working backwards, it can be shown that the value function at time ${\displaystyle t=T-j}$ is

${\displaystyle V_{T-j}(k)\,=\,a\sum _{i=0}^{j}a^{i}b^{i}\ln k+v_{T-j}}$

where each ${\displaystyle v_{T-j}}$ is a constant, and the optimal amount to consume at time ${\displaystyle t=T-j}$ is

${\displaystyle c_{T-j}(k)\,=\,{\frac {1}{\sum _{i=0}^{j}a^{i}b^{i}}}Ak^{a}}$

which can be simplified to