# Karush–Kuhn–Tucker conditions

In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests (sometimes called first-order) necessary conditions for a solution in nonlinear programming to be optimal, provided that some regularity conditions are satisfied.

Allowing inequality constraints, the KKT approach to nonlinear programming generalizes the method of Lagrange multipliers, which allows only equality constraints. Similar to the Lagrange approach, the constrained maximization (minimization) problem is rewritten as a Lagrange function whose optimal point is a saddle point, i.e. a global maximum (minimum) over the domain of the choice variables and a global minimum (maximum) over the multipliers, which is why the Karush–Kuhn–Tucker theorem is sometimes referred to as the saddle-point theorem.[1]

The KKT conditions were originally named after Harold W. Kuhn and Albert W. Tucker, who first published the conditions in 1951.[2] Later scholars discovered that the necessary conditions for this problem had been stated by William Karush in his master's thesis in 1939.[3][4]

## Nonlinear optimization problem

Consider the following nonlinear minimization or maximization problem:

Optimize ${\displaystyle f(\mathbf {x} )}$
subject to
${\displaystyle g_{i}(\mathbf {x} )\leq 0,}$
${\displaystyle h_{i}(\mathbf {x} )=0.}$

where ${\displaystyle \mathbf {x} \in \mathbf {X} }$ is the optimization variable chosen from a convex subset of ${\displaystyle \mathbb {R} ^{n}}$, ${\displaystyle f}$ is the objective or utility function, ${\displaystyle g_{i}\ (i=1,\ldots ,m)}$ are the inequality constraint functions and ${\displaystyle h_{i}\ (i=1,\ldots ,\ell )}$ are the equality constraint functions. The numbers of inequalities and equalities are denoted by ${\displaystyle m}$ and ${\displaystyle \ell }$ respectively. Corresponding to the constraint optimization problem one can form the Lagrangian function

${\displaystyle L(\mathbf {x} ,\mathbf {\mu } ,\mathbf {\lambda } )=f(\mathbf {x} )+\mathbf {\mu } ^{\top }\mathbf {g} (\mathbf {x} )+\mathbf {\lambda } ^{\top }\mathbf {h} (\mathbf {x} )}$

where ${\displaystyle \mathbf {g} (\mathbf {x} )=\left(g_{1}(\mathbf {x} ),\ldots ,g_{m}(\mathbf {x} )\right)^{\top }}$, ${\displaystyle \mathbf {h} (\mathbf {x} )=\left(h_{1}(\mathbf {x} ),\ldots ,h_{\ell }(\mathbf {x} )\right)^{\top }}$. The Karush–Kuhn–Tucker theorem then states the following.

Theorem. If ${\displaystyle (\mathbf {x} ^{\ast },\mathbf {\mu } ^{\ast })}$ is a saddle point of ${\displaystyle L(\mathbf {x} ,\mathbf {\mu } )}$ in ${\displaystyle \mathbf {x} \in \mathbf {X} }$, ${\displaystyle \mathbf {\mu } \geq \mathbf {0} }$, then ${\displaystyle \mathbf {x} ^{\ast }}$ is an optimal vector for the above optimization problem. Suppose that ${\displaystyle f(\mathbf {x} )}$ and ${\displaystyle g_{i}(\mathbf {x} )}$, ${\displaystyle i=1,\ldots ,m}$, are concave in ${\displaystyle \mathbf {x} }$ and that there exists ${\displaystyle \mathbf {x} _{0}\in \mathbf {X} }$ such that ${\displaystyle \mathbf {g} (\mathbf {x} _{0})>0}$. Then with an optimal vector ${\displaystyle \mathbf {x} ^{\ast }}$ for the above optimization problem there is associated a non-negative vector ${\displaystyle \mathbf {\mu } ^{\ast }}$ such that ${\displaystyle L(\mathbf {x} ^{\ast },\mathbf {\mu } ^{\ast })}$ is a saddle point of ${\displaystyle L(\mathbf {x} ,\mathbf {\mu } )}$.

Since the idea of this approach is to find a supporting hyperplane on the feasible set ${\displaystyle \mathbf {\Gamma } =\left\{\mathbf {x} \in \mathbf {X} :g_{i}(\mathbf {x} )\geq 0,i=1,\ldots ,m\right\}}$, the proof of the Karush–Kuhn–Tucker theorem makes use of the hyperplane separation theorem.[5]

The system of equations and inequalities corresponding to the KKT conditions is usually not solved directly, except in the few special cases where a closed-form solution can be derived analytically. In general, many optimization algorithms can be interpreted as methods for numerically solving the KKT system of equations and inequalities.[6]

## Necessary conditions

Suppose that the objective function ${\displaystyle f:\mathbb {R} ^{n}\rightarrow \mathbb {R} }$ and the constraint functions ${\displaystyle g_{i}:\,\!\mathbb {R} ^{n}\rightarrow \mathbb {R} }$ and ${\displaystyle h_{j}:\,\!\mathbb {R} ^{n}\rightarrow \mathbb {R} }$ are continuously differentiable at a point ${\displaystyle x^{*}}$. If ${\displaystyle x^{*}}$ is a local optimum and the optimization problem satisfies some regularity conditions (see below), then there exist constants ${\displaystyle \mu _{i}\ (i=1,\ldots ,m)}$ and ${\displaystyle \lambda _{j}\ (j=1,\ldots ,\ell )}$, called KKT multipliers, such that the following four groups of conditions hold:

Stationarity
For maximizing ${\displaystyle f(x)}$: ${\displaystyle \nabla f(x^{*})-\sum _{i=1}^{m}\mu _{i}\nabla g_{i}(x^{*})-\sum _{j=1}^{\ell }\lambda _{j}\nabla h_{j}(x^{*})=0,}$
For minimizing ${\displaystyle f(x)}$: ${\displaystyle \nabla f(x^{*})+\sum _{i=1}^{m}\mu _{i}\nabla g_{i}(x^{*})+\sum _{j=1}^{\ell }\lambda _{j}\nabla h_{j}(x^{*})=0,}$
Primal feasibility
${\displaystyle g_{i}(x^{*})\leq 0,{\text{ for }}i=1,\ldots ,m}$
${\displaystyle h_{j}(x^{*})=0,{\text{ for }}j=1,\ldots ,\ell \,\!}$
Dual feasibility
${\displaystyle \mu _{i}\geq 0,{\text{ for }}i=1,\ldots ,m}$
Complementary slackness
${\displaystyle \mu _{i}g_{i}(x^{*})=0,{\text{ for }}\;i=1,\ldots ,m.}$

In the particular case ${\displaystyle m=0}$, i.e., when there are no inequality constraints, the KKT conditions turn into the Lagrange conditions, and the KKT multipliers are called Lagrange multipliers.

If some of the functions are non-differentiable, subdifferential versions of Karush–Kuhn–Tucker (KKT) conditions are available.[7]

## Regularity conditions (or constraint qualifications)

In order for a minimum point ${\displaystyle x^{*}}$ to satisfy the above KKT conditions, the problem should satisfy some regularity conditions; some common examples are tabulated here:

Constraint Acronym Statement
Linearity constraint qualification LCQ If ${\displaystyle g_{i}}$ and ${\displaystyle h_{j}}$ are affine functions, then no other condition is needed.
Linear independence constraint qualification LICQ The gradients of the active inequality constraints and the gradients of the equality constraints are linearly independent at ${\displaystyle x^{*}}$.
Mangasarian-Fromovitz constraint qualification MFCQ The gradients of the equality constraints are linearly independent at ${\displaystyle x^{*}}$ and there exists a vector ${\displaystyle d\in \mathbb {R} ^{n}}$ such that ${\displaystyle \nabla g_{i}(x^{*})^{\top }d<0}$ for all active inequality constraints and ${\displaystyle \nabla h_{j}(x^{*})^{\top }d=0}$ for all equality constraints.[8]
Constant rank constraint qualification CRCQ For each subset of the gradients of the active inequality constraints and the gradients of the equality constraints the rank at a vicinity of ${\displaystyle x^{*}}$ is constant.
Constant positive linear dependence constraint qualification CPLD For each subset of gradients of active inequality constraints and gradients of equality constraints, if the subset of vectors is linearly dependent at ${\displaystyle x^{*}}$ with non-negative scalars associated with the inequality constraints, then it remains linearly dependent in a neighborhood of ${\displaystyle x^{*}}$.
Quasi-normality constraint qualification QNCQ If the gradients of the active inequality constraints and the gradients of the equality constraints are linearly dependent at ${\displaystyle x^{*}}$ with associated multipliers ${\displaystyle \lambda _{j}}$ for equalities and ${\displaystyle \mu _{i}\geq 0}$ for inequalities, then there is no sequence ${\displaystyle x_{k}\to x^{*}}$ such that ${\displaystyle \lambda _{j}\neq 0\Rightarrow \lambda _{j}h_{j}(x_{k})>0}$ and ${\displaystyle \mu _{i}\neq 0\Rightarrow \mu _{i}g_{i}(x_{k})>0.}$
Slater's condition SC For a convex problem (i.e., assuming minimization, ${\displaystyle f,g_{i}}$ are convex and ${\displaystyle h_{j}}$ is affine), there exists a point ${\displaystyle x}$ such that ${\displaystyle h(x)=0}$ and ${\displaystyle g_{i}(x)<0.}$

It can be shown that

LICQ ⇒ MFCQ ⇒ CPLD ⇒ QNCQ

and

LICQ ⇒ CRCQ ⇒ CPLD ⇒ QNCQ

(and the converses are not true), although MFCQ is not equivalent to CRCQ.[9] In practice weaker constraint qualifications are preferred since they provide stronger optimality conditions.

## Sufficient conditions

In some cases, the necessary conditions are also sufficient for optimality. In general, the necessary conditions are not sufficient for optimality and additional information is required, such as the Second Order Sufficient Conditions (SOSC). For smooth functions, SOSC involve the second derivatives, which explains its name.

The necessary conditions are sufficient for optimality if the objective function ${\displaystyle f}$ of a maximization problem is a concave function, the inequality constraints ${\displaystyle g_{j}}$ are continuously differentiable convex functions and the equality constraints ${\displaystyle h_{i}}$ are affine functions. Similarly, if the objective function ${\displaystyle f}$ of a minimization problem is a convex function, the necessary conditions are also sufficient for optimality.

It was shown by Martin in 1985 that the broader class of functions in which KKT conditions guarantees global optimality are the so-called Type 1 invex functions.[10][11]

### Second-order sufficient conditions

For smooth, non-linear optimization problems, a second order sufficient condition is given as follows.

The solution ${\displaystyle x^{*},\lambda ^{*},\mu ^{*}}$ found in the above section is a constrained local minimum if for the Lagrangian,

${\displaystyle L(x,\lambda ,\mu )=f(x)+\sum _{i=1}^{m}\mu _{i}g_{i}(x)+\sum _{j=1}^{\ell }\lambda _{j}h_{j}(x)}$

then,

${\displaystyle s^{T}\nabla _{xx}^{2}L(x^{*},\lambda ^{*},\mu ^{*})s\geq 0}$

where ${\displaystyle s\neq 0}$ is a vector satisfying the following,

${\displaystyle \left[\nabla _{x}g_{i}(x^{*}),\nabla _{x}h_{j}(x^{*})\right]^{T}s=0}$

where only those active inequality constraints ${\displaystyle g_{i}(x)}$ corresponding to strict complementarity (i.e. where ${\displaystyle \mu _{i}>0}$) are applied. The solution is a strict constrained local minimum in the case the inequality is also strict.

## Economics

Often in mathematical economics the KKT approach is used in theoretical models in order to obtain qualitative results. For example,[12] consider a firm that maximizes its sales revenue subject to a minimum profit constraint. Letting ${\displaystyle Q}$ be the quantity of output produced (to be chosen), ${\displaystyle R(Q)}$ be sales revenue with a positive first derivative and with a zero value at zero output, ${\displaystyle C(Q)}$ be production costs with a positive first derivative and with a non-negative value at zero output, and ${\displaystyle G_{\min }}$ be the positive minimal acceptable level of profit, then the problem is a meaningful one if the revenue function levels off so it eventually is less steep than the cost function. The problem expressed in the previously given minimization form is

Minimize ${\displaystyle -R(Q)}$
subject to
${\displaystyle G_{\min }\leq R(Q)-C(Q)}$
${\displaystyle Q\geq 0,}$

and the KKT conditions are

{\displaystyle {\begin{aligned}&\left({\frac {{\text{d}}R}{{\text{d}}Q}}\right)(1+\mu )-\mu \left({\frac {{\text{d}}C}{{\text{d}}Q}}\right)\leq 0,\\[5pt]&Q\geq 0,\\[5pt]&Q\left[\left({\frac {{\text{d}}R}{{\text{d}}Q}}\right)(1+\mu )-\mu \left({\frac {{\text{d}}C}{{\text{d}}Q}}\right)\right]=0,\\[5pt]&R(Q)-C(Q)-G_{\min }\geq 0,\\[5pt]&\mu \geq 0,\\[5pt]&\mu [R(Q)-C(Q)-G_{\min }]=0.\end{aligned}}}

Since ${\displaystyle Q=0}$ would violate the minimum profit constraint, we have ${\displaystyle Q>0}$ and hence the third condition implies that the first condition holds with equality. Solving that equality gives

${\displaystyle {\frac {{\text{d}}R}{{\text{d}}Q}}={\frac {\mu }{1+\mu }}\left({\frac {{\text{d}}C}{{\text{d}}Q}}\right).}$

Because it was given that ${\displaystyle {\text{d}}R/{\text{d}}Q}$ and ${\displaystyle {\text{d}}C/{\text{d}}Q}$ are strictly positive, this inequality along with the non-negativity condition on ${\displaystyle \mu }$ guarantees that ${\displaystyle \mu }$ is positive and so the revenue-maximizing firm operates at a level of output at which marginal revenue ${\displaystyle {\text{d}}R/{\text{d}}Q}$ is less than marginal cost ${\displaystyle {\text{d}}C/{\text{d}}Q}$ — a result that is of interest because it contrasts with the behavior of a profit maximizing firm, which operates at a level at which they are equal.

## Value function

If we reconsider the optimization problem as a maximization problem with constant inequality constraints:

${\displaystyle {\text{Maximize }}\;f(x)}$
${\displaystyle {\text{subject to }}\ }$
${\displaystyle g_{i}(x)\leq a_{i},h_{j}(x)=0.}$

The value function is defined as

${\displaystyle V(a_{1},\ldots ,a_{n})=\sup \limits _{x}f(x)}$
${\displaystyle {\text{subject to }}\ }$
${\displaystyle g_{i}(x)\leq a_{i},h_{j}(x)=0}$
${\displaystyle j\in \{1,\ldots ,\ell \},i\in \{1,\ldots ,m\},}$

so the domain of ${\displaystyle V}$ is ${\displaystyle \{a\in \mathbb {R} ^{m}\mid {\text{for some }}x\in X,g_{i}(x)\leq a_{i},i\in \{1,\ldots ,m\}\}.}$

Given this definition, each coefficient ${\displaystyle \mu _{i}}$ is the rate at which the value function increases as ${\displaystyle a_{i}}$ increases. Thus if each ${\displaystyle a_{i}}$ is interpreted as a resource constraint, the coefficients tell you how much increasing a resource will increase the optimum value of our function ${\displaystyle f}$. This interpretation is especially important in economics and is used, for instance, in utility maximization problems.

## Generalizations

With an extra multiplier ${\displaystyle \mu _{0}\geq 0}$, which may be zero (as long as ${\displaystyle (\mu _{0},\mu ,\lambda )\neq 0}$), in front of ${\displaystyle \nabla f(x^{*})}$ the KKT stationarity conditions turn into

{\displaystyle {\begin{aligned}&\mu _{0}\,\nabla f(x^{*})+\sum _{i=1}^{m}\mu _{i}\,\nabla g_{i}(x^{*})+\sum _{j=1}^{\ell }\lambda _{j}\,\nabla h_{j}(x^{*})=0,\\[4pt]&\mu _{j}g_{i}(x^{*})=0,\quad i=1,\dots ,m,\end{aligned}}}

which are called the Fritz John conditions. This optimality conditions holds without constraint qualifications and it is equivalent to the optimality condition KKT or (not-MFCQ).

The KKT conditions belong to a wider class of the first-order necessary conditions (FONC), which allow for non-smooth functions using subderivatives.

## References

1. Tabak, Daniel; Kuo, Benjamin C. (1971). Optimal Control by Mathematical Programming. Englewood Cliffs, NJ: Prentice-Hall. pp. 19–20. ISBN 0-13-638106-5.
2. Kuhn, H. W.; Tucker, A. W. (1951). "Nonlinear programming". Proceedings of 2nd Berkeley Symposium. Berkeley: University of California Press. pp. 481–492. MR 0047303.
3. W. Karush (1939). "Minima of Functions of Several Variables with Inequalities as Side Constraints". M.Sc. Dissertation. Dept. of Mathematics, Univ. of Chicago, Chicago, Illinois. Cite journal requires |journal= (help)
4. Kjeldsen, Tinne Hoff (2000). "A contextualized historical analysis of the Kuhn-Tucker theorem in nonlinear programming: the impact of World War II". Historia Math. 27 (4): 331–361. doi:10.1006/hmat.2000.2289. MR 1800317.
5. Kemp, Murray C.; Kimura, Yoshio (1978). Introduction to Mathematical Economics. New York: Springer. pp. 38–44. ISBN 0-387-90304-6.
6. Boyd, Stephen; Vandenberghe, Lieven (2004). Convex Optimization. Cambridge: Cambridge University Press. p. 244. ISBN 0-521-83378-7. MR 2061575.
7. Ruszczyński, Andrzej (2006). Nonlinear Optimization. Princeton, NJ: Princeton University Press. ISBN 978-0691119151. MR 2199043.
8. Dimitri Bertsekas (1999). Nonlinear Programming (2 ed.). Athena Scientific. pp. 329–330. ISBN 9781886529007.
9. Rodrigo Eustaquio; Elizabeth Karas; Ademir Ribeiro. Constraint Qualification for Nonlinear Programming (PDF) (Technical report). Federal University of Parana.
10. Martin, D. H. (1985). "The Essence of Invexity". J. Optim. Theory Appl. 47 (1): 65–76. doi:10.1007/BF00941316.
11. Hanson, M. A. (1999). "Invexity and the Kuhn-Tucker Theorem". J. Math. Anal. Appl. 236 (2): 594–604. doi:10.1006/jmaa.1999.6484.
12. Chiang, Alpha C. Fundamental Methods of Mathematical Economics, 3rd edition, 1984, pp. 750–752.
• Andreani, R.; Martínez, J. M.; Schuverdt, M. L. (2005). "On the relation between constant positive linear dependence condition and quasinormality constraint qualification". Journal of Optimization Theory and Applications. 125 (2): 473–485. doi:10.1007/s10957-004-1861-9.
• Avriel, Mordecai (2003). Nonlinear Programming: Analysis and Methods. Dover. ISBN 0-486-43227-0.
• Boltyanski, V.; Martini, H.; Soltan, V. (1998). "The Kuhn–Tucker Theorem". Geometric Methods and Optimization Problems. New York: Springer. pp. 78–92. ISBN 0-7923-5454-0.
• Boyd, S.; Vandenberghe, L. (2004). "Optimality Conditions" (PDF). Convex Optimization. Cambridge University Press. pp. 241–249. ISBN 0-521-83378-7.
• Kemp, Murray C.; Kimura, Yoshio (1978). Introduction to Mathematical Economics. New York: Springer. pp. 38–73. ISBN 0-387-90304-6.
• Rau, Nicholas (1981). "Lagrange Multipliers". Matrices and Mathematical Programming. London: Macmillan. pp. 156–174. ISBN 0-333-27768-6.
• Nocedal, J.; Wright, S. J. (2006). Numerical Optimization. New York: Springer. ISBN 978-0-387-30303-1.
• Sundaram, Rangarajan K. (1996). "Inequality Constraints and the Theorem of Kuhn and Tucker". A First Course in Optimization Theory. New York: Cambridge University Press. pp. 145–171. ISBN 0-521-49770-1.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.