Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization, also known as Lagrange-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twice continuously differentiable, but not necessarily convex.
SQP methods solve a sequence of optimization subproblems, each of which optimizes a quadratic model of the objective subject to a linearization of the constraints. If the problem is unconstrained, then the method reduces to Newton's method for finding a point where the gradient of the objective vanishes. If the problem has only equality constraints, then the method is equivalent to applying Newton's method to the first-order optimality conditions, or Karush–Kuhn–Tucker conditions, of the problem.
Consider a nonlinear programming problem of the form:
where x ∈ R n {\displaystyle x\in \mathbb {R} ^{n}} , f : R n → R {\displaystyle f:\mathbb {R} ^{n}\rightarrow \mathbb {R} } , h : R n → R m I {\displaystyle h:\mathbb {R} ^{n}\rightarrow \mathbb {R} ^{m_{I}}} and g : R n → R m E {\displaystyle g:\mathbb {R} ^{n}\rightarrow \mathbb {R} ^{m_{E}}} .
The Lagrangian for this problem is[1]
where λ {\displaystyle \lambda } and σ {\displaystyle \sigma } are Lagrange multipliers.
If the problem does not have inequality constrained (that is, m I = 0 {\displaystyle m_{I}=0} ), the first-order optimality conditions (aka KKT conditions) ∇ L ( x , σ ) = 0 {\displaystyle \nabla {\mathcal {L}}(x,\sigma )=0} are a set of nonlinear equations that may be iteratively solved with Newton's Method. Newton's method linearizes the KKT conditions at the current iterate [ x k , σ k ] T {\displaystyle \left[x_{k},\sigma _{k}\right]^{T}} , which provides the following expression for the Newton step [ d x , d σ ] T {\displaystyle \left[d_{x},d_{\sigma }\right]^{T}} :
[ d x d σ ] = − [ ∇ x x 2 L ( x k , σ k ) ] − 1 ∇ x L ( x k , σ k ) = − [ ∇ x x 2 L ( x k , σ k ) ∇ g ( x k , σ k ) ∇ g T ( x k , σ k ) 0 ] − 1 [ ∇ f ( x k ) + σ k ∇ g ( x k ) g ( x k ) ] {\displaystyle {\begin{bmatrix}d_{x}\\d_{\sigma }\end{bmatrix}}=-[\nabla _{xx}^{2}{\mathcal {L}}(x_{k},\sigma _{k})]^{-1}\nabla _{x}{\mathcal {L}}(x_{k},\sigma _{k})=-{\begin{bmatrix}\nabla _{xx}^{2}{\mathcal {L}}(x_{k},\sigma _{k})&\nabla g(x_{k},\sigma _{k})\\\nabla g^{T}(x_{k},\sigma _{k})&0\end{bmatrix}}^{-1}{\begin{bmatrix}\nabla f(x_{k})+\sigma _{k}\nabla g(x_{k})\\g(x_{k})\end{bmatrix}}} ,
where ∇ x x 2 L ( x k , σ k ) {\displaystyle \nabla _{xx}^{2}{\mathcal {L}}(x_{k},\sigma _{k})} denotes the Hessian matrix of the Lagrangian, and d x {\displaystyle d_{x}} and d σ {\displaystyle d_{\sigma }} are the primal and dual displacements, respectively. Note that the Lagrangian Hessian is not explicitly inverted and a linear system is solved instead.
When the Lagrangian Hessian ∇ 2 L ( x k , σ k ) {\displaystyle \nabla ^{2}{\mathcal {L}}(x_{k},\sigma _{k})} is not positive definite, the Newton step may not exist or it may characterize a stationary point that is not a local minimum (but rather, a local maximum or a saddle point). In this case, the Lagrangian Hessian must be regularized, for example one can add a multiple of the identity to it such that the resulting matrix is positive definite.
An alternative view for obtaining the primal-dual displacements is to construct and solve a local quadratic model of the original problem at the current iterate:
min d x f ( x k ) + ∇ f ( x k ) T d x + 1 2 d x T ∇ x x 2 L ( x k , σ k ) d x s . t . g ( x k ) + ∇ g ( x k ) T d x = 0. {\displaystyle {\begin{array}{rl}\min \limits _{d_{x}}&f(x_{k})+\nabla f(x_{k})^{T}d_{x}+{\frac {1}{2}}d_{x}^{T}\nabla _{xx}^{2}{\mathcal {L}}(x_{k},\sigma _{k})d_{x}\\\mathrm {s.t.} &g(x_{k})+\nabla g(x_{k})^{T}d_{x}=0.\end{array}}}
The optimality conditions of this quadratic problem correspond to the linearized KKT conditions of the original problem. Note that the term f ( x k ) {\displaystyle f(x_{k})} in the expression above may be left out, since it is constant under the min d {\displaystyle \min \limits _{d}} operator.
In the presence of inequality constraints ( m I > 0 {\displaystyle m_{I}>0} ), we can naturally extend the definition of the local quadratic model introduced in the previous section:
min d f ( x k ) + ∇ f ( x k ) T d + 1 2 d T ∇ x x 2 L ( x k , λ k , σ k ) d s . t . h ( x k ) + ∇ h ( x k ) T d ≥ 0 g ( x k ) + ∇ g ( x k ) T d = 0. {\displaystyle {\begin{array}{rl}\min \limits _{d}&f(x_{k})+\nabla f(x_{k})^{T}d+{\frac {1}{2}}d^{T}\nabla _{xx}^{2}{\mathcal {L}}(x_{k},\lambda _{k},\sigma _{k})d\\\mathrm {s.t.} &h(x_{k})+\nabla h(x_{k})^{T}d\geq 0\\&g(x_{k})+\nabla g(x_{k})^{T}d=0.\end{array}}}
The SQP algorithm starts from the initial iterate ( x 0 , λ 0 , σ 0 ) {\displaystyle (x_{0},\lambda _{0},\sigma _{0})} . At each iteration, the QP subproblem is built and solved; the resulting Newton step direction [ d x , d λ , d σ ] T {\displaystyle [d_{x},d_{\lambda },d_{\sigma }]^{T}} is used to update current iterate:
[ x k + 1 , λ k + 1 , σ k + 1 ] T = [ x k , λ k , σ k ] T + [ d x , d λ , d σ ] T . {\displaystyle \left[x_{k+1},\lambda _{k+1},\sigma _{k+1}\right]^{T}=\left[x_{k},\lambda _{k},\sigma _{k}\right]^{T}+[d_{x},d_{\lambda },d_{\sigma }]^{T}.}
This process is repeated for k = 0 , 1 , 2 , … {\displaystyle k=0,1,2,\ldots } until some convergence criterion is satisfied.
Practical implementations of the SQP algorithm are significantly more complex than its basic version above. To adapt SQP for real-world applications, the following challenges must be addressed:
To overcome these challenges, various strategies are typically employed:
These strategies can be combined in numerous ways, resulting in a diverse range of SQP methods.
SQP methods have been implemented in well known numerical environments such as MATLAB and GNU Octave. There also exist numerous software libraries, including open source:
and commercial