| Optimization Toolbox™ | ![]() |
| On this page… |
|---|
Sequential Quadratic Programming (SQP) |
In constrained optimization, the general aim is to transform the problem into an easier subproblem that can then be solved and used as the basis of an iterative process. A characteristic of a large class of early methods is the translation of the constrained problem to a basic unconstrained problem by using a penalty function for constraints that are near or beyond the constraint boundary. In this way the constrained problem is solved using a sequence of parameterized unconstrained optimizations, which in the limit (of the sequence) converge to the constrained problem. These methods are now considered relatively inefficient and have been replaced by methods that have focused on the solution of the Karush-Kuhn-Tucker (KKT) equations. The KKT equations are necessary conditions for optimality for a constrained optimization problem. If the problem is a so-called convex programming problem, that is, f(x) and Gi(x), i = 1,...,m, are convex functions, then the KKT equations are both necessary and sufficient for a global solution point.
Referring to GP (Equation 5-1), the Kuhn-Tucker equations can be stated as
![]() | (5-23) |
in addition to the original constraints in Equation 5-1.
The first equation describes a canceling of the gradients between the objective function and the active constraints at the solution point. For the gradients to be canceled, Lagrange multipliers (λi, i = 1,...,m) are necessary to balance the deviations in magnitude of the objective function and constraint gradients. Because only active constraints are included in this canceling operation, constraints that are not active must not be included in this operation and so are given Lagrange multipliers equal to 0. This is stated implicitly in the last two Kuhn-Tucker equations.
The solution of the KKT equations forms the basis to many nonlinear programming algorithms. These algorithms attempt to compute the Lagrange multipliers directly. Constrained quasi-Newton methods guarantee superlinear convergence by accumulating second-order information regarding the KKT equations using a quasi-Newton updating procedure. These methods are commonly referred to as Sequential Quadratic Programming (SQP) methods, since a QP subproblem is solved at each major iteration (also known as Iterative Quadratic Programming, Recursive Quadratic Programming, and Constrained Variable Metric methods).
SQP methods represent the state of the art in nonlinear programming methods. Schittkowski [36], for example, has implemented and tested a version that outperforms every other tested method in terms of efficiency, accuracy, and percentage of successful solutions, over a large number of test problems.
Based on the work of Biggs [1], Han [22], and Powell ([32] and [33]), the method allows you to closely mimic Newton's method for constrained optimization just as is done for unconstrained optimization. At each major iteration, an approximation is made of the Hessian of the Lagrangian function using a quasi-Newton updating method. This is then used to generate a QP subproblem whose solution is used to form a search direction for a line search procedure. An overview of SQP is found in Fletcher [13], Gill et. al. [19], Powell [35], and Schittkowski [23]. The general method, however, is stated here.
Given the problem description in GP (Equation 5-1) the principal idea is the formulation of a QP subproblem based on a quadratic approximation of the Lagrangian function.
|
| (5-24) |
Here you simplify Equation 5-1 by assuming that bound constraints have been expressed as inequality constraints. You obtain the QP subproblem by linearizing the nonlinear constraints.
![]() | (5-25) |
This subproblem can be solved using any QP algorithm (see, for instance, Quadratic Programming Solution). The solution is used to form a new iterate
xk + 1 = xk + αkdk.
The step length parameter αk is determined by an appropriate line search procedure so that a sufficient decrease in a merit function is obtained (see Updating the Hessian Matrix). The matrix Hk is a positive definite approximation of the Hessian matrix of the Lagrangian function (Equation 5-24). Hk can be updated by any of the quasi-Newton methods, although the BFGS method (see Updating the Hessian Matrix) appears to be the most popular.
A nonlinearly constrained problem can often be solved in fewer iterations than an unconstrained problem using SQP. One of the reasons for this is that, because of limits on the feasible area, the optimizer can make informed decisions regarding directions of search and step length.
Consider Rosenbrock's function with an additional nonlinear inequality constraint, g(x),
|
| (5-26) |
This was solved by an SQP implementation in 96 iterations compared to 140 for the unconstrained case. SQP Method on Nonlinear Linearly Constrained Rosenbrock's Function (Eq. 3-2) shows the path to the solution point x = [0.9072,0.8228] starting at x = [–1.9,2.0].
Figure 5-6. SQP Method on Nonlinear Linearly Constrained Rosenbrock's Function (Eq. 3-2)

The SQP implementation consists of three main stages, which are discussed briefly in the following subsections:
At each major iteration a positive definite quasi-Newton approximation of the Hessian of the Lagrangian function, H, is calculated using the BFGS method, where λi, i = 1,...,m, is an estimate of the Lagrange multipliers.
|
| (5-27) |
where

Powell [33] recommends
keeping the Hessian positive definite even though it might be positive
indefinite at the solution point. A positive definite Hessian is maintained
providing
is positive at each update and
that H is initialized with a positive definite
matrix. When
is not positive, qk is modified on an element-by-element basis
so that
. The general aim of this modification
is to distort the elements of qk, which contribute to a positive definite update, as little as possible.
Therefore, in the initial phase of the modification, the most negative
element of qk*sk is repeatedly halved. This procedure is continued until
is greater than or equal to
a small negative tolerance. If, after this procedure,
is still not positive, modify qk by adding a vector v multiplied by a constant scalar w, that is,
|
| (5-28) |
where

and increase w systematically until
becomes positive.
The functions fmincon, fminimax, fgoalattain, and fseminf all use SQP. If Display is set to 'iter' in options, then various information is given such as function
values and the maximum constraint violation. When the Hessian has
to be modified using the first phase of the preceding procedure to
keep it positive definite, then Hessian modified is displayed. If the Hessian has to be modified again using the
second phase of the approach described above, then Hessian modified twice is displayed. When the
QP subproblem is infeasible, then infeasible is displayed.
Such displays are usually not a cause for concern but indicate that
the problem is highly nonlinear and that convergence might take longer
than usual. Sometimes the message no update is displayed, indicating that
is
nearly zero. This can be an indication that the problem setup is wrong
or you are trying to minimize a noncontinuous function.
At each major iteration of the SQP method, a QP problem of the following form is solved, where Ai refers to the ith row of the m-by-n matrix A.
![]() | (5-29) |
The method used in Optimization Toolbox™ functions is an active set strategy (also known as a projection method) similar to that of Gill et. al., described in [18] and [17]. It has been modified for both Linear Programming (LP) and Quadratic Programming (QP) problems.
The solution procedure involves two phases. The first phase
involves the calculation of a feasible point (if one exists). The
second phase involves the generation of an iterative sequence of feasible
points that converge to the solution. In this method an active set,
, is maintained that is an estimate
of the active constraints (i.e., those that are on the constraint
boundaries) at the solution point. Virtually all QP algorithms are
active set methods. This point is emphasized because there exist many
different methods that are very similar in structure but that are
described in widely different terms.
is updated at each iteration k, and this is used to form a basis for a search direction
. Equality constraints always
remain in the active set
.
The notation for the variable
is
used here to distinguish it from dk in the major iterations of the SQP method. The search
direction
is calculated and minimizes
the objective function while remaining on any active constraint boundaries.
The feasible subspace for
is
formed from a basis Zk whose
columns are orthogonal to the estimate of the active set
(i.e.,
). Thus a search direction, which
is formed from a linear summation of any combination of the columns
of Zk, is guaranteed to
remain on the boundaries of the active constraints.
The matrix Zk is formed
from the last m – l columns of the QR decomposition of the matrix
, where l is the number of active constraints and l < m. That is, Zk is given by
|
| (5-30) |
where
![]()
Once Zk is found,
a new search direction
is sought that
minimizes q(d) where
is in the null space of the
active constraints. That is,
is
a linear combination of the columns of Zk:
for some vector p.
Then if you view the quadratic as a function of p, by substituting for
, you have
|
| (5-31) |
Differentiating this with respect to p yields
|
| (5-32) |
∇q(p) is referred to as the projected gradient of
the quadratic function because it is the gradient projected in the
subspace defined by Zk.
The term
is called the projected Hessian.
Assuming the Hessian matrix H is positive definite
(which is the case in this implementation of SQP), then the minimum
of the function q(p) in the
subspace defined by Zk occurs
when ∇q(p) = 0, which
is the solution of the system of linear equations
|
| (5-33) |
A step is then taken of the form
|
| (5-34) |
At each iteration, because of the quadratic nature of the objective
function, there are only two choices of step length α. A step of unity along
is
the exact step to the minimum of the function restricted to the null
space of
. If such a step can be taken,
without violation of the constraints, then this is the solution to
QP (Equation 5-30). Otherwise, the step along
to the nearest constraint is
less than unity and a new constraint is included in the active set
at the next iteration. The distance to the constraint boundaries in
any direction
is given by
|
| (5-35) |
which is defined for constraints not in the active set, and
where the direction
is towards the
constraint boundary, i.e.,
.
When n independent constraints are included in the active set, without location of the minimum, Lagrange multipliers, λk, are calculated that satisfy the nonsingular set of linear equations
|
| (5-36) |
If all elements of λk are positive, xk is the optimal solution of QP (Equation 5-30). However, if any component of λk is negative, and the component does not correspond to an equality constraint, then the corresponding element is deleted from the active set and a new iterate is sought.
Initialization. The algorithm requires a feasible point to start. If the current point from the SQP method is not feasible, then you can find a point by solving the linear programming problem
![]() | (5-37) |
The notation Ai indicates the ith row of the matrix A. You can find a feasible point (if one exists) to Equation 5-37 by setting x to a value that satisfies the equality constraints. You can determine this value by solving an under- or overdetermined set of linear equations formed from the set of equality constraints. If there is a solution to this problem, then the slack variable γ is set to the maximum inequality constraint at this point.
You can modify the preceding QP algorithm for LP problems by setting the search direction to the steepest descent direction at each iteration, where gk is the gradient of the objective function (equal to the coefficients of the linear objective function).
|
| (5-38) |
If a feasible point is found using the preceding LP method,
the main QP phase is entered. The search direction
is initialized with a search
direction
found from solving the set of
linear equations
|
| (5-39) |
where gk is the gradient of the objective function at the current iterate xk (i.e., Hxk + c).
If a feasible solution is not found for the QP problem, the
direction of search for the main SQP routine
is taken as one that
minimizes γ.
The solution to the QP subproblem produces a vector dk, which is used to form a new iterate
|
| (5-40) |
The step length parameter αk is determined in order to produce a sufficient decrease in a merit function. The merit function used by Han [22] and Powell [33] of the following form is used in this implementation.
|
| (5-41) |
Powell recommends setting the penalty parameter
|
| (5-42) |
This allows positive contribution from constraints that are inactive in the QP solution but were recently active. In this implementation, the penalty parameter ri is initially set to
|
| (5-43) |
where
represents the Euclidean norm.
This ensures larger contributions to the penalty parameter from constraints with smaller gradients, which would be the case for active constraints at the solution point.
The simplex algorithm, invented by George Dantzig in 1947, is one of the earliest and best known optimization algorithms. The algorithm solves the linear programming problem

The algorithm moves along the edges of the polyhedron defined by the constraints, from one vertex to another, while decreasing the value of the objective function, fTx, at each step. This section describes an improved version of the original simplex algorithm that returns a vertex optimal solution.
This section covers the following topics:
The simplex algorithm has two phases:
Phase 1 — Compute an initial basic feasible point.
Phase 2 — Compute the optimal solution to the original problem.
Note You cannot supply an initial point x0 for linprog with the simplex algorithm. If you pass in x0 as an input argument, linprog ignores x0 and computes its own initial point for the algorithm. |
Phase 1.
In phase 1, the algorithm finds an initial
basic feasible solution (see Basic and Nonbasic Variables for a definition) by solving
an auxiliary piecewise linear programming problem. The objective function
of the auxiliary problem is the linear penalty function
,
where Pj(xj) is defined by

P(x) measures how much a point x violates the lower and upper bound conditions. The auxiliary problem is
![]()
The original problem has a feasible basis point if and only if the auxiliary problem has minimum value 0.
The algorithm finds an initial point for the auxiliary problem by a heuristic method that adds slack and artificial variables as necessary. The algorithm then uses this initial point together with the simplex algorithm to solve the auxiliary problem. The optimal solution is the initial point for phase 2 of the main algorithm.
Phase 2. In phase 2, the algorithm applies the simplex algorithm, starting at the initial point from phase 1, to solve the original problem. At each iteration, the algorithm tests the optimality condition and stops if the current solution is optimal. If the current solution is not optimal, the algorithm
Chooses one variable, called the entering variable, from the nonbasic variables and adds the corresponding column of the nonbasis to the basis (see Basic and Nonbasic Variables for definitions).
Chooses a variable, called the leaving variable, from the basic variables and removes the corresponding column from the basis.
Updates the current solution and the current objective value.
The algorithm chooses the entering and the leaving variables by solving two linear systems while maintaining the feasibility of the solution.
The simplex algorithm uses the same preprocessing steps as the large-scale linear programming solver, which are described in Preprocessing. In addition, the algorithm uses two other steps:
Eliminates columns that have only one nonzero element and eliminates their corresponding rows.
For each constraint equation a·x = b, where a is a row of Aeq, the algorithm computes the lower and upper bounds of the linear combination a·x as rlb and rub if the lower and upper bounds are finite. If either rlb or rub equals b, the constraint is called a forcing constraint. The algorithm sets each variable corresponding to a nonzero coefficient of a·x equal to its upper or lower bound, depending on the forcing constraint. The algorithm then deletes the columns corresponding to these variables and deletes the rows corresponding to the forcing constraints.
To use the simplex method, set 'LargeScale' to 'off' and 'Simplex' to 'on' in options.
options = optimset('LargeScale','off','Simplex','on')
Then call the function linprog with the options input argument. See the reference page for linprog for more information.
linprog returns empty output arguments for x and fval if it detects infeasibility or unboundedness in the preprocessing procedure. linprog returns the current point when it
Exceeds the maximum number of iterations
Detects that the problem is infeasible or unbounded in phases 1 or 2
When the problem is unbounded, linprog returns x and fval in the unbounded direction.
This section defines the terms basis, nonbasis, and basic feasible solutions for a linear programming problem. The definition assumes that the problem is given in the following standard form:
![]()
(Note that A and b are not the matrix and vector defining the inequalities in the original
problem.) Assume that A is an m-by-n matrix, of rank m < n, whose columns are {a1, a2, ..., an}. Suppose that
is
a basis for the column space of A, with index
set B = {i1, i2, ..., im}, and that N = {1, 2, ..., n}\B is the complement of B. The submatrix AB is called a basis and the complementary submatrix AN is called a nonbasis. The vector
of basic variables is xB and the vector of nonbasic variables is xN. At each iteration
in phase 2, the algorithm replaces one column of the current basis
with a column of the nonbasis and updates the variables xB and xN accordingly.
If x is a solution to A·x = b and all the nonbasic variables in xN are equal to either their lower or upper bounds, x is called a basic solution. If, in addition, the basic variables in xB satisfy their lower and upper bounds, so that x is a feasible point, x is called a basic feasible solution.
[1] Chvatal, Vasek, Linear Programming, W. H. Freeman and Company, 1983.
[2] Bixby, Robert E., "Implementing the Simplex Method: The Initial Basis," ORSA Journal on Computing, Vol. 4, No. 3, 1992.
[3] Andersen, Erling D. and Knud D. Andersen, "Presolving in Linear Programming," Mathematical Programming, Vol. 71, pp. 221-245, 1995.
![]() | Nonlinear Systems of Equations | Multiobjective Optimization | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |