Main Content

Constrained minimization is the problem of finding a vector *x* that
is a local minimum to a scalar function *f*(*x*)
subject to constraints on the allowable *x*:

$$\underset{x}{\mathrm{min}}f(x)$$

such that one or more of the following holds: *c*(*x*) ≤ 0, *ceq*(*x*) = 0, *A·x* ≤ *b*, *Aeq·x* = *beq*, *l* ≤ *x* ≤ *u*. There are even
more constraints used in semi-infinite programming; see fseminf Problem Formulation and Algorithm.

Many of the methods used in Optimization Toolbox™ solvers
are based on *trust regions,* a simple yet powerful
concept in optimization.

To understand the trust-region approach to optimization, consider
the unconstrained minimization problem, minimize *f*(*x*),
where the function takes vector arguments and returns scalars. Suppose
you are at a point *x* in *n*-space
and you want to improve, i.e., move to a point with a lower function
value. The basic idea is to approximate *f* with
a simpler function *q*, which reasonably reflects
the behavior of function *f* in a neighborhood *N* around
the point *x*. This neighborhood is the trust region.
A trial step *s* is computed by minimizing (or approximately
minimizing) over *N*. This is the trust-region subproblem,

$$\underset{s}{\mathrm{min}}\left\{q(s),\text{}s\in N\right\}.$$ | (1) |

The current point is updated to be *x* + *s* if *f*(*x* + *s*) < *f*(*x*);
otherwise, the current point remains unchanged and *N*,
the region of trust, is shrunk and the trial step computation is repeated.

The key questions in defining a specific trust-region approach
to minimizing *f*(*x*) are how to
choose and compute the approximation *q* (defined
at the current point *x*), how to choose and modify
the trust region *N*, and how accurately to solve
the trust-region subproblem. This section focuses on the unconstrained
problem. Later sections discuss additional complications due to the
presence of constraints on the variables.

In the standard trust-region method ([48]), the quadratic approximation *q* is
defined by the first two terms of the Taylor approximation to *F* at *x*;
the neighborhood *N* is usually spherical or ellipsoidal
in shape. Mathematically the trust-region subproblem is typically
stated

$$\mathrm{min}\left\{\frac{1}{2}{s}^{T}Hs+{s}^{T}g\text{suchthat}\Vert Ds\Vert \le \Delta \right\},$$ | (2) |

where *g* is the gradient of *f* at the current point
*x*, *H* is the Hessian matrix (the
symmetric matrix of second derivatives), *D* is a diagonal
scaling matrix, Δ is a positive scalar, and ∥ . ∥ is the 2-norm. Good algorithms
exist for solving Equation 2 (see [48]); such algorithms typically involve the computation of all
eigenvalues of *H* and a Newton process applied to the secular equation

$$\frac{1}{\Delta}-\frac{1}{\Vert s\Vert}=0.$$

Such algorithms provide an accurate solution to Equation 2. However, they
require time proportional to several factorizations of *H*.
Therefore, for large-scale problems a different approach is needed.
Several approximation and heuristic strategies, based on Equation 2, have been proposed
in the literature ([42] and [50]). The approximation approach followed
in Optimization Toolbox solvers is to restrict the trust-region
subproblem to a two-dimensional subspace *S* ([39] and [42]).
Once the subspace *S* has been computed, the work
to solve Equation 2 is
trivial even if full eigenvalue/eigenvector information is needed
(since in the subspace, the problem is only two-dimensional). The
dominant work has now shifted to the determination of the subspace.

The two-dimensional subspace *S* is
determined with the aid of a preconditioned
conjugate gradient process described below. The solver defines *S* as
the linear space spanned by *s*_{1} and *s*_{2},
where *s*_{1} is in the direction
of the gradient *g*, and *s*_{2} is
either an approximate Newton direction, i.e.,
a solution to

$$H\cdot {s}_{2}=-g,$$ | (3) |

or a direction of negative curvature,

$${s}_{2}^{T}\cdot H\cdot {s}_{2}<0.$$ | (4) |

The philosophy behind this choice of *S* is
to force global convergence (via the steepest descent direction or
negative curvature direction) and achieve fast local convergence (via
the Newton step, when it exists).

A sketch of unconstrained minimization using trust-region ideas is now easy to give:

Formulate the two-dimensional trust-region subproblem.

Solve Equation 2 to determine the trial step

*s*.If

*f*(*x*+*s*) <*f*(*x*), then*x*=*x*+*s*.Adjust Δ.

These four steps are repeated until convergence. The trust-region
dimension Δ is adjusted according to standard rules. In particular,
it is decreased if the trial step is not accepted, i.e., *f*(*x* + *s*)
≥ *f*(*x*).
See [46] and [49] for a discussion of this aspect.

Optimization Toolbox solvers treat a few important special
cases of *f* with specialized functions: nonlinear
least-squares, quadratic functions, and linear least-squares. However,
the underlying algorithmic ideas are the same as for the general case.
These special cases are discussed in later sections.

A popular way to solve large, symmetric, positive definite systems of linear equations *Hp* = –*g* is the method of Preconditioned Conjugate Gradients (PCG). This iterative
approach requires the ability to calculate matrix-vector products of the form
*H·v* where *v* is an arbitrary vector. The
symmetric positive definite matrix *M *is a*
preconditioner* for *H*. That is, *M* = *C*^{2}, where *C*^{–1}*HC*^{–1} is a well-conditioned matrix or a matrix with clustered eigenvalues.

In a minimization context, you can assume that the Hessian matrix *H* is
symmetric. However, *H* is guaranteed to be positive definite only in the
neighborhood of a strong minimizer. Algorithm PCG exits when it encounters a direction of
negative (or zero) curvature, that is, *d ^{T}Hd* ≤ 0. The PCG output direction

Linear constraints complicate the situation described for unconstrained minimization. However, the underlying ideas described previously can be carried through in a clean and efficient way. The trust-region methods in Optimization Toolbox solvers generate strictly feasible iterates.

The general linear equality constrained minimization problem can be written

$$\mathrm{min}\left\{f(x)\text{suchthat}Ax=b\right\},$$ | (5) |

where *A* is an *m*-by-*n* matrix
(*m* ≤ *n*). Some Optimization Toolbox solvers
preprocess *A* to remove strict linear dependencies
using a technique based on the LU factorization of *A ^{T}* [46]. Here

The method used to solve Equation 5 differs from
the unconstrained approach in two significant ways. First, an initial
feasible point *x*_{0} is computed,
using a sparse least-squares step, so that *Ax*_{0} = *b*.
Second, Algorithm PCG is replaced with Reduced Preconditioned Conjugate
Gradients (RPCG), see [46], in order
to compute an approximate reduced Newton step (or a direction of negative
curvature in the null space of *A*). The key linear
algebra step involves solving systems of the form

$$\left[\begin{array}{cc}C& {\tilde{A}}^{T}\\ \tilde{A}& 0\end{array}\right]\left[\begin{array}{c}s\\ t\end{array}\right]=\left[\begin{array}{c}r\\ 0\end{array}\right],$$ | (6) |

where $$\tilde{A}$$ approximates *A* (small
nonzeros of *A* are set to zero provided rank is
not lost) and *C* is a sparse symmetric positive-definite
approximation to *H*, i.e., *C* = *H*.
See [46] for more details.

The box constrained problem is of the form

$$\mathrm{min}\left\{f(x)\text{suchthat}l\le x\le u\right\},$$ | (7) |

where *l* is a vector of lower bounds, and *u* is
a vector of upper bounds. Some (or all) of the components of *l* can
be equal to –∞ and some (or all) of the components of *u* can
be equal to ∞. The method generates a sequence of strictly
feasible points. Two techniques are used to maintain feasibility while
achieving robust convergence behavior. First, a scaled modified Newton
step replaces the unconstrained Newton step (to define the two-dimensional
subspace *S*). Second, reflections
are used to increase the step size.

The scaled modified Newton step arises from examining the Kuhn-Tucker necessary conditions for Equation 7,

$${\left(D(x)\right)}^{-2}g=0,$$ | (8) |

where

$$D(x)=\text{diag}\left({\left|{v}_{k}\right|}^{-1/2}\right),$$

and the vector *v*(*x*) is
defined below, for each 1 ≤ *i* ≤ *n*:

If

*g*< 0 and_{i}*u*< ∞ then_{i}*v*=_{i}*x*–_{i}*u*_{i}If

*g*≥ 0 and_{i}*l*> –∞ then_{i}*v*=_{i}*x*–_{i}*l*_{i}If

*g*< 0 and_{i}*u*= ∞ then_{i}*v*= –1_{i}If

*g*≥ 0 and_{i}*l*= –∞ then_{i}*v*= 1_{i}

The nonlinear system Equation 8 is not differentiable
everywhere. Nondifferentiability occurs when *v _{i}* = 0. You can avoid
such points by maintaining strict feasibility, i.e., restricting

The scaled modified Newton step *s _{k}* for
the nonlinear system of equations given by Equation 8 is
defined as the solution to the linear system

$$\widehat{M}D{s}^{N}=-\widehat{g}$$ | (9) |

at the *k*th iteration, where

$$\widehat{g}={D}^{-1}g=\text{diag}\left({\left|v\right|}^{1/2}\right)g,$$ | (10) |

and

$$\widehat{M}={D}^{-1}H{D}^{-1}+\text{diag}(g){J}^{v}.$$ | (11) |

Here *J ^{v}* plays
the role of the Jacobian of |

Second, reflections are used to increase
the step size. A (single) reflection step is defined as follows. Given
a step *p* that intersects a bound constraint, consider
the first bound constraint crossed by *p*; assume
it is the *i*th bound constraint (either the *i*th
upper or *i*th lower bound). Then the reflection
step *p ^{R}* =

In constrained optimization, the general aim is to transform the problem into an easier
subproblem that can then be solved and used as the basis of an iterative
process. A characteristic of a large class of early methods is the translation
of the constrained problem to a basic unconstrained problem by using a penalty
function for constraints that are near or beyond the constraint boundary. In
this way the constrained problem is solved using a sequence of parametrized
unconstrained optimizations, which in the limit (of the sequence) converge to
the constrained problem. These methods are now considered relatively inefficient
and have been replaced by methods that have focused on the solution of the Karush-Kuhn-Tucker (KKT) equations. The KKT equations are necessary
conditions for optimality for a constrained optimization problem. If the problem
is a so-called convex programming problem, that is,
*f*(*x*) and *G _{i}*(

Referring to GP (Equation 1), the Kuhn-Tucker equations can be stated as

$$\begin{array}{c}\nabla f\left({x}^{*}\right)+{\displaystyle \sum _{i=1}^{m}{\lambda}_{i}\cdot \nabla {G}_{i}\left({x}^{*}\right)}=0\\ {\lambda}_{i}\cdot {G}_{i}\left({x}^{*}\right)=0,\text{}i=1,\mathrm{...},{m}_{e}\\ {\lambda}_{i}\ge 0,\text{}i={m}_{e}+1,\mathrm{...},m,\end{array}$$ | (12) |

in addition to the original constraints in Equation 1.

The first equation describes a canceling of the gradients between
the objective function and the active constraints at the solution
point. For the gradients to be canceled, Lagrange multipliers (*λ _{i}*,

The solution of the KKT equations forms the basis to many nonlinear programming algorithms. These algorithms attempt to compute the Lagrange multipliers directly. Constrained quasi-Newton methods guarantee superlinear convergence by accumulating second-order information regarding the KKT equations using a quasi-Newton updating procedure. These methods are commonly referred to as Sequential Quadratic Programming (SQP) methods, since a QP subproblem is solved at each major iteration (also known as Iterative Quadratic Programming, Recursive Quadratic Programming, and Constrained Variable Metric methods).

The `'active-set'`

algorithm is not a large-scale
algorithm; see Large-Scale vs. Medium-Scale Algorithms.

SQP methods represent the state of the art in nonlinear programming methods. Schittkowski [36], for example, has implemented and tested a version that outperforms every other tested method in terms of efficiency, accuracy, and percentage of successful solutions, over a large number of test problems.

Based on the work of Biggs [1], Han [22], and Powell ([32] and [33]), the method allows you to closely mimic Newton's method for constrained optimization just as is done for unconstrained optimization. At each major iteration, an approximation is made of the Hessian of the Lagrangian function using a quasi-Newton updating method. This is then used to generate a QP subproblem whose solution is used to form a search direction for a line search procedure. An overview of SQP is found in Fletcher [13], Gill et al. [19], Powell [35], and Schittkowski [23]. The general method, however, is stated here.

Given the problem description in GP (Equation 1) the principal idea is the formulation of a QP subproblem based on a quadratic approximation of the Lagrangian function.

$$L(x,\lambda )=f(x)+{\displaystyle \sum _{i=1}^{m}{\lambda}_{i}\cdot {g}_{i}(x)}.$$ | (13) |

Here you simplify Equation 1 by assuming that bound constraints have been expressed as inequality constraints. You obtain the QP subproblem by linearizing the nonlinear constraints.

$$\begin{array}{l}\underset{d\in {\Re}^{n}}{\mathrm{min}}\frac{1}{2}{d}^{T}{H}_{k}d+\nabla f{\left({x}_{k}\right)}^{T}d\\ \nabla {g}_{i}{\left({x}_{k}\right)}^{T}d+{g}_{i}\left({x}_{k}\right)=0,\text{}i=1,\mathrm{...},{m}_{e}\\ \nabla {g}_{i}{\left({x}_{k}\right)}^{T}d+{g}_{i}\left({x}_{k}\right)\le 0,\text{}i={m}_{e}+1,\mathrm{...},m.\end{array}$$ | (14) |

This subproblem can be solved using any QP algorithm (see, for instance, Quadratic Programming Solution). The solution is used to form a new iterate

*x*_{k +
1} = *x*_{k} + *α*_{k}*d*_{k}.

The step length parameter *α*_{k} is
determined by an appropriate line search procedure so that a sufficient
decrease in a merit function is obtained (see Updating the Hessian Matrix). The matrix *H*_{k} is
a positive definite approximation of the Hessian matrix of the Lagrangian
function (Equation 13). *H*_{k} can
be updated by any of the quasi-Newton methods, although the BFGS method
(see Updating the Hessian Matrix)
appears to be the most popular.

A nonlinearly constrained problem can often be solved in fewer iterations than an unconstrained problem using SQP. One of the reasons for this is that, because of limits on the feasible area, the optimizer can make informed decisions regarding directions of search and step length.

Consider Rosenbrock's function
with an additional nonlinear inequality constraint, *g*(*x*),

$${x}_{1}^{2}+{x}_{2}^{2}-1.5\le 0.$$ | (15) |

This was solved by an SQP implementation in 96 iterations compared
to 140 for the unconstrained case. Figure 5-3, SQP Method on Nonlinearly Constrained Rosenbrock's Function shows
the path to the solution point *x* = [0.9072,0.8228] starting
at *x* = [–1.9,2.0].

**Figure 5-3, SQP Method on Nonlinearly Constrained Rosenbrock's
Function**

The SQP implementation consists of three main stages, which are discussed briefly in the following subsections:

**Updating the Hessian Matrix. **At each major iteration a positive definite quasi-Newton approximation
of the Hessian of the Lagrangian function, *H*, is
calculated using the BFGS method, where *λ _{i}*,

$${H}_{k+1}={H}_{k}+\frac{{q}_{k}{q}_{k}^{T}}{{q}_{k}^{T}{s}_{k}}-\frac{{H}_{k}{s}_{k}{s}_{k}^{T}{H}_{k}^{T}}{{s}_{k}^{T}{H}_{k}{s}_{k}},$$ | (16) |

where

$$\begin{array}{c}{s}_{k}={x}_{k+1}-{x}_{k}\\ {q}_{k}=\left(\nabla f\left({x}_{k+1}\right)+{\displaystyle \sum _{i=1}^{m}{\lambda}_{i}\cdot \nabla {g}_{i}\left({x}_{k+1}\right)}\right)-\left(\nabla f\left({x}_{k}\right)+{\displaystyle \sum _{i=1}^{m}{\lambda}_{i}\cdot \nabla {g}_{i}\left({x}_{k}\right)}\right).\end{array}$$

Powell [33] recommends
keeping the Hessian positive definite even though it might be positive
indefinite at the solution point. A positive definite Hessian is maintained
providing $${q}_{k}^{T}{s}_{k}$$ is positive at each update and
that *H* is initialized with a positive definite
matrix. When $${q}_{k}^{T}{s}_{k}$$ is not positive, *q _{k}* is
modified on an element-by-element basis so that $${q}_{k}^{T}{s}_{k}>0$$. The general aim of this modification
is to distort the elements of

$${q}_{k}={q}_{k}+wv,$$ | (17) |

where

$$\begin{array}{l}{v}_{i}=\nabla {g}_{i}\left({x}_{k+1}\right)\cdot {g}_{i}\left({x}_{k+1}\right)-\nabla {g}_{i}\left({x}_{k}\right)\cdot {g}_{i}\left({x}_{k}\right)\\ \text{if}{\left({q}_{k}\right)}_{i}\cdot w0\text{and}{\left({q}_{k}\right)}_{i}\cdot {\left({s}_{k}\right)}_{i}0,\text{}i=1,\mathrm{...},m\\ {v}_{i}=0\text{otherwise,}\end{array}$$

and increase *w* systematically until $${q}_{k}^{T}{s}_{k}$$ becomes positive.

The functions `fmincon`

, `fminimax`

, `fgoalattain`

,
and `fseminf`

all use SQP. If `Display`

is
set to `'iter'`

in `options`

, then
various information is given such as function values and the maximum
constraint violation. When the Hessian has to be modified using the
first phase of the preceding procedure to keep it positive definite,
then `Hessian modified`

is displayed. If the Hessian
has to be modified again using the second phase of the approach described
above, then `Hessian modified twice`

is displayed.
When the QP subproblem is infeasible, then `infeasible`

is displayed. Such displays
are usually not a cause for concern but indicate that the problem
is highly nonlinear and that convergence might take longer than usual.
Sometimes the message `no update`

is displayed, indicating that $${q}_{k}^{T}{s}_{k}$$ is nearly zero. This can be
an indication that the problem setup is wrong or you are trying to
minimize a noncontinuous function.

**Quadratic Programming Solution. **At each major iteration of the SQP method,
a QP problem of the following form is solved, where *A _{i}* refers
to the

`i`

th row of the $$\begin{array}{c}\underset{d\in {\Re}^{n}}{\mathrm{min}}q(d)=\frac{1}{2}{d}^{T}Hd+{c}^{T}d,\\ {A}_{i}d={b}_{i},\text{}i=1,\mathrm{...},{m}_{e}\\ {A}_{i}d\le {b}_{i},\text{}i={m}_{e}+1,\mathrm{...},m.\end{array}$$ | (18) |

The method used in Optimization Toolbox functions is an active set strategy (also known as a projection method) similar to that of Gill et al., described in [18] and [17]. It has been modified for both Linear Programming (LP) and Quadratic Programming (QP) problems.

The solution procedure involves two phases. The first phase involves the calculation of a feasible point (if one exists). The second phase involves the generation of an iterative sequence of feasible points that converge to the solution. In this method an active set, $${\overline{A}}_{k}$$, is maintained that is an estimate of the active constraints (i.e., those that are on the constraint boundaries) at the solution point. Virtually all QP algorithms are active set methods. This point is emphasized because there exist many different methods that are very similar in structure but that are described in widely different terms.

$${\overline{A}}_{k}$$ is updated at each iteration *k*,
and this is used to form a basis for a search direction $${\widehat{d}}_{k}$$. Equality constraints always
remain in the active set $${\overline{A}}_{k}$$.
The notation for the variable $${\widehat{d}}_{k}$$ is
used here to distinguish it from *d _{k}* in
the major iterations of the SQP method. The search direction $${\widehat{d}}_{k}$$ is calculated and minimizes
the objective function while remaining on any active constraint boundaries.
The feasible subspace for $${\widehat{d}}_{k}$$ is
formed from a basis

The matrix *Z _{k}* is formed
from the last

$${Z}_{k}=Q\left[:,l+1:m\right],$$ | (19) |

where

$${Q}^{T}{\overline{A}}_{k}^{T}=\left[\begin{array}{c}R\\ 0\end{array}\right].$$

Once *Z _{k}* is found,
a new search direction $${\widehat{d}}_{k}$$ is sought that
minimizes

Then if you view the quadratic as a function of *p*,
by substituting for $${\widehat{d}}_{k}$$, you have

$$q(p)=\frac{1}{2}{p}^{T}{Z}_{k}^{T}H{Z}_{k}p+{c}^{T}{Z}_{k}p.$$ | (20) |

Differentiating this with respect to *p* yields

$$\nabla q(p)={Z}_{k}^{T}H{Z}_{k}p+{Z}_{k}^{T}c.$$ | (21) |

∇*q*(*p*) is
referred to as the projected gradient of the quadratic function because
it is the gradient projected in the subspace defined by *Z _{k}*.
The term $${Z}_{k}^{T}H{Z}_{k}$$ is called the projected Hessian.
Assuming the Hessian matrix

$${Z}_{k}^{T}H{Z}_{k}p=-{Z}_{k}^{T}c.$$ | (22) |

A step is then taken of the form

$${x}_{k+1}={x}_{k}+\alpha {\widehat{d}}_{k},\text{where}{\widehat{d}}_{k}={Z}_{k}p.$$ | (23) |

At each iteration, because of the quadratic nature of the objective
function, there are only two choices of step length *α*.
A step of unity along $${\widehat{d}}_{k}$$ is the exact step
to the minimum of the function restricted to the null space of $${\overline{A}}_{k}$$. If such a step can be taken,
without violation of the constraints, then this is the solution to
QP (Equation 18). Otherwise,
the step along $${\widehat{d}}_{k}$$ to the nearest
constraint is less than unity and a new constraint is included in
the active set at the next iteration. The distance to the constraint
boundaries in any direction $${\widehat{d}}_{k}$$ is
given by

$$\alpha =\underset{i\in \left\{1,\mathrm{...},m\right\}}{\mathrm{min}}\left\{\frac{-\left({A}_{i}{x}_{k}-{b}_{i}\right)}{{A}_{i}{\widehat{d}}_{k}}\right\},$$ | (24) |

which is defined for constraints not in the active set, and where the direction $${\widehat{d}}_{k}$$ is towards the constraint boundary, i.e., $${A}_{i}{\widehat{d}}_{k}>0,\text{}i=1,\mathrm{...},m$$.

When *n* independent constraints are included
in the active set, without location of the minimum, Lagrange multipliers, *λ _{k}*,
are calculated that satisfy the nonsingular set of linear equations

$${\overline{A}}_{k}^{T}{\lambda}_{k}=c+H{x}_{k}.$$ | (25) |

If all elements of *λ _{k}* are
positive,

**Initialization. **The algorithm requires a feasible point to start. If the current
point from the SQP method is not feasible, then you can find a point by
solving the linear programming problem

$$\begin{array}{l}\underset{\gamma \in \Re ,\text{}x\in {\Re}^{n}}{\mathrm{min}}\gamma \text{suchthat}\\ {A}_{i}x={b}_{i},\text{}i=1,\mathrm{...},{m}_{e}\\ {A}_{i}x-\gamma \le {b}_{i},\text{}i={m}_{e}+1,\mathrm{...},m.\end{array}$$ | (26) |

The notation *A _{i}* indicates the

You can modify the preceding QP algorithm for LP problems by
setting the search direction to the steepest descent direction at each
iteration, where *g _{k}* is the
gradient of the objective function (equal to the coefficients of the linear
objective function).

$${\widehat{d}}_{k}=-{Z}_{k}{Z}_{k}^{T}{g}_{k}.$$ | (27) |

If a feasible point is found using the preceding LP method, the main QP phase is entered. The search direction $${\widehat{d}}_{k}$$ is initialized with a search direction $${\widehat{d}}_{1}$$ found from solving the set of linear equations

$$H{\widehat{d}}_{1}=-{g}_{k},$$ | (28) |

where *g _{k}* is the gradient of the
objective function at the current iterate

If a feasible solution is not found for the QP problem, the direction of
search for the main SQP routine $${\widehat{d}}_{k}$$ is taken as one that minimizes *γ*.

**Line Search and Merit Function. **The solution to the QP subproblem produces a vector *d _{k}*,
which is used to form a new iterate

$${x}_{k+1}={x}_{k}+\alpha {d}_{k}.$$ | (29) |

The step length parameter *α _{k}* is
determined in order to produce a sufficient decrease in a merit function. The merit function used by Han [22] and Powell [33] of the following form is used in this
implementation.

$$\Psi (x)=f(x)+{\displaystyle \sum _{i=1}^{{m}_{e}}{r}_{i}\cdot {g}_{i}(x)}+{\displaystyle \sum _{i={m}_{e}+1}^{m}{r}_{i}\cdot \mathrm{max}[0,{g}_{i}(x)]}.$$ | (30) |

Powell recommends setting the penalty parameter

$${r}_{i}={\left({r}_{k+1}\right)}_{i}=\underset{i}{\mathrm{max}}\left\{{\lambda}_{i},\frac{{\left({r}_{k}\right)}_{i}+{\lambda}_{i}}{2}\right\},\text{}i=1,\mathrm{...},m.$$ | (31) |

This allows positive contribution from constraints that are
inactive in the QP solution but were recently active. In this implementation,
the penalty parameter *r _{i}* is
initially set to

$${r}_{i}=\frac{\Vert \nabla f(x)\Vert}{\Vert \nabla {g}_{i}(x)\Vert},$$ | (32) |

where $$\Vert \text{}\Vert $$ represents the Euclidean norm.

This ensures larger contributions to the penalty parameter from constraints with smaller gradients, which would be the case for active constraints at the solution point.

The `sqp`

algorithm (and nearly identical `sqp-legacy`

algorithm)
is similar to the `active-set`

algorithm (for a description,
see fmincon Active Set Algorithm).
The basic `sqp`

algorithm is described in Chapter
18 of Nocedal and Wright [31].

The `sqp`

algorithm is essentially the same
as the `sqp-legacy`

algorithm, but has a different
implementation. Usually, `sqp`

has faster execution
time and less memory usage than `sqp-legacy`

.

The most important differences between the `sqp`

and
the `active-set`

algorithms are:

The `sqp`

algorithm takes every iterative step
in the region constrained by bounds. Furthermore, finite difference
steps also respect bounds. Bounds are not strict; a step can be exactly
on a boundary. This strict feasibility can be beneficial when your
objective function or nonlinear constraint functions are undefined
or are complex outside the region constrained by bounds.

During its iterations, the `sqp`

algorithm
can attempt to take a step that fails. This means an objective function
or nonlinear constraint function you supply returns a value of `Inf`

, `NaN`

,
or a complex value. In this case, the algorithm attempts to take a
smaller step.

The `sqp`

algorithm uses a different set of
linear algebra routines to solve the quadratic programming subproblem, Equation 14. These routines
are more efficient in both memory usage and speed than the `active-set`

routines.

The `sqp`

algorithm has two new approaches
to the solution of Equation 14 when constraints are not satisfied.

The

`sqp`

algorithm combines the objective and constraint functions into a merit function. The algorithm attempts to minimize the merit function subject to relaxed constraints. This modified problem can lead to a feasible solution. However, this approach has more variables than the original problem, so the problem size in Equation 14 increases. The increased size can slow the solution of the subproblem. These routines are based on the articles by Spellucci [60] and Tone [61]. The`sqp`

algorithm sets the penalty parameter for the merit function Equation 30 according to the suggestion in [41].Suppose nonlinear constraints are not satisfied, and an attempted step causes the constraint violation to grow. The

`sqp`

algorithm attempts to obtain feasibility using a second-order approximation to the constraints. The second-order technique can lead to a feasible solution. However, this technique can slow the solution by requiring more evaluations of the nonlinear constraint functions.

The interior-point approach to constrained minimization is to solve a sequence of approximate minimization problems. The original problem is

$$\underset{x}{\mathrm{min}}f(x),\text{subjectto}h(x)=0\text{and}g(x)\le 0.$$ | (33) |

$$\begin{array}{l}\underset{x,s}{\mathrm{min}}{f}_{\mu}(x,s)=\underset{x,s}{\mathrm{min}}f(x)-\mu {\displaystyle \sum _{i}\mathrm{ln}\left({s}_{i}\right)},\text{subjectto}s\ge 0,\text{\hspace{0.17em}}h(x)=0,\text{and}g(x)+s=0.\\ \end{array}$$ | (34) |

The approximate problem Equation 34 is a sequence of equality constrained problems. These are easier to solve than the original inequality-constrained problem Equation 33.

To solve the approximate problem, the algorithm uses one of two main types of steps at each iteration:

A

*direct*step in (*x*,*s*). This step attempts to solve the KKT equations, Equation 2 and Equation 3, for the approximate problem via a linear approximation. This is also called a*Newton step*.

By default, the algorithm first attempts to take a direct step. If it cannot, it attempts a CG step. One case where it does not take a direct step is when the approximate problem is not locally convex near the current iterate.

At each iteration the algorithm decreases a *merit function*, such as

$${f}_{\mu}(x,s)+\nu \Vert \left(h(x),g(x)+s\right)\Vert .$$ | (35) |

If either the objective function or a nonlinear constraint function
returns a complex value, NaN, Inf, or an error at an iterate *x _{j}*,
the algorithm rejects

`try`

-`catch`

:function val = userFcn(x) try val = ... % code that can error catch val = NaN; end

The objective and constraints must yield proper (`double`

)
values at the initial point.

The following variables are used in defining the direct step:

*H*denotes the Hessian of the Lagrangian of*f*:_{μ}$$H={\nabla}^{2}f(x)+{\displaystyle \sum _{i}{\lambda}_{i}{\nabla}^{2}{g}_{i}(x)}+{\displaystyle \sum _{j}{y}_{j}{\nabla}^{2}{h}_{j}(x)}.$$ **(36)***J*denotes the Jacobian of the constraint function_{g}*g*.*J*denotes the Jacobian of the constraint function_{h}*h*.*S*= diag(*s*).*λ*denotes the Lagrange multiplier vector associated with constraints*g**Λ*= diag(*λ*).*y*denotes the Lagrange multiplier vector associated with*h*.*e*denote the vector of ones the same size as*g*.

Equation 38 defines the direct step (Δ*x*, Δ*s*):

$$\left[\begin{array}{cccc}H& 0& {J}_{h}^{T}& {J}_{g}^{T}\\ 0& \Lambda & 0& S\\ {J}_{h}& 0& 0& 0\\ {J}_{g}& I& 0& 0\end{array}\right]\left[\begin{array}{c}\Delta x\\ \Delta s\\ \Delta y\\ \Delta \lambda \end{array}\right]=-\left[\begin{array}{c}\nabla f+{J}_{h}^{T}y+{J}_{g}^{T}\lambda \\ S\lambda -\mu e\\ h\\ g+s\end{array}\right].$$ | (37) |

This equation comes directly from attempting to solve Equation 2 and Equation 3 using a linearized Lagrangian.

You can symmetrize the equation by premultiplying the second variable
*Δs* by
*S*^{–1}:

$$\left[\begin{array}{cccc}H& 0& {J}_{h}^{T}& {J}_{g}^{T}\\ 0& S\Lambda & 0& S\\ {J}_{h}& 0& 0& 0\\ {J}_{g}& S& 0& 0\end{array}\right]\left[\begin{array}{c}\Delta x\\ {S}^{-1}\Delta s\\ \Delta y\\ \Delta \lambda \end{array}\right]=-\left[\begin{array}{c}\nabla f+{J}_{h}^{T}y+{J}_{g}^{T}\lambda \\ S\lambda -\mu e\\ h\\ g+s\end{array}\right].$$ | (38) |

In order to solve this equation for (Δ*x*, Δ*s*), the algorithm makes an LDL factorization of the matrix. (See
Example 3 — The
Structure of D in the MATLAB^{®}
`ldl`

function reference page.) This is the most
computationally expensive step. One result of this factorization is a
determination of whether the projected Hessian is positive definite or not; if
not, the algorithm uses a conjugate gradient step, described in Conjugate Gradient Step.

To have the approximate problem Equation 34 approach the original problem, the barrier
parameter *μ* should decrease toward 0 as the iterations
proceed. The algorithm has two barrier parameter update options that you choose
with the `BarrierParamUpdate`

option:
`'monotone'`

(default) and
`'predictor-corrector'`

.

The `'monotone'`

option decreases the parameter
*μ* by a factor of 1/100 or 1/5 whenever the approximate
problem is solved sufficiently accurately in the previous iteration. It chooses
1/100 when the algorithm takes only one or two iterations to achieve sufficient
accuracy, and 1/5 otherwise. The measure of accuracy is the following test,
which is that the sizes of all the terms of the right side of Equation 38 are less than *μ*:

$$\mathrm{max}\left(\Vert \nabla f(x)-{J}_{h}^{T}y-{J}_{G}^{T}\lambda \Vert ,\Vert S\lambda -\mu e\Vert ,\Vert h\Vert ,\Vert c(x)+s\Vert \right)<\mu .$$

**Note**

`fmincon`

overrides your
`BarrierParamUpdate`

choice to
`'monotone'`

when either of the following hold:

The problem has no inequality constraints, including bound constraints.

The

`SubproblemAlgorithm`

option is`'cg'`

.

The remainder of this section discusses the
`'predictor-corrector'`

algorithm for updating the barrier
parameter *μ*. The mechanism is similar to the linear
programming Predictor-Corrector algorithm.

Predictor-Corrector steps can accelerate the existing Fiacco-McCormack (Monotone) approach by adjusting for the linearization error in the Newton steps. The effects of the Predictor-Corrector mode are twofold: it (frequently) improves step directions and simultaneously updates the barrier parameter adaptively with the centering parameter σ to encourage iterates to follow the “central path”. See Nocedal and Wright’s discussion of Predictor-Corrector Steps for linear programs to understand why the central path allows larger step sizes and consequently faster convergence.

The predictor step uses the linearized step with *μ* = 0,
meaning without a barrier function:

$$\left[\begin{array}{cccc}H& 0& {J}_{h}^{T}& {J}_{g}^{T}\\ 0& \Lambda & 0& S\\ {J}_{h}& 0& 0& 0\\ {J}_{g}& I& 0& 0\end{array}\right]\left[\begin{array}{c}\Delta x\\ \Delta s\\ \Delta y\\ \Delta \lambda \end{array}\right]=-\left[\begin{array}{c}\nabla f+{J}_{h}^{T}y+{J}_{g}^{T}\lambda \\ S\lambda \\ h\\ g+s\end{array}\right].$$

Define *ɑ _{s}* and

$$\begin{array}{l}{\alpha}_{s}=\mathrm{max}\left(\alpha \in (0,1]:s+\alpha \Delta s\ge 0\right)\\ {\alpha}_{\lambda}=\mathrm{max}\left(\alpha \in (0,1]:\lambda +\alpha \Delta \lambda \ge 0\right).\end{array}$$

Now compute the complementarity from the predictor step.

$${\mu}_{P}=\frac{\left(s+{\alpha}_{s}\Delta s\right)\left(\lambda +{\alpha}_{\lambda}\Delta \lambda \right)}{m},$$ | (39) |

where *m* is the number of constraints.

The first corrector step adjusts for the quadratic term neglected in the Newton’s root-finding linearization

$$\left(s+\Delta s\right)\left(\lambda +\text{}\Delta \lambda \right)=\underset{\text{Lineartermsetto0}}{\underbrace{s\lambda +\text{}s\Delta \lambda +\lambda \Delta s}}+\underset{\text{Quadratic}}{\underbrace{\Delta s\Delta \lambda}}.$$

To correct the quadratic error, solve the linear system for the corrector step direction.

$$\left[\begin{array}{cccc}H& 0& {J}_{h}^{T}& {J}_{g}^{T}\\ 0& \Lambda & 0& S\\ {J}_{h}& 0& 0& 0\\ {J}_{g}& I& 0& 0\end{array}\right]\left[\begin{array}{c}\Delta {x}_{\text{cor}}\\ \Delta {s}_{\text{cor}}\\ \Delta {y}_{\text{cor}}\\ \Delta {\lambda}_{\text{cor}}\end{array}\right]=-\left[\begin{array}{c}0\\ \Delta s\Delta \lambda \\ 0\\ 0\end{array}\right].$$

The second corrector step is a centering step. The centering correction is
based on the variable *σ* on the right side of the
equation

$$\left[\begin{array}{cccc}H& 0& {J}_{h}^{T}& {J}_{g}^{T}\\ 0& \Lambda & 0& S\\ {J}_{h}& 0& 0& 0\\ {J}_{g}& I& 0& 0\end{array}\right]\left[\begin{array}{c}\Delta {x}_{\text{cen}}\\ \Delta {s}_{\text{cen}}\\ \Delta {y}_{\text{cen}}\\ \Delta {\lambda}_{\text{cen}}\end{array}\right]=-\left[\begin{array}{c}\nabla f+{J}_{h}^{T}y+{J}_{g}^{T}\lambda \\ S\lambda -\mu e\sigma \\ h\\ g+s\end{array}\right].$$

Here, *σ* is defined as

$$\sigma ={\left(\frac{{\mu}_{P}}{\mu}\right)}^{3},$$

where *μ _{P}* is defined in equation
Equation 39, and $$\mu ={s}^{T}\lambda /m$$.

To keep the barrier parameter from decreasing too quickly, potentially
destabilizing the algorithm, the algorithm keeps the centering parameter
*σ* above 1/100. This causes the barrier parameter
*μ* to decrease by no more than a factor of 1/100.

Algorithmically, the first correction and centering steps are independent of each other, so they are computed together. Furthermore, the matrix on the left for the predictor and both corrector steps is the same. So, algorithmically, the matrix is factorized once, and this factorization is used for all these steps.

The algorithm can reject the proposed predictor-corrector step for because the step increases the merit function value Equation 35, the complementarity by at least a factor of two, or the computed inertia is incorrect (the problem looks nonconvex). In these cases the algorithm attempts to take a different step or a conjugate gradient step.

The conjugate gradient approach to solving the approximate problem Equation 34 is similar to other
conjugate gradient calculations. In this case, the algorithm adjusts
both *x* and *s*, keeping the slacks *s* positive.
The approach is to minimize a quadratic approximation to the approximate
problem in a trust region, subject to linearized constraints.

Specifically, let *R* denote the radius of
the trust region, and let other variables be defined as in Direct Step. The algorithm obtains Lagrange
multipliers by approximately solving the KKT equations

$${\nabla}_{x}L={\nabla}_{x}f(x)+{\displaystyle \sum _{i}{\lambda}_{i}\nabla {g}_{i}(x)}+{\displaystyle \sum _{j}{y}_{j}\nabla {h}_{j}(x)}=0,$$

in the least-squares sense, subject to *λ* being
positive. Then it takes a step (Δ*x*,
Δ*s*) to approximately
solve

$$\underset{\Delta x,\Delta s}{\mathrm{min}}\nabla {f}^{T}\Delta x+\frac{1}{2}\Delta {x}^{T}{\nabla}_{xx}^{2}L\Delta x+\mu {e}^{T}{S}^{-1}\Delta s+\frac{1}{2}\Delta {s}^{T}{S}^{-1}\Lambda \Delta s,$$ | (40) |

$$g(x)+{J}_{g}\Delta x+\Delta s=0,\text{}h(x)+{J}_{h}\Delta x=0.$$ | (41) |

Here are the meanings and effects of several options in the interior-point algorithm.

`HonorBounds`

— When set to`true`

, every iterate satisfies the bound constraints you have set. When set to`false`

, the algorithm may violate bounds during intermediate iterations.`HessianApproximation`

— When set to:`'bfgs'`

,`fmincon`

calculates the Hessian by a dense quasi-Newton approximation.`'lbfgs'`

,`fmincon`

calculates the Hessian by a limited-memory, large-scale quasi-Newton approximation.`'fin-diff-grads'`

,`fmincon`

calculates a Hessian-times-vector product by finite differences of the gradient(s); other options need to be set appropriately.

`HessianFcn`

—`fmincon`

uses the function handle you specify in`HessianFcn`

to compute the Hessian. See Including Hessians.`HessianMultiplyFcn`

— Give a separate function for Hessian-times-vector evaluation. For details, see Including Hessians and Hessian Multiply Function.`SubproblemAlgorithm`

— Determines whether or not to attempt the direct Newton step. The default setting`'factorization'`

allows this type of step to be attempted. The setting`'cg'`

allows only conjugate gradient steps.

For a complete list of options see **Interior-Point
Algorithm** in `fmincon`

`options`

.

`fminbnd`

is a solver available in any MATLAB installation.
It solves for a local minimum in one dimension within a bounded interval.
It is not based on derivatives. Instead, it uses golden-section search
and parabolic interpolation.

`fseminf`

addresses optimization problems
with additional types of constraints compared to those addressed by `fmincon`

.
The formulation of `fmincon`

is

$$\underset{x}{\mathrm{min}}f(x)$$

such that *c*(*x*) ≤ 0, *ceq*(*x*) = 0, *A·x* ≤ *b*, *Aeq·x* = *beq*, and *l* ≤ *x* ≤ *u*.

`fseminf`

adds the following set of semi-infinite
constraints to those already given. For *w _{j}* in
a one- or two-dimensional bounded interval or rectangle

*K _{j}*(

The term “dimension” of an `fseminf`

problem
means the maximal dimension of the constraint set *I*:
1 if all *I _{j}* are intervals,
and 2 if at least one

The reason this is called semi-infinite programming is that
there are a finite number of variables (*x* and *w _{j}*),
but an infinite number of constraints. This is because the constraints
on

You might think a problem with an infinite number of constraints
is impossible to solve. `fseminf`

addresses this
by reformulating the problem to an equivalent one that has two stages:
a maximization and a minimization. The semi-infinite constraints are
reformulated as

$$\underset{{w}_{j}\in {I}_{j}}{\mathrm{max}}{K}_{j}(x,{w}_{j})\le 0\text{forall}j=1,\mathrm{...},\left|K\right|,$$ | (42) |

where |*K*| is the number of components of
the vector *K*; i.e., the number of semi-infinite
constraint functions. For fixed *x*, this is an ordinary
maximization over bounded intervals or rectangles.

`fseminf`

further simplifies the problem
by making piecewise quadratic or cubic approximations *κ _{j}*(

`fseminf`

considers
only the maxima of the interpolation function **Sampling Points. **Your semi-infinite constraint function must provide a set of
sampling points, points used in making the quadratic or cubic approximations.
To accomplish this, it should contain:

The initial spacing

`s`

between sampling points*w*A way of generating the set of sampling points

*w*from`s`

The initial spacing `s`

is a |*K*|-by-2
matrix. The *j*th row of `s`

represents
the spacing for neighboring sampling points for the constraint function *K _{j}*.
If

`s(j,2) = 0`

. `fseminf`

updates
the matrix `s`

in subsequent iterations.`fseminf`

uses the matrix `s`

to
generate the sampling points *w*, which it then uses
to create the approximation *κ _{j}*(

`s`

should
keep the same intervals or rectangles **Example of Creating Sampling Points. **Consider a problem with two semi-infinite constraints, *K*_{1} and *K*_{2}. *K*_{1} has
one-dimensional *w*_{1}, and *K*_{2} has
two-dimensional *w*_{2}. The
following code generates a sampling set from *w*_{1} = 2 to 12:

% Initial sampling interval if isnan(s(1,1)) s(1,1) = .2; s(1,2) = 0; end % Sampling set w1 = 2:s(1,1):12;

`fseminf`

specifies `s`

as `NaN`

when
it first calls your constraint function. Checking for this allows
you to set the initial sampling interval.

The following code generates a sampling set from *w*_{2} in
a square, with each component going from 1 to 100, initially sampled
more often in the first component than the second:

% Initial sampling interval if isnan(s(1,1)) s(2,1) = 0.2; s(2,2) = 0.5; end % Sampling set w2x = 1:s(2,1):100; w2y = 1:s(2,2):100; [wx,wy] = meshgrid(w2x,w2y);

The preceding code snippets can be simplified as follows:

% Initial sampling interval if isnan(s(1,1)) s = [0.2 0;0.2 0.5]; end % Sampling set w1 = 2:s(1,1):12; w2x = 1:s(2,1):100; w2y = 1:s(2,2):100; [wx,wy] = meshgrid(w2x,w2y);

`fseminf`

essentially reduces the problem
of semi-infinite programming to a problem addressed by `fmincon`

. `fseminf`

takes
the following steps to solve semi-infinite programming problems:

At the current value of

*x*,`fseminf`

identifies all the*w*such that the interpolation_{j,i}*κ*(_{j}*x*,*w*) is a local maximum. (The maximum refers to varying_{j,i}*w*for fixed*x*.)`fseminf`

takes one iteration step in the solution of the`fmincon`

problem:$$\underset{x}{\mathrm{min}}f(x)$$

such that

*c*(*x*) ≤ 0,*ceq*(*x*) = 0,*A·x*≤*b*,*Aeq·x*=*beq*, and*l*≤*x*≤*u*, where*c*(*x*) is augmented with all the maxima of*κ*(_{j}*x*,*w*) taken over all_{j}*w*∈_{j}*I*, which is equal to the maxima over_{j}*j*and*i*of*κ*(_{j}*x*,*w*)._{j,i}`fseminf`

checks if any stopping criterion is met at the new point*x*(to halt the iterations); if not, it continues to step 4.`fseminf`

checks if the discretization of the semi-infinite constraints needs updating, and updates the sampling points appropriately. This provides an updated approximation*κ*(_{j}*x*,*w*). Then it continues at step 1._{j}