Least squares, in general, is the problem of finding a vector *x* that
is a local minimizer to a function that is a sum of squares, possibly
subject to some constraints:

$$\underset{x}{\mathrm{min}}{\Vert F(x)\Vert}_{2}^{2}=\underset{x}{\mathrm{min}}{\displaystyle \sum _{i}{F}_{i}^{2}(x)}$$

such that *A·x* ≤ *b*, *Aeq·x* = *beq*, *lb ≤ x ≤ ub*.

There are several Optimization Toolbox™ solvers available
for various types of *F*(*x*) and
various types of constraints:

Solver | F(x) | Constraints |
---|---|---|

`mldivide` | C·x – d | None |

`lsqnonneg` | C·x – d | x ≥ 0 |

`lsqlin` | C·x – d | Bound, linear |

`lsqnonlin` | General F(x) | Bound |

`lsqcurvefit` | F(x, xdata)
– ydata | Bound |

There are five least-squares algorithms in Optimization Toolbox solvers,
in addition to the algorithms used in `mldivide`

:

Trust-region-reflective

Levenberg-Marquardt

`lsqlin`

active-set`lsqlin`

interior-pointThe algorithm used by

`lsqnonneg`

All the algorithms except the `lsqlin`

active-set
algorithm are large-scale; see Large-Scale vs. Medium-Scale Algorithms. For a general survey
of nonlinear least-squares methods, see Dennis [8]. Specific details on the Levenberg-Marquardt
method can be found in Moré [28].

Many of the methods used in Optimization Toolbox solvers
are based on *trust regions,* a simple yet powerful
concept in optimization.

To understand the trust-region approach to optimization, consider
the unconstrained minimization problem, minimize *f*(*x*),
where the function takes vector arguments and returns scalars. Suppose
you are at a point *x* in *n*-space
and you want to improve, i.e., move to a point with a lower function
value. The basic idea is to approximate *f* with
a simpler function *q*, which reasonably reflects
the behavior of function *f* in a neighborhood *N* around
the point *x*. This neighborhood is the trust region.
A trial step *s* is computed by minimizing (or approximately
minimizing) over *N*. This is the trust-region subproblem,

$$\underset{s}{\mathrm{min}}\left\{q(s),\text{}s\in N\right\}.$$ | (10-1) |

The current point is updated to be *x* + *s* if *f*(*x* + *s*) < *f*(*x*);
otherwise, the current point remains unchanged and *N*,
the region of trust, is shrunk and the trial step computation is repeated.

The key questions in defining a specific trust-region approach
to minimizing *f*(*x*) are how to
choose and compute the approximation *q* (defined
at the current point *x*), how to choose and modify
the trust region *N*, and how accurately to solve
the trust-region subproblem. This section focuses on the unconstrained
problem. Later sections discuss additional complications due to the
presence of constraints on the variables.

In the standard trust-region method ([48]), the quadratic approximation *q* is
defined by the first two terms of the Taylor approximation to *F* at *x*;
the neighborhood *N* is usually spherical or ellipsoidal
in shape. Mathematically the trust-region subproblem is typically
stated

$$\mathrm{min}\left\{\frac{1}{2}{s}^{T}Hs+{s}^{T}g\text{suchthat}\Vert Ds\Vert \le \Delta \right\},$$ | (10-2) |

where *g* is the gradient of *f* at
the current point *x*, *H* is the
Hessian matrix (the symmetric matrix of second derivatives), *D* is
a diagonal scaling matrix, Δ is a positive scalar, and ∥
. ∥ is the 2-norm. Good algorithms exist for solving Equation 10-2 (see [48]); such algorithms typically involve the
computation of a full eigensystem and a Newton process applied to
the secular equation

$$\frac{1}{\Delta}-\frac{1}{\Vert s\Vert}=0.$$

Such algorithms provide an accurate solution to Equation 10-2. However, they
require time proportional to several factorizations of *H*.
Therefore, for trust-region problems a different approach is needed.
Several approximation and heuristic strategies, based on Equation 10-2, have been proposed
in the literature ([42] and [50]). The approximation approach followed
in Optimization Toolbox solvers is to restrict the trust-region
subproblem to a two-dimensional subspace *S* ([39] and [42]).
Once the subspace *S* has been computed, the work
to solve Equation 10-2 is
trivial even if full eigenvalue/eigenvector information is needed
(since in the subspace, the problem is only two-dimensional). The
dominant work has now shifted to the determination of the subspace.

The two-dimensional subspace *S* is
determined with the aid of a preconditioned
conjugate gradient process described below. The solver defines *S* as
the linear space spanned by *s*_{1} and *s*_{2},
where *s*_{1} is in the direction
of the gradient *g*, and *s*_{2} is
either an approximate Newton direction, i.e.,
a solution to

$$H\cdot {s}_{2}=-g,$$ | (10-3) |

or a direction of negative curvature,

$${s}_{2}^{T}\cdot H\cdot {s}_{2}<0.$$ | (10-4) |

The philosophy behind this choice of *S* is
to force global convergence (via the steepest descent direction or
negative curvature direction) and achieve fast local convergence (via
the Newton step, when it exists).

A sketch of unconstrained minimization using trust-region ideas is now easy to give:

Formulate the two-dimensional trust-region subproblem.

Solve Equation 10-2 to determine the trial step

*s*.If

*f*(*x*+*s*) <*f*(*x*), then*x*=*x*+*s*.Adjust Δ.

These four steps are repeated until convergence. The trust-region
dimension Δ is adjusted according to standard rules. In particular,
it is decreased if the trial step is not accepted, i.e., *f*(*x* + *s*)
≥ *f*(*x*).
See [46] and [49] for a discussion of this aspect.

Optimization Toolbox solvers treat a few important special
cases of *f* with specialized functions: nonlinear
least-squares, quadratic functions, and linear least-squares. However,
the underlying algorithmic ideas are the same as for the general case.
These special cases are discussed in later sections.

An important special case for *f*(*x*)
is the nonlinear least-squares problem

$$\underset{x}{\mathrm{min}}{\displaystyle \sum _{i}{f}_{i}^{2}(x)}=\underset{x}{\mathrm{min}}{\Vert F(x)\Vert}_{2}^{2},$$ | (10-5) |

where *F*(*x*) is a vector-valued
function with component *i* of *F*(*x*)
equal to *f _{i}*(

$$\mathrm{min}{\Vert Js+F\Vert}_{2}^{2},$$ | (10-6) |

(where *J* is the Jacobian of *F*(*x*))
is used to help define the two-dimensional subspace *S*.
Second derivatives of the component function *f _{i}*(

In each iteration the method of preconditioned conjugate gradients is used to approximately solve the normal equations, i.e.,

$${J}^{T}Js=-{J}^{T}F,$$

although the normal equations are not explicitly formed.

In this case the function *f*(*x*)
to be solved is

$$f(x)={\Vert Cx+d\Vert}_{2}^{2},$$

possibly subject to linear constraints. The algorithm generates
strictly feasible iterates converging, in the limit, to a local solution.
Each iteration involves the approximate solution of a large linear
system (of order *n*, where *n* is
the length of *x*). The iteration matrices have the
structure of the matrix *C*. In particular, the method
of preconditioned conjugate gradients is used to approximately solve
the normal equations, i.e.,

$${C}^{T}Cx=-{C}^{T}d,$$

although the normal equations are not explicitly formed.

The subspace trust-region method is used to determine a search direction. However, instead of restricting the step to (possibly) one reflection step, as in the nonlinear minimization case, a piecewise reflective line search is conducted at each iteration, as in the quadratic case. See [45] for details of the line search. Ultimately, the linear systems represent a Newton approach capturing the first-order optimality conditions at the solution, resulting in strong local convergence rates.

**Jacobian Multiply Function. **`lsqlin`

can solve the linearly-constrained
least-squares problem without using the matrix *C* explicitly.
Instead, it uses a Jacobian multiply function `jmfun`

,

W = jmfun(Jinfo,Y,flag)

that you provide. The function must calculate the following
products for a matrix *Y*:

If

`flag == 0`

then`W = C'*(C*Y)`

.If

`flag > 0`

then`W = C*Y`

.If

`flag < 0`

then`W = C'*Y`

.

This can be useful if *C* is large, but contains
enough structure that you can write `jmfun`

without
forming *C* explicitly. For an example, see Jacobian Multiply Function with Linear Least Squares.

The `lsqlin`

`'interior-point'`

algorithm
uses the `interior-point-convex`

`quadprog`

Algorithm.
The quadprog problem definition is to minimize a quadratic function

$$\underset{x}{\mathrm{min}}\frac{1}{2}{x}^{T}Hx+{c}^{T}x$$

subject to linear constraints and bound constraints. The `lsqlin`

function
minimizes the squared 2-norm of the vector *Cx – d* subject to linear constraints and bound constraints.
In other words, `lsqlin`

minimizes

$$\begin{array}{c}{\Vert Cx-d\Vert}_{2}^{2}={\left(Cx-d\right)}^{T}\left(Cx-d\right)\\ =\left({x}^{T}{C}^{T}-{d}^{T}\right)\left(Cx-d\right)\\ =\left({x}^{T}{C}^{T}Cx\right)-\left({x}^{T}{C}^{T}d-{d}^{T}Cx\right)+{d}^{T}d\\ =\frac{1}{2}{x}^{T}\left(2{C}^{T}C\right)x+{\left(-2{C}^{T}d\right)}^{T}x+{d}^{T}d.\end{array}$$

This fits into the `quadprog`

framework by
setting the *H* matrix to 2*C ^{T}C* and
the

`lsqlin`

problem, the `quadprog`

`'interior-point-convex'`

algorithm
calculates the solution.In the least-squares problem a function *f*(*x*)
is minimized that is a sum of squares.

$$\underset{x}{\mathrm{min}}f(x)={\Vert F(x)\Vert}_{2}^{2}={\displaystyle \sum _{i}{F}_{i}^{2}(x)}.$$ | (10-7) |

Problems of this type occur in a large number of practical applications,
especially when fitting model functions to data, i.e., nonlinear parameter
estimation. They are also prevalent in control where you want the
output, *y*(*x,t*), to follow some
continuous model trajectory, * φ*(*t*),
for vector *x* and scalar *t*.
This problem can be expressed as

$$\underset{x\in {\Re}^{n}}{\mathrm{min}}{\displaystyle \underset{{t}_{1}}{\overset{{t}_{2}}{\int}}{\left(y(x,t)-\phi (t)\right)}^{2}}dt,$$ | (10-8) |

where *y*(*x,t*) and * φ*(*t*)
are scalar functions.

When the integral is discretized using a suitable quadrature formula, the above can be formulated as a least-squares problem:

$$\underset{x\in {\Re}^{n}}{\mathrm{min}}f(x)={\displaystyle \sum _{i=1}^{m}{\left(\overline{y}(x,{t}_{i})-\overline{\phi}({t}_{i})\right)}^{2}},$$ | (10-9) |

where $$\overline{y}$$ and $$\overline{\phi}$$ include the weights of the quadrature
scheme. Note that in this problem the vector *F*(*x*)
is

$$F(x)=\left[\begin{array}{c}\overline{y}(x,{t}_{1})-\overline{\phi}({t}_{1})\\ \overline{y}(x,{t}_{2})-\overline{\phi}({t}_{2})\\ \mathrm{...}\\ \overline{y}(x,{t}_{m})-\overline{\phi}({t}_{m})\end{array}\right].$$

In problems of this kind, the residual ∥*F*(*x*)∥ is
likely to be small at the optimum since it is general practice to
set realistically achievable target trajectories. Although the function
in LS can be minimized using a general unconstrained minimization
technique, as described in Basics of Unconstrained Optimization, certain characteristics
of the problem can often be exploited to improve the iterative efficiency
of the solution procedure. The gradient and Hessian matrix of LS have
a special structure.

Denoting the *m*-by-*n* Jacobian
matrix of *F*(*x*) as *J*(*x*),
the gradient vector of *f*(*x*)
as *G*(*x*), the Hessian matrix
of *f*(*x*) as *H*(*x*),
and the Hessian matrix of each *F _{i}*(

$$\begin{array}{l}G(x)=2J{(}^{x}F(x)\\ H(x)=2J{(}^{x}J(x)+2Q(x),\end{array}$$ | (10-10) |

where

$$Q(x)={\displaystyle \sum _{i=1}^{m}{F}_{i}(x)\cdot {H}_{i}(x)}.$$

The matrix *Q*(*x*) has the
property that when the residual ∥*F*(*x*)∥ tends
to zero as *x _{k}* approaches
the solution, then

In the Gauss-Newton method, a search direction, *d _{k}*,
is obtained at each major iteration,

$$\underset{x\in {\Re}^{n}}{\mathrm{min}}{\Vert J({x}_{k})-F({x}_{k})\Vert}_{2}^{2}.$$ | (10-11) |

The direction derived from this method is equivalent to the
Newton direction when the terms of *Q*(*x*)
can be ignored. The search direction *d _{k}* can
be used as part of a line search strategy to ensure that at each iteration
the function

The Gauss-Newton method often encounters problems when the second-order
term *Q*(*x*) is significant. A
method that overcomes this problem is the Levenberg-Marquardt method.

The Levenberg-Marquardt [25], and [27] method uses a search direction that is a solution of the linear set of equations

$$\left(J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)+{\lambda}_{k}I\right){d}_{k}=-J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right),$$ | (10-12) |

or, optionally, of the equations

$$\left(J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)+{\lambda}_{k}diag\left(J{\left({x}_{k}\right)}^{T}J\left({x}_{k}\right)\right)\right){d}_{k}=-J{\left({x}_{k}\right)}^{T}F\left({x}_{k}\right),$$ | (10-13) |

where the scalar *λ _{k}* controls
both the magnitude and direction of

`ScaleProblem`

to `'none'`

to
choose Equation 10-12,
and set `ScaleProblem`

to `'Jacobian'`

to
choose Equation 10-13.You set the initial value
of the parameter *λ*_{0} using
the `InitDamping`

option. Occasionally, the `0.01`

default
value of this option can be unsuitable. If you find that the Levenberg-Marquardt
algorithm makes little initial progress, try setting `InitDamping`

to
a different value than the default, perhaps `1e2`

.

When *λ _{k}* is zero,
the direction

The Levenberg-Marquardt method therefore uses a search direction that is a cross between the Gauss-Newton direction and the steepest descent direction. This is illustrated in Figure 10-1, Levenberg-Marquardt Method on Rosenbrock's Function. The solution for Rosenbrock's function converges after 90 function evaluations compared to 48 for the Gauss-Newton method. The poorer efficiency is partly because the Gauss-Newton method is generally more effective when the residual is zero at the solution. However, such information is not always available beforehand, and the increased robustness of the Levenberg-Marquardt method compensates for its occasional poorer efficiency.

**Figure 10-1. Levenberg-Marquardt Method on Rosenbrock's Function**

For a more complete description of this figure, including scripts that generate the iterative points, see Banana Function Minimization.

`lsqlin`

Active-Set AlgorithmThe `lsqlin`

solver addresses the minimization
problem

$$\underset{x}{\mathrm{min}}\frac{1}{2}{\Vert C\cdot x-d\Vert}_{2}^{2}$$

subject to bounds and linear constraints. A simple calculation turns this into a quadratic programming problem.

Recall the problem `quadprog`

addresses:

$$\underset{x}{\mathrm{min}}\frac{1}{2}{x}^{T}Hx+{c}^{T}x$$ | (10-14) |

subject to bounds and linear constraints. But

$$\begin{array}{c}\frac{1}{2}{\Vert Cx-d\Vert}_{2}^{2}=\frac{1}{2}{\left(Cx-d\right)}^{T}\left(Cx-d\right)\\ =\frac{1}{2}{x}^{T}\left({C}^{T}C\right)x-{d}^{T}Cx+\frac{1}{2}{d}^{T}d.\end{array}$$

So making the definitions *H* = *C ^{T}C* and

`lsqlin`

problem becomes a `quadprog`

problem,
with an additional constant 1/2 Both solvers accept constraints of the form *A·x* ≤ *b*, *Aeq·x* = *beq*, and *l* ≤ *x* ≤ *u*. *m* is
the total number of linear constraints, the sum of number of rows
of *A* and of *Aeq*.

The The `lsqlin`

active-set
algorithm is an active-set strategy (also known as a projection method)
similar to that of Gill et al., described in [18] and [17]. It has been
modified for both Linear Programming (LP) and Quadratic Programming
(QP) problems. The remainder of the algorithm description addresses
the equivalent quadratic programming (QP) problem.

The solution procedure involves two phases. The first phase involves the calculation of a feasible point (if one exists). The second phase involves the generation of an iterative sequence of feasible points that converge to the solution.

In this method an active set matrix, *S _{k}*,
is maintained that is an estimate of the active constraints (i.e.,
those that are on the constraint boundaries) at the solution point.
Specifically, the active set

The matrix *Z _{k}* is formed
from the last

$${Z}_{k}=Q\left[:,l+1:m\right],$$ | (10-15) |

where

$${Q}^{T}{S}_{k}^{T}=\left[\begin{array}{c}R\\ 0\end{array}\right].$$

Once *Z _{k}* is found,
a search direction

Then if you view the quadratic objective function as a function
of *p*, by substituting for *d _{k}*,
the result is

$$q(p)=\frac{1}{2}{p}^{T}{Z}_{k}^{T}H{Z}_{k}p+{c}^{T}{Z}_{k}p.$$ | (10-16) |

Differentiating this with respect to *p* yields

$$\nabla q(p)={Z}_{k}^{T}H{Z}_{k}p+{Z}_{k}^{T}c.$$ | (10-17) |

∇*q*(*p*) is
referred to as the projected gradient of the quadratic function because
it is the gradient projected in the subspace defined by *Z _{k}*.
The term $${Z}_{k}^{T}H{Z}_{k}$$ is called the projected Hessian.
Assuming the Hessian matrix

$${Z}_{k}^{T}H{Z}_{k}p=-{Z}_{k}^{T}c.$$ | (10-18) |

The next step is

$${x}_{k+1}={x}_{k}+\alpha {d}_{k},\text{where}{d}_{k}={Z}_{k}p.$$ | (10-19) |

At each iteration, because of the quadratic nature of the objective
function, there are only two choices of step length *α*.
A step of unity along *d _{k}* is
the exact step to the minimum of the function restricted to the null
space of

$$\alpha =\underset{i\in \left\{1,\mathrm{...},m\right\}}{\mathrm{min}}\left\{\frac{-\left({A}_{i}{x}_{k}-{b}_{i}\right)}{{A}_{i}{d}_{k}}\right\},$$ | (10-20) |

which is defined for constraints not in the active set, and
where the direction *d _{k}* is
towards the constraint boundary, i.e., $${A}_{i}{d}_{k}>0,\text{}i=1,\mathrm{...},m$$.

Lagrange multipliers, *λ _{k}*,
are calculated that satisfy the nonsingular set of linear equations

$${S}_{k}^{T}{\lambda}_{k}=c.$$ | (10-21) |

If all elements of *λ _{k}* are
positive,

The algorithm requires a feasible point to start. If the initial point is not feasible, then you can find a feasible point by solving the linear programming problem

$$\begin{array}{l}\underset{\gamma \in \Re ,\text{}x\in {\Re}^{n}}{\mathrm{min}}\gamma \text{suchthat}\\ {A}_{i}x={b}_{i},\text{}i=1,\mathrm{...},{m}_{e}\text{(therowsof}Aeq)\\ {A}_{i}x-\gamma \le {b}_{i},\text{}i={m}_{e}+1,\mathrm{...},m\text{(therowsof}A).\end{array}$$ | (10-22) |

The notation *A _{i}* indicates
the

You can modify the preceding QP algorithm
for LP problems by setting the search direction *d* to
the steepest descent direction at each iteration, where *g _{k}* is
the gradient of the objective function (equal to the coefficients
of the linear objective function):

$$d=-{Z}_{k}{Z}_{k}^{T}{g}_{k}.$$ | (10-23) |

If a feasible point is found using the preceding LP method,
the main QP phase is entered. The search direction *d _{k}* is
initialized with a search direction

$$H{d}_{1}=-{g}_{k},$$ | (10-24) |

where *g _{k}* is the gradient
of the objective function at the current iterate

Was this topic helpful?