Introduction to Least-Squares Fitting

A regression model relates response data to predictor data with one or more coefficients. A fitting method is an algorithm that calculates the model coefficients given a set of input data. Curve Fitting Toolbox™ uses least-squares fitting methods to estimate the coefficients of a regression model.

Curve Fitting Toolbox supports the following least-squares fitting methods:

Linear least-squares
Weighted least-squares
Robust least-squares
Nonlinear least-squares

The type of regression model and the properties of the input data determine which least-squares method is most appropriate for estimating model coefficients.

Calculating Residuals

A residual for a data point is the difference between the value of the observed response and the response estimate returned by the fitted model. The formula for calculating the vector of estimated responses is

$\hat{y} = f (X, b)$

where

$\hat{y}$ is an n-by-1 vector of response estimates
f is the general form of the regression model.
X is an n-by-m design matrix.
b is an m-by-1 vector of fitted model coefficients.

A least-squares fitting method calculates model coefficients that minimize the sum of squared errors (SSE), which is also called the residual sum of squares. Given a set of n data points, the residual for the ith data point r_i is calculated with the formula

$r_{i} = y_{i} - {\hat{y}}_{i}$

where y_i is the ith observed response value and ŷ_i is the ith fitted response value. The SSE is given by

$S S E = \sum_{i = 1}^{n} r_{i}^{2} = \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}$

Error Assumptions

The difference between the observed and true values for a data point is called the error. Because it cannot be observed directly, the error for a data point is approximated with the data point's residual.

Least-squares fitting methods are most accurate for data sets that do not contain a large number of random errors with extreme values. Statistical results, such as confidence and prediction bounds, assume that errors are normally distributed. Data fitting techniques typically make two important assumptions about the error in data containing random variations:

The error exists only in the response data, and not in the predictor data.
The errors are random and follow a normal distribution with zero mean and constant variance.

Data fitting techniques assume that errors are normally distributed because the normal distribution often provides an adequate approximation to the distribution of many measured quantities. Although the least-squares fitting method does not assume normally distributed errors when calculating parameter estimates, the method works best for data that does not contain a large number of random errors with extreme values. The normal distribution is one of the probability distributions in which extreme random errors are uncommon. However, statistical results, such as confidence and prediction bounds require normally distributed errors for their validity.

If the mean of the residuals is nonzero, check whether the residuals are influenced by the choice of model or predictor variables. For fitting methods other than weighted least squares, Curve Fitting Toolbox additionally assumes that the errors have constant variance across the values of the predictor variables. Residuals that do not have a constant variance indicate that the fit might be influenced by poor quality data.

Linear Least Squares

Curve Fitting Toolbox uses the linear least-squares method to fit a linear model to data. A linear model is defined as an equation that is linear in its coefficients. Use the linear least-squares fitting method when the data contains few extreme values, and the variance of the error is constant across predictor variables.

A linear model of degree m – 1 has the matrix form

$y = X β + ε$

where

y is an n-by-1 vector of response data.
$β$ is an m-by-1 vector of unknown coefficients.
X is an n-by-m design matrix containing m – 1 predictor columns. Each predictor variable corresponds to a column in X. The last column in X is a column of ones representing the model's constant term.
$ε$ is an n-by-1 vector of unknown errors.

For example, a first-degree polynomial of the form

$y = p_{1} x + p_{2}$

is given by

$[\begin{matrix} y_{1} \\ y_{2} \\ y_{3} \\ . \\ . \\ . \\ y_{n} \end{matrix}] = [\begin{matrix} x_{1} 1 \\ x_{2} 1 \\ x_{3} 1 \\ . \\ . \\ . \\ x_{n} 1 \end{matrix}] \times [\begin{array}{l} p_{1} \\ p_{2} \end{array}]$

You cannot calculate $β$ directly because $ε$ is unknown. The linear least-squares fitting method approximates $β$ by calculating a vector of coefficients b that minimizes the SSE. Curve Fitting Toolbox calculates b by solving a system of equations called the normal equations. The normal equations are given by the formula

$(X^{T} X) b = X^{T} y$

where X^T is the transpose of the matrix X. The formula for b is then

$b = {(X^{T} X)}^{- 1} X^{T} y$

To solve the system of simultaneous linear equations for unknown coefficients, use the MATLAB^® backslash operator (mldivide). Because inverting X^TX can lead to unacceptable rounding errors, the backslash operator uses QR decomposition with pivoting, which is a stable algorithm numerically. See Arithmetic Operations for more information about the backslash operator and QR decomposition. To calculate the vector of fitted response values ŷ, substitute b into the model formula.

$\hat{y} = X b$

For an example of fitting a polynomial model using the linear least-squares fitting method, see Fit Polynomial Model to Data.

Weighted Least Squares

If the response data error does not have constant variance across the values of the predictor data, the fit can be influenced by poor quality data. The weighted least-squares fitting method uses scaling factors called weights to influence the effect of a response value on the calculation of model coefficients. Use the weighted least-squares fitting method if the weights are known, or if the weights follow a particular form.

The weighted least-squares fitting method introduces weights in the formula for the SSE, which becomes

$S S E = \sum_{i = 1}^{n} w_{i} {(y_{i} - {\hat{y}}_{i})}^{2}$

where w_i are the weights. The weights you supply should transform the response variances to a constant value. If you know the variances $σ_{i}^{2}$ of the measurement errors in your data, then the weights are given by $w_{i} = \frac{1}{σ_{i}^{2}}$ . Alternatively, you can use the residuals to estimate the error in the calculation of the $σ_{i}^{2}$ .

The weighted formula for the SSE yields the formula for b

$b = {(X^{T} W X)}^{- 1} X^{T} W y$

where W is a diagonal matrix such that $W_{i i} = w_{i}$ .

For an example of fitting a polynomial model using the weighted least-squares fitting method, see Improve Model Fit with Weights.

Robust Least Squares

Extreme values in the response data are called outliers. Linear least-squares fitting is sensitive to outliers because squaring the residuals magnifies the effects of these data points in the SSE calculation. Use the robust least-squares fitting method if your data contains outliers.

Curve Fitting Toolbox provides the following robust least-squares fitting methods:

Least absolute residuals (LAR) — This method finds a curve that minimizes the absolute residuals rather than the squared differences. Therefore, extreme values have less influence on the fit.
Bisquare weights — This method minimizes a weighted sum of squares, where the weight given to each data point depends on how far the point is from a fitted curve. Points near the fitted curve get full weight. Points farther from the curve get reduced weight. Points that are farther from the curve than expected by random chance get zero weight.
The bisquare weights method is often preferred over LAR because it simultaneously seeks to find a curve that fits the bulk of the data using the least-squares approach while minimizing the effect of outliers.

Robust bisquare weights fitting uses the iteratively reweighted least-squares algorithm, which follows these steps:

Fit the model by weighted least squares. For the first iteration, the algorithm uses weights equal to one unless you specify the weights.
Calculate the adjusted residuals and standardize them. The adjusted residuals are given by

$r_{a d j} = \frac{r_{i}}{\sqrt{1 - h_{i}}}$
where h_i are parameters that reduce the weight of data points that are far from the fitted curve. The standardized adjusted residuals are given by

$u = \frac{r_{a d j}}{K s}$
where K=4.685 is a tuning constant, and s is the robust standard deviation given by dividing the median absolute deviation (MAD) of the residuals by 0.6745.
Calculate the robust weights as a function of u. The bisquare weights are given by
$w_{i} = {\begin{matrix} {(1 - {(u_{i})}^{2})}^{2} & | u_{i} | < 1 \\ 0 & | u_{i} | \geq 1 \end{matrix}$
If the fit converges, exit the iteration process. Otherwise, perform the next iteration of the bisquare weights fitting method by returning to step 1.

Instead of minimizing the effects of outliers by using robust least-squares fitting, you can mark data points to be excluded from the fit. See Remove Outliers for more information.

For an example of fitting a polynomial model using the robust least-squares fitting method, see Compare Robust Fitting Methods.

Nonlinear Least Squares

Curve Fitting Toolbox uses the nonlinear least-squares method to fit a nonlinear model to data. A nonlinear model is defined as an equation that is nonlinear in the coefficients, or has a combination of linear and nonlinear coefficients. Exponential, Fourier, and Gaussian models are nonlinear, for example.

A nonlinear model has the matrix form

$y = f (X, β) + ε$

where

y is an n-by-1 vector of response data.
$β$ is an m-by-1 vector of coefficients.
X is the n-by-m design matrix.
f is a nonlinear function of $β$ and X.
$ε$ is an n-by-1 vector of unknown errors.

In a nonlinear model, unlike a linear model, the approximate coefficients b cannot be calculated using matrix techniques. Curve Fitting Toolbox uses the following iterative approach to calculate the coefficients:

Initialize the coefficient values. For some nonlinear models, the toolbox provides a heuristic approach for calculating initial values. For other models, the coefficients are initialized with random values in the interval [0,1].
Calculate the fitted curve for the current set of coefficients. The fitted response value ŷ is given by $\hat{y} = f (X, b)$ and is calculated using the Jacobian of $f (X, β)$ . The Jacobian of $f (X, β)$ is defined as a matrix of partial derivatives taken with respect to the coefficients in $β$ .
Adjust the coefficients using one of these nonlinear least-squares algorithms:
- Trust-region — This algorithm is the default. You must use the trust-region algorithm if you specify coefficient constraints. The trust-region algorithm can solve difficult nonlinear problems more efficiently than other algorithms and is an improvement over the popular Levenberg-Marquardt algorithm.
- Levenberg-Marquardt — If the trust-region algorithm does not produce a reasonable fit, and you do not have coefficient constraints, use the Levenberg-Marquardt algorithm.
If the fit satisfies the specified convergence criteria, exit the iteration. Otherwise, return to step 2.

Curve Fitting Toolbox supports the use of weights and robust fitting to calculate the SSE for nonlinear models.

The accuracy of a nonlinear model's predictions depends on the type of the model, the convergence criteria, the data set, and the initial values assigned to the coefficients. If the default options do not yield a reasonable fit, experiment with different starting values for the model coefficients, nonlinear least-squares algorithms, and convergence criteria. In general, begin by modifying the coefficient starting values, because nonlinear model fits are particularly sensitive to the starting values for the model coefficients. See Specify Fit Options and Optimized Starting Points for more information about modifying the default options.

For an example of fitting an exponential model using the nonlinear least-squares fitting method, see Fit Exponential Model to Data.

References

[1] DuMouchel, W. H., and F. L. O'Brien. “Integrating a Robust Option into a Multiple Regression Computing Environment.” Computer Science and Statistics: Proceedings of the 21st Symposium on the Interface. Alexandria, VA: American Statistical Association, 1989.

[2] Holland, P. W., and R. E. Welsch. “Robust Regression Using Iteratively Reweighted Least-Squares.” Communications in Statistics: Theory and Methods, A6, 1977, pp. 813–827.