| Statistics Toolbox™ | ![]() |
| On this page… |
|---|
In statistics, linear regression models often take the form of something like this:
![]()
Here a response variable y is modeled as a combination of constant, linear, interaction, and quadratic terms formed from two predictor variables x1 and x2. Uncontrolled factors and experimental errors are modeled by ε. Given data on x1, x2, and y, regression estimates the model parameters βj (j = 1, ..., 5).
More general linear regression models represent the relationship between a continuous response y and a continuous or categorical predictor x in the form:
![]()
The response is modeled as a linear combination of (not necessarily linear) functions of the predictor, plus a random error ε. The expressions fj(x) (j = 1, ..., p) are the terms of the model. The βj (j = 1, ..., p) are the coefficients. Errors ε are assumed to be uncorrelated and distributed with mean 0 and constant (but unknown) variance.
Examples of linear regression models with a scalar predictor variable x include:
Linear additive (straight-line) models — Terms are f1(x) = 1 and f2(x) = x.
Polynomial models — Terms are f1(x) = 1, f2(x) = x, …, fp(x) = xp–1.
Chebyshev orthogonal polynomial models — Terms are f1(x) = 1, f2(x) = x, …, fp(x) = 2xfp–1(x) – fp–2(x).
Fourier trigonometric polynomial models — Terms are f1(x) = 1/2 and sines and cosines of different frequencies.
Examples of linear regression models with a vector of predictor variables x = (x1, ..., xN) include:
Linear additive (hyperplane) models — Terms are f1(x) = 1 and fk(x) = xk (k = 1, ..., N).
Pairwise interaction models — Terms are linear additive terms plus gk1k2(x) = xk1xk2 (k1, k2 = 1, ..., N, k1 ≠ k2).
Quadratic models — Terms are pairwise interaction terms plus hk(x) = xk2 (k = 1, ..., N).
Pure quadratic models — Terms are quadratic terms minus the gk1k2(x) terms.
Whether or not the predictor x is a vector of predictor variables, multivariate regression refers to the case where the response y = (y1, ..., yM) is a vector of M response variables. See Multivariate Regression for more on multivariate regression models.
Given n independent observations (x1, y1), …, (xn, yn) of the predictor x and the response y, the linear regression model becomes an n-by-p system of equations:

X is the design matrix of the system. The columns of X are the terms of the model evaluated at the predictors. To fit the model to the data, the system must be solved for the p coefficient values in β = (β1, …, βp)T.
The MATLAB® backslash operator \ (mldivide) solves systems of linear equations. Ignoring the unknown error ε, MATLAB estimates model coefficients in y = Xβ using
betahat = X\y
where X is the design matrix and y is the vector of observed responses. MATLAB returns the least-squares solution to the system; betahat minimizes the norm of the residual vector y-X*beta over all beta. If the system is consistent, the norm is 0 and the solution is exact. In this case, the regression model interpolates the data. In more typical regression cases where n > p and the system is overdetermined, the least-squares solution estimates model coefficients obscured by the error ε.
The least-squares estimator betahat has several important statistical properties. First, it is unbiased, with expected value β. Second, by the Gauss-Markov theorem, it has minimum variance among all unbiased estimators formed from linear combinations of the response data. Under the additional assumption that ε is normally distributed, betahat is a maximum likelihood estimator. The assumption also implies that the estimates themselves are normally distributed, which is useful for computing confidence intervals. Even without the assumption, by the Central Limit theorem, the estimates have an approximate normal distribution if the sample size is large enough.
Visualize the least-squares estimator as follows.

For betahat to minimize norm(y-X*beta), y-X*betahat must be perpendicular to the column space of X, which contains all linear combinations of the model terms. This requirement is summarized in the normal equations, which express vanishing inner products between y-X*betahat and the columns of X:
![]()
or
![]()
If X is n-by-p, the normal equations are a p-by-p square system with solution betahat = inv(X'*X)*X'*y, where inv is the MATLAB inverse operator. The matrix inv(X'*X)*X' is the pseudoinverse of X, computed by the MATLAB function pinv.
The normal equations are often badly conditioned relative to the original system y = Xβ (the coefficient estimates are much more sensitive to the model error ε), so the MATLAB backslash operator avoids solving them directly. Instead, a QR (orthogonal, triangular) decomposition of X is used to create a simpler, more stable triangular system:

Statistics Toolbox™ functions like regress and regstats call the MATLAB backslash operator to perform linear regression. The QR decomposition is also used for efficient computation of confidence intervals.
Once betahat is computed, the model can be evaluated at the predictor data:
yhat = X*betahat
or
yhat = X*inv(X'*X)*X'*y
H = X*inv(X'*X)*X' is the hat matrix. It is a square, symmetric n-by-n matrix determined by the predictor data. The diagonal elements H(i,i) (i = 1, ..., n) give the leverage of the ith observation. Since yhat = H*y, leverage values determine the influence of the observed response y(i) on the predicted response yhat(i). For leverage values near 1, the predicted response approximates the observed response. The Statistics Toolbox function leverage computes leverage values from a QR decomposition of X.
Component residual values in y-yhat are useful for detecting failures in model assumptions. Like the errors in ε, residuals have an expected value of 0. Unlike the errors, however, residuals are correlated, with nonconstant variance. Residuals may be "Studentized" (scaled by an estimate of their standard deviation) for comparison. Studentized residuals are used by Statistics Toolbox functions like regress and robustfit to identify outliers in the data.
The system of linear equations

in Linear Regression Models expresses a response y as a linear combination of model terms fj(x) (j = 1, ..., p) at each of the observations (x1, y1), …, (xn, yn).
If the predictor x is multidimensional, so are the functions fj that form the terms of the model. For example, if the predictor is x = (x1, x2), terms for the model might include f1(x) = x1 (a linear term), f2(x) = x12 (a quadratic term), and f3(x) = x1x2 (a pairwise interaction term). Typically, the function f(x) = 1 is included among the fj, so that the design matrix X contains a column of 1s and the model contains a constant (y-intercept) term.
Multiple linear regression models are useful for:
Understanding which terms fj(x) have greatest effect on the response (coefficients βj with greatest magnitude)
Finding the direction of the effects (signs of the βj)
Predicting unobserved values of the response (y(x) for new x)
The Statistics Toolbox functions regress and regstats are used for multiple linear regression analysis.
For example, the file moore.mat contains the 20-by-6 data matrix moore. The first five columns are measurements of five predictor variables. The final column contains the observed responses. Coefficient estimates betahat for a linear additive model are found with regress as follows:
load moore
X1 = [ones(size(moore,1),1) moore(:,1:5)];
y = moore(:,6);
betahat = regress(y,X1)
betahat =
-2.1561
-0.0000
0.0013
0.0001
0.0079
0.0001Before using regress give the design matrix X1 a first column of 1s to include a constant term in the model, betahat(1).
The MATLAB backslash (mldivide) operator, which regress calls, obtains the same result:
betahat = X1\y
betahat =
-2.1561
-0.0000
0.0013
0.0001
0.0079
0.0001The advantage of working with regress is that it allows for additional inputs and outputs relevant to statistical analysis of the regression. For example:
alpha = 0.05; [betahat,Ibeta,res,Ires,stats] = regress(y,X1,alpha);
returns not only the coefficient estimates in betahat, but also
Ibeta — A p-by-2 matrix of 95% confidence intervals for the coefficient estimates, using a 100*(1-alpha)% confidence level. The first column contains lower confidence bounds for each of the p coefficient estimates; the second column contains upper confidence bounds.
res — An n-by-1 vector of residuals.
Ires — An n-by-2 matrix of intervals that can be used to diagnose outliers. If the interval Ires(i,:) for observation i does not contain zero, the corresponding residual is larger than expected in 100*(1-alpha)% of new observations, suggesting an outlier.
stats — A 1-by-4 vector that contains, in order, the R2 statistic, the F statistic and its p-value, and an estimate of the error variance. The statistics are computed assuming the model contains a constant term, and are incorrect otherwise.
Visualize the residuals, in case (row number) order, with the rcoplot function:
rcoplot(res,Ires)

The interval around the first residual, shown in red when plotted, does not contain zero. This indicates that the residual is larger than expected in 95% of new observations, and suggests the data point is an outlier.
Outliers in regression appear for a variety of reasons:
If there is sufficient data, 5% of the residuals, by the definition of rint, are too big.
If there is a systematic error in the model (that is, if the model is not appropriate for generating the data under model assumptions), the mean of the residuals is not zero.
If the errors in the model are not normally distributed, the distributions of the residuals may be skewed or leptokurtic (with heavy tails and more outliers).
When errors are normally distributed, Ires(i,:) is a confidence interval for the mean of res(i) and checking if the interval contains zero is a test of the null hypothesis that the residual has zero mean.
The function regstats also performs multiple linear regression, but computes more statistics than regress. By default, regstats automatically adds a first column of 1s to the design matrix (necessary for computing the F statistic and its p-value), so a constant term should not be included explicitly as for regress. For example:
X2 = moore(:,1:5); stats = regstats(y,X2);
creates a structure stats with fields containing regression statistics. An optional input argument allows you to specify which statistics are computed.
To interactively specify the computed statistics, call regstats without an output argument. For example:
regstats(y,X2)
opens the following interface.

Select the check boxes corresponding to the statistics you want to compute and click OK. Selected statistics are returned to the MATLAB workspace. Names of container variables for the statistics appear on the right-hand side of the interface, where they can be changed to any valid MATLAB variable name.
The regstats function computes statistics that are typically used in regression diagnostics. Statistics can be formatted into standard tabular displays in a variety of ways. For example, the tstat field of the stats output structure of regstats is itself a structure containing statistics related to the estimated coefficients of the regression. Dataset arrays (see Dataset Arrays) provide a natural tabular format for the information:
t = stats.tstat;
CoeffTable = dataset({t.beta,'Coef'},{t.se,'StdErr'}, ...
{t.t,'tStat'},{t.pval,'pVal'})
CoeffTable =
Coef StdErr tStat pVal
-2.1561 0.91349 -2.3603 0.0333
-9.0116e-006 0.00051835 -0.017385 0.98637
0.0013159 0.0012635 1.0415 0.31531
0.0001278 7.6902e-005 1.6618 0.11876
0.0078989 0.014 0.56421 0.58154
0.00014165 7.3749e-005 1.9208 0.075365The MATLAB function fprintf gives you control over tabular formatting. For example, the fstat field of the stats output structure of regstats is a structure with statistics related to the analysis of variance (ANOVA) of the regression. The following commands produce a standard regression ANOVA table:
f = stats.fstat;
fprintf('\n')
fprintf('Regression ANOVA');
fprintf('\n\n')
fprintf('%6s','Source');
fprintf('%10s','df','SS','MS','F','P');
fprintf('\n')
fprintf('%6s','Regr');
fprintf('%10.4f',f.dfr,f.ssr,f.ssr/f.dfr,f.f,f.pval);
fprintf('\n')
fprintf('%6s','Resid');
fprintf('%10.4f',f.dfe,f.sse,f.sse/f.dfe);
fprintf('\n')
fprintf('%6s','Total');
fprintf('%10.4f',f.dfe+f.dfr,f.sse+f.ssr);
fprintf('\n')
The result looks like this:
Regression ANOVA Source df SS MS F P Regr 5.0000 4.1084 0.8217 11.9886 0.0001 Resid 14.0000 0.9595 0.0685 Total 19.0000 5.0679
The models described in Linear Regression Models are based on certain assumptions, such as a normal distribution of errors in the observed responses. If the distribution of errors is asymmetric or prone to outliers, model assumptions are invalidated, and parameter estimates, confidence intervals, and other computed statistics become unreliable. The Statistics Toolbox function robustfit is useful in these cases. The function implements a robust fitting method that is less sensitive than ordinary least squares to large changes in small parts of the data.
Robust regression works by assigning a weight to each data point. Weighting is done automatically and iteratively using a process called iteratively reweighted least squares. In the first iteration, each point is assigned equal weight and model coefficients are estimated using ordinary least squares. At subsequent iterations, weights are recomputed so that points farther from model predictions in the previous iteration are given lower weight. Model coefficients are then recomputed using weighted least squares. The process continues until the values of the coefficient estimates converge within a specified tolerance.
The example in Multiple Linear Regression shows an outlier when ordinary least squares is used to model the response variable as a linear combination of the five predictor variables. To determine the influence of the outlier, compare the coefficient estimates computed by regress:
load moore
X1 = [ones(size(moore,1),1) moore(:,1:5)];
y = moore(:,6);
betahat = regress(y,X1)
betahat =
-2.1561
-0.0000
0.0013
0.0001
0.0079
0.0001to those computed by robustfit:
X2 = moore(:,1:5);
robustbeta = robustfit(X2,y)
robustbeta =
-1.7516
0.0000
0.0009
0.0002
0.0060
0.0001By default, robustfit automatically adds a first column of 1s to the design matrix, so a constant term does not have to be included explicitly as for regress. In addition, the order of inputs is reversed for robustfit and regress.
To understand the difference in the coefficient estimates, look at the final weights given to the data points in the robust fit:
[robustbeta,stats] = robustfit(X2,y);
stats.w'
ans =
Columns 1 through 5
0.0246 0.9986 0.9763 0.9323 0.9704
Columns 6 through 10
0.8597 0.9180 0.9992 0.9590 0.9649
Columns 11 through 15
0.9769 0.9868 0.9999 0.9976 0.8122
Columns 16 through 20
0.9733 0.9892 0.9988 0.8974 0.6774The first data point has a very low weight compared to the other data points, and so is effectively ignored in the robust regression.
The robustdemo function shows the difference between ordinary least squares and robust fitting for data with a single predictor. You can use data provided with the demo, or you can supply your own data. The following steps show you how to use robustdemo.
Start the demo. To begin using robustdemo with the built-in data, simply type the function name:
robustdemo

The resulting figure shows a scatter plot with two fitted lines. The red line is the fit using ordinary least-squares regression. The green line is the fit using robust regression. At the bottom of the figure are the equations for the fitted lines, together with the estimated root mean squared errors for each fit.
View leverages and robust weights. Right-click on any data point to see its least-squares leverage and robust weight.

In the built-in data, the right-most point has a relatively high leverage of 0.35. The point exerts a large influence on the least-squares fit, but its small robust weight shows that it is effectively excluded from the robust fit.
See how changes in the data affect the fits. With the left mouse button, click and hold on any data point and drag it to a new location. When you release the mouse button, the displays update.

Bringing the right-most data point closer to the least-squares line makes the two fitted lines nearly identical. The adjusted right-most data point has significant weight in the robust fit.
Multiple linear regression models, as described in Multiple Linear Regression, are built from a potentially large number of predictive terms. The number of interaction terms, for example, increases exponentially with the number of predictor variables. If there is no theoretical basis for choosing the form of a model, and no assessment of correlations among terms, it is possible to include redundant terms in a model that confuse the identification of significant effects.
Stepwise regression is a systematic method for adding and removing terms from a multilinear model based on their statistical significance in a regression. The method begins with an initial model and then compares the explanatory power of incrementally larger and smaller models. At each step, the p-value of an F-statistic is computed to test models with and without a potential term. If a term is not currently in the model, the null hypothesis is that the term would have a zero coefficient if added to the model. If there is sufficient evidence to reject the null hypothesis, the term is added to the model. Conversely, if a term is currently in the model, the null hypothesis is that the term has a zero coefficient. If there is insufficient evidence to reject the null hypothesis, the term is removed from the model. The method proceeds as follows:
Fit the initial model.
If any terms not in the model have p-values less than an entrance tolerance (that is, if it is unlikely that they would have zero coefficient if added to the model), add the one with the smallest p-value and repeat this step; otherwise, go to step 3.
If any terms in the model have p-values greater than an exit tolerance (that is, if it is unlikely that the hypothesis of a zero coefficient can be rejected), remove the one with the largest p-value and go to step 2; otherwise, end.
Depending on the terms included in the initial model and the order in which terms are moved in and out, the method may build different models from the same set of potential terms. The method terminates when no single step improves the model. There is no guarantee, however, that a different initial model or a different sequence of steps will not lead to a better fit. In this sense, stepwise models are locally optimal, but may not be globally optimal.
Statistics Toolbox functions for stepwise regression are:
stepwisefit — A function that proceeds automatically from a specified initial model and entrance/exit tolerances
stepwise — An interactive tool that allows you to explore individual steps in the regression
For example, load the data in hald.mat, which contains observations of the heat of reaction of various cement mixtures:
load hald whos Name Size Bytes Class Attributes Description 22x58 2552 char hald 13x5 520 double heat 13x1 104 double ingredients 13x4 416 double
The response (heat) depends on the quantities of the four predictors (the columns of ingredients).
Use stepwisefit to carry out the stepwise regression algorithm, beginning with no terms in the model and using entrance/exit tolerances of 0.05/0.10 on the p-values:
stepwisefit(ingredients,heat,...
'penter',0.05,'premove',0.10);
Initial columns included: none
Step 1, added column 4, p=0.000576232
Step 2, added column 1, p=1.10528e-006
Final columns included: 1 4
'Coeff' 'Std.Err.' 'Status' 'P'
[ 1.4400] [ 0.1384] 'In' [1.1053e-006]
[ 0.4161] [ 0.1856] 'Out' [ 0.0517]
[-0.4100] [ 0.1992] 'Out' [ 0.0697]
[-0.6140] [ 0.0486] 'In' [1.8149e-007]
stepwisefit automatically includes an intercept term in the model, so you do not add it explicitly to ingredients as you would for regress. For terms not in the model, coefficient estimates and their standard errors are those that result if the term is added.
The inmodel parameter is used to specify terms in an initial model:
initialModel = ...
[false true false false]; % Force in 2nd term
stepwisefit(ingredients,heat,...
'inmodel',initialModel,...
'penter',.05,'premove',0.10);
Initial columns included: 2
Step 1, added column 1, p=2.69221e-007
Final columns included: 1 2
'Coeff' 'Std.Err.' 'Status' 'P'
[ 1.4683] [ 0.1213] 'In' [2.6922e-007]
[ 0.6623] [ 0.0459] 'In' [5.0290e-008]
[ 0.2500] [ 0.1847] 'Out' [ 0.2089]
[-0.2365] [ 0.1733] 'Out' [ 0.2054]
The preceding two models, built from different initial models, use different subsets of the predictive terms. Terms 2 and 4, swapped in the two models, are highly correlated:
term2 = ingredients(:,2);
term4 = ingredients(:,4);
R = corrcoef(term2,term4)
R =
1.0000 -0.9730
-0.9730 1.0000To compare the models, use the stats output of stepwisefit:
[betahat1,se1,pval1,inmodel1,stats1] = ...
stepwisefit(ingredients,heat,...
'penter',.05,'premove',0.10,...
'display','off');
[betahat2,se2,pval2,inmodel2,stats2] = ...
stepwisefit(ingredients,heat,...
'inmodel',initialModel,...
'penter',.05,'premove',0.10,...
'display','off');
RMSE1 = stats1.rmse
RMSE1 =
2.7343
RMSE2 = stats2.rmse
RMSE2 =
2.4063The second model has a lower Root Mean Square Error (RMSE).
An added variable plot is used to determine the unique effect of adding a new term to a model. The plot shows the relationship between the part of the response unexplained by terms already in the model and the part of the new term unexplained by terms already in the model. The "unexplained" parts are measured by the residuals of the respective regressions. A scatter of the residuals from the two regressions forms the added variable plot.
For example, suppose you want to add term2 to a model that already contains the single term term1. First, consider the ability of term2 alone to explain the response:
load hald
term2 = ingredients(:,2);
[b2,Ib2,res2] = regress(heat,[ones(size(term2)) term2]);
scatter(term2,heat)
xlabel('Term 2')
ylabel('Heat')
hold on
x2 = 20:80;
y2 = b2(1) + b2(2)*x2;
plot(x2,y2,'r')
title('{\bf Response Explained by Term 2: Ignoring Term 1}')

Next, consider the following regressions involving the model term term1:
term1 = ingredients(:,1); [b1,Ib1,res1] = regress(heat,[ones(size(term1)) term1]); [b21,Ib21,res21] = regress(term2,[ones(size(term1)) term1]); bres = regress(res1,[ones(size(res21)) res21]);
A scatter of the residuals res1 vs. the residuals res12 forms the added variable plot:
figure
scatter(res21,res1)
xlabel('Residuals: Term 2 on Term 1')
ylabel('Residuals: Heat on Term 1')
hold on
xres = -30:30;
yres = bres(1) + bres(2)*xres;
plot(xres,yres,'r')
title('{\bf Response Explained by Term 2: Adjusted for Term 1}')

Since the plot adjusted for term1 shows a stronger relationship (less variation along the fitted line) than the plot ignoring term1, the two terms act jointly to explain extra variation. In this case, adding term2 to a model consisting of term1 would reduce the RMSE.
The Statistics Toolbox function addedvarplot produces added variable plots. The previous plot is essentially the one produced by the following:
figure inmodel = [true false false false]; addedvarplot(ingredients,heat,2,inmodel)

In addition to the scatter of residuals, the plot shows 95% confidence intervals on predictions from the fitted line. The fitted line has intercept zero because, under the assumptions outlined in Linear Regression Models, both of the plotted variables have mean zero. The slope of the fitted line is the coefficient that term2 would have if it was added to the model with term1.
The addevarplot function is useful for considering the unique effect of adding a new term to an existing model with any number of terms.
The stepwise interface provides interactive features that allow you to investigate individual steps in a stepwise regression, and to build models from arbitrary subsets of the predictive terms. To open the interface with data from hald.mat:
load hald stepwise(ingredients,heat)

The upper left of the interface displays estimates of the coefficients for all potential terms, with horizontal bars indicating 90% (colored) and 95% (grey) confidence intervals. The red color indicates that, initially, the terms are not in the model. Values displayed in the table are those that would result if the terms were added to the model.
The middle portion of the interface displays summary statistics for the entire model. These statistics are updated with each step.
The lower portion of the interface, Model History, displays the RMSE for the model. The plot tracks the RMSE from step to step, so you can compare the optimality of different models. Hover over the blue dots in the history to see which terms were in the model at a particular step. Click on a blue dot in the history to open a copy of the interface initialized with the terms in the model at that step.
Initial models, as well as entrance/exit tolerances for the p-values of F-statistics, are specified using additional input arguments to stepwise. Defaults are an initial model with no terms, an entrance tolerance of 0.05, and an exit tolerance of 0.10.
To center and scale the input data (compute z-scores) to improve conditioning of the underlying least-squares problem, select Scale Inputs from the Stepwise menu.
You proceed through a stepwise regression in one of two ways:
Click Next Step to select the recommended next step. The recommended next step either adds the most significant term or removes the least significant term. When the regression reaches a local minimum of RMSE, the recommended next step is "Move no terms." You can perform all of the recommended steps at once by clicking All Steps.
Click a line in the plot or in the table to toggle the state of the corresponding term. Clicking a red line, corresponding to a term not currently in the model, adds the term to the model and changes the line to blue. Clicking a blue line, corresponding to a term currently in the model, removes the term from the model and changes the line to red.
To call addedvarplot and produce an added variable plot from the stepwise interface, select Added Variable Plot from the Stepwise menu. A list of terms is displayed. Select the term you want to add, and then click OK.
Click Export to display a dialog box that allows you to select information from the interface to save to the MATLAB workspace. Check the information you want to export and, optionally, change the names of the workspace variables to be created. Click OK to export the information.
Coefficient estimates for the models described in Multiple Linear Regression rely on the independence of the model terms. When terms are correlated and the columns of the design matrix X have an approximate linear dependence, the matrix (XTX)–1 becomes close to singular. As a result, the least-squares estimate
![]()
becomes highly sensitive to random errors in the observed response y, producing a large variance. This situation of multicollinearity can arise, for example, when data are collected without an experimental design.
Ridge regression addresses the problem by estimating regression coefficients using
![]()
where k is the ridge parameter and I is the identity matrix. Small positive values of k improve the conditioning of the problem and reduce the variance of the estimates. While biased, the reduced variance of ridge estimates often result in a smaller mean square error when compared to least-squares estimates.
The Statistics Toolbox function ridge carries out ridge regression.
For example, load the data in acetylene.mat, with observations of the predictor variables x1, x2, x3, and the response variable y:
load acetylene
Plot the predictor variables against each other:
subplot(1,3,1)
plot(x1,x2,'.')
xlabel('x1'); ylabel('x2'); grid on; axis square
subplot(1,3,2)
plot(x1,x3,'.')
xlabel('x1'); ylabel('x3'); grid on; axis square
subplot(1,3,3)
plot(x2,x3,'.')
xlabel('x2'); ylabel('x3'); grid on; axis square

Note the correlation between x1 and the other two predictor variables.
Use ridge and x2fx to compute coefficient estimates for a multilinear model with interaction terms, for a range of ridge parameters:
X = [x1 x2 x3]; D = x2fx(X,'interaction'); D(:,1) = []; % No constant term k = 0:1e-5:5e-3; betahat = ridge(y,D,k);
Plot the ridge trace:
figure
plot(k,betahat,'LineWidth',2)
ylim([-100 100])
grid on
xlabel('Ridge Parameter')
ylabel('Standardized Coefficient')
title('{\bf Ridge Trace}')
legend('x1','x2','x3','x1x2','x1x3','x2x3')

The estimates stabilize to the right of the plot. Note that the coefficient of the x2x3 interaction term changes sign at a value of the ridge parameter ≈ 5 × 10–4.
Partial least-squares (PLS) regression is a technique used with data that contain correlated predictor variables. This technique constructs new predictor variables, known as components, as linear combinations of the original predictor variables. PLS constructs these components while considering the observed response values, leading to a parsimonious model with reliable predictive power.
The technique is something of a cross between multiple linear regression and principal component analysis:
Multiple linear regression finds a combination of the predictors that best fit a response.
Principal component analysis finds combinations of the predictors with large variance, reducing correlations. The technique makes no use of response values.
PLS finds combinations of the predictors that have a large covariance with the response values.
PLS therefore combines information about the variances of both the predictors and the responses, while also considering the correlations among them.
PLS shares characteristics with other regression and feature transformation techniques. It is similar to ridge regression in that it is used in situations with correlated predictors. It is similar to stepwise regression (or the more general technique of feature selection) in that it can be used to select a smaller set of model terms. PLS differs from these methods, however, by transforming the original predictor space into the new component space.
The Statistics Toolbox function plsregress carries out PLS regression.
For example, consider the data on biochemical oxygen demand in moore.mat, padded with noisy versions of the predictors to introduce correlations:
load moore y = moore(:,6); % Response X0 = moore(:,1:5); % Original predictors X1 = X0+10*randn(size(X0)); % Correlated predictors X = [X0,X1];
Use plsregress to perform PLS regression with the same number of components as predictors, then plot the percentage variance explained in the response as a function of the number of components:
[XL,yl,XS,YS,beta,PCTVAR] = plsregress(X,y,10);
plot(1:10,cumsum(100*PCTVAR(2,:)),'-bo');
xlabel('Number of PLS components');
ylabel('Percent Variance Explained in y');

Choosing the number of components in a PLS model is a critical step. The plot gives a rough indication, showing nearly 80% of the variance in y explained by the first component, with as many as five additional components making significant contributions.
The following computes the six-component model:
[XL,yl,XS,YS,beta,PCTVAR,MSE,stats] = plsregress(X,y,6); yfit = [ones(size(X,1),1) X]*beta; plot(y,yfit,'o')

The scatter shows a reasonable correlation between fitted and observed responses, and this is confirmed by the R2 statistic:
TSS = sum((y-mean(y)).^2);
RSS = sum((y-yfit).^2);
Rsquared = 1 - RSS/TSS
Rsquared =
0.8421A plot of the weights of the ten predictors in each of the six components shows that two of the components (the last two computed) explain the majority of the variance in X:
plot(1:10,stats.W,'o-');
legend({'c1','c2','c3','c4','c5','c6'},'Location','NW')
xlabel('Predictor');
ylabel('Weight');

A plot of the mean-squared errors suggests that as few as two components may provide an adequate model:
[axes,h1,h2] = plotyy(0:6,MSE(1,:),0:6,MSE(2,:));
set(h1,'LineWidth',2)
set(h2,'LineWidth',2)
legend('MSE Predictors','MSE Response')
xlabel('Number of Components')

The calculation of mean-squared errors by plsregress is controlled by optional parameter name/value pairs specifying cross-validation type and the number of Monte Carlo repetitions.
Polynomial models are a special case of the linear models discussed in Linear Regression Models. Polynomial models have the advantages of being simple, familiar in their properties, and reasonably flexible for following data trends. They are also robust with respect to changes in the location and scale of the data (see Conditioning Polynomial Fits). However, polynomial models may be poor predictors of new values. They oscillate between data points, especially as the degree is increased to improve the fit. Asymptotically, they follow power functions, leading to inaccuracies when extrapolating other long-term trends. Choosing a polynomial model is often a trade-off between a simple description of overall data trends and the accuracy of predictions made from the model.
Functions for Polynomial Fitting. To fit polynomials to data, MATLAB and Statistics Toolbox software offer a number of dedicated functions. The MATLAB function polyfit computes least-squares coefficient estimates for polynomials of arbitrary degree. For example:
x = 0:5; % x data y = [2 1 4 4 3 2]; % y data p = polyfit(x,y,3) % Degree 3 fit p = -0.1296 0.6865 -0.1759 1.6746
Polynomial coefficients in p are listed from highest to lowest degree, so p(x) ≈ –0.13 x3 + 0.69 x2 – 0.18 x + 1.67. For convenience, polyfit sets up the Vandermonde design matrix (vander) and calls backslash (mldivide) to perform the fit.
Once the coefficients of a polynomial are collected in a vector p, use the MATLAB function polyval to evaluate the polynomial at arbitrary inputs. For example, the following plots the data and the fit over a range of inputs:
plot(x,y,'ro','LineWidth',2) % Plot data hold on xfit = -1:0.01:6; yfit = polyval(p,xfit); plot(xfit,yfit,'LineWidth',2) % Plot fit ylim([0,5]) grid on

Use the MATLAB function roots to find the roots of p:
r = roots(p) r = 5.4786 -0.0913 + 1.5328i -0.0913 - 1.5328i
The MATLAB function poly solves the inverse problem, finding a polynomial with specified roots. poly is the inverse of roots up to ordering, scaling, and round-off error.
An optional output from polyfit is passed to polyval or to the Statistics Toolbox function polyconf to compute prediction intervals for the fit. For example, the following computes 95% prediction intervals for new observations at each value of the predictor x:
[p,S] = polyfit(x,y,3); [yhat,delta] = polyconf(p,x,S); PI = [yhat-delta;yhat+delta]' PI = -5.3022 8.6514 -4.2068 8.3179 -2.9899 9.0534 -2.1963 9.8471 -2.6036 9.9211 -5.2229 8.7308
Optional input arguments to polyconf allow you to compute prediction intervals for estimated values (yhat) as well as new observations, and to compute the bounds simultaneously for all x instead of nonsimultaneously (the default). The confidence level for the intervals can also be set.
Displaying Polynomial Fits. The documentation example function polydemo combines the functions polyfit, polyval, roots, and polyconf to produce a formatted display of data with a polynomial fit.
Note Statistics Toolbox documentation example files are located in the \help\toolbox\stats\examples subdirectory of your MATLAB root directory (matlabroot). This subdirectory is not on the MATLAB path at installation. To use the M-files in this subdirectory, either add the subdirectory to the MATLAB path (addpath) or make the subdirectory your current working directory (cd). |
For example, the following uses polydemo to produce a display of simulated data with a quadratic trend, a fitted polynomial, and 95% prediction intervals for new observations:
x = -5:5;
y = x.^2 - 5*x - 3 + 5*randn(size(x));
p = polydemo(x,y,2,0.05)
p =
0.8107 -4.5054 -1.1862

polydemo calls the documentation example function polystr to convert the coefficient vector p into a string for the polynomial expression displayed in the figure title.
Conditioning Polynomial Fits. If x and y data are on very different scales, polynomial fits may be badly conditioned, in the sense that coefficient estimates are very sensitive to random errors in the data. For example, using polyfit to estimate coefficients of a cubic fit to the U.S. census data in census.mat produces the following warning:
load census
x = cdate;
y = pop;
p = polyfit(x,y,3);
Warning: Polynomial is badly conditioned.
Add points with distinct X values,
reduce the degree of the polynomial,
or try centering and scaling as
described in HELP POLYFIT.
The following implements the suggested centering and scaling, and demonstrates the robustness of polynomial fits under these transformations:
plot(x,y,'ro') % Plot data hold on z = zscore(x); % Compute z-scores of x data zfit = linspace(z(1),z(end),100); pz = polyfit(z,y,3); % Compute conditioned fit yfit = polyval(pz,zfit); xfit = linspace(x(1),x(end),100); plot(xfit,yfit,'b-') % Plot conditioned fit vs. x data grid on

The functions polyfit, polyval, and polyconf are interactively applied to data using two graphical interfaces for polynomial fitting:
The Basic Fitting Tool. The Basic Fitting Tool is a MATLAB interface, discussed in Interactive Fitting in the MATLAB documentation. The tool allows you to:
Fit interpolants and polynomials of degree ≤ 10
Plot residuals and compute their norm
Interpolate or extrapolate values from the fit
Save results to the MATLAB workspace
The Polynomial Fitting Tool. The Statistics Toolbox function polytool opens the Polynomial Fitting Tool. For example, the following opens the interface using simulated data with a quadratic trend and displays a fitted polynomial with 95% prediction intervals for new observations:
x = -5:5; y = x.^2 - 5*x - 3 + 5*randn(size(x)); polytool(x,y,2,0.05)

The tool allows you to:
Interactively change the degree of the fit. Change the value in the Degree text box at the top of the figure.
Evaluate the fit and the bounds using a movable crosshair. Click, hold, and drag the crosshair to change its position.
Export estimated coefficients, predicted values, prediction intervals, and residuals to the MATLAB workspace. Click Export to a open a dialog box with choices for exporting the data.
Options for the displayed bounds and the fitting method are available through menu options at the top of the figure:
The Bounds menu lets you choose between bounds on new observations (the default) and bounds on estimated values. It also lets you choose between nonsimultaneous (the default) and simultaneous bounds. See polyconf for a description of these options.
The Method menu lets you choose between ordinary least-squares regression and robust regression, as described in Robust Regression.
Polynomial models are generalized to any number of predictor variables xi (i = 1, ..., N) as follows:
![]()
The model includes, from left to right, an intercept, linear terms, quadratic interaction terms, and squared terms. Higher order terms would follow, as necessary.
Response surface models are multivariate polynomial models. They typically arise in the design of experiments (see Design of Experiments), where they are used to determine a set of design variables that optimize a response. Linear terms alone produce models with response surfaces that are hyperplanes. The addition of interaction terms allows for warping of the hyperplane. Squared terms produce the simplest models in which the response surface has a maximum or minimum, and so an optimal response.
Response surface methodology (RSM) is the process of adjusting predictor variables to move the response in a desired direction and, iteratively, to an optimum. The method generally involves a combination of both computation and visualization. The use of quadratic response surface models makes the method much simpler than standard nonlinear techniques for determining optimal designs.
The file reaction.mat contains simulated data on the rate of a chemical reaction:
load reaction
The variables include:
rate — A 13-by-1 vector of observed reaction rates
reactants — A 13-by-3 matrix of reactant concentrations
xn — The names of the three reactants
yn — The name of the response
In Nonlinear Regression, the nonlinear Hougen-Watson model is fit to the data using nlinfit. However, there may be no theoretical basis for choosing a particular model to fit the data. A quadratic response surface model provides a simple way to determine combinations of reactants that lead to high reaction rates.
As described in Multiple Linear Regression, the regress and regstats functions fit linear models—including response surface models—to data using a design matrix of model terms evaluated at predictor data. The x2fx function converts predictor data to design matrices for quadratic models. The regstats function calls x2fx when instructed to do so.
For example, the following fits a quadratic response surface model to the data in reaction.mat:
stats = regstats(rate,reactants,'quadratic','beta'); b = stats.beta; % Model coefficients
The 10-by-1 vector b contains, in order, a constant term and then the coefficients for the model terms x1, x2, x3, x1x2, x1x3, x2x3, x12, x22, and x32, where x1, x2, and x3 are the three columns of reactants. The order of coefficients for quadratic models is described in the reference page for x2fx.
Since the model involves only three predictors, it is possible to visualize the entire response surface using a color dimension for the reaction rate:
x1 = reactants(:,1);
x2 = reactants(:,2);
x3 = reactants(:,3);
xx1 = linspace(min(x1),max(x1),25);
xx2 = linspace(min(x2),max(x2),25);
xx3 = linspace(min(x3),max(x3),25);
[X1,X2,X3] = meshgrid(xx1,xx2,xx3);
RATE = b(1) + b(2)*X1 + b(3)*X2 + b(4)*X3 + ...
b(5)*X1.*X2 + b(6)*X1.*X3 + b(7)*X2.*X3 + ...
b(8)*X1.^2 + b(9)*X2.^2 + b(10)*X3.^2;
hmodel = scatter3(X1(:),X2(:),X3(:),5,RATE(:),'filled');
hold on
hdata = scatter3(x1,x2,x3,'ko','filled');
axis tight
xlabel(xn(1,:))
ylabel(xn(2,:))
zlabel(xn(3,:))
hbar = colorbar;
ylabel(hbar,yn);
title('{\bf Quadratic Response Surface Model}')
legend(hdata,'Data','Location','NE')

The plot show a general increase in model response, within the space of the observed data, as the concentration of n-pentane increases and the concentrations of hydrogen and isopentane decrease.
Before trying to determine optimal values of the predictors, perhaps by collecting more data in the direction of increased reaction rate indicated by the plot, it is helpful to evaluate the geometry of the response surface. If x = (x1, x2, x3)T is the vector of predictors, and H is the matrix such that xTHx gives the quadratic terms of the model, the model has a unique optimum if and only if H is positive definite. For the data in this example, the model does not have a unique optimum:
H = [b(8),b(5)/2,b(6)/2; ...
b(5)/2,b(9),b(7)/2; ...
b(6)/2,b(7)/2,b(10)];
lambda = eig(H)
lambda =
1.0e-003 *
-0.1303
0.0412
0.4292The negative eigenvalue shows a lack of positive definiteness. The saddle in the model is visible if the range of the predictors in the plot (xx1, xx2, and xx3) is expanded:

When the number of predictors makes it impossible to visualize the entire response surface, 3-, 2-, and 1-dimensional slices provide local views. The MATLAB function slice displays 2-dimensional contours of the data at fixed values of the predictors:
delete(hmodel) X2slice = 200; % Fix n-Pentane concentration slice(X1,X2,X3,RATE,[],X2slice,[])

One-dimensional contours are displayed by the Response Surface Tool, rstool, described in the next section.
The Statistics Toolbox function rstool opens a GUI for interactively investigating simultaneous one-dimensional contours of multidimensional response surface models. For example, the following opens the interface with a quadratic response surface fit to the data in reaction.mat:
load reaction alpha = 0.01; % Significance level rstool(reactants,rate,'quadratic',alpha,xn,yn)

A sequence of plots is displayed, each showing a contour of the response surface against a single predictor, with all other predictors held fixed. Confidence intervals for new observations are shown as dashed red curves above and below the response. Predictor values are displayed in the text boxes on the horizontal axis and are marked by vertical dashed blue lines in the plots. Predictor values are changed by editing the text boxes or by dragging the dashed blue lines. When you change the value of a predictor, all plots update to show the new point in predictor space.
Note The Statistics Toolbox demonstration function rsmdemo generates simulated data for experimental settings specified by either the user or by a D-optimal design generated by cordexch. It uses the rstool interface to visualize response surface models fit to the data, and it uses the nlintool interface to visualize a nonlinear model fit to the data. |
Linear regression models describe a linear relationship between a response and one or more predictive terms. Many times, however, a nonlinear relationship exists. Nonlinear Regression describes general nonlinear models. A special class of nonlinear models, known as generalized linear models, makes use of linear methods.
Recall that linear models have the following characteristics:
At each set of values for the predictors, the response has a normal distribution with mean μ.
A coefficient vector b defines a linear combination Xb of the predictors X.
The model is μ = Xb.
In generalized linear models, these characteristics are generalized as follows:
At each set of values for the predictors, the response has a distribution that may be normal, binomial, Poisson, gamma, or inverse Gaussian, with parameters including a mean μ.
A coefficient vector b defines a linear combination Xb of the predictors X.
The following data are derived from carbig.mat, which contains measurements of large cars of various weights. Each weight in w has a corresponding number of cars in total and a corresponding number of poor-mileage cars in poor:
w = [2100 2300 2500 2700 2900 3100 ...
3300 3500 3700 3900 4100 4300];
total = [48 42 31 34 31 21 23 23 21 16 17 21];
poor = [1 2 0 3 8 8 14 17 19 15 17 21];
A plot shows that the proportion of poor-mileage cars follows an S-shaped sigmoid:
plot(w,poor./total,'x','LineWidth',2)
grid on
xlabel('Weight')
ylabel('Proportion of Poor-Mileage Cars')

The logistic model is useful for proportion data. It defines the relationship between the proportion p and the weight w by:
log[p/(1–p)] = b1 + b2w
Some of the proportions in the data are 0 and 1, making the left-hand side of this equation undefined. To keep the proportions within range, add relatively small perturbations to the poor and total values. A semi-log plot then shows a nearly linear relationship, as predicted by the model:
p_adjusted = (poor+.5)./(total+1);
semilogy(w,p_adjusted./(1-p_adjusted),'x','LineWidth',2)
grid on
xlabel('Weight')
ylabel('Adjusted p / (1 - p)')

It is reasonable to assume that the values of poor follow binomial distributions, with the number of trials given by total and the percentage of successes depending on w. This distribution can be accounted for in the context of a logistic model by using a generalized linear model with link function log(µ/(1–µ)) = Xb.
Use the glmfit function to carry out the associated regression:
b = glmfit(w,[poor' total'],'binomial','link','logit')
b =
-13.3801
0.0042
To use the coefficients in b to compute fitted proportions, invert the logistic relationship:
p = 1/(1 + exp(–b1 – b2w))
Use the glmval function to compute the fitted values:
x = 2100:100:4500;
y = glmval(b,x,'logit');
plot(w,poor./total,'x','LineWidth',2)
hold on
plot(x,y,'r-','LineWidth',2)
grid on
xlabel('Weight')
ylabel('Proportion of Poor-Mileage Cars')

The previous is an example of logistic regression. For an example of a kind of stepwise logistic regression, analogous to stepwise regression for linear models, see Sequential Feature Selection.
Whether or not the predictor x is a vector of predictor variables, multivariate regression refers to the case where the response y = (y1, ..., yM) is a vector of M response variables.
The Statistics Toolbox functions mvregress and mvregresslike are used for multivariate regression analysis.
![]() | Introduction | Nonlinear Regression | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |