Documentation |
Linear regression model class
An object comprising training data, model description, diagnostic information, and fitted coefficients for a linear regression. Predict model responses with the predict or feval methods.
mdl = fitlm(tbl) or mdl = fitlm(X,y) create a linear model of a table or dataset array tbl, or of the responses y to a data matrix X. For details, see fitlm.
mdl = stepwiselm(tbl) or mdl = stepwiselm(X,y) create a linear model of a table or dataset array tbl, or of the responses y to a data matrix X, with unimportant predictors excluded. For details, see stepwiselm.
CoefficientCovariance |
Covariance matrix of coefficient estimates. | ||||||||||||||||||||||||||||||||||
CoefficientNames |
Cell array of strings containing a label for each coefficient. | ||||||||||||||||||||||||||||||||||
Coefficients |
Coefficient values stored as a table. Coefficients has one row for each coefficient and these columns:
To obtain any of these columns as a vector, index into the property using dot notation. For example, in mdl the estimated coefficient vector is beta = mdl.Coefficients.Estimate Use coefTest to perform other tests on the coefficients. | ||||||||||||||||||||||||||||||||||
DFE |
Degrees of freedom for error (residuals), equal to the number of observations minus the number of estimated coefficients. | ||||||||||||||||||||||||||||||||||
Diagnostics |
Table with the same number of rows as the input data (tbl or X). Diagnostics contains diagnostics helpful in finding outliers and influential observations. Many diagnostics describe the effect on the fit of deleting single observations. Diagnostics contains the following fields.
Rows not used in the fit because of missing values (in ObservationInfo.Missing) contain NaN values. Rows not used in the fit because of excluded values (in ObservationInfo.Excluded) contain NaN values, with the following exception: Delete-1 diagnostics refer to the statistic with and without that observation (row) included in the fit. These diagnostics help identify important observations. | ||||||||||||||||||||||||||||||||||
Fitted |
Predicted response to the input data by using the model. Use predict to compute predictions for other predictor values, or to compute confidence bounds on Fitted. | ||||||||||||||||||||||||||||||||||
Formula |
Object containing information about the model. | ||||||||||||||||||||||||||||||||||
LogLikelihood |
Log likelihood of the model distribution at the response values, with mean fitted from the model, and other parameters estimated as part of the model fit. | ||||||||||||||||||||||||||||||||||
ModelCriterion |
AIC and other information criteria for comparing models. A structure with fields:
To obtain any of these values as a scalar, index into the property using dot notation. For example, in a model mdl, the AIC value aic is: aic = mdl.ModelCriterion.AIC | ||||||||||||||||||||||||||||||||||
MSE |
Mean squared error (residuals), SSE/DFE. | ||||||||||||||||||||||||||||||||||
NumCoefficients |
Number of coefficients in the model, a positive integer. NumCoefficients includes coefficients that are set to zero when the model terms are rank deficient. | ||||||||||||||||||||||||||||||||||
NumEstimatedCoefficients |
Number of estimated coefficients in the model, a positive integer. NumEstimatedCoefficients does not include coefficients that are set to zero when the model terms are rank deficient. NumEstimatedCoefficients is the degrees of freedom for regression. | ||||||||||||||||||||||||||||||||||
NumObservations |
Number of observations the fitting function used in fitting. This is the number of observations supplied in the original table, dataset, or matrix, minus any excluded rows (set with the Excluded name-value pair) or rows with missing values. | ||||||||||||||||||||||||||||||||||
NumPredictors |
Number of variables fitlm used as predictors for fitting. | ||||||||||||||||||||||||||||||||||
NumVariables |
Number of variables in the data. NumVariables is the number of variables in the original table or dataset, or the total number of columns in the predictor matrix and response vector when the fit is based on those arrays. It includes variables, if any, that are not used as predictors or as the response. | ||||||||||||||||||||||||||||||||||
ObservationInfo |
Table with the same number of rows as the input data (tbl or X).
| ||||||||||||||||||||||||||||||||||
ObservationNames |
Cell array of strings containing the names of the observations used in the fit.
| ||||||||||||||||||||||||||||||||||
PredictorNames |
Cell array of strings, the names of the predictors used in fitting the model. | ||||||||||||||||||||||||||||||||||
Residuals |
Table of residuals, with one row for each observation and these variables.
To obtain any of these columns as a vector, index into the property using dot notation. For example, in a model mdl, the ordinary raw residual vector r is: r = mdl.Residuals.Raw Rows not used in the fit because of missing values (in ObservationInfo.Missing) contain NaN values. Rows not used in the fit because of excluded values (in ObservationInfo.Excluded) contain NaN values, with the following exceptions:
| ||||||||||||||||||||||||||||||||||
ResponseName |
String giving naming the response variable. | ||||||||||||||||||||||||||||||||||
RMSE |
Root mean squared error (residuals), sqrt(MSE). | ||||||||||||||||||||||||||||||||||
Robust |
Structure that is empty unless fitlm constructed the model using robust regression.
| ||||||||||||||||||||||||||||||||||
Rsquared |
Proportion of total sum of squares explained by the model. The ordinary R-squared value relates to the SSR and SST properties: Rsquared = SSR/SST = 1 - SSE/SST. For a linear or nonlinear model, Rsquared is a structure with two fields:
For a generalized linear model, Rsquared is a structure with five fields:
To obtain any of these values as a scalar, index into the property using dot notation. For example, the adjusted R-squared value in mdl is r2 = mdl.Rsquared.Adjusted | ||||||||||||||||||||||||||||||||||
SSE |
Sum of squared errors (residuals). The Pythagorean theorem implies SST = SSE + SSR. | ||||||||||||||||||||||||||||||||||
SSR |
Regression sum of squares, the sum of squared deviations of the fitted values from their mean. The Pythagorean theorem implies SST = SSE + SSR. | ||||||||||||||||||||||||||||||||||
SST |
Total sum of squares, the sum of squared deviations of y from mean(y). The Pythagorean theorem implies SST = SSE + SSR. | ||||||||||||||||||||||||||||||||||
Steps |
Structure that is empty unless stepwiselm constructed the model.
The History table has one row for each step including the initial fit, and the following variables (columns).
| ||||||||||||||||||||||||||||||||||
VariableInfo |
Table containing metadata about Variables. There is one row for each term in the model, and the following columns.
| ||||||||||||||||||||||||||||||||||
VariableNames |
Cell array of strings containing names of the variables in the fit.
| ||||||||||||||||||||||||||||||||||
Variables |
Table containing the data, both observations and responses, that the fitting function used to construct the fit. If the fit is based on a table or dataset array, Variables contains all of the data from that table or dataset array. Otherwise, Variables is a table created from the input data matrix X and response vector y. |
addTerms | Add terms to linear regression model |
anova | Analysis of variance for linear model |
coefCI | Confidence intervals of coefficient estimates of linear model |
coefTest | Linear hypothesis test on linear regression model coefficients |
disp | Display linear regression model |
dwtest | Durbin-Watson test of linear model |
feval | Evaluate linear regression model prediction |
fit | Create linear regression model |
plot | Scatter plot or added variable plot of linear model |
plotAdded | Added variable plot or leverage plot for linear model |
plotAdjustedResponse | Adjusted response plot for linear regression model |
plotDiagnostics | Plot diagnostics of linear regression model |
plotEffects | Plot main effects of each predictor in linear regression model |
plotInteraction | Plot interaction effects of two predictors in linear regression model |
plotResiduals | Plot residuals of linear regression model |
plotSlice | Plot of slices through fitted linear regression surface |
predict | Predict response of linear regression model |
random | Simulate responses for linear regression model |
removeTerms | Remove terms from linear model |
step | Improve linear regression model by adding or removing terms |
stepwise | Create linear regression model by stepwise regression |
Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB^{®} documentation.
The hat matrix H is defined in terms of the data matrix X:
H = X(X^{T}X)^{–1}X^{T}.
The diagonal elements H_{ii} satisfy
$$\begin{array}{l}0\le {h}_{ii}\le 1\\ {\displaystyle \sum _{i=1}^{n}{h}_{ii}}=p,\end{array}$$
where n is the number of observations (rows of X), and p is the number of coefficients in the regression model.
The leverage of observation i is the value of the ith diagonal term, h_{ii}, of the hat matrix H. Because the sum of the leverage values is p (the number of coefficients in the regression model), an observation i can be considered to be an outlier if its leverage substantially exceeds p/n, where n is the number of observations.
Cook's distance is the scaled change in fitted values. Each element in CooksDistance is the normalized change in the vector of coefficients due to the deletion of an observation. The Cook's distance, D_{i}, of observation i is
$${D}_{i}=\frac{{\displaystyle \sum _{j=1}^{n}{\left({\widehat{y}}_{j}-{\widehat{y}}_{j(i)}\right)}^{2}}}{p\text{\hspace{0.17em}}MSE},$$
where
$${\widehat{y}}_{j}$$ is the jth fitted response value.
$${\widehat{y}}_{j(i)}$$ is the jth fitted response value, where the fit does not include observation i.
MSE is the mean squared error.
p is the number of coefficients in the regression model.
Cook's distance is algebraically equivalent to the following expression:
$${D}_{i}=\frac{{r}_{i}^{2}}{p\text{\hspace{0.17em}}MSE}\left(\frac{{h}_{ii}}{{\left(1-{h}_{ii}\right)}^{2}}\right),$$
where r_{i} is the ith residual, and h_{ii} is the ith leverage value.
CooksDistance is an n-by-1 column vector in the Diagnostics table of the LinearModel object.
The main fitting algorithm is QR decomposition. For robust fitting, the algorithm is robustfit.
To remove redundant predictors in linear regression using lasso or elastic net, use the lasso function.
To regularize a regression with correlated terms using ridge regression, use the ridge or lasso functions.
To regularize a regression with correlated terms using partial least squares, use the plsregress function.