Linear regression model class
mdl = stepwiselm(
a linear model of a table or dataset array
or of the responses
y to a data matrix
with unimportant predictors excluded. For details, see
|addTerms||Add terms to linear regression model|
|compact||Compact linear regression model|
|dwtest||Durbin-Watson test of linear model|
|fit||Create linear regression model|
|plot||Scatter plot or added variable plot of linear model|
|plotAdded||Added variable plot or leverage plot for linear model|
|plotAdjustedResponse||Adjusted response plot for linear regression model|
|plotDiagnostics||Plot diagnostics of linear regression model|
|plotResiduals||Plot residuals of linear regression model|
|removeTerms||Remove terms from linear model|
|step||Improve linear regression model by adding or removing terms|
|stepwise||Create linear regression model by stepwise regression|
|anova||Analysis of variance for linear model|
|coefCI||Confidence intervals of coefficient estimates of linear model|
|coefTest||Linear hypothesis test on linear regression model coefficients|
|disp||Display linear regression model|
|feval||Evaluate linear regression model prediction|
|plotEffects||Plot main effects of each predictor in linear regression model|
|plotInteraction||Plot interaction effects of two predictors in linear regression model|
|plotSlice||Plot of slices through fitted linear regression surface|
|predict||Predict response of linear regression model|
|random||Simulate responses for linear regression model|
Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB).
Fit a linear model of the Hald data.
Load the data.
load hald X = ingredients; % Predictor variables y = heat; % Response
Fit a default linear model to the data.
mdl = fitlm(X,y)
mdl = Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 Estimated Coefficients: Estimate SE tStat pValue ________ _______ ________ ________ (Intercept) 62.405 70.071 0.8906 0.39913 x1 1.5511 0.74477 2.0827 0.070822 x2 0.51017 0.72379 0.70486 0.5009 x3 0.10191 0.75471 0.13503 0.89592 x4 -0.14406 0.70905 -0.20317 0.84407 Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 2.45 R-squared: 0.982, Adjusted R-Squared 0.974 F-statistic vs. constant model: 111, p-value = 4.76e-07
Fit a model of a table that contains a categorical predictor.
Construct a table containing continuous predictor variable
Weight, nominal predictor variable
Year, and response variable
tbl = table(MPG,Weight); tbl.Year = nominal(Model_Year);
Create a fitted model of
MPG as a function of
Weight^2. (You don't have to include
Weight explicitly in your formula because it is a lower-order term of
Weight^2) and is included automatically.
mdl = fitlm(tbl,'MPG ~ Year + Weight^2')
mdl = Linear regression model: MPG ~ 1 + Weight + Year + Weight^2 Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 54.206 4.7117 11.505 2.6648e-19 Weight -0.016404 0.0031249 -5.2493 1.0283e-06 Year_76 2.0887 0.71491 2.9215 0.0044137 Year_82 8.1864 0.81531 10.041 2.6364e-16 Weight^2 1.5573e-06 4.9454e-07 3.149 0.0022303 Number of observations: 94, Error degrees of freedom: 89 Root Mean Squared Error: 2.78 R-squared: 0.885, Adjusted R-Squared 0.88 F-statistic vs. constant model: 172, p-value = 5.52e-41
fitlm creates two dummy (indicator) variables for the nominal variate,
Year. The dummy variable
Year_76 takes the value 1 if model year is 1976 and takes the value 0 if it is not. The dummy variable
Year_82 takes the value 1 if model year is 1982 and takes the value 0 if it is not. And the year 1970 is the reference year. The corresponding model is
Fit a linear regression model using a robust fitting method.
Load the sample data.
hald data measures the effect of cement composition on its hardening heat. The matrix
ingredients contains the percent composition of four chemicals present in the cement. The array
heat contains the heat of hardening after 180 days for each cement sample.
Fit a robust linear model to the data.
mdl = fitlm(ingredients,heat,'linear','RobustOpts','on')
mdl = Linear regression model (robust fit): y ~ 1 + x1 + x2 + x3 + x4 Estimated Coefficients: Estimate SE tStat pValue ________ _______ ________ ________ (Intercept) 60.09 75.818 0.79256 0.4509 x1 1.5753 0.80585 1.9548 0.086346 x2 0.5322 0.78315 0.67957 0.51596 x3 0.13346 0.8166 0.16343 0.87424 x4 -0.12052 0.7672 -0.15709 0.87906 Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 2.65 R-squared: 0.979, Adjusted R-Squared 0.969 F-statistic vs. constant model: 94.6, p-value = 9.03e-07
The hat matrix H is defined in terms of the data matrix X:
H = X(XTX)–1XT.
The diagonal elements hii satisfy
where n is the number of observations (rows of X), and p is the number of coefficients in the regression model.
The leverage of observation i is the value of the ith diagonal term, hii, of the hat matrix H. Because the sum of the leverage values is p (the number of coefficients in the regression model), an observation i can be considered to be an outlier if its leverage substantially exceeds p/n, where n is the number of observations.
Cook’s distance is the scaled change in fitted values.
Each element in
CooksDistance is the normalized
change in the vector of coefficients due to the deletion of an observation.
The Cook’s distance, Di,
of observation i is
is the jth fitted response value.
is the jth fitted response value, where the fit does not include observation i.
MSE is the mean squared error.
p is the number of coefficients in the regression model.
Cook’s distance is algebraically equivalent to the following expression:
where ri is the ith residual, and hii is the ith leverage value.
CooksDistance is an n-by-1
column vector in the
Diagnostics table of the
The main fitting algorithm is QR decomposition. For robust fitting,
the algorithm is
To remove redundant predictors in linear regression using lasso
or elastic net, use the
To regularize a regression with correlated terms using partial
least squares, use the
Usage notes and limitations:
When you fit a model by using
stepwiselm, you cannot supply training data in a table that contains
at least one categorical predictor, and you cannot use the
'CategoricalVars' name-value pair argument. Code generation does not
support categorical predictors. To dummy-code variables that you want treated as
categorical, preprocess the categorical data by using
dummyvar before fitting the