MathWorks Machine Translation
The automated translation of this page is provided by a general purpose third party translator tool.
MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.
An object comprising training data, model description, diagnostic
information, and fitted coefficients for a linear regression. Predict
model responses with the predict
or feval
methods.
or mdl
=
fitlm(tbl
)
create
a linear model of a table or dataset array mdl
=
fitlm(X
,y
)tbl
,
or of the responses y
to a data matrix X
.
For details, see fitlm
.
or mdl
= stepwiselm(tbl
)
create
a linear model of a table or dataset array mdl
=
stepwiselm(X
,y
)tbl
,
or of the responses y
to a data matrix X
,
with unimportant predictors excluded. For details, see stepwiselm
.
tbl
— Input dataInput data, specified as a table or dataset array. When modelspec
is
a formula
, it specifies the variables to be used
as the predictors and response. Otherwise, if you do not specify the
predictor and response variables, the last variable is the response
variable and the others are the predictor variables by default.
Predictor variables can be numeric, or any grouping variable type, such as logical or categorical (see Grouping Variables). The response must be numeric or logical.
To set a different column as the response variable, use the ResponseVar
namevalue
pair argument. To use a subset of the columns as predictors, use the PredictorVars
namevalue
pair argument.
Data Types: single
 double
 logical
X
— Predictor variablesPredictor variables, specified as an nbyp matrix,
where n is the number of observations and p is
the number of predictor variables. Each column of X
represents
one variable, and each row represents one observation.
By default, there is a constant term in the model, unless you
explicitly remove it, so do not include a column of 1s in X
.
Data Types: single
 double
 logical
y
— Response variableResponse variable, specified as an nby1
vector, where n is the number of observations.
Each entry in y
is the response for the corresponding
row of X
.
Data Types: single
 double
CoefficientCovariance
— Covariance matrix of coefficient estimatesCovariance matrix of coefficient estimates, stored as a pbyp matrix of numeric values. p is the number of coefficients in the fitted model.
CoefficientNames
— Coefficient namesCoefficient names, stored as a cell array of character vectors containing a label for each coefficient.
Coefficients
— Coefficient valuesCoefficient values, stored as a table. Coefficients
has
one row for each coefficient and the following columns:
Estimate
— Estimated coefficient
value
SE
— Standard error of the
estimate
tStat
— t statistic
for a test that the coefficient is zero
pValue
— pvalue
for the t statistic
To obtain any of these columns as a vector, index into the property
using dot notation. For example, in mdl
the estimated
coefficient vector is
beta = mdl.Coefficients.Estimate
Use coefTest
to perform other tests on the
coefficients.
DFE
— Degrees of freedom for errorDegrees of freedom for error (residuals), equal to the number of observations minus the number of estimated coefficients, stored as a positive integer value.
Diagnostics
— Diagnostic valuesDiagnostic values, stored as a table with the same number of
rows as the input data (tbl
or X
). Diagnostics
contains
diagnostics helpful in finding outliers and influential observations.
Many diagnostics describe the effect on the fit of deleting single
observations. Diagnostics
contains the following
fields.
Field  Meaning  Utility 

Leverage  Diagonal elements of HatMatrix  Leverage indicates to what extent the predicted
value for an observation is determined by the observed value for that
observation. A value close to 1 indicates that
the prediction is largely determined by that observation, with little
contribution from the other observations. A value close to 0 indicates
the fit is largely determined by the other observations. For a model
with P coefficients and N observations,
the average value of Leverage is P/N .
An observation with Leverage larger than 2*P/N can
be regarded as having high leverage. 
CooksDistance  Cook's measure of scaled change in fitted values  CooksDistance is a measure of scaled change
in fitted values. An observation with CooksDistance larger
than three times the mean Cook's distance can be an outlier. 
Dffits  Delete1 scaled differences in fitted values vs. observation number  Dffits is the scaled change in the fitted
values for each observation that would result from excluding that
observation from the fit. Values with an absolute value larger than 2*sqrt(P/N) may
be considered influential. 
S2_i  Delete1 variance vs. observation number  S2_i is a set of residual variance estimates
obtained by deleting each observation in turn. These can be compared
with the value of the MSE property. 
CovRatio  Delete1 ratio of determinant of covariance vs. observation number  CovRatio is the ratio of the determinant
of the coefficient covariance matrix with each observation deleted
in turn to the determinant of the covariance matrix for the full model.
Values larger than 1+3*P/N or smaller than 13*P/N indicate
influential points. 
Dfbetas  Delete1 scaled differences in covariance estimates vs. observation number  Dfbetas is an N byP matrix
of the scaled change in the coefficient estimates that would result
from excluding each observation in turn. Values larger than 3/sqrt(N) in
absolute value indicate that the observation has a large influence
on the corresponding coefficient. 
HatMatrix  Projection matrix to compute fitted from observed responses  HatMatrix is an N byN matrix
such that Fitted = HatMatrix*Y ,
where Y is the response vector and Fitted is
the vector of fitted response values. 
Rows not used in the fit because of missing values (in ObservationInfo.Missing
)
contain NaN
values.
Rows not used in the fit because of excluded values (in ObservationInfo.Excluded
)
contain NaN
values, with the following exception:
Delete1 diagnostics refer to the statistic with and without that
observation (row) included in the fit. These diagnostics help identify
important observations.
Fitted
— Fitted response values based on input dataFitted (predicted) response values based on input data, stored
as an nby1 vector of numeric values. n is
the number of observations in the input data. Use predict
to
compute predictions for other predictor values, or to compute confidence
bounds on Fitted
.
Formula
— Model informationLinearFormula
object  NonLinearFormula
objectModel information, stored as a LinearFormula
object
or NonLinearFormula
object. If you fit a linear
or generalized linear regression model, then Formula
is
a LinearFormula
object. If you fit a nonlinear
regression model, then Formula
is a NonLinearFormula
object.
LogLikelihood
— Log likelihoodLog likelihood of the model distribution at the response values, stored as a numeric value. The mean is fitted from the model, and other parameters are estimated as part of the model fit.
ModelCriterion
— Criterion for model comparisonCriterion for model comparison, stored as a structure with the following fields:
AIC
— Akaike information
criterion
AICc
— Akaike information
criterion corrected for sample size
BIC
— Bayesian information
criterion
CAIC
— Consistent Akaike
information criterion
To obtain any of these values as a scalar, index into the property
using dot notation. For example, in a model mdl
,
the AIC value aic
is:
aic = mdl.ModelCriterion.AIC
MSE
— Mean squared errorMean squared error (residuals), stored as a numeric value. Mean square error is calculated as MSE = SSE / DFE, where MSE is the mean square error, SSE is the sum of squared errors, and DFE is the degrees of freedom.
NumCoefficients
— Number of model coefficientsNumber of model coefficients, stored as a positive integer. NumCoefficients
includes
coefficients that are set to zero when the model terms are rank deficient.
NumEstimatedCoefficients
— Number of estimated coefficientsNumber of estimated coefficients in the model, stored as a positive
integer. NumEstimatedCoefficients
does not include
coefficients that are set to zero when the model terms are rank deficient. NumEstimatedCoefficients
is
the degrees of freedom for regression.
NumObservations
— Number of observationsNumber of observations the fitting function used in fitting,
stored as a positive integer. This is the number of observations supplied
in the original table, dataset, or matrix, minus any excluded rows
(set with the Excluded
namevalue pair) or rows
with missing values.
NumPredictors
— Number of predictor variablesNumber of predictor variables used to fit the model, stored as a positive integer.
NumVariables
— Number of variablesNumber of variables in the input data, stored as a positive
integer. NumVariables
is the number of variables
in the original table or dataset, or the total number of columns in
the predictor matrix and response vector when the fit is based on
those arrays. It includes variables, if any, that are not used as
predictors or as the response.
ObservationInfo
— Observation informationObservation information, stored as a nby4
table, where n is equal to the number of rows of
input data. The four columns of ObservationInfo
contain
the following:
Field  Description 

Weights  Observation weights. Default is all 1 . 
Excluded  Logical value, 1 indicates an observation
that you excluded from the fit with the Exclude namevalue
pair. 
Missing  Logical value, 1 indicates a missing value
in the input. Missing values are not used in the fit. 
Subset  Logical value, 1 indicates the observation
is not excluded or missing, so is used in the fit. 
ObservationNames
— Observation namesObservation names, stored as a cell array of character vectors containing the names of the observations used in the fit.
If the fit is based on a table or dataset containing
observation names, ObservationNames
uses those
names.
Otherwise, ObservationNames
is
an empty cell array
PredictorNames
— Names of predictors used to fit the modelNames of predictors used to fit the model, stored as a cell array of character vectors.
Residuals
— Residuals for fitted modelResiduals for fitted model, stored as a table that contains one row for each observation and the following columns.
Field  Description 

Raw  Observed minus fitted values. 
Pearson  Raw residuals divided by RMSE. 
Standardized  Raw residuals divided by their estimated standard deviation. 
Studentized  Residual divided by an independent estimate of the residual standard deviation. The residual for observation i is divided by an estimate of the error standard deviation based on all observations except for observation i. 
To obtain any of these columns as a vector, index into the property
using dot notation. For example, in a model mdl
,
the ordinary raw residual vector r
is:
r = mdl.Residuals.Raw
Rows not used in the fit because of missing values (in ObservationInfo.Missing
)
contain NaN
values.
Rows not used in the fit because of excluded values (in ObservationInfo.Excluded
)
contain NaN
values, with the following exceptions:
raw
contains the difference between
the observed and predicted values.
standardized
is the residual, standardized
in the usual way.
studentized
matches the standardized
values because this residual is not used in the estimate of the residual
standard deviation.
ResponseName
— Response variable nameResponse variable name, stored as a character vector.
RMSE
— Root mean squared errorRoot mean squared error (residuals), stored as a numeric value. The root mean squared error (RMSE) is equal to RMSE = sqrt(MSE), where MSE is the mean squared error.
Robust
— Robust fit informationRobust fit information, stored as a structure with the following fields:
Field  Description 

WgtFun  Robust weighting function, such as 'bisquare' (see robustfit ) 
Tune  Value specified for tuning parameter (can be [] ) 
Weights  Vector of weights used in final iteration of robust fit. This
field is empty for compacted CompactLinearModel models. 
This structure is empty unless fitlm
constructed
the model using robust regression.
Rsquared
— Rsquared value for the modelRsquared value for the model, stored as a structure.
For a linear or nonlinear model, Rsquared
is
a structure with two fields:
Ordinary
— Ordinary (unadjusted)
Rsquared
Adjusted
— Rsquared adjusted
for the number of coefficients
For a generalized linear model, Rsquared
is
a structure with five fields:
Ordinary
— Ordinary (unadjusted)
Rsquared
Adjusted
— Rsquared adjusted
for the number of coefficients
LLR
— Loglikelihood ratio
Deviance
— Deviance
AdjGeneralized
— Adjusted
generalized Rsquared
The Rsquared value is the proportion of total sum of squares
explained by the model. The ordinary Rsquared value relates to the SSR
and SST
properties:
Rsquared = SSR/SST = 1  SSE/SST
.
To obtain any of these values as a scalar, index into the property
using dot notation. For example, the adjusted Rsquared value in mdl
is
r2 = mdl.Rsquared.Adjusted
SSE
— Sum of squared errorsSum of squared errors (residuals), stored as a numeric value.
The Pythagorean theorem implies
SST = SSE + SSR
.
SSR
— Regression sum of squaresRegression sum of squares, stored as a numeric value. The regression sum of squares is equal to the sum of squared deviations of the fitted values from their mean.
The Pythagorean theorem implies
SST = SSE + SSR
.
SST
— Total sum of squaresTotal sum of squares, stored as a numeric value. The total sum
of squares is equal to the sum of squared deviations of y
from mean(y)
.
The Pythagorean theorem implies
SST = SSE + SSR
.
Steps
— Stepwise fitting informationStepwise fitting information, stored as a structure with the following fields.
Field  Description 

Start  Formula representing the starting model 
Lower  Formula representing the lower bound model, these terms that must remain in the model 
Upper  Formula representing the upper bound model, model cannot contain
more terms than Upper 
Criterion  Criterion used for the stepwise algorithm, such as 'sse' 
PEnter  Value of the parameter, such as 0.05 
PRemove  Value of the parameter, such as 0.10 
History  Table representing the steps taken in the fit 
The History
table has one row for each step
including the initial fit, and the following variables (columns).
Field  Description 

Action  Action taken during this step, one of:

TermName 

Terms  Terms matrix (see modelspec of fitlm ) 
DF  Regression degrees of freedom after this step 
delDF  Change in regression degrees of freedom from previous step (negative for steps that remove a term) 
Deviance  Deviance (residual sum of squares) at that step 
FStat  F statistic that led to this step 
PValue  pvalue of the F statistic 
The structure is empty unless you use stepwiselm
or stepwiseglm
to fit the model.
VariableInfo
— Information about input variablesInformation about input variables contained in Variables
,
stored as a table with one row for each model term and the following
columns.
Field  Description 

Class  Character vector giving variable class, such as 'double' 
Range  Cell array giving variable range:

InModel  Logical vector, where true indicates the
variable is in the model 
IsCategorical  Logical vector, where true indicates a categorical
variable 
VariableNames
— Names of variables used in fitNames of variables used in fit, stored as a cell array of character vectors.
If the fit is based on a table or dataset, this property provides the names of the variables in that table or dataset.
If the fit is based on a predictor matrix and response
vector, VariableNames
is the values in the VarNames
namevalue
pair of the fitting method.
Otherwise the variables have the default fitting names.
Variables
— Data used to fit the modelData used to fit the model, stored as a table. Variables
contains
both observation and response values. If the fit is based on a table
or dataset array, Variables
contains all of the
data from that table or dataset array. Otherwise, Variables
is
a table created from the input data matrix X
and
response vector y
.
addTerms  Add terms to linear regression model 
compact  Compact linear regression model 
dwtest  DurbinWatson test of linear model 
fit  Create linear regression model 
plot  Scatter plot or added variable plot of linear model 
plotAdded  Added variable plot or leverage plot for linear model 
plotAdjustedResponse  Adjusted response plot for linear regression model 
plotDiagnostics  Plot diagnostics of linear regression model 
plotResiduals  Plot residuals of linear regression model 
removeTerms  Remove terms from linear model 
step  Improve linear regression model by adding or removing terms 
stepwise  Create linear regression model by stepwise regression 
anova  Analysis of variance for linear model 
coefCI  Confidence intervals of coefficient estimates of linear model 
coefTest  Linear hypothesis test on linear regression model coefficients 
disp  Display linear regression model 
feval  Evaluate linear regression model prediction 
plotEffects  Plot main effects of each predictor in linear regression model 
plotInteraction  Plot interaction effects of two predictors in linear regression model 
plotSlice  Plot of slices through fitted linear regression surface 
predict  Predict response of linear regression model 
random  Simulate responses for linear regression model 
Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB) in the MATLAB^{®} documentation.
Fit a linear model of the Hald data.
Load the data.
load hald X = ingredients; % Predictor variables y = heat; % Response
Fit a default linear model to the data.
mdl = fitlm(X,y)
mdl = Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 Estimated Coefficients: Estimate SE tStat pValue ________ _______ ________ ________ (Intercept) 62.405 70.071 0.8906 0.39913 x1 1.5511 0.74477 2.0827 0.070822 x2 0.51017 0.72379 0.70486 0.5009 x3 0.10191 0.75471 0.13503 0.89592 x4 0.14406 0.70905 0.20317 0.84407 Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 2.45 Rsquared: 0.982, Adjusted RSquared 0.974 Fstatistic vs. constant model: 111, pvalue = 4.76e07
Fit a model of a table that contains a categorical predictor.
Load the carsmall
data.
load carsmall
Construct a table containing continuous predictor variable Weight
, nominal predictor variable Year
, and response variable MPG
.
tbl = table(MPG,Weight); tbl.Year = nominal(Model_Year);
Create a fitted model of MPG
as a function of Year
, Weight
, and Weight^2
. (You don't have to include Weight
explicitly in your formula because it is a lowerorder term of Weight^2
) and is included automatically.
mdl = fitlm(tbl,'MPG ~ Year + Weight^2')
mdl = Linear regression model: MPG ~ 1 + Weight + Year + Weight^2 Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 54.206 4.7117 11.505 2.6648e19 Weight 0.016404 0.0031249 5.2493 1.0283e06 Year_76 2.0887 0.71491 2.9215 0.0044137 Year_82 8.1864 0.81531 10.041 2.6364e16 Weight^2 1.5573e06 4.9454e07 3.149 0.0022303 Number of observations: 94, Error degrees of freedom: 89 Root Mean Squared Error: 2.78 Rsquared: 0.885, Adjusted RSquared 0.88 Fstatistic vs. constant model: 172, pvalue = 5.52e41
fitlm
creates two dummy (indicator) variables for the nominal variate, Year
. The dummy variable Year_76
takes the value 1 if model year is 1976 and takes the value 0 if it is not. The dummy variable Year_82
takes the value 1 if model year is 1982 and takes the value 0 if it is not. And the year 1970 is the reference year. The corresponding model is
Fit a linear regression model using a robust fitting method.
Load the sample data.
load hald
The hald
data measures the effect of cement composition on its hardening heat. The matrix ingredients
contains the percent composition of four chemicals present in the cement. The array heat
contains the heat of hardening after 180 days for each cement sample.
Fit a robust linear model to the data.
mdl = fitlm(ingredients,heat,'linear','RobustOpts','on')
mdl = Linear regression model (robust fit): y ~ 1 + x1 + x2 + x3 + x4 Estimated Coefficients: Estimate SE tStat pValue ________ _______ ________ ________ (Intercept) 60.09 75.818 0.79256 0.4509 x1 1.5753 0.80585 1.9548 0.086346 x2 0.5322 0.78315 0.67957 0.51596 x3 0.13346 0.8166 0.16343 0.87424 x4 0.12052 0.7672 0.15709 0.87906 Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 2.65 Rsquared: 0.979, Adjusted RSquared 0.969 Fstatistic vs. constant model: 94.6, pvalue = 9.03e07
The hat matrix H is defined in terms of the data matrix X:
H = X(X^{T}X)^{–1}X^{T}.
The diagonal elements h_{ii} satisfy
$$\begin{array}{l}0\le {h}_{ii}\le 1\\ {\displaystyle \sum _{i=1}^{n}{h}_{ii}}=p,\end{array}$$
where n is the number of observations (rows of X), and p is the number of coefficients in the regression model.
The leverage of observation i is the value of the ith diagonal term, h_{ii}, of the hat matrix H. Because the sum of the leverage values is p (the number of coefficients in the regression model), an observation i can be considered to be an outlier if its leverage substantially exceeds p/n, where n is the number of observations.
Cook's distance is the scaled change in fitted values.
Each element in CooksDistance
is the normalized
change in the vector of coefficients due to the deletion of an observation.
The Cook's distance, D_{i},
of observation i is
$${D}_{i}=\frac{{\displaystyle \sum _{j=1}^{n}{\left({\widehat{y}}_{j}{\widehat{y}}_{j(i)}\right)}^{2}}}{p\text{\hspace{0.17em}}MSE},$$
where
$${\widehat{y}}_{j}$$ is the jth fitted response value.
$${\widehat{y}}_{j(i)}$$ is the jth fitted response value, where the fit does not include observation i.
MSE is the mean squared error.
p is the number of coefficients in the regression model.
Cook's distance is algebraically equivalent to the following expression:
$${D}_{i}=\frac{{r}_{i}^{2}}{p\text{\hspace{0.17em}}MSE}\left(\frac{{h}_{ii}}{{\left(1{h}_{ii}\right)}^{2}}\right),$$
where r_{i} is the ith residual, and h_{ii} is the ith leverage value.
CooksDistance
is an nby1
column vector in the Diagnostics
table of the LinearModel
object.
The main fitting algorithm is QR decomposition. For robust fitting,
the algorithm is robustfit
.
To remove redundant predictors in linear regression using lasso
or elastic net, use the lasso
function.
To regularize a regression with correlated terms using ridge
regression, use the ridge
or lasso
functions.
To regularize a regression with correlated terms using partial
least squares, use the plsregress
function.
Usage notes and limitations:
Only the predict
and random
methods
support code generation.
When fitting the model using fitlm
,
you cannot supply a table of data containing any categorical variables
or specify categorical variables using the CategoricalVars
namevalue
pair argument. That is, the IsCategorical
variable
of the VariableInfo
property cannot contain any 'categorical'
entries.
To dummycode variables that you want treated as categorical, preprocess
the categorical data before fitting the model using dummyvar
.
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
You can also select a location from the following list: