This example shows how to display and interpret linear regression output statistics.
Load sample data and define predictor variables.
load carsmall
X = [Weight,Horsepower,Acceleration];
Fit linear regression model.
lm = fitlm(X,MPG,'linear')
lm = Linear regression model: y ~ 1 + x1 + x2 + x3 Estimated Coefficients: Estimate SE tStat pValue (Intercept) 47.977 3.8785 12.37 4.8957e-21 x1 -0.0065416 0.0011274 -5.8023 9.8742e-08 x2 -0.042943 0.024313 -1.7663 0.08078 x3 -0.011583 0.19333 -0.059913 0.95236 Number of observations: 93, Error degrees of freedom: 89 Root Mean Squared Error: 4.09 R-squared: 0.752, Adjusted R-Squared 0.744 F-statistic vs. constant model: 90, p-value = 7.38e-27
This linear regression outputs display shows the following.
y ~ 1 + x1 + x2 + x3 | Linear regression model in the formula form using Wilkinson notation. Here it corresponds to: $$y={\beta}_{0}+{\beta}_{1}{X}_{1}+{\beta}_{2}{X}_{2}+{\beta}_{3}{X}_{3}+\epsilon .$$ |
First column (under Estimated Coefficients ) | Terms included in the model. |
Estimate | Coefficient estimates for each corresponding term in
the model. For example, the estimate for the constant term ( |
SE | Standard error of the coefficients. |
tStat | t-statistic for each coefficient to
test the null hypothesis that the corresponding coefficient is zero
against the alternative that it is different from zero, given the
other predictors in the model. Note that |
pValue | p-value for the F statistic of the
hypotheses test that the corresponding coefficient is equal to zero
or not. For example, the p-value of the F-statistic
for |
Number of observations | Number of rows without any |
Error degrees of freedom | n – p, where n is
the number of observations, and p is the number
of coefficients in the model, including the intercept. For example,
the model has four predictors, so the |
Root mean squared error | Square root of the mean squared error, which estimates the standard deviation of the error distribution. |
R-squared and Adjusted R-squared | Coefficient of determination and adjusted coefficient
of determination, respectively. For example, the |
F-statistic vs. constant model | Test statistic for the F-test on the regression model. It tests for a significant linear regression relationship between the response variable and the predictor variables. |
p-value | p-value for the F-test on the model. For example, the model is significant with a p-value of 7.3816e-27. |
You can request this display by using disp
.
For example, if you name your model lm
, then you
can display the outputs using disp(lm)
.
Perform analysis of variance (ANOVA) for the model.
anova(lm,'summary')
ans = SumSq DF MeanSq F pValue Total 6004.8 92 65.269 Model 4516 3 1505.3 89.987 7.3816e-27 Residual 1488.8 89 16.728
This ANOVA display shows the following.
SumSq | Sum of squares for the regression model, |
DF | Degrees of freedom for each term. Degrees of freedom
is n – 1 for the total, p –
1 for the model, and n – p for
the error term, where n is the number of observations,
and p is the number of coefficients in the model,
including the intercept. For example, |
MeanSq | Mean squared error for each term. Note that |
F | F-statistic value, which is the same as |
pValue | p-value for the F-test on the model. In this example, it is 7.3816e-27. |
If there are higher-order terms in the regression model, anova
partitions
the model SumSq
into the part explained by the
higher-order terms and the rest. The corresponding F-statistics are
for testing the significance of the linear terms and higher-order
terms as separate groups.
If the data includes replicates, or multiple measurements at
the same predictor values, then the anova
partitions
the error SumSq
into the part for the replicates
and the rest. The corresponding F-statistic is for testing the lack-of-fit
by comparing the model residuals with the model-free variance estimate
computed on the replicates.
See the anova
method for details.
Decompose ANOVA table for model terms.
anova(lm)
ans = SumSq DF MeanSq F pValue x1 563.18 1 563.18 33.667 9.8742e-08 x2 52.187 1 52.187 3.1197 0.08078 x3 0.060046 1 0.060046 0.0035895 0.95236 Error 1488.8 89 16.728
This anova
display shows the following:
First column | Terms included in the model. |
SumSq | Sum of squared error for each term except for the constant. |
DF | Degrees of freedom. In this example, DF for
that variable is the number of indicator variables created for its
categories (number of categories – 1). |
MeanSq | Mean squared error for each term. Note that |
F | F-values for each coefficient. The F-value is the ratio
of the mean squared of each term and mean squared error, that is, F
= MeanSq(x_{i})/MeanSq(Error).
Each F-statistic has an F distribution, with the numerator degrees
of freedom, |
pValue | p-value for each hypothesis test on
the coefficient of the corresponding term in the linear model. For
example, the p-value for the F-statistic coefficient
of |
Display coefficient confidence intervals.
coefCI(lm)
ans = 40.2702 55.6833 -0.0088 -0.0043 -0.0913 0.0054 -0.3957 0.3726
The values in each row are the lower and upper confidence limits, respectively, for the default 95% confidence intervals for the coefficients. For example, the first row shows the lower and upper limits, 40.2702 and 55.6833, for the intercept, β_{0}. Likewise, the second row shows the limits for β_{1} and so on. Confidence intervals provide a measure of precision for linear regression coefficient estimates. A 100(1–α)% confidence interval gives the range the corresponding regression coefficient will be in with 100(1–α)% confidence.
You can also change the confidence level. Find the 99% confidence intervals for the coefficients.
coefCI(lm,0.01)
ans = 37.7677 58.1858 -0.0095 -0.0036 -0.1069 0.0211 -0.5205 0.4973
Perform hypothesis test on coefficients.
Test the null hypothesis that all predictor variable coefficients are equal to zero versus the alternate hypothesis that at least one of them is different from zero.
[p,F,d] = coefTest(lm)
p = 7.3816e-27 F = 89.9874 d = 3
Here, coefTest
performs an F-test for the
hypothesis that all regression coefficients (except for the intercept)
are zero versus at least one differs from zero, which essentially
is the hypothesis on the model. It returns p
, the p-value, F
,
the F-statistic, and d
, the numerator degrees of
freedom. The F-statistic and p-value are the same
as the ones in the linear regression display and ANOVA for the model.
The degrees of freedom is 4 – 1 = 3 because there are four
predictors (including the intercept) in the model.
Now, perform a hypothesis test on the coefficients of the first and second predictor variables.
H = [0 1 0 0; 0 0 1 0]; [p,F,d] = coefTest(lm,H)
p = 5.1702e-23 F = 96.4873 d = 2
The numerator degrees of freedom is the number of coefficients tested, which is 2 in this example. The results indicate that at least one of β_{2} and β_{3} differs from zero.
anova
| fitlm
| LinearModel
| stepwiselm