In linear regression, the F-statistic is the test statistic for the analysis of variance (ANOVA) approach to test the significance of the model or the components in the model.

The F-statistic in the linear model output display is the test
statistic for testing the statistical significance of the model. The
F-statistic values in the `anova`

display are for
assessing the significance of the terms or components in the model.

After obtaining a fitted model, say, `mdl`

,
using `fitlm`

or `stepwiselm`

, you
can:

Find the

`F-statistic vs. constant model`

in the output display or by usingdisp(mdl)

Display the ANOVA for the model using

anova(mdl,'summary')

Obtain the F-statistic values for the components, except for the constant term using

For details, see theanova(mdl)

`anova`

method of the`LinearModel`

class.

This example shows how to use assess the fit of the model and the significance of the regression coefficients using F-statistic.

Load the sample data.

```
load carbig
tbl = table(Acceleration,Cylinders,Weight,MPG);
tbl.Cylinders = ordinal(Cylinders);
```

Fit a linear regression model.

```
mdl = fitlm(tbl,'MPG~Acceleration*Weight+Cylinders+Weight^2')
```

mdl = Linear regression model: MPG ~ 1 + Cylinders + Acceleration*Weight + Weight^2 Estimated Coefficients: Estimate SE tStat pValue __________ __________ ________ __________ (Intercept) 50.816 7.5669 6.7156 6.661e-11 Acceleration 0.023343 0.33931 0.068796 0.94519 Cylinders_4 7.167 2.0596 3.4798 0.0005587 Cylinders_5 10.963 3.1299 3.5028 0.00051396 Cylinders_6 4.7415 2.1257 2.2306 0.026279 Cylinders_8 5.057 2.2981 2.2005 0.028356 Weight -0.017497 0.0034674 -5.0461 6.9371e-07 Acceleration:Weight 7.0745e-05 0.00011171 0.6333 0.52691 Weight^2 1.5767e-06 3.6909e-07 4.2719 2.4396e-05 Number of observations: 398, Error degrees of freedom: 389 Root Mean Squared Error: 4.02 R-squared: 0.741, Adjusted R-Squared 0.736 F-statistic vs. constant model: 139, p-value = 2.94e-109

The F-statistic of the linear fit versus the constant model is 139, with a *p*-value of 2.94e-109. The model is significant at the 5% significance level. The R-squared value of 0.741 means the model explains about 74% of the variability in the response.

Display the ANOVA table for the fitted model.

```
anova(mdl,'summary')
```

ans = SumSq DF MeanSq F pValue ______ ___ ______ ______ ___________ Total 24253 397 61.09 Model 17981 8 2247.6 139.41 2.9432e-109 . Linear 17667 6 2944.4 182.63 7.5446e-110 . Nonlinear 314.36 2 157.18 9.7492 7.3906e-05 Residual 6271.6 389 16.122 . Lack of fit 6267.1 387 16.194 7.1973 0.12968 . Pure error 4.5 2 2.25

This display separates the variability in the model into linear and nonlinear terms. Since there are two non-linear terms (`Weight^2`

and the interaction between `Weight`

and `Acceleration`

), the nonlinear degrees of freedom in the `DF`

column is 2. There are six linear terms in the model (four `Cylinders`

indicator variables, `Weight`

, and `Acceleration`

). The corresponding F-statistics in the `F`

column are for testing the significance of the linear and nonlinear terms as separate groups.

The residual term is also separated into two parts; first is the error due to the lack of fit, and second is the pure error independent from the model, obtained from the replicated observations. The corresponding F-statistics in the `F`

column are for testing the lack of fit, that is, whether the proposed model is an adequate fit or not.

Display the ANOVA table for the model terms.

anova(mdl)

ans = SumSq DF MeanSq F pValue ______ ___ ______ _______ __________ Acceleration 104.99 1 104.99 6.5122 0.011095 Cylinders 408.94 4 102.23 6.3412 5.9573e-05 Weight 2187.5 1 2187.5 135.68 4.1974e-27 Acceleration:Weight 6.4662 1 6.4662 0.40107 0.52691 Weight^2 294.22 1 294.22 18.249 2.4396e-05 Error 6271.6 389 16.122

This display decomposes the ANOVA table into the model terms. The corresponding F-statistics in the `F`

column are for assessing the statistical significance of each term. The F-test for `Cylinders`

test whether at least one of the coefficients of indicator variables for cylinders categories is different from zero or not. That is, whether different numbers of cylinders have a significant effect on `MPG`

or not. The degrees of freedom for each model term is the numerator degrees of freedom for the corresponding F-test. Most of the terms have 1 degree of freedom, but the degrees of freedom for `Cylinders`

is 4. Because there are four indicator variables for this term.

In linear regression, the *t*-statistic is
useful for making inferences about the regression coefficients. The
hypothesis test on coefficient *i* tests the null
hypothesis that it is equal to zero – meaning the corresponding
term is not significant – versus the alternate hypothesis that
the coefficient is different from zero.

For a hypotheses test on coefficient *i*, with

H_{0} :* β*_{i} =
0

H_{1} : *β*_{i} ≠
0,

the *t*-statistic is:

$$t=\frac{{b}_{i}}{SE({b}_{i})},$$

where *SE*(*b*_{i})
is the standard error of the estimated coefficient *b*_{i}.

After obtaining a fitted model, say, `mdl`

,
using `fitlm`

or `stepwiselm`

, you
can:

Find the coefficient estimates, the standard errors of the estimates (

`SE`

), and the*t*-statistic values of hypothesis tests for the corresponding coefficients (`tStat`

) in the output display.Call for the display using

display(mdl)

This example shows how to test for the significance of the regression coefficients using t-statistic.

Load the sample data and fit the linear regression model.

```
load hald
mdl = fitlm(ingredients,heat)
```

mdl = Linear regression model: y ~ 1 + x1 + x2 + x3 + x4 Estimated Coefficients: Estimate SE tStat pValue ________ _______ ________ ________ (Intercept) 62.405 70.071 0.8906 0.39913 x1 1.5511 0.74477 2.0827 0.070822 x2 0.51017 0.72379 0.70486 0.5009 x3 0.10191 0.75471 0.13503 0.89592 x4 -0.14406 0.70905 -0.20317 0.84407 Number of observations: 13, Error degrees of freedom: 8 Root Mean Squared Error: 2.45 R-squared: 0.982, Adjusted R-Squared 0.974 F-statistic vs. constant model: 111, p-value = 4.76e-07

You can see that for each coefficient, `tStat = Estimate/SE`

. The
-values for the hypotheses tests are in the `pValue`

column. Each
-statistic tests for the significance of each term given other terms in the model. According to these results, none of the coefficients seem significant at the 5% significance level, although the R-squared value for the model is really high at 0.97. This often indicates possible multicollinearity among the predictor variables.

Use stepwise regression to decide which variables to include in the model.

```
load hald
mdl = stepwiselm(ingredients,heat)
```

1. Adding x4, FStat = 22.7985, pValue = 0.000576232 2. Adding x1, FStat = 108.2239, pValue = 1.105281e-06 mdl = Linear regression model: y ~ 1 + x1 + x4 Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ __________ (Intercept) 103.1 2.124 48.54 3.3243e-13 x1 1.44 0.13842 10.403 1.1053e-06 x4 -0.61395 0.048645 -12.621 1.8149e-07 Number of observations: 13, Error degrees of freedom: 10 Root Mean Squared Error: 2.73 R-squared: 0.972, Adjusted R-Squared 0.967 F-statistic vs. constant model: 177, p-value = 1.58e-08

In this example, `stepwiselm`

starts with the constant model (default) and uses forward selection to incrementally add `x4`

and `x1`

. Each predictor variable in the final model is significant given the other one is in the model. The algorithm stops when adding none of the other predictor variables significantly improves in the model. For details on stepwise regression, see `stepwiselm`

.

`anova`

| `coefCI`

| `coefTest`

| `fitlm`

| `LinearModel`

| `stepwiselm`

Was this topic helpful?