Accelerating the pace of engineering and science

# Documentation

## F-statistic and t-statistic

### F-statistic

#### Purpose

In linear regression, the F-statistic is the test statistic for the analysis of variance (ANOVA) approach to test the significance of the model or the components in the model.

#### Definition

The F-statistic in the linear model output display is the test statistic for testing the statistical significance of the model. The F-statistic values in the anova display are for assessing the significance of the terms or components in the model.

#### How To

After obtaining a fitted model, say, mdl, using fitlm or stepwiselm, you can:

• Find the F-statistic vs. constant model in the output display or by using

`disp(mdl)`
• Display the ANOVA for the model using

`anova(mdl,'summary')`
• Obtain the F-statistic values for the components, except for the constant term using

`anova(mdl)`

For details, see the anova method of the LinearModel class.

### Assess Fit of Model Using F-statistic

This example shows how to use assess the fit of the model and the significance of the regression coefficients using F-statistic.

```load carbig
tbl = table(Acceleration,Cylinders,Weight,MPG);
tbl.Cylinders = ordinal(Cylinders);
```

Fit a linear regression model.

```mdl = fitlm(tbl,'MPG~Acceleration*Weight+Cylinders+Weight^2')
```
```mdl =

Linear regression model:
MPG ~ 1 + Cylinders + Acceleration*Weight + Weight^2

Estimated Coefficients:
Estimate         SE         tStat        pValue
__________    __________    ________    __________

(Intercept)                50.816        7.5669      6.7156     6.661e-11
Acceleration             0.023343       0.33931    0.068796       0.94519
Cylinders_4                 7.167        2.0596      3.4798     0.0005587
Cylinders_5                10.963        3.1299      3.5028    0.00051396
Cylinders_6                4.7415        2.1257      2.2306      0.026279
Cylinders_8                 5.057        2.2981      2.2005      0.028356
Weight                  -0.017497     0.0034674     -5.0461    6.9371e-07
Acceleration:Weight    7.0745e-05    0.00011171      0.6333       0.52691
Weight^2               1.5767e-06    3.6909e-07      4.2719    2.4396e-05

Number of observations: 398, Error degrees of freedom: 389
Root Mean Squared Error: 4.02
F-statistic vs. constant model: 139, p-value = 2.94e-109
```

The F-statistic of the linear fit versus the constant model is 139, with a p-value of 2.94e-109. The model is significant at the 5% significance level. The R-squared value of 0.741 means the model explains about 74% of the variability in the response.

Display the ANOVA table for the fitted model.

```anova(mdl,'summary')
```
```ans =

SumSq     DF     MeanSq      F         pValue
______    ___    ______    ______    ___________

Total             24253    397     61.09
Model             17981      8    2247.6    139.41    2.9432e-109
. Linear          17667      6    2944.4    182.63    7.5446e-110
. Nonlinear      314.36      2    157.18    9.7492     7.3906e-05
Residual         6271.6    389    16.122
. Lack of fit    6267.1    387    16.194    7.1973        0.12968
. Pure error        4.5      2      2.25

```

This display separates the variability in the model into linear and nonlinear terms. Since there are two non-linear terms (Weight^2 and the interaction between Weight and Acceleration), the nonlinear degrees of freedom in the DF column is 2. There are six linear terms in the model (four Cylinders indicator variables, Weight, and Acceleration). The corresponding F-statistics in the F column are for testing the significance of the linear and nonlinear terms as separate groups.

The residual term is also separated into two parts; first is the error due to the lack of fit, and second is the pure error independent from the model, obtained from the replicated observations. The corresponding F-statistics in the F column are for testing the lack of fit, that is, whether the proposed model is an adequate fit or not.

Display the ANOVA table for the model terms.

```anova(mdl)
```
```ans =

SumSq     DF     MeanSq       F         pValue
______    ___    ______    _______    __________

Acceleration           104.99      1    104.99     6.5122      0.011095
Cylinders              408.94      4    102.23     6.3412    5.9573e-05
Weight                 2187.5      1    2187.5     135.68    4.1974e-27
Acceleration:Weight    6.4662      1    6.4662    0.40107       0.52691
Weight^2               294.22      1    294.22     18.249    2.4396e-05
Error                  6271.6    389    16.122

```

This display decomposes the ANOVA table into the model terms. The corresponding F-statistics in the F column are for assessing the statistical significance of each term. The F-test for Cylinders test whether at least one of the coefficients of indicator variables for cylinders categories is different from zero or not. That is, whether different numbers of cylinders have a significant effect on MPG or not. The degrees of freedom for each model term is the numerator degrees of freedom for the corresponding F-test. Most of the terms have 1 degree of freedom, but the degrees of freedom for Cylinders is 4. Because there are four indicator variables for this term.

### t-statistic

#### Purpose

In linear regression, the t-statistic is useful for making inferences about the regression coefficients. The hypothesis test on coefficient i tests the null hypothesis that it is equal to zero – meaning the corresponding term is not significant – versus the alternate hypothesis that the coefficient is different from zero.

#### Definition

For a hypotheses test on coefficient i, with

H0 : βi = 0

H1 : βi ≠ 0,

the t-statistic is:

$t=\frac{{b}_{i}}{SE\left({b}_{i}\right)},$

where SE(bi) is the standard error of the estimated coefficient bi.

#### How To

After obtaining a fitted model, say, mdl, using fitlm or stepwiselm, you can:

• Find the coefficient estimates, the standard errors of the estimates (SE), and the t-statistic values of hypothesis tests for the corresponding coefficients (tStat) in the output display.

• Call for the display using

`display(mdl)`

### Assess Significance of Regression Coefficients Using t-statistic

This example shows how to test for the significance of the regression coefficients using t-statistic.

Load the sample data and fit the linear regression model.

```load hald
mdl = fitlm(ingredients,heat)
```
```mdl =

Linear regression model:
y ~ 1 + x1 + x2 + x3 + x4

Estimated Coefficients:
Estimate      SE        tStat       pValue
________    _______    ________    ________

(Intercept)      62.405     70.071      0.8906     0.39913
x1               1.5511    0.74477      2.0827    0.070822
x2              0.51017    0.72379     0.70486      0.5009
x3              0.10191    0.75471     0.13503     0.89592
x4             -0.14406    0.70905    -0.20317     0.84407

Number of observations: 13, Error degrees of freedom: 8
Root Mean Squared Error: 2.45
F-statistic vs. constant model: 111, p-value = 4.76e-07
```

You can see that for each coefficient, tStat = Estimate/SE. The -values for the hypotheses tests are in the pValue column. Each -statistic tests for the significance of each term given other terms in the model. According to these results, none of the coefficients seem significant at the 5% significance level, although the R-squared value for the model is really high at 0.97. This often indicates possible multicollinearity among the predictor variables.

Use stepwise regression to decide which variables to include in the model.

```load hald
mdl = stepwiselm(ingredients,heat)
```
```1. Adding x4, FStat = 22.7985, pValue = 0.000576232
2. Adding x1, FStat = 108.2239, pValue = 1.105281e-06

mdl =

Linear regression model:
y ~ 1 + x1 + x4

Estimated Coefficients:
Estimate       SE        tStat       pValue
________    ________    _______    __________

(Intercept)       103.1       2.124      48.54    3.3243e-13
x1                 1.44     0.13842     10.403    1.1053e-06
x4             -0.61395    0.048645    -12.621    1.8149e-07

Number of observations: 13, Error degrees of freedom: 10
Root Mean Squared Error: 2.73