Note: This page has been translated by MathWorks. Please click here

To view all translated materials including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materials including this page, select Japan from the country navigator on the bottom of this page.

Cook’s distance is useful for identifying outliers in the *X*
values (observations for predictor variables). It also shows the influence of each
observation on the fitted response values. An observation with Cook’s distance
larger than three times the mean Cook’s distance might be an outlier.

Cook’s distance is the scaled change in fitted values.
Each element in `CooksDistance`

is the normalized
change in the vector of coefficients due to the deletion of an observation.
The Cook’s distance, *D*_{i},
of observation *i* is

$${D}_{i}=\frac{{\displaystyle \sum _{j=1}^{n}{\left({\widehat{y}}_{j}-{\widehat{y}}_{j(i)}\right)}^{2}}}{p\text{\hspace{0.17em}}MSE},$$

where

$${\widehat{y}}_{j}$$ is the

*j*th fitted response value.$${\widehat{y}}_{j(i)}$$ is the

*j*th fitted response value, where the fit does not include observation*i*.*MSE*is the mean squared error.*p*is the number of coefficients in the regression model.

Cook’s distance is algebraically equivalent to the following expression:

$${D}_{i}=\frac{{r}_{i}^{2}}{p\text{\hspace{0.17em}}MSE}\left(\frac{{h}_{ii}}{{\left(1-{h}_{ii}\right)}^{2}}\right),$$

where *r*_{i} is
the *i*th residual, and *h*_{ii} is
the *i*th leverage value.

`CooksDistance`

is an *n*-by-1
column vector in the `Diagnostics`

table of the `LinearModel`

object.

After obtaining a fitted model, say, `mdl`

, using
`fitlm`

or `stepwiselm`

, you can:

Display the Cook’s distance values by indexing into the property using dot notation,

mdl.Diagnostics.CooksDistance

Plot the Cook’s distance values using

For details, see theplotDiagnostics(mdl,'cookd')

`plotDiagnostics`

method of the`LinearModel`

class.

This example shows how to use Cook's Distance to determine the outliers in the data.

Load the sample data and define the independent and response variables.

```
load hospital
X = double(hospital(:,2:5));
y = hospital.BloodPressure(:,1);
```

Fit the linear regression model.

mdl = fitlm(X,y);

Plot the Cook's distance values.

```
plotDiagnostics(mdl,'cookd')
```

The dashed line in the figure corresponds to the recommended threshold value, `3*mean(mdl.Diagnostics.CooksDistance)`

. The plot has some observations with Cook's distance values greater than the threshold value, which for this example is 3*(0.0108) = 0.0324. In particular, there are two Cook's distance values that are relatively higher than the others, which exceed the threshold value. You might want to find and omit these from your data and rebuild your model.

Find the observations with Cook's distance values that exceed the threshold value.

find((mdl.Diagnostics.CooksDistance)>3*mean(mdl.Diagnostics.CooksDistance))

ans = 2 13 28 44 58 70 71 84 93 95

Find the observations with Cook's distance values that are relatively larger than the other observations with Cook's distances exceeding the threshold value.

find((mdl.Diagnostics.CooksDistance)>5*mean(mdl.Diagnostics.CooksDistance))

ans = 2 84

[1] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman.
*Applied Linear Statistical Models*. 4th ed. Chicago:
Irwin, 1996.

`LinearModel`

| `fitlm`

| `plotDiagnostics`

| `stepwiselm`

Was this topic helpful?