MATLAB Examples

Determine Outliers Using Cook's Distance

This example shows how to use Cook's Distance to determine the outliers in the data.

Load the sample data and define the independent and response variables.

load hospital
X = double(hospital(:,2:5));
y = hospital.BloodPressure(:,1);

Fit the linear regression model.

mdl = fitlm(X,y);

Plot the Cook's distance values.

plotDiagnostics(mdl,'cookd')

The dashed line in the figure corresponds to the recommended threshold value, 3*mean(mdl.Diagnostics.CooksDistance). The plot has some observations with Cook's distance values greater than the threshold value, which for this example is 3*(0.0108) = 0.0324. In particular, there are two Cook's distance values that are relatively higher than the others, which exceed the threshold value. You might want to find and omit these from your data and rebuild your model.

Find the observations with Cook's distance values that exceed the threshold value.

find((mdl.Diagnostics.CooksDistance)>3*mean(mdl.Diagnostics.CooksDistance))
ans =

     2
    13
    28
    44
    58
    70
    71
    84
    93
    95

Find the observations with Cook's distance values that are relatively larger than the other observations with Cook's distances exceeding the threshold value.

find((mdl.Diagnostics.CooksDistance)>5*mean(mdl.Diagnostics.CooksDistance))
ans =

     2
    84