Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

When you examine a data plot, you might find that some points
appear to differ dramatically from the
rest of the data. In some cases, it is reasonable to consider such
points *outliers*, or data values that appear to be inconsistent with the
rest of the data.

The following example illustrates how to
remove outliers from three data sets in the 24-by-3 matrix `count`

.
In this case, an outlier is defined as a value that is more than three
standard deviations away from the mean.

Be cautious about changing data unless you are confident that you understand the source of the problem you want to correct. Removing an outlier has a greater effect on the standard deviation than on the mean of the data. Deleting one such point leads to a smaller new standard deviation, which might result in making some remaining points appear to be outliers!

% Import the sample data load count.dat; % Calculate the mean and the standard deviation % of each data column in the matrix mu = mean(count) sigma = std(count)

The Command Window displays

mu = 32.0000 46.5417 65.5833 sigma = 25.3703 41.4057 68.0281

When
an *outlier* is considered to be more than three
standard deviations away from the mean, use the following syntax to determine
the number of outliers in each column of the `count`

matrix:

[n,p] = size(count); % Create a matrix of mean values by % replicating the mu vector for n rows MeanMat = repmat(mu,n,1); % Create a matrix of standard deviation values by % replicating the sigma vector for n rows SigmaMat = repmat(sigma,n,1); % Create a matrix of zeros and ones, where ones indicate % the location of outliers outliers = abs(count - MeanMat) > 3*SigmaMat; % Calculate the number of outliers in each column nout = sum(outliers)

The procedure returns the following number of outliers in each column:

nout = 1 0 0

There is one outlier in the first data column of `count`

and
none in the other two columns.

To remove an entire row of data containing the outlier, type

count(any(outliers,2),:) = [];

Here, `any(outliers,2)`

returns a `1`

when
any of the elements in the `outliers`

vector are
nonzero. The argument `2`

specifies
that `any`

works down the second
dimension of the count matrix—its columns.

Was this topic helpful?