Documentation

This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Inconsistent Data

When you examine a data plot, you might find that some points appear to differ dramatically from the rest of the data. In some cases, it is reasonable to consider such points outliers, or data values that appear to be inconsistent with the rest of the data.

The following example illustrates how to remove outliers from three data sets in the 24-by-3 matrix `count`. In this case, an outlier is defined as a value that is more than three standard deviations away from the mean.

Caution

Be cautious about changing data unless you are confident that you understand the source of the problem you want to correct. Removing an outlier has a greater effect on the standard deviation than on the mean of the data. Deleting one such point leads to a smaller new standard deviation, which might result in making some remaining points appear to be outliers!

```% Import the sample data load count.dat; % Calculate the mean and the standard deviation % of each data column in the matrix mu = mean(count) sigma = std(count) ```

The Command Window displays

```mu = 32.0000 46.5417 65.5833 sigma = 25.3703 41.4057 68.0281 ```

When an outlier is considered to be more than three standard deviations away from the mean, use the following syntax to determine the number of outliers in each column of the `count` matrix:

```[n,p] = size(count); % Create a matrix of mean values by % replicating the mu vector for n rows MeanMat = repmat(mu,n,1); % Create a matrix of standard deviation values by % replicating the sigma vector for n rows SigmaMat = repmat(sigma,n,1); % Create a matrix of zeros and ones, where ones indicate % the location of outliers outliers = abs(count - MeanMat) > 3*SigmaMat; % Calculate the number of outliers in each column nout = sum(outliers) ```

The procedure returns the following number of outliers in each column:

```nout = 1 0 0 ```

There is one outlier in the first data column of `count` and none in the other two columns.

To remove an entire row of data containing the outlier, type

```count(any(outliers,2),:) = []; ```

Here, `any(outliers,2)` returns a `1` when any of the elements in the `outliers` vector are nonzero. The argument `2` specifies that `any` works down the second dimension of the count matrix—its columns.

Was this topic helpful?

Beyond Excel: The Manager's Guide to Solving the Big Data Conundrum

Download white paper