This is machine translation

Translated by Microsoft
Mouse over text to see original. Click the button below to return to the English verison of the page.

Handle Outliers

The outliers are data points located far outside the range of the majority of the data. Glitches, data-entry errors, and inaccurate measurements can produce outliers in real data samples. The outliers can significantly affect the analysis of data samples. If you suspect that the data that you want to analyze contains outliers, you can discard the outliers or replace them with the values typical for that data sample.

Before you discard or replace the outliers, try to verify that they are actual errors. The outliers can be a part of the correct data sample, and discarding them can lead you to incorrect conclusions. If you cannot determine whether the outliers are correct data or errors, the recommended strategy is to analyze the data with the outliers and without them.

To discard outliers, use the stats::cutoff function. For example, discard the outliers with the values smaller than the 1/10 quantile (10th percentile) and larger than 9/10 quantile of the list x:

x := [1/100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 100]:
stats::cutoff(x, 1/10)

To replace the outliers with the value of a k-th quantile, use the stats::winsorize function. In the list x, replace the values of the outliers with the 10th and 90th percentiles:

stats::winsorize(x, 1/10)

Was this topic helpful?