Documentation Center

  • Trial Software
  • Product Updates

stats::cutoff

Discard outliers

Use only in the MuPAD Notebook Interface.

This functionality does not run in MATLAB.

Syntax

stats::cutoff([x1, x2, …], α)
stats::cutoff([[x11, x12, …], [x21, x22, …], …], α, i)
stats::cutoff(s, α, i)

Description

stats::cutoff([x1, x2, …], α) returns those elements of [x1, x2, …] larger than the α quantile and smaller than the 1 - α quantile of this list.

stats::cutoff([[x11, x12, …], [x21, x22, …], …], α, i) and stats::cutoff(stats::sample([[x11, x12, …], [x21, x22, …], …]), α, i) perform the operations described above on the i-th entries of the input rows.

Measurement data often contains "outliers," sample points rather far outside the range containing the majority of the points. While expected both from theory and experience, these outliers, for small or medium-sized samples, tend to distort statistical data such as the mean value.

One of the standard methods dealing with this problem for (real) continuous scales is discarding the outliers. stats::cutoff discards all data points below or above a given quantile.

Examples

Example 1

We create a normally distributed sample, slightly contaminated:

r := stats::normalRandom(0, 1, Seed=2):
data := [r() $ i = 1..300, 100*r() $ i = 1..2]:

The two extra points distort the data significantly:

plot(plot::Histogram2d(data, Cells=20))

Using either stats::winsorize or stats::cutoff removes this noise and the image shows more detail:

plot(plot::Scene2d(plot::Histogram2d
         (stats::winsorize(data, 1/100), Cells=20)),
     plot::Scene2d(plot::Histogram2d
         (stats::cutoff(data, 1/100), Cells=20)))

With larger values of α, the difference between the two is easier to see:

plot(plot::Scene2d(plot::Histogram2d
         (stats::winsorize(data, 1/20), Cells=20)),
     plot::Scene2d(plot::Histogram2d
         (stats::cutoff(data, 1/20), Cells=20)))

Both stats::winsorize and stats::cutoff reduce the standard deviation of the sample. This effect is considerably stronger for stats::cutoff, though. Keeping in mind that the standard deviation of our random number generator is 1, we compute that of the data in its various forms:

stats::stdev(data),
stats::stdev(stats::winsorize(data, 1/20)),
stats::stdev(stats::cutoff(data, 1/20))

Parameters

x1, x2, x11, …

The statistical data: arithmetical expressions. The data to filter on must be real-valued.

s

Sample of type stats::sample

α

Cutoff parameter: a real-valued expression .

i

Column index: positive integer. The nested list or the sample is filtered on its i-th column.

Return Values

The input data with outliers being removed.

See Also

MuPAD Functions

More About

Was this topic helpful?