# Documentation

### This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English version of the page.

# isoutlier

Find outliers in data

## Syntax

``TF = isoutlier(A)``
``TF = isoutlier(A,method)``
``TF = isoutlier(A,movmethod,window)``
``TF = isoutlier(___,dim)``
``TF = isoutlier(___,Name,Value)``
``````[TF,L,U,C] = isoutlier(___)``````

## Description

example

````TF = isoutlier(A)` returns a logical array whose elements are `true` when an outlier is detected in the corresponding element of `A`. By default, an outlier is a value that is more than three scaled median absolute deviations (MAD) away from the median. If `A` is a matrix or table, then `isoutlier` operates on each column separately. If `A` is a multidimensional array, then `isoutlier` operates along the first dimension whose size does not equal 1.```

example

````TF = isoutlier(A,method)` specifies a method for determining outliers. For example, `isoutlier(A,'mean')` returns `true` for all elements more than three standard deviations from the mean.```

example

````TF = isoutlier(A,movmethod,window)` specifies a moving method for determining local outliers according to a window length defined by `window`. For example, `isoutlier(A,'movmedian',5)` returns `true` for all elements more than three local scaled MAD from the local median within a sliding window containing five elements.```

example

````TF = isoutlier(___,dim)` operates along dimension `dim` of `A` for any of the previous syntaxes. For example, `isoutlier(A,2)` operates on each row of a matrix `A`.```

example

````TF = isoutlier(___,Name,Value)` specifies additional parameters for detecting outliers using one or more name-value pair arguments. For example, `isoutlier(A,'SamplePoints',t)` detects outliers in `A` relative to the corresponding elements of a time vector `t`.```

example

``````[TF,L,U,C] = isoutlier(___)``` also returns the lower and upper thresholds and the center value used by the outlier detection method.```

## Examples

collapse all

Find the outliers in a vector of data. A logical 1 in the output indicates the location of an outlier.

```A = [57 59 60 100 59 58 57 58 300 61 62 60 62 58 57]; TF = isoutlier(A) ```
```TF = 1x15 logical array 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 ```

Define outliers as points more than three standard deviations from the mean, and find the locations of outliers in a vector.

```A = [57 59 60 100 59 58 57 58 300 61 62 60 62 58 57]; TF = isoutlier(A,'mean') ```
```TF = 1x15 logical array 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 ```

Create a vector of data containing a local outlier.

```x = -2*pi:0.1:2*pi; A = sin(x); A(47) = 0; ```

Create a time vector that corresponds to the data in `A`.

```t = datetime(2017,1,1,0,0,0) + hours(0:length(x)-1); ```

Define outliers as points more than three local scaled MAD away from the local median within a sliding window. Find the locations of the outliers in `A` relative to the points in `t` with a window size of 5 hours. Plot the data and detected outliers.

```TF = isoutlier(A,'movmedian',hours(5),'SamplePoints',t); plot(t,A,t(TF),A(TF),'x') legend('Data','Outlier') ```

Find outliers for each row of a matrix.

Create a matrix of data containing outliers along the diagonal.

```A = magic(5) + diag(200*ones(1,5)) ```
```A = 217 24 1 8 15 23 205 7 14 16 4 6 213 20 22 10 12 19 221 3 11 18 25 2 209 ```

Find the locations of outliers based on the data in each row.

```TF = isoutlier(A,2) ```
```TF = 5x5 logical array 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 ```

Create a vector of data containing an outlier. Find and plot the location of the outlier, and the thresholds and center value determined by the outlier method. The center value is the median of the data, and the upper and lower thresholds are three scaled MAD above and below the median.

```x = 1:10; A = [60 59 49 49 58 100 61 57 48 58]; [TF,L,U,C] = isoutlier(A); plot(x,A,x(TF),A(TF),'x',x,L*ones(1,10),x,U*ones(1,10),x,C*ones(1,10)) legend('Original Data','Outlier','Lower Threshold','Upper Threshold','Center Value') ```

## Input Arguments

collapse all

Input data, specified as a vector, matrix, multidimensional array, table, or timetable.

If `A` is a table, then its variables must be of type `double` or `single`, or you can use the `'DataVariables'` name-value pair to list `double` or `single` variables explicitly. Specifying variables is useful when you are working with a table that contains variables with data types other than `double` or `single`.

If `A` is a timetable, then `isoutlier` operates only on the table elements. Row times must be unique and listed in ascending order.

Data Types: `double` | `single` | `table` | `timetable`

Method for determining outliers, specified as one of the following:

MethodDescription
`'median'`Returns `true` for elements more than three scaled MAD from the median. The scaled MAD is defined as `c*median(abs(A-median(A)))` where `c=-1/(sqrt(2)*erfcinv(3/2))`.
`'mean'`Returns `true` for elements more than three standard deviations from the mean. This method is faster but less robust than `'median'`.
`'quartiles'`Returns `true` for elements more than 1.5 interquartile ranges above the upper quartile or below the lower quartile. This method is useful when the data in `A` is not normally distributed.
`'grubbs'`Applies Grubbs’s test for outliers, which iteratively removes one outlier per iteration based on hypothesis testing. This method assumes that the data in `A` is normally distributed.
`'gesd'`Applies the generalized extreme Studentized deviate test for outliers. This iterative method is similar to `'grubbs'`, but can perform better when there are multiple outliers masking each other.

Moving method for determining outliers, specified as one of the following:

MethodDescription
`'movmedian'`Returns `true` for elements more than three local scaled MAD from the local median over a window length specified by `window`.
`'movmean'`Returns `true` for elements more than three local standard deviations from the local mean over a window length specified by `window`.

Window length, specified as a positive integer scalar, a two-element vector of positive integers, a positive duration scalar, or a two-element vector of positive durations.

When `window` is a positive integer scalar, then the window is centered about the current element and contains `window-1` neighboring elements. If `window` is even, then the window is centered about the current and previous elements. If `window` is a two-element vector of positive integers `[b f]`, then the window contains the current element, `b` elements backward, and `f` elements forward.

When `A` is a timetable or `'SamplePoints'` is specified as a `datetime` or `duration` vector, then `window` must be of type `duration`, and the windows are computed relative to the sample points.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64` | `duration`

Dimension to operate along, specified as a positive integer scalar. If no value is specified, then the default is the first array dimension whose size does not equal 1.

Consider a matrix `A`.

`isoutlier(A,1)` detects outliers based on the data in each column of `A`.

`isoutlier(A,2)` detects outliers based on the data in each row of `A`.

When `A` is a table or timetable, `dim` is not supported. `isoutlier` operates along each table or timetable variable separately.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `isoutlier(A,'mean','ThresholdFactor',4)`

collapse all

Detection threshold factor, specified as the comma-separated pair consisting of `'ThresholdFactor'` and a nonnegative scalar.

For methods `'grubbs'` and `'gesd'`, the detection threshold factor is a scalar ranging from 0 to 1. Values close to 0 result in a smaller number of outliers and values close to 1 result in a larger number of outliers. The default detection threshold factor is 0.5.

For methods `'movmedian'` and `'movmean'`, the detection threshold factor replaces the number of scaled MAD or standard deviations from the mean, which is 3 by default.

For the `'quartile'` method, the detection threshold factor replaces the number of interquartile ranges, which is 1.5 by default.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64`

Sample points, specified as the comma-separated pair consisting of `'SamplePoints'` and a vector. The sample points represent the location of the data in `A`. Sample points do not need to be uniformly sampled. By default, the sample points vector is `[1 2 3 ...]`.

Moving windows are defined relative to the sample points, which must be sorted and contain unique elements. For example, if `t` is a vector of times corresponding to the input data, then `isoutlier(rand(1,10),'movmean',3,'SamplePoints',t)` has a window that represents the time interval between `t(i)-1.5` and `t(i)+1.5`.

When the sample points vector has data type `datetime` or `duration`, then the moving window length must have type `duration`.

Data Types: `double` | `single` | `datetime` | `duration`

Table variables, specified as the comma-separated pair consisting of `'DataVariables'` and a variable name, a cell array of variable names, a numeric vector, a logical vector, or a function handle. The `'DataVariables'` value indicates which columns of the input table to detect outliers in, and can be one of the following:

• A character vector specifying a single table variable name

• A cell array of character vectors where each element is a table variable name

• A vector of table variable indices

• A logical vector whose elements each correspond to a table variable, where `true` includes the corresponding variable and `false` excludes it

• A function handle that takes the table as input and returns a logical scalar

The data type associated with the indicated variable must be `double` or `single`.

Example: `'Age'`

Example: `{'Height','Weight'}`

Example: `@isnumeric`

Data Types: `char` | `cell` | `double` | `single` | `logical` | `function_handle`

Maximum outlier threshold, for the `'gesd'` method only, specified as the comma-separated pair consisting of `'MaxNumOutliers'` and a positive integer. The `'MaxNumOutliers'` value specifies the maximum number of outliers returned by the `'gesd'` method. For example, `isoutlier(A,'gesd','MaxNumOutliers',5)` returns no more than five outliers.

The default value for `'MaxNumOutliers'` is the integer nearest to 10 percent of the number of elements in `A`. Setting a larger value for the maximum number of outliers can ensure that all outliers are detected, but at the cost of reduced computational efficiency.

The `'gesd'` method assumes the non-outlier input data is sampled from an approximate normal distribution. When the data is not sampled in this way, the number of returned outliers may exceed the `'MaxNumOutliers'` value.

Data Types: `double` | `single` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64`

## Output Arguments

collapse all

Outlier indicator, returned as a vector, matrix, or multidimensional array. An element of `TF` is `true` when the corresponding element of `A` is an outlier and `false` otherwise. `TF` is the same size as `A`.

Data Types: `logical`

Lower threshold used by the outlier detection method, returned as a scalar, vector, matrix, multidimensional array, table, or timetable. For example, the lower value of the default outlier detection method is three scaled MAD below the median of the input data. `L` has the same size as `A` in all dimensions except for the operating dimension where the length is 1.

Data Types: `double` | `single` | `table` | `timetable`

Upper threshold used by the outlier detection method, returned as a scalar, vector, matrix, multidimensional array, table, or timetable. For example, the upper value of the default outlier detection method is three scaled MAD above the median of the input data. `U` has the same size as `A` in all dimensions except for the operating dimension where the length is 1.

Data Types: `double` | `single` | `table` | `timetable`

Center value used by the outlier detection method, returned as a scalar, vector, matrix, multidimensional array, table, or timetable. For example, the center value of the default outlier detection method is the median of the input data. `C` has the same size as `A` in all dimensions except for the operating dimension where the length is 1.

Data Types: `double` | `single` | `table` | `timetable`

collapse all

### Median Absolute Deviation

For a random variable vector A made up of N scalar observations, the median absolute deviation (MAD) is defined as

for i = 1,2,...,N.

The scaled MAD is defined as `c*median(abs(A-median(A)))` where `c=-1/(sqrt(2)*erfcinv(3/2))`.