mahal

Mahalanobis distance to reference samples

collapse all in page

Syntax

d2 = mahal(Y,X)

Description

d2 = mahal(Y,X) returns the squared Mahalanobis distance of each observation in Y to the reference samples in X.

example

Examples

collapse all

Compare Mahalanobis and Squared Euclidean Distances

Open Live Script

Generate a correlated bivariate sample data set.

rng('default') % For reproducibility
X = mvnrnd([0;0],[1 .9;.9 1],1000);

Specify four observations that are equidistant from the mean of X in Euclidean distance.

Y = [1 1;1 -1;-1 1;-1 -1];

Compute the Mahalanobis distance of each observation in Y to the reference samples in X.

d2_mahal = mahal(Y,X)

d2_mahal = 4×1

    1.1095
   20.3632
   19.5939
    1.0137

Compute the squared Euclidean distance of each observation in Y from the mean of X.

d2_Euclidean = sum((Y-mean(X)).^2,2)

d2_Euclidean = 4×1

    2.0931
    2.0399
    1.9625
    1.9094

Plot X and Y by using scatter and use marker color to visualize the Mahalanobis distance of Y to the reference samples in X.

scatter(X(:,1),X(:,2),10,'.') % Scatter plot with points of size 10
hold on
scatter(Y(:,1),Y(:,2),100,d2_mahal,'o','filled')
hb = colorbar;
ylabel(hb,'Mahalanobis Distance')
legend('X','Y','Location','best')

Figure contains an axes object. The axes object contains 2 objects of type scatter. These objects represent X, Y.

All observations in Y ([1,1], [-1,-1,], [1,-1], and [-1,1]) are equidistant from the mean of X in Euclidean distance. However, [1,1] and [-1,-1] are much closer to X than [1,-1] and [-1,1] in Mahalanobis distance. Because Mahalanobis distance considers the covariance of the data and the scales of the different variables, it is useful for detecting outliers.

Input Arguments

collapse all

`Y` — Data
n-by-m numeric matrix

Data, specified as an n-by-m numeric matrix, where n is the number of observations and m is the number of variables in each observation.

X and Y must have the same number of columns, but can have different numbers of rows.

Data Types: single | double

`X` — Reference samples
p-by-m numeric matrix

Reference samples, specified as a p-by-m numeric matrix, where p is the number of samples and m is the number of variables in each sample.

X and Y must have the same number of columns, but can have different numbers of rows. X must have more rows than columns.

Data Types: single | double

Output Arguments

collapse all

`d2` — Squared Mahalanobis distance
n-by-1 numeric vector

Squared Mahalanobis distance of each observation in Y to the reference samples in X, returned as an n-by-1 numeric vector, where n is the number of observations in X.

More About

collapse all

Mahalanobis Distance

The Mahalanobis distance is a measure between a sample point and a distribution.

The Mahalanobis distance from a vector y to a distribution with mean μ and covariance C is

$d = \sqrt{(y - μ) C^{- 1} (y - μ)'} .$

This distance represents how far y is from the mean in number of standard deviations.

mahal returns the squared Mahalanobis distance d² from an observation in Y to the reference samples in X. In the mahal function, μ and C are the sample mean and covariance of the reference samples, respectively.

Tips

Each time you call the mahal function, it computes the covariance matrix of the reference samples. In cases where you want to compute Mahalanobis distances between multiple sets of data and the same reference samples X, you can save computing time by calculating the covariance matrix of X only once, and supplying it to the pdist2 function. For an example, see Compute Mahalanobis Distance.

Version History

Introduced before R2006a

mahal

Syntax

Description

Examples

Compare Mahalanobis and Squared Euclidean Distances

Input Arguments

`Y` — Data
n-by-m numeric matrix

`X` — Reference samples
p-by-m numeric matrix

Output Arguments

`d2` — Squared Mahalanobis distance
n-by-1 numeric vector

More About

Mahalanobis Distance

Tips

Version History

See Also

Topics

mahal

Syntax

Description

Examples

Compare Mahalanobis and Squared Euclidean Distances

Input Arguments

Y — Data n-by-m numeric matrix

X — Reference samples p-by-m numeric matrix

Output Arguments

d2 — Squared Mahalanobis distance n-by-1 numeric vector

More About

Mahalanobis Distance

Tips

Version History

See Also

Topics

`Y` — Data
n-by-m numeric matrix

`X` — Reference samples
p-by-m numeric matrix

`d2` — Squared Mahalanobis distance
n-by-1 numeric vector