Quantcast

Documentation Center

  • Trial Software
  • Product Updates

mahal

Mahalanobis distance

Syntax

d = mahal(Y,X)

Description

d = mahal(Y,X) computes the Mahalanobis distance (in squared units) of each observation in Y from the reference sample in matrix X. If Y is n-by-m, where n is the number of observations and m is the dimension of the data, d is n-by-1. X and Y must have the same number of columns, but can have different numbers of rows. X must have more rows than columns.

For observation I, the Mahalanobis distance is defined by d(I) = (Y(I,:)-mu)*inv(SIGMA)*(Y(I,:)-mu)', where mu and SIGMA are the sample mean and covariance of the data in X. mahal performs an equivalent, but more efficient, computation.

Examples

expand all

Compare Mahalanobis and Squared Euclidean Distances

Generate correlated bivariate data.

X = mvnrnd([0;0],[1 .9;.9 1],100);

Input observations.

Y = [1 1;1 -1;-1 1;-1 -1];

Compute the Mahalanobis distance of observations in Y from the reference sample in X .

d1 = mahal(Y,X)
d1 =

    0.6288
   19.3520
   21.1384
    0.9404

Compute their squared Euclidean distances from the mean of X .

d2 = sum((Y-repmat(mean(X),4,1)).^2, 2)
d2 =

    1.6170
    1.9334
    2.1094
    2.4258

Plot the observations with Y values colored according to the Mahalanobis distance.

scatter(X(:,1),X(:,2))
hold on
scatter(Y(:,1),Y(:,2),100,d1,'*','LineWidth',2)
hb = colorbar;
ylabel(hb,'Mahalanobis Distance')
legend('X','Y','Location','NW')

The observations in Y with equal coordinate values are much closer to X in Mahalanobis distance than observations with opposite coordinate values, even though all observations are approximately equidistant from the mean of X in Euclidean distance. The Mahalanobis distance, by considering the covariance of the data and the scales of the different variables, is useful for detecting outliers in such cases.

See Also

|

Was this topic helpful?