d = mahal(Y,X)
d = mahal(Y,X) computes the Mahalanobis distance (in squared units) of each observation
Y from the reference sample in matrix
Y is n-by-m,
where n is the number of observations and m is
the dimension of the data,
d is n-by-1.
have the same number of columns, but can have different numbers of
X must have more rows than columns.
I, the Mahalanobis distance
is defined by
d(I) = (Y(I,:)-mu)*inv(SIGMA)*(Y(I,:)-mu)',
SIGMA are the sample
mean and covariance of the data in
an equivalent, but more efficient, computation.
Generate correlated bivariate data.
X = mvnrnd([0;0],[1 .9;.9 1],100);
Y = [1 1;1 -1;-1 1;-1 -1];
Compute the Mahalanobis distance of observations in
Y from the reference sample in
d1 = mahal(Y,X)
d1 = 0.6288 19.3520 21.1384 0.9404
Compute their squared Euclidean distances from the mean of
d2 = sum((Y-repmat(mean(X),4,1)).^2, 2)
d2 = 1.6170 1.9334 2.1094 2.4258
Plot the observations with
Y values colored according to the Mahalanobis distance.
scatter(X(:,1),X(:,2)) hold on scatter(Y(:,1),Y(:,2),100,d1,'*','LineWidth',2) hb = colorbar; ylabel(hb,'Mahalanobis Distance') legend('X','Y','Location','NW')
The observations in
Y with equal coordinate values are much closer to
X in Mahalanobis distance than observations with opposite coordinate values, even though all observations are approximately equidistant from the mean of
X in Euclidean distance. The Mahalanobis distance, by considering the covariance of the data and the scales of the different variables, is useful for detecting outliers in such cases.