MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn moreOpportunities for recent engineering grads.

Apply Today
Asked by ir on 11 May 2013

Hey,

I tried the method mahal to calculate the mahalanobis distance between 2 vectors of 27 variables(columns) e.g. mahal(X,Y) where X and Y are the 2 vectors but it comes up with an error. After a few minutes of research I got that I can't use it like this but I'm still not sure sure why? can some explain to me why?

Also I have below an example of mahal method :

mahal([1.55 5 32],[5.76 43 34; 6.7 32 5; 3 3 5; 34 12 6;])

ans =

11.1706

can someone clarify how matlab calculated the answer in this case?

*No products are associated with this question.*

Answer by Roger Stafford on 11 May 2013

Accepted answer

There appears to be a misconception here. Mahalanobis distance has no meaning between two multiple-element vectors. Ideally it is a distance between a vector (or distances between a set of vectors) and some given distribution defined by means and covariances. See the Wikipedia website

http://en.wikipedia.org/wiki/Mahalanobis_distance

In Mathworks' 'mahal' function

d = mahal(Y,X) ,

that distribution is approximated from the X array, which must have more rows than columns to be meaningful. In your case you were trying to use only one row in the second argument and that would not give a meaningful distribution. I suggest you carefully read the documentation at:

http://www.mathworks.com/help/stats/mahal.html

Roger Stafford on 12 May 2013

I neglected to answer your question about the example. Here is the equivalent matlab code. It gets the same answer as you saw. What you get here is the mahalanobis squared distance between the vector Y and the distribution S, mu which is obtained from the rows of X.

Note that this is the square of the actual mahalanobis distance. To get the latter, take the square root of this value.

In this case X is a set of four rows of three-element vectors which are supposedly representative of some distribution. In general you would want a great many more than four sample vectors to get a truly representative sample of such a three-dimensional distribution.

Y = [1.55 5 32]; X = [5.76 43 34;6.7 32 5;3 3 5;34 12 6]; S = cov(X); mu = mean(X,1); d = (Y-mu)*inv(S)*(Y-mu)' % d = ((Y-mu)/S)*(Y-mu)'; % <-- Mathworks prefers this way ans =

11.1706

babi psylon on 12 Nov 2013

hi Roger

Can you elaborate on the difference between mahal() en pdist2()? My full question is listed here: http://www.mathworks.com/matlabcentral/answers/105829-mahalanobis-distance- in-matlab-pdist2-vs-mahal-function

Babi

## 0 Comments