MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

# mahalanobis distance for 2 vectors matlab

Asked by ir on 11 May 2013
Latest activity Commented on by babi psylon on 12 Nov 2013

Hey,

I tried the method mahal to calculate the mahalanobis distance between 2 vectors of 27 variables(columns) e.g. mahal(X,Y) where X and Y are the 2 vectors but it comes up with an error. After a few minutes of research I got that I can't use it like this but I'm still not sure sure why? can some explain to me why?

Also I have below an example of mahal method :

`      mahal([1.55 5 32],[5.76 43 34; 6.7 32 5; 3 3 5; 34 12 6;])`
```ans =
```
`     11.1706`

can someone clarify how matlab calculated the answer in this case?

## Products

No products are associated with this question.

Answer by Roger Stafford on 11 May 2013

There appears to be a misconception here. Mahalanobis distance has no meaning between two multiple-element vectors. Ideally it is a distance between a vector (or distances between a set of vectors) and some given distribution defined by means and covariances. See the Wikipedia website

` http://en.wikipedia.org/wiki/Mahalanobis_distance`

In Mathworks' 'mahal' function

` d = mahal(Y,X) ,`

that distribution is approximated from the X array, which must have more rows than columns to be meaningful. In your case you were trying to use only one row in the second argument and that would not give a meaningful distribution. I suggest you carefully read the documentation at:

` http://www.mathworks.com/help/stats/mahal.html`

Roger Stafford on 12 May 2013

I neglected to answer your question about the example. Here is the equivalent matlab code. It gets the same answer as you saw. What you get here is the mahalanobis squared distance between the vector Y and the distribution S, mu which is obtained from the rows of X.

Note that this is the square of the actual mahalanobis distance. To get the latter, take the square root of this value.

In this case X is a set of four rows of three-element vectors which are supposedly representative of some distribution. In general you would want a great many more than four sample vectors to get a truly representative sample of such a three-dimensional distribution.

``` Y = [1.55 5 32];
X = [5.76 43 34;6.7 32 5;3 3 5;34 12 6];
S = cov(X);
mu = mean(X,1);
d = (Y-mu)*inv(S)*(Y-mu)'
% d = ((Y-mu)/S)*(Y-mu)'; % <-- Mathworks prefers this way
ans =```
```11.1706
```
babi psylon on 12 Nov 2013

hi Roger

Can you elaborate on the difference between mahal() en pdist2()? My full question is listed here: http://www.mathworks.com/matlabcentral/answers/105829-mahalanobis-distance- in-matlab-pdist2-vs-mahal-function

Babi