This is machine translation

Translated by Microsoft
Mouse over text to see original. Click the button below to return to the English verison of the page.


Pairwise distance between two sets of observations


D = pdist2(X,Y)
D = pdist2(X,Y,distance)
D = pdist2(X,Y,'minkowski',P)
D = pdist2(X,Y,'mahalanobis',C)
D = pdist2(X,Y,distance,'Smallest',K)
D = pdist2(X,Y,distance,'Largest',K)
[D,I] = pdist2(X,Y,distance,'Smallest',K)
[D,I] = pdist2(X,Y,distance,'Largest',K)


D = pdist2(X,Y) returns a matrix D containing the Euclidean distances between each pair of observations in the mx-by-n data matrix X and my-by-n data matrix Y. Rows of X and Y correspond to observations, columns correspond to variables. D is an mx-by-my matrix, with the (i,j) entry equal to distance between observation i in X and observation j in Y. The (i,j) entry will be NaN if observation i in X or observation j in Y contain NaNs.

D = pdist2(X,Y,distance) computes D using distance. Choices are:


Euclidean distance (default).


Squared Euclidean distance. (This option is provided for efficiency only. It does not satisfy the triangle inequality.)


Standardized Euclidean distance. Each coordinate difference between rows in X and Y is scaled by dividing by the corresponding element of the standard deviation computed from X, S=nanstd(X). To specify another value for S, use D = PDIST2(X,Y,'seuclidean',S).


City block metric.


Minkowski distance. The default exponent is 2. To compute the distance with a different exponent, use D = pdist2(X,Y,'minkowski',P), where the exponent P is a scalar positive value.


Chebychev distance (maximum coordinate difference).


Mahalanobis distance, using the sample covariance of X as computed by nancov. To compute the distance with a different covariance, use D = pdist2(X,Y,'mahalanobis',C) where the matrix C is symmetric and positive definite.


One minus the cosine of the included angle between points (treated as vectors).


One minus the sample correlation between points (treated as sequences of values).


One minus the sample Spearman's rank correlation between observations, treated as sequences of values.


Hamming distance, the percentage of coordinates that differ.


One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.


A distance function specified using @:
D = pdist2(X,Y,@distfun).

A distance function must be of the form

function D2 = distfun(ZI, ZJ)
taking as arguments a 1-by-n vector ZI containing a single observation from X or Y, an m2-by-n matrix ZJ containing multiple observations from X or Y, and returning an m2-by-1 vector of distances D2, whose Jth element is the distance between the observations ZI and ZJ(J,:).

If your data is not sparse, generally it is faster to use a built-in distance than to use a function handle.

D = pdist2(X,Y,distance,'Smallest',K) returns a K-by-my matrix D containing the K smallest pairwise distances to observations in X for each observation in Y. pdist2 sorts the distances in each column of D in ascending order. D = pdist2(X,Y,distance,'Largest',K) returns the K largest pairwise distances sorted in descending order. If K is greater than mx, pdist2 returns an mx-by-my distance matrix. For each observation in Y, pdist2 finds the K smallest or largest distances by computing and comparing the distance values to all the observations in X.

[D,I] = pdist2(X,Y,distance,'Smallest',K) returns a K-by-my matrix I containing indices of the observations in X corresponding to the K smallest pairwise distances in D. [D,I] = pdist2(X,Y,distance,'Largest',K) returns indices corresponding to the K largest pairwise distances.


Given an mx-by-n data matrix X, which is treated as mx (1-by-n) row vectors x1, x2, ..., xmx, and my-by-n data matrix Y, which is treated as my (1-by-n) row vectors y1, y2, ...,ymy, the various distances between the vector xs and yt are defined as follows:

  • Euclidean distance


    Notice that the Euclidean distance is a special case of the Minkowski metric, where p=2.

  • Standardized Euclidean distance


    where V is the n-by-n diagonal matrix whose jth diagonal element is S(j)2, where S is the vector of standard deviations.

  • Mahalanobis distance


    where C is the covariance matrix.

  • City block metric


    Notice that the city block distance is a special case of the Minkowski metric, where p=1.

  • Minkowski metric


    Notice that for the special case of p = 1, the Minkowski metric gives the City Block metric, for the special case of p = 2, the Minkowski metric gives the Euclidean distance, and for the special case of p=∞, the Minkowski metric gives the Chebychev distance.

  • Chebychev distance


    Notice that the Chebychev distance is a special case of the Minkowski metric, where p=∞.

  • Cosine distance


  • Correlation distance



    x¯s=1njxsj and


  • Hamming distance


  • Jaccard distance


  • Spearman distance



    • rsj is the rank of xsj taken over x1j, x2j, ...xmx,j, as computed by tiedrank

    • rtj is the rank of ytj taken over y1j, y2j, ...ymy,j, as computed by tiedrank

    • rs and rt are the coordinate-wise rank vectors of xs and yt, i.e. rs = (rs1, rs2, ... rsn) and rt = (rt1, rt2, ... rtn)

    • r¯s=1njrsj=(n+1)2

    • r¯t=1njrtj=(n+1)2


Generate random data and find the unweighted Euclidean distance, then find the weighted distance using two different methods:

% Compute the ordinary Euclidean distance
X = randn(100, 5);
Y = randn(25, 5);
D = pdist2(X,Y,'euclidean'); % euclidean distance

% Compute the Euclidean distance with each coordinate
% difference scaled by the standard deviation
Dstd = pdist2(X,Y,'seuclidean');

% Use a function handle to compute a distance that weights
% each coordinate contribution differently.
Wgts = [.1 .3 .3 .2 .1];
weuc = @(XI,XJ,W)(sqrt(bsxfun(@minus,XI,XJ).^2 * W'));
Dwgt = pdist2(X,Y, @(Xi,Xj) weuc(Xi,Xj,Wgts));

Introduced in R2010a

Was this topic helpful?