This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.


Pairwise distance between pairs of objects


D = pdist(X)
D = pdist(X,distance)


D = pdist(X) computes the Euclidean distance between pairs of objects in m-by-n data matrix X. Rows of X correspond to observations, and columns correspond to variables. D is a row vector of length m(m–1)/2, corresponding to pairs of observations in X. The distances are arranged in the order (2,1), (3,1), ..., (m,1), (3,2), ..., (m,2), ..., (m,m–1)). D is commonly used as a dissimilarity matrix in clustering or multidimensional scaling.

To save space and computation time, D is formatted as a vector. However, you can convert this vector into a square matrix using the squareform function so that element i, j in the matrix, where i < j, corresponds to the distance between objects i and j in the original data set.

D = pdist(X,distance) computes the distance between objects in the data matrix, X, using the method specified by distance, which can be any of the following:


Euclidean distance (default).


Squared Euclidean distance. (This option is provided for efficiency only. It does not satisfy the triangle inequality.)


Standardized Euclidean distance. Each coordinate difference between rows in X is scaled by dividing by the corresponding element of the standard deviation S=nanstd(X). To specify another value for S, use D = pdist(X,'seuclidean',S).


City block metric.


Minkowski distance. The default exponent is 2. To specify a different exponent, use D = pdist(X,'minkowski',P), where P is a scalar positive value of the exponent.


Chebychev distance (maximum coordinate difference).


Mahalanobis distance, using the sample covariance of X as computed by nancov. To compute the distance with a different covariance, use D = pdist(X,'mahalanobis',C), where the matrix C is symmetric and positive definite.


One minus the cosine of the included angle between points (treated as vectors).


One minus the sample correlation between points (treated as sequences of values).


One minus the sample Spearman's rank correlation between observations (treated as sequences of values).


Hamming distance, which is the percentage of coordinates that differ.


One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ.

custom distance function

A distance function specified using @:
D = pdist(X,@distfun)

A distance function must be of form

d2 = distfun(XI,XJ)
taking as arguments a 1-by-n vector XI, corresponding to a single row of X, and an m2-by-n matrix XJ, corresponding to multiple rows of X. distfun must accept a matrix XJ with an arbitrary number of rows. distfun must return an m2-by-1 vector of distances d2, whose kth element is the distance between XI and XJ(k,:).

The output D is arranged in the order of ((2,1),(3,1),..., (m,1),(3,2),...(m,2),.....(m,m–1)), i.e. the lower left triangle of the full m-by-m distance matrix in column order. To get the distance between the ith and jth observations (i < j), either use the formula D((i–1)*(mi/2)+ji), or use the helper function Z = squareform(D), which returns an m-by-m square symmetric matrix, with the (i,j) entry equal to distance between observation i and observation j.


Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2, ..., xm, the various distances between the vector xs and xt are defined as follows:

  • Euclidean distance


    Notice that the Euclidean distance is a special case of the Minkowski metric, where p = 2.

  • Standardized Euclidean distance


    where V is the n-by-n diagonal matrix whose jth diagonal element is S(j)2, where S is the vector of standard deviations.

  • Mahalanobis distance


    where C is the covariance matrix.

  • City block metric


    Notice that the city block distance is a special case of the Minkowski metric, where p=1.

  • Minkowski metric


    Notice that for the special case of p = 1, the Minkowski metric gives the city block metric, for the special case of p = 2, the Minkowski metric gives the Euclidean distance, and for the special case of p = ∞, the Minkowski metric gives the Chebychev distance.

  • Chebychev distance


    Notice that the Chebychev distance is a special case of the Minkowski metric, where p = ∞.

  • Cosine distance


  • Correlation distance



    x¯s=1njxsj and x¯t=1njxtj

  • Hamming distance


  • Jaccard distance


  • Spearman distance



    • rsj is the rank of xsj taken over x1j, x2j, ...xmj, as computed by tiedrank

    • rs and rt are the coordinate-wise rank vectors of xs and xt, i.e., rs = (rs1, rs2, ... rsn)

    • r¯s=1njrsj=(n+1)2

    • r¯t=1njrtj=(n+1)2


Generate random data and find the unweighted Euclidean distance and then find the weighted distance using two different methods:

% Compute the ordinary Euclidean distance.
X = randn(100, 5);
D = pdist(X,'euclidean');  % euclidean distance
% Compute the Euclidean distance with each coordinate
% difference scaled by the standard deviation.
Dstd = pdist(X,'seuclidean');
% Use a function handle to compute a distance that weights
% each coordinate contribution differently.
Wgts = [.1 .3 .3 .2 .1];     % coordinate weights
weuc = @(XI,XJ,W)(sqrt(bsxfun(@minus,XI,XJ).^2 * W'));
Dwgt = pdist(X, @(Xi,Xj) weuc(Xi,Xj,Wgts));

Introduced before R2006a

Was this topic helpful?