| Contents | Index |
D = pdist(X)
D = pdist(X,distance)
D = pdist(X) computes the Euclidean distance between pairs of objects in m-by-n data matrix X. Rows of X correspond to observations, and columns correspond to variables. D is a row vector of length m(m–1)/2, corresponding to pairs of observations in X. The distances are arranged in the order (2,1), (3,1), ..., (m,1), (3,2), ..., (m,2), ..., (m,m–1)). D is commonly used as a dissimilarity matrix in clustering or multidimensional scaling.
To save space and computation time, D is formatted as a vector. However, you can convert this vector into a square matrix using the squareform function so that element i, j in the matrix, where i < j, corresponds to the distance between objects i and j in the original data set.
D = pdist(X,distance) computes the distance between objects in the data matrix, X, using the method specified by distance, which can be any of the following character strings.
| Metric | Description |
|---|---|
| 'euclidean' | Euclidean distance (default). |
| 'seuclidean' | Standardized Euclidean distance. Each coordinate difference between rows in X is scaled by dividing by the corresponding element of the standard deviation S=nanstd(X). To specify another value for S, use D=pdist(X,'seuclidean',S). |
| 'cityblock' | City block metric. |
| 'minkowski' | Minkowski distance. The default exponent is 2. To specify a different exponent, use D = pdist(X,'minkowski',P), where P is a scalar positive value of the exponent. |
| 'chebychev' | Chebychev distance (maximum coordinate difference). |
| 'mahalanobis' | Mahalanobis distance, using the sample covariance of X as computed by nancov. To compute the distance with a different covariance, use D = pdist(X,'mahalanobis',C), where the matrix C is symmetric and positive definite. |
| 'cosine' | One minus the cosine of the included angle between points (treated as vectors). |
| 'correlation' | One minus the sample correlation between points (treated as sequences of values). |
| 'spearman' | One minus the sample Spearman's rank correlation between observations (treated as sequences of values). |
| 'hamming' | Hamming distance, which is the percentage of coordinates that differ. |
| 'jaccard' | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ. |
| custom distance function | A distance function specified using @: A distance function must be of form d2 = distfun(XI,XJ) taking as arguments a 1-by-n vector XI, corresponding to a single row of X, and an m2-by-n matrix XJ, corresponding to multiple rows of X. distfun must accept a matrix XJ with an arbitrary number of rows. distfun must return an m2-by-1 vector of distances d2, whose kth element is the distance between XI and XJ(k,:). |
The output D is arranged in the order of ((2,1),(3,1),..., (m,1),(3,2),...(m,2),.....(m,m–1)), i.e. the lower left triangle of the full m-by-m distance matrix in column order. To get the distance between the ith and jth observations (i < j), either use the formula D((i–1)*(m–i/2)+j–i), or use the helper function Z = squareform(D), which returns an m-by-m square symmetric matrix, with the (i,j) entry equal to distance between observation i and observation j.
Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2, ..., xm, the various distances between the vector xs and xt are defined as follows:
Euclidean distance
![]()
Notice that the Euclidean distance is a special case of the Minkowski metric, where p = 2.
Standardized Euclidean distance
![]()
where V is the n-by-n diagonal matrix whose jth diagonal element is S(j)2, where S is the vector of standard deviations.
Mahalanobis distance
![]()
where C is the covariance matrix.
City block metric
![]()
Notice that the city block distance is a special case of the Minkowski metric, where p=1.
Minkowski metric

Notice that for the special case of p = 1, the Minkowski metric gives the city block metric, for the special case of p = 2, the Minkowski metric gives the Euclidean distance, and for the special case of p = ∞, the Minkowski metric gives the Chebychev distance.
Chebychev distance
![]()
Notice that the Chebychev distance is a special case of the Minkowski metric, where p = ∞.
Cosine distance
![]()
Correlation distance

where
and
![]()
Hamming distance
![]()
Jaccard distance

Spearman distance

where
rsj is the rank of xsj taken over x1j, x2j, ...xmj, as computed by tiedrank
rs and rt are the coordinate-wise rank vectors of xs and xt, i.e., rs = (rs1, rs2, ... rsn)
![]()
![]()
Generate random data and find the unweighted Euclidean distance and then find the weighted distance using two different methods:
% Compute the ordinary Euclidean distance. X = randn(100, 5); D = pdist(X,'euclidean'); % euclidean distance % Compute the Euclidean distance with each coordinate % difference scaled by the standard deviation. Dstd = pdist(X,'seuclidean'); % Use a function handle to compute a distance that weights % each coordinate contribution differently. Wgts = [.1 .3 .3 .2 .1]; % coordinate weights weuc = @(XI,XJ,W)(sqrt(bsxfun(@minus,XI,XJ).^2 * W')); Dwgt = pdist(X, @(Xi,Xj) weuc(Xi,Xj,Wgts));
cluster | clusterdata | cmdscale | cophenet | dendrogram | inconsistent | linkage | pdist2 | silhouette | squareform
| © 1984-2012- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |