| Statistics Toolbox™ | ![]() |
y = pdist(X)
y = pdist(X,metric)
y = pdist(X,distfun)
y = pdist(X,'minkowski',p)
y = pdist(X) computes the Euclidean distance between pairs of objects in n-by-p data matrix X. Rows of X correspond to observations; columns correspond to variables. y is a row vector of length n(n–1)/2, corresponding to pairs of observations in X. The distances are arranged in the order (2,1), (3,1), ..., (n,1), (3,2), ..., (n,2), ..., (n,n–1)). y is commonly used as a dissimilarity matrix in clustering or multidimensional scaling.
To save space and computation time, y is formatted as a vector. However, you can convert this vector into a square matrix using the squareform function so that element i, j in the matrix, where i < j, corresponds to the distance between objects i and j in the original data set.
y = pdist(X,metric) computes the distance between objects in the data matrix, X, using the method specified by metric, which can be any of the following character strings.
| Metric | Description |
|---|---|
| 'euclidean' | Euclidean distance (default) |
| 'seuclidean' | Standardized Euclidean distance. Each coordinate in the sum of squares is inverse weighted by the sample variance of that coordinate. |
| 'mahalanobis' | Mahalanobis distance |
| 'cityblock' | City block metric |
| 'minkowski' | Minkowski metric |
| 'cosine' | One minus the cosine of the included angle between points (treated as vectors) |
| 'correlation' | One minus the sample correlation between points (treated as sequences of values). |
| 'spearman' | One minus the sample Spearman's rank correlation between observations, treated as sequences of values |
| 'hamming' | Hamming distance, the percentage of coordinates that differ |
| 'jaccard' | One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ |
| 'chebychev' | Chebychev distance (maximum coordinate difference) |
y = pdist(X,distfun) accepts a function handle distfun to a metric of the form
d = distfun(u,V)
which takes as arguments a 1-by-p vector u, corresponding to a single row of X, and an m-by-p matrix V, corresponding to multiple rows of X. distfun must accept a matrix V with an arbitrary number of rows. distfun must return an m-by-1 vector of distances d, whose kth element is the distance between u and V(k,:).
y = pdist(X,'minkowski',p) computes the distance between objects in the data matrix, X, using the Minkowski metric. p is the exponent used in the Minkowski computation which, by default, is 2.
Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2, ..., xm, the various distances between the vector xr and xs are defined as follows:
Euclidean distance
![]()
Standardized Euclidean distance
![]()
where D is the diagonal matrix with diagonal
elements given by
, which denotes the variance of
the variable Xj over the m objects.
Mahalanobis distance
![]()
where V is the sample covariance matrix.
City block metric
![]()
Minkowski metric

Notice that for the special case of p = 1, the Minkowski metric gives the City Block metric, and for the special case of p = 2, the Minkowski metric gives the Euclidean distance.
Cosine distance
![]()
Correlation distance
![]()
where
and
![]()
Hamming distance
![]()
Jaccard distance
![]()
X = [1 2; 1 3; 2 2; 3 1]
X =
1 2
1 3
2 2
3 1
Y = pdist(X,'mahal')
Y =
2.3452 2.0000 2.3452 1.2247 2.4495 1.2247
Y = pdist(X)
Y =
1.0000 1.0000 2.2361 1.4142 2.8284 1.4142
squareform(Y)
ans =
0 1.0000 1.0000 2.2361
1.0000 0 1.4142 2.8284
1.0000 1.4142 0 1.4142
2.2361 2.8284 1.4142 0cluster, clusterdata, cmdscale, cophenet, dendrogram, inconsistent, linkage, silhouette, squareform
![]() | pearsrnd | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |