pdist - Pairwise distance

Syntax

y = pdist(X)
y = pdist(X,metric)
y = pdist(X,distfun)
y = pdist(X,'minkowski',p)

Description

y = pdist(X) computes the Euclidean distance between pairs of objects in n-by-p data matrix X. Rows of X correspond to observations; columns correspond to variables. y is a row vector of length n(n–1)/2, corresponding to pairs of observations in X. The distances are arranged in the order (2,1), (3,1), ..., (n,1), (3,2), ..., (n,2), ..., (n,n–1)). y is commonly used as a dissimilarity matrix in clustering or multidimensional scaling.

To save space and computation time, y is formatted as a vector. However, you can convert this vector into a square matrix using the squareform function so that element i, j in the matrix, where i < j, corresponds to the distance between objects i and j in the original data set.

y = pdist(X,metric) computes the distance between objects in the data matrix, X, using the method specified by metric, which can be any of the following character strings.

MetricDescription
'euclidean'

Euclidean distance (default)

'seuclidean'

Standardized Euclidean distance. Each coordinate in the sum of squares is inverse weighted by the sample variance of that coordinate.

'mahalanobis'

Mahalanobis distance

'cityblock'

City block metric

'minkowski'

Minkowski metric

'cosine'

One minus the cosine of the included angle between points (treated as vectors)

'correlation'

One minus the sample correlation between points (treated as sequences of values).

'spearman'

One minus the sample Spearman's rank correlation between observations, treated as sequences of values

'hamming'

Hamming distance, the percentage of coordinates that differ

'jaccard'

One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ

'chebychev'

Chebychev distance (maximum coordinate difference)

y = pdist(X,distfun) accepts a function handle distfun to a metric of the form

d = distfun(u,V)

which takes as arguments a 1-by-p vector u, corresponding to a single row of X, and an m-by-p matrix V, corresponding to multiple rows of X. distfun must accept a matrix V with an arbitrary number of rows. distfun must return an m-by-1 vector of distances d, whose kth element is the distance between u and V(k,:).

y = pdist(X,'minkowski',p) computes the distance between objects in the data matrix, X, using the Minkowski metric. p is the exponent used in the Minkowski computation which, by default, is 2.

Metrics

Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2, ..., xm, the various distances between the vector xr and xs are defined as follows:

where


 and 

Examples

X = [1 2; 1 3; 2 2; 3 1]
X =
   1   2
   1   3
   2   2
   3   1

Y = pdist(X,'mahal')
Y =
  2.3452  2.0000  2.3452  1.2247  2.4495  1.2247

Y = pdist(X)
Y =
  1.0000  1.0000  2.2361  1.4142  2.8284  1.4142

squareform(Y)
ans =
       0  1.0000  1.0000  2.2361
  1.0000       0  1.4142  2.8284
  1.0000  1.4142       0  1.4142
  2.2361  2.8284  1.4142       0

See Also

cluster, clusterdata, cmdscale, cophenet, dendrogram, inconsistent, linkage, silhouette, squareform

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS