Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Pairwise distance between pairs of objects

`D = pdist(X)`

D = pdist(X,* distance*)

`D = pdist(X)`

computes the
Euclidean distance between pairs of objects in * m*-by-

`X`

. Rows of `X`

correspond
to observations, and columns correspond to variables. `D`

is
a row vector of length `X`

. The
distances are arranged in the order (2,1), (3,1), ..., (`D`

is
commonly used as a dissimilarity matrix in clustering or multidimensional
scaling.To save space and computation time, `D`

is
formatted as a vector. However, you can convert this vector into a
square matrix using the `squareform`

function
so that element * i*,

`D = pdist(X,`

computes
the distance between objects in the data matrix, * distance*)

`X`

,
using the method specified by `distance`

Metric | Description |
---|---|

`'euclidean'` | Euclidean distance (default). |

`'squaredeuclidean'` | Squared Euclidean distance. (This option is provided for efficiency only. It does not satisfy the triangle inequality.) |

`'seuclidean'` | Standardized Euclidean distance. Each coordinate difference
between rows in X is scaled by dividing by the corresponding element
of the standard deviation |

`'cityblock'` | City block metric. |

`'minkowski'` | Minkowski distance. The default exponent is 2. To specify
a different exponent, use |

`'chebychev'` | Chebychev distance (maximum coordinate difference). |

`'mahalanobis'` | Mahalanobis distance, using the sample covariance of |

`'cosine'` | One minus the cosine of the included angle between points (treated as vectors). |

`'correlation'` | One minus the sample correlation between points (treated as sequences of values). |

`'spearman'` | One minus the sample Spearman's rank correlation between observations (treated as sequences of values). |

`'hamming'` | Hamming distance, which is the percentage of coordinates that differ. |

`'jaccard'` | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ. |

custom distance function | A distance function specified using @: A distance function must be of form d2 = distfun(XI,XJ) vector n`XI` ,
corresponding to a single row of `X` , and an 2-by-m matrix n`XJ` ,
corresponding to multiple rows of `X` . `distfun` must
accept a matrix `XJ` with an arbitrary number of
rows. `distfun` must return an 2-by-1
vector of distances m`d2` , whose th
element is the distance between k`XI` and `XJ(k,:)` . |

The output `D`

is arranged in the order of *((2,1),(3,1),...,
( m,1),(3,2),...(m,2),.....(m,m–1))*,
i.e. the lower left triangle of the full

`Z = squareform(D)`

, which
returns an Given an * m*-by-

`X`

,
which is treated as `x`

`x`

`x`

`x`

`x`

Euclidean distance

$${d}_{st}^{2}=({x}_{s}-{x}_{t})({x}_{s}-{x}_{t}{)}^{\prime}$$

Notice that the Euclidean distance is a special case of the Minkowski metric, where

`p`

= 2.Standardized Euclidean distance

$${d}_{st}^{2}=({x}_{s}-{x}_{t}){V}^{-1}({x}_{s}-{x}_{t}{)}^{\prime}$$

where

`V`

is the-by-*n*diagonal matrix whose*n*th diagonal element is*j*`S`

()*j*^{2}, where`S`

is the vector of standard deviations.Mahalanobis distance

$${d}_{st}^{2}=({x}_{s}-{x}_{t}){C}^{-1}({x}_{s}-{x}_{t}{)}^{\prime}$$

where

`C`

is the covariance matrix.City block metric

$${d}_{st}={\displaystyle \sum _{j=1}^{n}\left|{x}_{sj}-{x}_{tj}\right|}$$

Notice that the city block distance is a special case of the Minkowski metric, where

`p=`

1.Minkowski metric

$${d}_{st}=\sqrt[p]{{\displaystyle \sum _{j=1}^{n}{\left|{x}_{sj}-{x}_{tj}\right|}^{p}}}$$

Notice that for the special case of

`p`

= 1, the Minkowski metric gives the city block metric, for the special case of`p`

= 2, the Minkowski metric gives the Euclidean distance, and for the special case of`p`

= ∞, the Minkowski metric gives the Chebychev distance.Chebychev distance

$${d}_{st}={\mathrm{max}}_{j}\left\{\left|{x}_{sj}-{x}_{tj}\right|\right\}$$

Notice that the Chebychev distance is a special case of the Minkowski metric, where

`p`

= ∞.Cosine distance

$${d}_{st}=1-\frac{{x}_{s}{{x}^{\prime}}_{t}}{\sqrt{\left({x}_{s}{{x}^{\prime}}_{s}\right)\left({x}_{t}{{x}^{\prime}}_{t}\right)}}$$

Correlation distance

$${d}_{st}=1-\frac{\left({x}_{s}-{\overline{x}}_{s}\right){\left({x}_{t}-{\overline{x}}_{t}\right)}^{\prime}}{\sqrt{\left({x}_{s}-{\overline{x}}_{s}\right){\left({x}_{s}-{\overline{x}}_{s}\right)}^{\prime}}\sqrt{\left({x}_{t}-{\overline{x}}_{t}\right){\left({x}_{t}-{\overline{x}}_{t}\right)}^{\prime}}}$$

where

$${\overline{x}}_{s}=\frac{1}{n}{\displaystyle \sum _{j}{x}_{sj}}$$ and $${\overline{x}}_{t}=\frac{1}{n}{\displaystyle \sum _{j}{x}_{tj}}$$

Hamming distance

$${d}_{st}=(\#({x}_{sj}\ne {x}_{tj})/n)$$

Jaccard distance

$${d}_{st}=\frac{\#\left[\left({x}_{sj}\ne {x}_{tj}\right)\cap \left(\left({x}_{sj}\ne 0\right)\cup \left({x}_{tj}\ne 0\right)\right)\right]}{\#\left[\left({x}_{sj}\ne 0\right)\cup \left({x}_{tj}\ne 0\right)\right]}$$

Spearman distance

$${d}_{st}=1-\frac{\left({r}_{s}-{\overline{r}}_{s}\right){\left({r}_{t}-{\overline{r}}_{t}\right)}^{\prime}}{\sqrt{\left({r}_{s}-{\overline{r}}_{s}\right){\left({r}_{s}-{\overline{r}}_{s}\right)}^{\prime}}\sqrt{\left({r}_{t}-{\overline{r}}_{t}\right){\left({r}_{t}-{\overline{r}}_{t}\right)}^{\prime}}}$$

where

is the rank of*r*_{sj}taken over*x*_{sj}*x*_{1},_{j}*x*_{2}, ..._{j}, as computed by*x*_{mj}`tiedrank`

and*r*_{s}are the coordinate-wise rank vectors of*r*_{t}and*x*_{s}, i.e.,*x*_{t}= (*r*_{s}*r*_{s}_{1},*r*_{s}_{2}, ...)*r*_{sn}$${\overline{r}}_{s}=\frac{1}{n}{\displaystyle \sum _{j}{r}_{sj}}=\frac{\left(n+1\right)}{2}$$

$${\overline{r}}_{t}=\frac{1}{n}{\displaystyle \sum _{j}{r}_{tj}}=\frac{\left(n+1\right)}{2}$$

Generate random data and find the unweighted Euclidean distance and then find the weighted distance using two different methods:

% Compute the ordinary Euclidean distance. X = randn(100, 5); D = pdist(X,'euclidean'); % euclidean distance % Compute the Euclidean distance with each coordinate % difference scaled by the standard deviation. Dstd = pdist(X,'seuclidean'); % Use a function handle to compute a distance that weights % each coordinate contribution differently. Wgts = [.1 .3 .3 .2 .1]; % coordinate weights weuc = @(XI,XJ,W)(sqrt(bsxfun(@minus,XI,XJ).^2 * W')); Dwgt = pdist(X, @(Xi,Xj) weuc(Xi,Xj,Wgts));

`cluster`

| `clusterdata`

| `cmdscale`

| `cophenet`

| `dendrogram`

| `inconsistent`

| `linkage`

| `pdist2`

| `silhouette`

| `squareform`

Was this topic helpful?