Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Pairwise distance between two sets of observations

`D = pdist2(X,Y)`

D = pdist2(X,Y,distance)

D = pdist2(X,Y,'minkowski',P)

D = pdist2(X,Y,'mahalanobis',C)

D = pdist2(X,Y,distance,'Smallest',K)

D = pdist2(X,Y,distance,'Largest',K)

[D,I] = pdist2(X,Y,distance,'Smallest',K)

[D,I]
= pdist2(X,Y,distance,'Largest',K)

`D = pdist2(X,Y)`

returns a matrix `D`

containing
the Euclidean distances between each pair of observations in the *mx*-by-*n* data
matrix `X`

and *my*-by-*n* data
matrix `Y`

. Rows of `X`

and `Y`

correspond
to observations, columns correspond to variables. `D`

is
an *mx*-by-*my* matrix, with the
(*i*,*j*) entry equal to distance
between observation *i* in `X`

and
observation *j* in `Y`

. The (*i*,*j*)
entry will be `NaN`

if observation *i* in `X`

or
observation *j* in `Y`

contain `NaN`

s.

`D = pdist2(X,Y,distance)`

computes `D`

using `distance`

.
Choices are:

Metric | Description |
---|---|

`'euclidean'` | Euclidean distance (default). |

`'squaredeuclidean'` | Squared Euclidean distance. (This option is provided for efficiency only. It does not satisfy the triangle inequality.) |

`'seuclidean'` | Standardized Euclidean distance. Each coordinate difference
between rows in |

`'cityblock'` | City block metric. |

`'minkowski'` | Minkowski distance. The default exponent is 2. To compute
the distance with a different exponent, use |

`'chebychev'` | Chebychev distance (maximum coordinate difference). |

`'mahalanobis'` | Mahalanobis distance, using the sample covariance of |

`'cosine'` | One minus the cosine of the included angle between points (treated as vectors). |

`'correlation'` | One minus the sample correlation between points (treated as sequences of values). |

`'spearman'` | One minus the sample Spearman's rank correlation between observations, treated as sequences of values. |

`'hamming'` | Hamming distance, the percentage of coordinates that differ. |

`'jaccard'` | One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ. |

function | A distance function specified using @: A distance function must be of the form function D2 = distfun(ZI, ZJ) n vector `ZI` containing
a single observation from `X` or `Y` ,
an m2-by-n matrix `ZJ` containing
multiple observations from `X` or `Y` ,
and returning an m2-by-1 vector of distances `D2` ,
whose `J` th element is the distance between the observations `ZI` and `ZJ(J,:)` . If
your data is not sparse, generally it is faster to use a built-in |

`D = pdist2(X,Y,distance,'Smallest',K)`

returns
a `K`

-by-*my* matrix `D`

containing
the `K`

smallest pairwise distances to observations
in `X`

for each observation in `Y`

. `pdist2`

sorts
the distances in each column of `D`

in ascending
order. `D = pdist2(X,Y,distance,'Largest',K)`

returns
the `K`

largest pairwise distances sorted in descending
order. If `K`

is greater than *mx*, `pdist2`

returns
an *mx*-by-*my* distance matrix.
For each observation in `Y`

, `pdist2`

finds
the `K`

smallest or largest distances by computing
and comparing the distance values to all the observations in `X`

.

`[D,I] = pdist2(X,Y,distance,'Smallest',K)`

returns
a `K`

-by-*my* matrix `I`

containing
indices of the observations in `X`

corresponding
to the `K`

smallest pairwise distances in `D`

. ```
[D,I]
= pdist2(X,Y,distance,'Largest',K)
```

returns indices corresponding
to the `K`

largest pairwise distances.

Given an *mx*-by-*n* data
matrix `X`

, which is treated as *mx* (1-by-*n*)
row vectors `x`

_{1}, `x`

_{2},
..., `x`

_{mx},
and *my*-by-*n* data matrix `Y`

,
which is treated as *my* (1-by-*n*)
row vectors `y`

_{1}, `y`

_{2},
...,`y`

_{my},
the various distances between the vector `x`

_{s} and `y`

_{t} are
defined as follows:

Euclidean distance

$${d}_{st}^{2}=({x}_{s}-{y}_{t})({x}_{s}-{y}_{t}{)}^{\prime}$$

Notice that the Euclidean distance is a special case of the Minkowski metric, where

`p=`

2.Standardized Euclidean distance

$${d}_{st}^{2}=({x}_{s}-{y}_{t}){V}^{-1}({x}_{s}-{y}_{t}{)}^{\prime}$$

where

`V`

is the*n*-by-*n*diagonal matrix whose*j*th diagonal element is`S`

(*j*)^{2}, where`S`

is the vector of standard deviations.Mahalanobis distance

$${d}_{st}^{2}=({x}_{s}-{y}_{t}){C}^{-1}({x}_{s}-{y}_{t}{)}^{\prime}$$

where

`C`

is the covariance matrix.City block metric

$${d}_{st}={\displaystyle \sum _{j=1}^{n}\left|{x}_{sj}-{y}_{tj}\right|}$$

Notice that the city block distance is a special case of the Minkowski metric, where

`p=`

1.Minkowski metric

$${d}_{st}=\sqrt[p]{{\displaystyle \sum _{j=1}^{n}{\left|{x}_{sj}-{y}_{tj}\right|}^{p}}}$$

Notice that for the special case of

`p`

= 1, the Minkowski metric gives the City Block metric, for the special case of`p`

= 2, the Minkowski metric gives the Euclidean distance, and for the special case of`p=`

∞, the Minkowski metric gives the Chebychev distance.Chebychev distance

$${d}_{st}={\mathrm{max}}_{j}\left\{\left|{x}_{sj}-{y}_{tj}\right|\right\}$$

Notice that the Chebychev distance is a special case of the Minkowski metric, where

`p=`

∞.Cosine distance

$${d}_{st}=\left(1-\frac{{x}_{s}{{y}^{\prime}}_{t}}{\sqrt{\left({x}_{s}{{x}^{\prime}}_{s}\right)\left({y}_{t}{{y}^{\prime}}_{t}\right)}}\right)$$

Correlation distance

$${d}_{st}=1-\frac{\left({x}_{s}-{\overline{x}}_{s}\right){\left({y}_{t}-{\overline{y}}_{t}\right)}^{\prime}}{\sqrt{\left({x}_{s}-{\overline{x}}_{s}\right){\left({x}_{s}-{\overline{x}}_{s}\right)}^{\prime}}\sqrt{\left({y}_{t}-{\overline{y}}_{t}\right){\left({y}_{t}-{\overline{y}}_{t}\right)}^{\prime}}}$$

where

$${\overline{x}}_{s}=\frac{1}{n}{\displaystyle \sum _{j}{x}_{sj}}$$ and

$${\overline{y}}_{t}=\frac{1}{n}{\displaystyle \sum _{j}{y}_{tj}}$$

Hamming distance

$${d}_{st}=(\#({x}_{sj}\ne {y}_{tj})/n)$$

Jaccard distance

$${d}_{st}=\frac{\#\left[\left({x}_{sj}\ne {y}_{tj}\right)\cap \left(\left({x}_{sj}\ne 0\right)\cup \left({y}_{tj}\ne 0\right)\right)\right]}{\#\left[\left({x}_{sj}\ne 0\right)\cup \left({y}_{tj}\ne 0\right)\right]}$$

Spearman distance

$${d}_{st}=1-\frac{\left({r}_{s}-{\overline{r}}_{s}\right){\left({r}_{t}-{\overline{r}}_{t}\right)}^{\prime}}{\sqrt{\left({r}_{s}-{\overline{r}}_{s}\right){\left({r}_{s}-{\overline{r}}_{s}\right)}^{\prime}}\sqrt{\left({r}_{t}-{\overline{r}}_{t}\right){\left({r}_{t}-{\overline{r}}_{t}\right)}^{\prime}}}$$

where

*r*is the rank of_{sj}*x*taken over_{sj}*x*,_{1j}*x*, ..._{2j}*x*, as computed by_{mx,j}`tiedrank`

*r*is the rank of_{tj}*y*taken over_{tj}*y*,_{1j}*y*, ..._{2j}*y*, as computed by_{my,j}`tiedrank`

*r*and_{s}*r*are the coordinate-wise rank vectors of_{t}*x*and_{s}*y*, i.e._{t}*r*= (_{s}*r*,_{s1}*r*, ..._{s2}*r*) and_{sn}*r*= (_{t}*r*,_{t1}*r*, ..._{t2}*r*)_{tn}$${\overline{r}}_{s}=\frac{1}{n}{\displaystyle \sum _{j}{r}_{sj}}=\frac{\left(n+1\right)}{2}$$

$${\overline{r}}_{t}=\frac{1}{n}{\displaystyle \sum _{j}{r}_{tj}}=\frac{\left(n+1\right)}{2}$$

Generate random data and find the unweighted Euclidean distance, then find the weighted distance using two different methods:

% Compute the ordinary Euclidean distance X = randn(100, 5); Y = randn(25, 5); D = pdist2(X,Y,'euclidean'); % euclidean distance % Compute the Euclidean distance with each coordinate % difference scaled by the standard deviation Dstd = pdist2(X,Y,'seuclidean'); % Use a function handle to compute a distance that weights % each coordinate contribution differently. Wgts = [.1 .3 .3 .2 .1]; weuc = @(XI,XJ,W)(sqrt(bsxfun(@minus,XI,XJ).^2 * W')); Dwgt = pdist2(X,Y, @(Xi,Xj) weuc(Xi,Xj,Wgts));

`createns`

| `ExhaustiveSearcher`

| `KDTreeSearcher`

| `knnsearch`

| `pdist`

Was this topic helpful?