## Documentation Center |

Pairwise distance between two sets of observations

`D = pdist2(X,Y)D = pdist2(X,Y,distance)D = pdist2(X,Y,'minkowski',P)D = pdist2(X,Y,'mahalanobis',C)D = pdist2(X,Y,distance,'Smallest',K)D = pdist2(X,Y,distance,'Largest',K)[D,I] = pdist2(X,Y,distance,'Smallest',K)[D,I]
= pdist2(X,Y,distance,'Largest',K)`

`D = pdist2(X,Y)` returns a matrix `D` containing
the Euclidean distances between each pair of observations in the *mx*-by-*n* data
matrix `X` and *my*-by-*n* data
matrix `Y`. Rows of `X` and `Y` correspond
to observations, columns correspond to variables. `D` is
an *mx*-by-*my* matrix, with the
(*i*,*j*) entry equal to distance
between observation *i* in `X` and
observation *j* in `Y`. The (*i*,*j*)
entry will be `NaN` if observation *i* in `X` or
observation *j* in `Y` contain `NaN`s.

`D = pdist2(X,Y,distance)` computes `D` using `distance`.
Choices are:

Metric | Description |
---|---|

'euclidean' | Euclidean distance (default). |

'seuclidean' | Standardized Euclidean distance. Each coordinate difference
between rows in S, use D
= PDIST2(X,Y,'seuclidean',S). |

'cityblock' | City block metric. |

'minkowski' | Minkowski distance. The default exponent is 2. To compute
the distance with a different exponent, use |

'chebychev' | Chebychev distance (maximum coordinate difference). |

'mahalanobis' | Mahalanobis distance, using the sample covariance of |

'cosine' | One minus the cosine of the included angle between points (treated as vectors). |

'correlation' | One minus the sample correlation between points (treated as sequences of values). |

'spearman' | One minus the sample Spearman's rank correlation between observations, treated as sequences of values. |

'hamming' | Hamming distance, the percentage of coordinates that differ. |

'jaccard' | One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ. |

function | A distance function specified using @: A distance function must be of the form function D2 = distfun(ZI, ZJ) taking
as arguments a 1-by- If
your data is not sparse, generally it is faster to use a built-in |

`D = pdist2(X,Y,distance,'Smallest',K)` returns
a `K`-by-*my* matrix `D` containing
the `K` smallest pairwise distances to observations
in `X` for each observation in `Y`. `pdist2` sorts
the distances in each column of `D` in ascending
order. `D = pdist2(X,Y,distance,'Largest',K)` returns
the `K` largest pairwise distances sorted in descending
order. If `K` is greater than *mx*, `pdist2` returns
an *mx*-by-*my* distance matrix.
For each observation in `Y`, `pdist2` finds
the `K` smallest or largest distances by computing
and comparing the distance values to all the observations in `X`.

`[D,I] = pdist2(X,Y,distance,'Smallest',K)` returns
a `K`-by-*my* matrix `I` containing
indices of the observations in `X` corresponding
to the `K` smallest pairwise distances in `D`. `[D,I]
= pdist2(X,Y,distance,'Largest',K)` returns indices corresponding
to the `K` largest pairwise distances.

Given an *mx*-by-*n* data
matrix `X`, which is treated as *mx* (1-by-*n*)
row vectors `x`_{1}, `x`_{2},
..., `x`_{mx},
and *my*-by-*n* data matrix `Y`,
which is treated as *my* (1-by-*n*)
row vectors `y`_{1}, `y`_{2},
...,`y`_{my},
the various distances between the vector `x`_{s} and `y`_{t} are
defined as follows:

Euclidean distance

Notice that the Euclidean distance is a special case of the Minkowski metric, where

`p=`2.Standardized Euclidean distance

where

`V`is the*n*-by-*n*diagonal matrix whose*j*th diagonal element is`S`(*j*)^{2}, where`S`is the vector of standard deviations.Mahalanobis distance

where

`C`is the covariance matrix.City block metric

Notice that the city block distance is a special case of the Minkowski metric, where

`p=`1.Minkowski metric

Notice that for the special case of

`p`= 1, the Minkowski metric gives the City Block metric, for the special case of`p`= 2, the Minkowski metric gives the Euclidean distance, and for the special case of`p=`∞, the Minkowski metric gives the Chebychev distance.Chebychev distance

Notice that the Chebychev distance is a special case of the Minkowski metric, where

`p=`∞.Cosine distance

Correlation distance

where

and

Hamming distance

Jaccard distance

Spearman distance

where

*r*is the rank of_{sj}*x*taken over_{sj}*x*,_{1j}*x*, ..._{2j}*x*, as computed by_{mx,j}`tiedrank`*r*is the rank of_{tj}*y*taken over_{tj}*y*,_{1j}*y*, ..._{2j}*y*, as computed by_{my,j}`tiedrank`*r*and_{s}*r*are the coordinate-wise rank vectors of_{t}*x*and_{s}*y*, i.e._{t}*r*= (_{s}*r*,_{s1}*r*, ..._{s2}*r*) and_{sn}*r*= (_{t}*r*,_{t1}*r*, ..._{t2}*r*)_{tn}

Generate random data and find the unweighted Euclidean distance, then find the weighted distance using two different methods:

% Compute the ordinary Euclidean distance X = randn(100, 5); Y = randn(25, 5); D = pdist2(X,Y,'euclidean'); % euclidean distance % Compute the Euclidean distance with each coordinate % difference scaled by the standard deviation Dstd = pdist2(X,Y,'seuclidean'); % Use a function handle to compute a distance that weights % each coordinate contribution differently. Wgts = [.1 .3 .3 .2 .1]; weuc = @(XI,XJ,W)(sqrt(bsxfun(@minus,XI,XJ).^2 * W')); Dwgt = pdist2(X,Y, @(Xi,Xj) weuc(Xi,Xj,Wgts));

`createns` | `ExhaustiveSearcher` | `KDTreeSearcher` | `knnsearch` | `pdist`

Was this topic helpful?