Documentation

# ExhaustiveSearcher

Create exhaustive nearest neighbor searcher

## Description

`ExhaustiveSearcher` model objects store the training data, distance metric, and parameter values of the distance metric for an exhaustive nearest neighbor search. The exhaustive search algorithm finds the distance from each query observation to all n observations in the training data, which is an n-by-K numeric matrix.

Once you create an `ExhaustiveSearcher` model object, find neighboring points in the training data to the query data by performing a nearest neighbor search using `knnsearch` or a radius search using `rangesearch`. The exhaustive search algorithm is more efficient than the Kd-tree algorithm when K is large (that is, K > 10), and it is more flexible than the Kd-tree algorithm with respect to distance metric choices. The `ExhaustiveSearcher` model object also supports sparse data.

## Creation

Use either the `createns` function or the `ExhaustiveSearcher` function (described here) to create an `ExhaustiveSearcher` object. Both functions use the same syntax except that the `createns` function has the `'NSMethod'` name-value pair argument, which you use to choose the nearest neighbor search method. The `createns` function also creates a `KDTreeSearcher` object. Specify `'NSMethod','exhaustive'` to create an `ExhaustiveSearcher` object. The default is `'exhaustive'` if K > 10, the training data is sparse, or the distance metric is not the Euclidean, city block, Chebychev, or Minkowski.

### Syntax

``Mdl = ExhaustiveSearcher(X)``
``Mdl = ExhaustiveSearcher(X,Name,Value)``

### Description

example

````Mdl = ExhaustiveSearcher(X)` creates an exhaustive nearest neighbor searcher object (`Mdl`) using the n-by-K numeric matrix of training data (`X`).```

example

````Mdl = ExhaustiveSearcher(X,Name,Value)` specifies additional options using one or more name-value pair arguments. You can specify the distance metric and set the distance metric parameter (`DistParameter`) property. For example, `ExhaustiveSearcher(X,'Distance','chebychev')` creates an exhaustive nearest neighbor searcher object that uses the Chebychev distance. To specify `DistParameter`, use the `Cov`, `P`, or `Scale` name-value pair argument.```

### Input Arguments

expand all

Training data that prepares the exhaustive searcher algorithm, specified as a numeric matrix. `X` has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature).

Data Types: `single` | `double`

#### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'Distance','mahalanobis','Cov',eye(3)` specifies to use the Mahalanobis distance when searching for nearest neighbors and a 3-by-3 identity matrix for the covariance matrix in the Mahalanobis distance metric.

Distance metric used when you call `knnsearch` or `rangesearch` to find nearest neighbors for future query points, specified as the comma-separated pair consisting of `'Distance'` and a character vector, string scalar, or function handle.

This table describes the supported distance metrics specified as character vectors or string scalars.

ValueDescription
`'chebychev'`Chebychev distance (maximum coordinate difference).
`'cityblock'`City block distance.
`'correlation'`One minus the sample linear correlation between observations (treated as sequences of values).
`'cosine'`One minus the cosine of the included angle between observations (treated as row vectors).
`'euclidean'`Euclidean distance.
`'hamming'`Hamming distance, which is the percentage of coordinates that differ.
`'jaccard'`One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ.
`'minkowski'`Minkowski distance. The default exponent is 2. To specify a different exponent, use the `'P'` name-value pair argument.
`'mahalanobis'`Mahalanobis distance, computed using a positive definite covariance matrix. To change the value of the covariance matrix, use the `'Cov'` name-value pair argument.
`'seuclidean'`Standardized Euclidean distance. Each coordinate difference between rows in `Mdl.X` and the query matrix is scaled by dividing by the corresponding element of the standard deviation computed from `Mdl.X`. To specify another scaling, use the `'Scale'` name-value pair argument.
`'spearman'`One minus the sample Spearman's rank correlation between observations (treated as sequences of values).

For more details, see Distance Metrics.

You can specify a function handle for a custom distance metric by using `@` (for example, `@distfun`). A custom distance function must:

• Have the form ```function D2 = distfun(ZI, ZJ)```.

• Take as arguments:

• A 1-by-K vector `ZI` containing a single row from `X` or from the query points `Y`, where K is the number of columns in `X`.

• An m-by-K matrix `ZJ` containing multiple rows of `X` or `Y`, where m is a positive integer.

• Return an m-by-1 vector of distance `D2`, where `D2(j)` is the distance between the observations `ZI` and `ZJ(j,:)`.

The software does not use the distance metric for creating an `ExhaustiveSearcher` model object, so you can alter the distance metric by using dot notation after creating the object.

Example: `'Distance','mahalanobis'`

Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair consisting of `'Cov'` and a K-by-K positive definite matrix, where K is the number of columns in `X`. This argument is valid only if `'Distance'` is `'mahalanobis'`.

Example: `'Cov',eye(3)`

Data Types: `single` | `double`

Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of `'P'` and a positive scalar. This argument is valid only if `'Distance'` is `'minkowski'`.

Example: `'P',3`

Data Types: `single` | `double`

Scale parameter value for the standardized Euclidean distance metric, specified as the comma-separated pair consisting of `'Scale'` and a nonnegative numeric vector of length K, where K is the number of columns in `X`. The software scales each difference between the training and query data using the corresponding element of `Scale`. This argument is valid only if `'Distance'` is `'seuclidean'`.

Example: `'Scale',quantile(X,0.75) - quantile(X,0.25)`

Data Types: `single` | `double`

## Properties

expand all

This property is read-only.

Training data that prepares the exhaustive searcher algorithm, specified as a numeric matrix. `X` has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature).

The input argument `X` of `createns` or `ExhaustiveSearcher` sets this property.

Data Types: `single` | `double`

Distance metric used when you call `knnsearch` or `rangesearch` to find nearest neighbors for future query points, specified as a character vector or string scalar (`'chebychev'`, `'cityblock'`, `'correlation'`, `'cosine'`, `'euclidean'`, `'hamming'`, `'jaccard'`, `'minkowski'`, `'mahalanobis'`, `'seuclidean'`, or `'spearman'`), or a function handle.

The `'Distance'` name-value pair argument of `createns` or `ExhaustiveSearcher` sets this property.

The software does not use the distance metric for creating an `ExhaustiveSearcher` model object, so you can alter it by using dot notation.

Distance metric parameter values, specified as empty (`[]`) or a positive scalar.

This table describes the distance parameters of the supported distance metrics.

Distance MetricParameter Description
`'mahalanobis'`

A positive definite matrix representing the covariance matrix used for computing the Mahalanobis distance. By default, the software sets the covariance using `nancov(Mdl.X)`.

The `'Cov'` name-value pair argument of `createns` or `ExhaustiveSearcher` sets this property.

You can alter `DistParameter` by using dot notation, for example, ```Mdl.DistParameter = CovNew```, where `CovNew` is a K-by-K positive definite numeric matrix.

`'minkowski'`

A positive scalar indicating the exponent of the Minkowski distance. By default, the exponent is `2`.

The `'P'` name-value pair argument of `createns` or `ExhaustiveSearcher` sets this property.

You can alter `DistParameter` by using dot notation, for example, ```Mdl.DistParameter = PNew```, where `PNew` is a positive scalar.

`'seuclidean'`

A positive numeric vector indicating the values used by the software to scale the predictors when computing the standardized Euclidean distance. By default, the software:

1. Estimates the standard deviation of each predictor (column) of `X` using `scale = nanstd(Mdl.X)`

2. Scales each coordinate difference between the rows in `X` and the query matrix by dividing by the corresponding element of `scale`

The `'Scale'` name-value pair argument of `createns` or `ExhaustiveSearcher` sets this property.

You can alter `DistParameter` by using dot notation, for example, ```Mdl.DistParameter = sNew```, where `sNew` is a K-dimensional positive numeric vector.

If `Mdl.Distance` is not one of the parameters listed in this table, then `Mdl.DistParameter` is `[]`, which means that the specified distance metric formula has no parameters.

Data Types: `single` | `double`

## Object Functions

 `knnsearch` Find k-nearest neighbors using searcher object `rangesearch` Find all neighbors within specified distance using searcher object

## Examples

collapse all

Load Fisher's iris data set.

```load fisheriris X = meas; [n,k] = size(X)```
```n = 150 ```
```k = 4 ```

`X` has 150 observations and 4 predictors.

Prepare an exhaustive nearest neighbor searcher using the entire data set as training data.

`Mdl1 = ExhaustiveSearcher(X)`
```Mdl1 = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [150x4 double] ```

`Mdl1` is an `ExhaustiveSearcher` model object, and its properties appear in the Command Window. The object contains information about the trained algorithm, such as the distance metric. You can alter property values using dot notation.

Alternatively, you can prepare an exhaustive nearest neighbor searcher by using `createns` and specifying `'exhaustive'` as the search method.

`Mdl2 = createns(X,'NSMethod','exhaustive')`
```Mdl2 = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [150x4 double] ```

`Mdl2` is also an `ExhaustiveSearcher` model object, and it is equivalent to `Mdl1`.

To search `X` for the nearest neighbors to a batch of query data, pass the `ExhaustiveSearcher` model object and the query data to `knnsearch` or `rangesearch`.

Load Fisher's iris data set. Focus on the petal dimensions.

```load fisheriris X = meas(:,[3 4]); % Predictors```

Prepare an exhaustive nearest neighbor searcher. Specify the Mahalanobis distance metric.

`Mdl = createns(X,'Distance','mahalanobis')`
```Mdl = ExhaustiveSearcher with properties: Distance: 'mahalanobis' DistParameter: [2x2 double] X: [150x2 double] ```

Because the distance metric is Mahalanobis, `createns` creates an `ExhaustiveSearcher` model object by default.

Access properties of `Mdl` by using dot notation. For example, use `Mdl.DistParameter` to access the Mahalanobis covariance parameter.

`Mdl.DistParameter`
```ans = 2×2 3.1163 1.2956 1.2956 0.5810 ```

You can pass query data and `Mdl` to:

Create an `ExhaustiveSearcher` model object and alter the `Distance` property by using dot notation.

Load Fisher's iris data set.

```load fisheriris X = meas;```

Train a default exhaustive searcher algorithm using the entire data set as training data.

`Mdl = ExhaustiveSearcher(X)`
```Mdl = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [150x4 double] ```

Specify that the neighbor searcher use the Mahalanobis metric to compute the distances between the training and query data.

`Mdl.Distance = 'mahalanobis'`
```Mdl = ExhaustiveSearcher with properties: Distance: 'mahalanobis' DistParameter: [4x4 double] X: [150x4 double] ```

You can pass `Mdl` and the query data to either `knnsearch` or `rangesearch` to find the nearest neighbors to the points in the query data based on the Mahalanobis distance.

Create an exhaustive searcher object by using the `createns` function. Pass the object and query data to the `knnsearch` function to find k-nearest neighbors.

Load Fisher's iris data set.

`load fisheriris`

Remove five irises randomly from the predictor data to use as a query set.

```rng('default'); % For reproducibility n = size(meas,1); % Sample size qIdx = randsample(n,5); % Indices of query data X = meas(~ismember(1:n,qIdx),:); Y = meas(qIdx,:);```

Prepare an exhaustive nearest neighbor searcher using the training data. Specify the Mahalanobis distance for finding nearest neighbors.

`Mdl = createns(X,'Distance','mahalanobis')`
```Mdl = ExhaustiveSearcher with properties: Distance: 'mahalanobis' DistParameter: [4x4 double] X: [145x4 double] ```

Because the distance metric is Mahalanobis, `createns` creates an `ExhaustiveSearcher` model object by default.

The software uses the covariance matrix of the predictors (columns) in the training data for computing the Mahalanobis distance. To display this value, use `Mdl.DistParameter`.

`Mdl.DistParameter`
```ans = 4×4 0.6547 -0.0368 1.2320 0.5026 -0.0368 0.1914 -0.3227 -0.1193 1.2320 -0.3227 3.0671 1.2842 0.5026 -0.1193 1.2842 0.5800 ```

Find the indices of the training data (`Mdl.X`) that are the two nearest neighbors of each point in the query data (`Y`).

`IdxNN = knnsearch(Mdl,Y,'K',2)`
```IdxNN = 5×2 5 6 98 95 104 128 135 65 102 115 ```

Each row of `IdxNN` corresponds to a query data observation. The column order corresponds to the order of the nearest neighbors with respect to ascending distance. For example, based on the Mahalanobis metric, the second nearest neighbor of `Y(3,:)` is `X(128,:)`.