t = templateKNN() returns
a k-nearest neighbor (KNN) learner template suitable
for training ensembles or error-correcting output code (ECOC) multiclass
models.

If you specify a default template, then the software uses default
values for all input arguments during training.

t = templateKNN(Name,Value) creates
a template with additional options specified by one or more name-value
pair arguments.

For example, you can specify the nearest neighbor search method,
the number of nearest neighbors to find, or the distance metric.

If you display t in the Command Window, then
all options appear empty ([]), except those that
you specify using name-value pair arguments. During training, the
software uses default values for empty options.

All properties of the template object are empty except for NumNeighbors, Method, StandardizeData, and Type. When you specify t as a learner, the software fills in the empty properties with their respective default values.

Specify t as a weak learner for a classification ensemble.

Mdl = fitensemble(meas,species,'Subspace',100,t);

Display the in-sample (resubstitution) misclassification error.

All properties of the template object are empty except for NumNeighbors, Method, StandardizeData, and Type. When you specify t as a learner, the software fills in the empty properties with their respective default values.

Specify t as a binary learner for an ECOC multiclass model.

Mdl = fitcecoc(meas,species,'Learners',t);

By default, the software trains Mdl using the one-versus-one coding design.

Display the in-sample (resubstitution) misclassification error.

Specify optional comma-separated pairs of Name,Value arguments.
Name is the argument
name and Value is the corresponding
value. Name must appear
inside single quotes (' ').
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'NumNeighbors',4,'Distance','minkowski' specifies
a 4-nearest neighbor classifier template using the Minkowski distance
measure.

Tie-breaking algorithm used by the predict method
if multiple classes have the same smallest cost, specified as the
comma-separated pair consisting of 'BreakTies' and
one of the following:

'smallest' — Use the smallest
index among tied groups.

'nearest' — Use the class
with the nearest neighbor among tied groups.

'random' — Use a random
tiebreaker among tied groups.

By default, ties occur when multiple classes have the same number
of nearest points among the K nearest neighbors.

Maximum number of data points in the leaf node of the kd-tree,
specified as the comma-separated pair consisting of 'BucketSize' and
a positive integer value. This argument is meaningful only when NSMethod is 'kdtree'.

Covariance matrix, specified as the comma-separated pair consisting
of 'Cov' and a positive definite matrix of scalar
values representing the covariance matrix when computing the Mahalanobis
distance. This argument is only valid when 'Distance' is 'mahalanobis'.

You cannot simultaneously specify 'Standardize' and
either of 'Scale' or 'Cov'.

Distance metric, specified as the comma-separated pair consisting
of 'Distance' and a valid distance metric string
or function handle. The allowable strings depend on the NSMethod parameter,
which you set in fitcknn, and which exists as
a field in ModelParameters. If you specify CategoricalPredictors as 'all',
then the default distance metric is 'hamming'.
Otherwise, the default distance metric is 'euclidean'.

One minus the sample linear correlation between observations
(treated as sequences of values).

'cosine'

One minus the cosine of the included angle between observations
(treated as vectors).

'euclidean'

Euclidean distance.

'hamming'

Hamming distance, percentage of coordinates that differ.

'jaccard'

One minus the Jaccard coefficient, the percentage of nonzero
coordinates that differ.

'mahalanobis'

Mahalanobis distance, computed using a positive definite covariance
matrix C. The default value of C is
the sample covariance matrix of X, as computed
by nancov(X). To specify a different value for C,
use the 'Cov' name-value pair argument.

'minkowski'

Minkowski distance. The default exponent is 2.
To specify a different exponent, use the 'Exponent' name-value
pair argument.

'seuclidean'

Standardized Euclidean distance. Each coordinate difference
between X and a query point is scaled, meaning
divided by a scale value S. The default value of S is
the standard deviation computed from X, S = nanstd(X). To specify another
value for S, use the Scale name-value
pair argument.

'spearman'

One minus the sample Spearman's rank correlation between observations
(treated as sequences of values).

@distfun

Distance function handle. distfun has
the form

function D2 = DISTFUN(ZI,ZJ)
% calculation of distance
...

where

ZI is a 1-by-N vector
containing one row of X or y.

ZJ is an M2-by-N matrix
containing multiple rows of X or y.

D2 is an M2-by-1 vector
of distances, and D2(k) is the distance between
observations ZI and ZJ(J,:).

Distance weighting function, specified as the comma-separated
pair consisting of 'DistanceWeight' and either
a function handle or one of the following strings specifying the distance
weighting function.

DistanceWeight

Meaning

'equal'

No weighting

'inverse'

Weight is 1/distance

'squaredinverse'

Weight is 1/distance^{2}

@fcn

fcn is a function that accepts a
matrix of nonnegative distances, and returns a matrix the same size
containing nonnegative distance weights. For example, 'squaredinverse' is
equivalent to @(d)d.^(-2).

Minkowski distance exponent, specified as the comma-separated
pair consisting of 'Exponent' and a positive scalar
value. This argument is only valid when 'Distance' is 'minkowski'.

Tie inclusion flag, specified as the comma-separated pair consisting
of 'IncludeTies' and a logical value indicating
whether predict includes all the neighbors whose
distance values are equal to the Kth smallest distance.
If IncludeTies is true, predict includes
all these neighbors. Otherwise, predict uses exactly K neighbors.

Nearest neighbor search method, specified as the comma-separated
pair consisting of 'NSMethod' and 'kdtree' or 'exhaustive'.

'kdtree' — Create and use
a kd-tree to find nearest neighbors. 'kdtree' is
valid when the distance metric is one of the following:

'euclidean'

'cityblock'

'minkowski'

'chebyshev'

'exhaustive' — Use the exhaustive
search algorithm. The distance values from all points in X to
each point in y are computed to find nearest
neighbors.

The default is 'kdtree' when X has 10 or
fewer columns, X is not sparse, and the distance
metric is a 'kdtree' type; otherwise, 'exhaustive'.

Number of nearest neighbors in X to find
for classifying each point when predicting, specified as the comma-separated
pair consisting of 'NumNeighbors' and a positive
integer value.

Distance scale, specified as the comma-separated pair consisting
of 'Scale' and a vector containing nonnegative
scalar values with length equal to the number of columns in X.
Each coordinate difference between X and a query
point is scaled by the corresponding element of Scale.
This argument is only valid when 'Distance' is 'seuclidean'.

You cannot simultaneously specify 'Standardize' and
either of 'Scale' or 'Cov'.

Flag to standardize the predictors, specified as the comma-separated
pair consisting of 'Standardize' and true (1)
or false(0).

If you set 'Standardize',true, then the software
centers and scales each column of the predictor data (X)
by the column mean and standard deviation, respectively.

The software does not standardize categorical predictors, and
throws an error if all predictors are categorical.

You cannot simultaneously specify 'Standardize',1 and
either of 'Scale' or 'Cov'.

It is good practice to standardize the predictor data.

kNN classification template suitable for
training ensembles or error-correcting output code (ECOC) multiclass
models, returned as a template object. Pass t to fitensemble or fitcecoc to
specify how to create the KNN classifier for the ensemble or ECOC
model, respectively.

If you display t to the Command Window, then
all, unspecified options appear empty ([]). However,
the software replaces empty options with their corresponding default
values during training.