Classify data using nearest neighbor method
knnclassify
will be removed in a future
release. Instead use fitcknn
to
fit a knn
classification model and classify data
using the predict
function of ClassificationKNN
object.
Class
=
knnclassify(Sample
, Training
, Group
)Class
=
knnclassify(Sample
, Training
, Group
, k
)Class
=
knnclassify(Sample
, Training
, Group
, k
, distance
)Class
=
knnclassify(Sample
, Training
, Group
, k
, distance
, rule
)
Sample  Matrix whose rows will be classified
into groups. Sample must have the same
number of columns as Training . 
Training  Matrix used to group the rows in the
matrix Sample . Training must
have the same number of columns as Sample .
Each row of Training belongs to the group
whose value is the corresponding entry of Group . 
Group  Vector whose distinct values define the
grouping of the rows in Training . 
k  The number of nearest neighbors used
in the classification. Default is 1 . 
distance  Character vector specifying the distance
metric. Choices are:

rule  Character vector to specify the rule
used to decide how to classify the sample. Choices are:

classifies
the rows of the data matrix Class
=
knnclassify(Sample
, Training
, Group
)Sample
into
groups, based on the grouping of the rows of Training
. Sample
and Training
must
be matrices with the same number of columns. Group
is
a vector whose distinct values define the grouping of the rows in Training
.
Each row of Training
belongs to the group
whose value is the corresponding entry of Group
.
knnclassify
assigns each row of Sample
to
the group for the closest row of Training
. Group
can
be a numeric vector, a character vector, or a cell array of character
vectors. Training
and Group
must
have the same number of rows. knnclassify
treats NaN
s
or empty character vectors in Group
as
missing values, and ignores the corresponding rows of Training
. Class
indicates
which group each row of Sample
has been
assigned to, and is of the same type as Group
.
enables
you to specify Class
=
knnclassify(Sample
, Training
, Group
, k
)k
, the number of nearest neighbors
used in the classification. Default is 1
.
enables
you to specify the distance metric. Choices for Class
=
knnclassify(Sample
, Training
, Group
, k
, distance
)distance
are:
'euclidean'
— Euclidean
distance (default)
'cityblock'
— Sum of absolute
differences
'cosine'
— One minus the
cosine of the included angle between points (treated as vectors)
'correlation'
— One minus
the sample correlation between points (treated as sequences of values)
'hamming'
— Percentage of
bits that differ (suitable only for binary data)
enables you
to specify the rule used to decide how to classify the sample. Choices
for Class
=
knnclassify(Sample
, Training
, Group
, k
, distance
, rule
)rule
are:
'nearest'
— Majority rule
with nearest point tiebreak (default)
'random'
— Majority rule
with random point tiebreak
'consensus'
— Consensus
rule
The default behavior is to use majority rule. That is, a sample
point is assigned to the class the majority of the k
nearest
neighbors are from. Use 'consensus'
to require
a consensus, as opposed to majority rule. When using the 'consensus'
option,
points where not all of the k
nearest neighbors
are from the same class are not assigned to one of the classes. Instead
the output Class
for these points is NaN
for
numerical groups, ''
for stringnamed groups, or undefined
for
categorical groups. When classifying to more than two groups or when
using an even value for k
, it might be necessary
to break a tie in the number of nearest neighbors. Options are 'random'
,
which selects a random tiebreaker, and 'nearest'
,
which uses the nearest neighbor among the tied groups to break the
tie. The default behavior is majority rule, with nearest tiebreak.
Classifying Rows
The following example classifies the rows of the matrix sample
:
sample = [.9 .8;.1 .3;.2 .6] sample = 0.9000 0.8000 0.1000 0.3000 0.2000 0.6000 training=[0 0;.5 .5;1 1] training = 0 0 0.5000 0.5000 1.0000 1.0000 group = [1;2;3] group = 1 2 3 class = knnclassify(sample, training, group) class = 3 1 2
Row 1 of sample
is closest to row 3 of training
,
so class(1) = 3
. Row 2 of sample
is
closest to row 1 of training
, so class(2)
= 1
. Row 3 of sample
is closest to row
2 of training
, so class(3) = 2
.
Classifying Rows into One of Two Groups
The following example classifies each row of the data in sample
into
one of the two groups in training
. The following
commands create the matrix training
and the grouping
variable group
, and plot the rows of training
in
two groups.
training = [mvnrnd([ 1 1], eye(2), 100); ... mvnrnd([1 1], 2*eye(2), 100)]; group = [repmat(1,100,1); repmat(2,100,1)]; gscatter(training(:,1),training(:,2),group,'rb','+x'); legend('Training group 1', 'Training group 2'); hold on;
The following commands create the matrix sample
,
classify its rows into two groups, and plot the result.
sample = unifrnd(5, 5, 100, 2); % Classify the sample using the nearest neighbor classification c = knnclassify(sample, training, group); gscatter(sample(:,1),sample(:,2),c,'mc'); hold on; legend('Training group 1','Training group 2', ... 'Data in group 1','Data in group 2'); hold off;
Classifying Rows Using the Three Nearest Neighbors
The following example uses the same data as in Classifying Rows into One of Two Groups,
but classifies the rows of sample
using three nearest
neighbors instead of one.
gscatter(training(:,1),training(:,2),group,'rb','+x'); hold on; c3 = knnclassify(sample, training, group, 3); gscatter(sample(:,1),sample(:,2),c3,'mc','o'); legend('Training group 1','Training group 2','Data in group 1','Data in group 2');
If you compare this plot with the one in Classifying Rows into One of Two Groups, you see that some of the data points are classified differently using three nearest neighbors.
[1] Mitchell, T. (1997). Machine Learning, (McGrawHill).
classify
 classperf
 crossvalind
 fitcknn
 knnimpute
 svmclassify
 svmtrain