Generate cross-validation indices
Indices
=
crossvalind('Kfold', N, K)
[Train, Test] = crossvalind('HoldOut',
N, P)
[Train, Test] = crossvalind('LeaveMOut',
N, M)
[Train, Test] = crossvalind('Resubstitution',
N, [P,Q])
[...] = crossvalind(Method, Group,
...)
[...] = crossvalind(Method, Group,
..., 'Classes', C)
[...] = crossvalind(Method, Group,
..., 'Min', MinValue)
returns randomly generated indices
for a K-fold cross-validation of Indices
=
crossvalind('Kfold', N, K)N
observations. Indices
contains
equal (or approximately equal) proportions of the integers 1
through K
that
define a partition of the N
observations into K
disjoint
subsets. Repeated calls return different randomly generated partitions. K
defaults
to 5
when omitted. In K-fold cross-validation, K-1
folds
are used for training and the last fold is used for evaluation. This
process is repeated K
times, leaving one different
fold for evaluation each time.
[Train, Test] = crossvalind('HoldOut',
N, P)
returns logical index vectors for cross-validation
of N
observations by randomly selecting P*N
(approximately)
observations to hold out for the evaluation set. P
must
be a scalar between 0
and 1
.
P defaults to 0.5
when omitted, corresponding to
holding 50%
out. Using holdout cross-validation
within a loop is similar to K-fold cross-validation one time outside
the loop, except that non-disjointed subsets are assigned to each
evaluation.
[Train, Test] = crossvalind('LeaveMOut',
N, M)
, where M
is an integer, returns
logical index vectors for cross-validation of N
observations
by randomly selecting M
of the observations to
hold out for the evaluation set. M
defaults to 1
when
omitted. Using 'LeaveMOut'
cross-validation within
a loop does not guarantee disjointed evaluation sets. To guarantee
disjointed evaluation sets, use 'Kfold'
instead.
[Train, Test] = crossvalind('Resubstitution',
N, [P,Q])
returns logical index vectors of indices for
cross-validation of N
observations by randomly
selecting P*N
observations for the evaluation set
and Q*N
observations for training. Sets are selected
in order to minimize the number of observations that are used in both
sets. P
and Q
are scalars between 0
and 1
. Q=1-P
corresponds
to holding out (100*P)%
, while P=Q=1
corresponds
to full resubstitution. [P,Q]
defaults to [1,1]
when
omitted.
[...] = crossvalind(Method, Group,
...)
takes the group structure of the data into account. Group
is
a grouping vector that defines the class for each observation. Group
can
be a numeric vector, a character vector, or a cell array of character
vectors. The partition of the groups depends on the type of cross-validation:
For K-fold, each group is divided into K
subsets,
approximately equal in size. For all others, approximately equal numbers
of observations from each group are selected for the evaluation set.
In both cases the training set contains at least one observation from
each group.
[...] = crossvalind(Method, Group,
..., 'Classes', C)
restricts the observations to only those
values specified in C
. C
can
be a numeric vector, a character vector, or a cell array of character
vectors, but it is of the same form as Group
. If
one output argument is specified, it contains the value 0
for
observations belonging to excluded classes. If two output arguments
are specified, both will contain the logical value false for observations
belonging to excluded classes.
[...] = crossvalind(Method, Group,
..., 'Min', MinValue)
sets the minimum number of observations
that each group has in the training set. Min
defaults
to 1
. Setting a large value for Min
can
help to balance the training groups, but adds partial resubstitution
when there are not enough observations. You cannot set Min
when
using K-fold cross-validation.
Note:
The |
Create a 10-fold cross-validation to compute classification error.
load fisheriris indices = crossvalind('Kfold',species,10); cp = classperf(species); for i = 1:10 test = (indices == i); train = ~test; class = classify(meas(test,:),meas(train,:),species(train,:)); classperf(cp,class,test) end cp.ErrorRate ans = 0.0200
Approximate a leave-one-out prediction error estimate.
load carbig x = Displacement; y = Acceleration; N = length(x); sse = 0; for i = 1:100 [train,test] = crossvalind('LeaveMOut',N,1); yhat = polyval(polyfit(x(train),y(train),2),x(test)); sse = sse + sum((yhat - y(test)).^2); end CVerr = sse / 100 CVerr = 4.9750
Divide cancer data 60/40 without using the 'Benign'
observations.
Assume groups are the true labels of the observations.
labels = {'Cancer','Benign','Control'}; groups = labels(ceil(rand(100,1)*3)); [train,test] = crossvalind('holdout',groups,0.6,'classes',... {'Control','Cancer'}); sum(test) % Total groups allocated for testing ans = 35 sum(train) % Total groups allocated for training ans = 26