Loss estimate using crossvalidation
vals = crossval(fun,X)
vals = crossval(fun,X,Y,...)
mse = crossval('mse',X,y,'Predfun',predfun)
mcr = crossval('mcr',X,y,'Predfun',predfun)
val = crossval(criterion
,X1,X2,...,y,'Predfun',predfun)
vals = crossval(...,'name
',value
)
vals = crossval(fun,X)
performs 10fold crossvalidation for the function
fun
, applied to the data in X
.
fun
is a function handle to a function with
two inputs, the training subset of X
, XTRAIN
,
and the test subset of X
, XTEST
,
as follows:
testval = fun(XTRAIN,XTEST)
Each time it is called, fun
should use XTRAIN
to
fit a model, then return some criterion testval
computed
on XTEST
using that fitted model.
X
can be a column vector or a matrix. Rows
of X
correspond to observations; columns correspond
to variables or features. Each row of vals
contains
the result of applying fun
to one test set. If testval
is
a nonscalar value, crossval
converts it to a row
vector using linear indexing and stored in one row of vals
.
vals = crossval(fun,X,Y,...)
is used when
data are stored in separate variables X
, Y
,
... . All variables (column vectors, matrices, or arrays) must have
the same number of rows. fun
is called with the
training subsets of X
, Y
, ...
, followed by the test subsets of X
, Y
,
... , as follows:
testvals = fun(XTRAIN,YTRAIN,...,XTEST,YTEST,...)
mse = crossval('mse',X,y,'Predfun',predfun)
returns
mse
, a scalar containing a 10fold crossvalidation estimate of
meansquared error for the function predfun
. X
can
be a column vector, matrix, or array of predictors. y
is a column
vector of response values. X
and y
must have the
same number of rows.
predfun
is a function handle called with
the training subset of X
, the training subset of y
,
and the test subset of X
as follows:
yfit = predfun(XTRAIN,ytrain,XTEST)
Each time it is called, predfun
should use XTRAIN
and ytrain
to
fit a regression model and then return fitted values in a column vector
yfit
. Each row of yfit
contains
the predicted values for the corresponding row of XTEST
. crossval
computes
the squared errors between yfit
and the corresponding
response test set, and returns the overall mean across all test sets.
mcr = crossval('mcr',X,y,'Predfun',predfun)
returns
mcr
, a scalar containing a 10fold crossvalidation estimate of
misclassification rate (the proportion of misclassified samples) for the function
predfun
. The matrix X
contains predictor
values and the vector y
contains class labels.
predfun
should use XTRAIN
and
YTRAIN
to fit a classification model and return
yfit
as the predicted class labels for XTEST
.
crossval
computes the number of misclassifications between
yfit
and the corresponding response test set, and returns the
overall misclassification rate across all test sets.
val = crossval(
,
where criterion
,X1,X2,...,y,'Predfun',predfun)criterion
is 'mse'
or
'mcr'
, returns a crossvalidation estimate of meansquared error
(for a regression model) or misclassification rate (for a classification model) with
predictor values in X1
, X2
, ... and, respectively,
response values or class labels in y
. X1
,
X2
, ... and y
must have the same number of
rows. predfun
is a function handle called with the training subsets
of X1
, X2
, ..., the training subset of
y
, and the test subsets of X1
,
X2
, ..., as follows:
yfit=predfun(X1TRAIN,X2TRAIN,...,ytrain,X1TEST,X2TEST,...)
yfit
should be a column vector containing
the fitted values.
vals = crossval(...,'
specifies
one or more optional parameter name/value pairs from the following
table. Specify name
',value
)name
inside single quotes.
Name  Value 

holdout  A scalar specifying the ratio or the number of observations 
kfold  A positive integer that is greater than 1 specifying the number of folds

leaveout  Specifies leaveoneout crossvalidation. The value must be

mcreps  A positive integer specifying the number of MonteCarlo repetitions for validation. If the
first input of 
partition  An object 
stratify  A column vector 
options  A structure that specifies whether to run in parallel,
and specifies the random stream or streams. Create the

Only one of kfold
, holdout
, leaveout
,
or partition
can be specified, and partition
cannot be specified with stratify
. If both
partition
and mcreps
are specified, the first
MonteCarlo repetition uses the partition information in the
cvpartition
object, and the repartition
method is called to generate new partitions for each of the
remaining repetitions. If no crossvalidation type is specified, the default is 10fold
crossvalidation.
When using crossvalidation with classification algorithms, stratification is preferred. Otherwise, some test sets may not include observations from all classes.
Compute meansquared error for regression using 10fold crossvalidation:
load('fisheriris'); y = meas(:,1); X = [ones(size(y,1),1),meas(:,2:4)]; regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN)); cvMse = crossval('mse',X,y,'predfun',regf) cvMse = 0.1015
Compute misclassification rate using stratified 10fold crossvalidation:
load('fisheriris'); y = species; X = meas; cp = cvpartition(y,'k',10); % Stratified crossvalidation classf = @(XTRAIN, ytrain,XTEST)(classify(XTEST,XTRAIN,... ytrain)); cvMCR = crossval('mcr',X,y,'predfun',classf,'partition',cp) cvMCR = 0.0200
Compute the confusion matrix using stratified 10fold crossvalidation:
load('fisheriris'); y = species; X = meas; order = unique(y); % Order of the group labels cp = cvpartition(y,'k',10); % Stratified crossvalidation f = @(xtr,ytr,xte,yte)confusionmat(yte,... classify(xte,xtr,ytr),'order',order); cfMat = crossval(f,X,y,'partition',cp); cfMat = reshape(sum(cfMat),3,3) cfMat = 50 0 0 0 48 2 0 1 49
cfMat
is the summation of 10 confusion matrices
from 10 test sets.
[1] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. New York: Springer, 2001.