crossval - Loss estimate using cross-validation

Syntax

loss = crossval(fun,X)
loss = crossval(fun,X,Y,...)
loss = crossval(...,param1,val1,param2,val2,...)

Description

loss = crossval(fun,X) computes 10-fold cross-validation loss for the function fun applied to the data in X.

fun is a function handle to a function with two inputs, the training subset of X, XTRAIN, and the test subset of X, XTEST, as follows:

testloss = fun(XTRAIN,XTEST)

fun returns testloss, the distance, or loss, computed on the test subset using the model fit to the training subset.

X can be a column vector or a matrix. Rows of X correspond to observations; columns correspond to variables or features. Each row of loss contains the loss value for one test set. If testloss is a matrix or array, it is converted to a row vector in linear indexing order for storage in one row of loss.

Typical loss measures include the mean-squared error for regression and misclassification rate for classification.

loss = crossval(fun,X,Y,...) is used when data are stored in separate variables X, Y, ... . All variables (column vectors, matrices, or arrays) must have the same number of rows. fun is called with the training subsets of X, Y, ... , followed by the test subsets of X, Y, ... , as follows:

testloss = fun(XTRAIN,YTRAIN,...,XTEST,YTEST,...)

loss = crossval(...,param1,val1,param2,val2,...) specifies optional parameter name/value pairs from the following table:

NameValue
'holdout'

A scalar specifying the ratio or the number of observations p for holdout cross-validation. When 0 < p < 1, approximately p*n observations for the test set are randomly selected. When p is an integer, p observations for the test set are randomly selected.

'kfold'

A scalar specifying the number of folds k for k-fold cross-validation.

'leaveout'

Specifies leave-one-out cross-validation. The value must be 1.

'mcreps'

A positive integer specifying the number of Monte-Carlo repetitions. The values of loss from all repetitions are concatenated along the first dimension.

'partition'

An object c of the @cvpartition class, specifying the cross-validation type and partition.

'stratify'

A column vector group specifying groups for stratification. Both training and test sets have roughly the same class proportions as in group. NaNs or empty strings in group are treated as missing values, and the corresponding rows of X, Y, ... are ignored.

Only one of 'kfold', 'holdout', 'leaveout', or 'partition' can be specified, and 'partition' cannot be specified with 'stratify'. If both 'partition' and 'mcreps' are specified, the first Monte-Carlo repetition uses the partition information in the cvpartition object, and the repartition method is called to generate new partitions for each of the remaining repetitions. If no cross-validation type is specified, the default is 10-fold cross-validation.

Examples

Example 1

Use 10-fold cross-validation to compute mean-squared error for regression:

load('fisheriris');
y = meas(:,1);
x = [ones(size(y,1),1),meas(:,2:4)];

fun = @(xT,yT,xt,yt)(norm(yt-xt*regress(yT,xT)).^2);
SSE = crossval(fun,x,y);

MSE = sum(SSE)/length(y)
MSE =
    0.1015

Example 2

Use stratified 10-fold cross-validation to compute misclassification rate:

load fisheriris;
y = species;
c = cvpartition(y,'k',10);

fun = @(xT,yT,xt,yt)(sum(~strcmp(yt,classify(xt,xT,yT))));

rate = sum(crossval(fun,meas,y,'partition',c))...
           /sum(c.TestSize)
rate =
    0.0200

Reference

[1] Hastie, T. Tibshirani, R, and Friedman, J., The Elements of Statistical Learning, Springer, 2001.

See Also

cvpartition

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS