Documentation

### This is machine translation

Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

# crossval

Loss estimate using cross-validation

## Syntax

```vals = crossval(fun,X) vals = crossval(fun,X,Y,...) mse = crossval('mse',X,y,'Predfun',predfun) mcr = crossval('mcr',X,y,'Predfun',predfun) val = crossval(criterion,X1,X2,...,y,'Predfun',predfun) vals = crossval(...,'name',value) ```

## Description

`vals = crossval(fun,X)` performs 10-fold cross-validation for the function `fun`, applied to the data in `X`.

`fun` is a function handle to a function with two inputs, the training subset of `X`, `XTRAIN`, and the test subset of `X`, `XTEST`, as follows:

`testval = fun(XTRAIN,XTEST)`

Each time it is called, `fun` should use `XTRAIN` to fit a model, then return some criterion `testval` computed on `XTEST` using that fitted model.

`X` can be a column vector or a matrix. Rows of `X` correspond to observations; columns correspond to variables or features. Each row of `vals` contains the result of applying `fun` to one test set. If `testval` is a non-scalar value, `crossval` converts it to a row vector using linear indexing and stored in one row of `vals`.

`vals = crossval(fun,X,Y,...)` is used when data are stored in separate variables `X`, `Y`, ... . All variables (column vectors, matrices, or arrays) must have the same number of rows. `fun` is called with the training subsets of `X`, `Y`, ... , followed by the test subsets of `X`, `Y`, ... , as follows:

`testvals = fun(XTRAIN,YTRAIN,...,XTEST,YTEST,...)`

`mse = crossval('mse',X,y,'Predfun',predfun)` returns `mse`, a scalar containing a 10-fold cross-validation estimate of mean-squared error for the function `predfun`. `X` can be a column vector, matrix, or array of predictors. `y` is a column vector of response values. `X` and `y` must have the same number of rows.

`predfun` is a function handle called with the training subset of `X`, the training subset of `y`, and the test subset of `X` as follows:

```yfit = predfun(XTRAIN,ytrain,XTEST) ```

Each time it is called, `predfun` should use `XTRAIN` and `ytrain` to fit a regression model and then return fitted values in a column vector `yfit`. Each row of `yfit` contains the predicted values for the corresponding row of `XTEST`. `crossval` computes the squared errors between `yfit` and the corresponding response test set, and returns the overall mean across all test sets.

`mcr = crossval('mcr',X,y,'Predfun',predfun)` returns `mcr`, a scalar containing a 10-fold cross-validation estimate of misclassification rate (the proportion of misclassified samples) for the function `predfun`. The matrix `X` contains predictor values and the vector `y` contains class labels. `predfun` should use `XTRAIN` and `YTRAIN` to fit a classification model and return `yfit` as the predicted class labels for `XTEST`. `crossval` computes the number of misclassifications between `yfit` and the corresponding response test set, and returns the overall misclassification rate across all test sets.

`val = crossval(criterion,X1,X2,...,y,'Predfun',predfun)`, where `criterion` is `'mse'` or `'mcr'`, returns a cross-validation estimate of mean-squared error (for a regression model) or misclassification rate (for a classification model) with predictor values in `X1`, `X2`, ... and, respectively, response values or class labels in `y`. `X1`, `X2`, ... and `y` must have the same number of rows. `predfun` is a function handle called with the training subsets of `X1`, `X2`, ..., the training subset of `y`, and the test subsets of `X1`, `X2`, ..., as follows:

```yfit=predfun(X1TRAIN,X2TRAIN,...,ytrain,X1TEST,X2TEST,...) ```

`yfit` should be a column vector containing the fitted values.

`vals = crossval(...,'name',value)` specifies one or more optional parameter name/value pairs from the following table. Specify `name` inside single quotes.

NameValue
`holdout`

A scalar specifying the ratio or the number of observations `p` for holdout cross-validation. When `0` < `p` < `1`, approximately `p*n` observations for the test set are randomly selected. When `p` is an integer, `p` observations for the test set are randomly selected.

`kfold`

A positive integer that is greater than 1 specifying the number of folds `k` for `k`-fold cross-validation.

`leaveout`

Specifies leave-one-out cross-validation. The value must be `1`.

`mcreps`

A positive integer specifying the number of Monte-Carlo repetitions for validation. If the first input of `crossval` is `'mse'` or `'mcr'`, `crossval` returns the mean of mean-squared error or misclassification rate across all of the Monte-Carlo repetitions. Otherwise, `crossval` concatenates the values `vals` from all of the Monte-Carlo repetitions along the first dimension.

`partition`

An object `c` of the `cvpartition` class, specifying the cross-validation type and partition.

`stratify`

A column vector `group` specifying groups for stratification. Both training and test sets have roughly the same class proportions as in `group`. `NaN`s, empty character vectors, empty strings, `<missing>` values, and `<undefined>` values in `group` are treated as missing data values, and the corresponding rows of the data are ignored.

`options`

A structure that specifies whether to run in parallel, and specifies the random stream or streams. Create the `options` structure with `statset`. Option fields:

• `UseParallel` — Set to `true` to compute in parallel. Default is `false`.

You need Parallel Computing Toolbox™ for parallel computation.

• `UseSubstreams` — Set to `true` to compute in parallel in a reproducible fashion. Default is `false`. To compute reproducibly, set `Streams` to a type allowing substreams: `'mlfg6331_64'` or `'mrg32k3a'`.

• `Streams` — A `RandStream` object or cell array consisting of one such object. If you do not specify `Streams`, `crossval` uses the default stream.

Only one of `kfold`, `holdout`, `leaveout`, or `partition` can be specified, and `partition` cannot be specified with `stratify`. If both `partition` and `mcreps` are specified, the first Monte-Carlo repetition uses the partition information in the `cvpartition` object, and the `repartition` method is called to generate new partitions for each of the remaining repetitions. If no cross-validation type is specified, the default is 10-fold cross-validation.

### Note

When using cross-validation with classification algorithms, stratification is preferred. Otherwise, some test sets may not include observations from all classes.

## Examples

### Example 1

Compute mean-squared error for regression using 10-fold cross-validation:

```load('fisheriris'); y = meas(:,1); X = [ones(size(y,1),1),meas(:,2:4)]; regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN)); cvMse = crossval('mse',X,y,'predfun',regf) cvMse = 0.1015```

### Example 2

Compute misclassification rate using stratified 10-fold cross-validation:

```load('fisheriris'); y = species; X = meas; cp = cvpartition(y,'k',10); % Stratified cross-validation classf = @(XTRAIN, ytrain,XTEST)(classify(XTEST,XTRAIN,... ytrain)); cvMCR = crossval('mcr',X,y,'predfun',classf,'partition',cp) cvMCR = 0.0200```

### Example 3

Compute the confusion matrix using stratified 10-fold cross-validation:

```load('fisheriris'); y = species; X = meas; order = unique(y); % Order of the group labels cp = cvpartition(y,'k',10); % Stratified cross-validation f = @(xtr,ytr,xte,yte)confusionmat(yte,... classify(xte,xtr,ytr),'order',order); cfMat = crossval(f,X,y,'partition',cp); cfMat = reshape(sum(cfMat),3,3) cfMat = 50 0 0 0 48 2 0 1 49```

`cfMat` is the summation of 10 confusion matrices from 10 test sets.

## References

 Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. New York: Springer, 2001.

Download ebook