# kfoldLoss

Classification loss for observations not used in training

## Description

returns
the cross-validated classification
losses obtained by the cross-validated, binary, linear classification
model `L`

= kfoldLoss(`CVMdl`

)`CVMdl`

. That is, for every fold, `kfoldLoss`

estimates
the classification loss for observations that it holds out when it
trains using all other observations.

`L`

contains a classification loss for each
regularization strength in the linear classification models that compose `CVMdl`

.

uses
additional options specified by one or more `L`

= kfoldLoss(`CVMdl`

,`Name,Value`

)`Name,Value`

pair
arguments. For example, indicate which folds to use for the loss calculation
or specify the classification-loss function.

## Input Arguments

`CVMdl`

— Cross-validated, binary, linear classification model

`ClassificationPartitionedLinear`

model object

Cross-validated, binary, linear classification model, specified as a `ClassificationPartitionedLinear`

model object. You can create a
`ClassificationPartitionedLinear`

model using `fitclinear`

and specifying any one of the cross-validation, name-value
pair arguments, for example, `CrossVal`

.

To obtain estimates, kfoldLoss applies the same data used to cross-validate the linear
classification model (`X`

and `Y`

).

### Name-Value Arguments

Specify optional
comma-separated pairs of `Name,Value`

arguments. `Name`

is
the argument name and `Value`

is the corresponding value.
`Name`

must appear inside quotes. You can specify several name and value
pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

`Folds`

— Fold indices to use for classification-score prediction

`1:CVMdl.KFold`

(default) | numeric vector of positive integers

Fold indices to use for classification-score prediction, specified
as the comma-separated pair consisting of `'Folds'`

and
a numeric vector of positive integers. The elements of `Folds`

must
range from `1`

through `CVMdl.KFold`

.

**Example: **`'Folds',[1 4 10]`

**Data Types: **`single`

| `double`

`LossFun`

— Loss function

`'classiferror'`

(default) | `'binodeviance'`

| `'exponential'`

| `'hinge'`

| `'logit'`

| `'mincost'`

| `'quadratic'`

| function handle

Loss function, specified as the comma-separated pair consisting
of `'LossFun'`

and a built-in, loss-function name
or function handle.

The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar.

Value Description `'binodeviance'`

Binomial deviance `'classiferror'`

Misclassified rate in decimal `'exponential'`

Exponential loss `'hinge'`

Hinge loss `'logit'`

Logistic loss `'mincost'`

Minimal expected misclassification cost (for classification scores that are posterior probabilities) `'quadratic'`

Quadratic loss `'mincost'`

is appropriate for classification scores that are posterior probabilities. For linear classification models, logistic regression learners return posterior probabilities as classification scores by default, but SVM learners do not (see`predict`

).Specify your own function using function handle notation.

Let

`n`

be the number of observations in`X`

and`K`

be the number of distinct classes (`numel(Mdl.ClassNames)`

,`Mdl`

is the input model). Your function must have this signaturewhere:`lossvalue =`

(C,S,W,Cost)`lossfun`

The output argument

`lossvalue`

is a scalar.You choose the function name (

).`lossfun`

`C`

is an`n`

-by-`K`

logical matrix with rows indicating which class the corresponding observation belongs. The column order corresponds to the class order in`Mdl.ClassNames`

.Construct

`C`

by setting`C(p,q) = 1`

if observation`p`

is in class`q`

, for each row. Set all other elements of row`p`

to`0`

.`S`

is an`n`

-by-`K`

numeric matrix of classification scores. The column order corresponds to the class order in`Mdl.ClassNames`

.`S`

is a matrix of classification scores, similar to the output of`predict`

.`W`

is an`n`

-by-1 numeric vector of observation weights. If you pass`W`

, the software normalizes them to sum to`1`

.`Cost`

is a*K*-by-`K`

numeric matrix of misclassification costs. For example,`Cost = ones(K) - eye(K)`

specifies a cost of`0`

for correct classification, and`1`

for misclassification.

Specify your function using

`'LossFun',@`

.`lossfun`

**Data Types: **`char`

| `string`

| `function_handle`

`Mode`

— Loss aggregation level

`'average'`

(default) | `'individual'`

Loss aggregation level, specified as the comma-separated pair
consisting of `'Mode'`

and `'average'`

or `'individual'`

.

Value | Description |
---|---|

`'average'` | Returns losses averaged over all folds |

`'individual'` | Returns losses for each fold |

**Example: **`'Mode','individual'`

## Output Arguments

`L`

— Cross-validated classification losses

numeric scalar | numeric vector | numeric matrix

Cross-validated classification losses, returned
as a numeric scalar, vector, or matrix. The interpretation of `L`

depends
on `LossFun`

.

Let * R* be the number of regularizations strengths is the
cross-validated models (stored in

`numel(CVMdl.Trained{1}.Lambda)`

) and
*be the number of folds (stored in*

`F`

`CVMdl.KFold`

).If

`Mode`

is`'average'`

, then`L`

is a 1-by-vector.`R`

`L(`

is the average classification loss over all folds of the cross-validated model that uses regularization strength)`j`

.`j`

Otherwise,

`L`

is an-by-`F`

matrix.`R`

`L(`

is the classification loss for fold,`i`

)`j`

of the cross-validated model that uses regularization strength`i`

.`j`

To estimate `L`

,
`kfoldLoss`

uses the data that created
`CVMdl`

(see `X`

and `Y`

).

## Examples

### Estimate *k*-Fold Cross-Validation Classification Error

Load the NLP data set.

`load nlpdata`

`X`

is a sparse matrix of predictor data, and `Y`

is a categorical vector of class labels. There are more than two classes in the data.

The models should identify whether the word counts in a web page are from the Statistics and Machine Learning Toolbox™ documentation. So, identify the labels that correspond to the Statistics and Machine Learning Toolbox™ documentation web pages.

`Ystats = Y == 'stats';`

Cross-validate a binary, linear classification model that can identify whether the word counts in a documentation web page are from the Statistics and Machine Learning Toolbox™ documentation.

rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'CrossVal','on');

`CVMdl`

is a `ClassificationPartitionedLinear`

model. By default, the software implements 10-fold cross validation. You can alter the number of folds using the `'KFold'`

name-value pair argument.

Estimate the average of the out-of-fold, classification error rates.

ce = kfoldLoss(CVMdl)

ce = 7.6017e-04

Alternatively, you can obtain the per-fold classification error rates by specifying the name-value pair `'Mode','individual'`

in `kfoldLoss`

.

### Specify Custom Classification Loss

Load the NLP data set. Preprocess the data as in Estimate k-Fold Cross-Validation Classification Error, and transpose the predictor data.

load nlpdata Ystats = Y == 'stats'; X = X';

Cross-validate a binary, linear classification model using 5-fold cross-validation. Optimize the objective function using SpaRSA. Specify that the predictor observations correspond to columns.

rng(1); % For reproducibility CVMdl = fitclinear(X,Ystats,'Solver','sparsa','KFold',5,... 'ObservationsIn','columns'); CMdl = CVMdl.Trained{1};

`CVMdl`

is a `ClassificationPartitionedLinear`

model. It contains the property `Trained`

, which is a 5-by-1 cell array holding a `ClassificationLinear`

models that the software trained using the training set of each fold.

Create an anonymous function that measures linear loss, that is,

$$L=\frac{\sum _{j}-{w}_{j}{y}_{j}{f}_{j}}{\sum _{j}{w}_{j}}.$$

$${w}_{j}$$ is the weight for observation *j*, y_j is response *j* (-1 for the negative class, and 1 otherwise), and f_j is the raw classification score of observation *j*. Custom loss functions must be written in a particular form. For rules on writing a custom loss function, see the `LossFun`

name-value pair argument. Because the function does not use classification cost, use `~`

to have `kfoldLoss`

ignore its position.

linearloss = @(C,S,W,~)sum(-W.*sum(S.*C,2))/sum(W);

Estimate the average cross-validated classification loss using the linear loss function. Also, obtain the loss for each fold.

`ce = kfoldLoss(CVMdl,'LossFun',linearloss)`

ce = -8.0982

ceFold = kfoldLoss(CVMdl,'LossFun',linearloss,'Mode','individual')

`ceFold = `*5×1*
-8.3165
-8.7633
-7.4342
-8.0423
-7.9347

### Find Good Lasso Penalty Using *k*-fold Classification Loss

To determine a good lasso-penalty strength for a linear classification model that uses a logistic regression learner, compare test-sample classification error rates.

Load the NLP data set. Preprocess the data as in Specify Custom Classification Loss.

load nlpdata Ystats = Y == 'stats'; X = X';

Create a set of 11 logarithmically-spaced regularization strengths from $$1{0}^{-6}$$ through $$1{0}^{0.5}$$.

Lambda = logspace(-6,-0.5,11);

Cross-validate binary, linear classification models using 5-fold cross-validation, and that use each of the regularization strengths. Optimize the objective function using SpaRSA. Lower the tolerance on the gradient of the objective function to `1e-8`

.

rng(10); % For reproducibility CVMdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'KFold',5,'Learner','logistic','Solver','sparsa',... 'Regularization','lasso','Lambda',Lambda,'GradientTolerance',1e-8)

CVMdl = ClassificationPartitionedLinear CrossValidatedModel: 'Linear' ResponseName: 'Y' NumObservations: 31572 KFold: 5 Partition: [1x1 cvpartition] ClassNames: [0 1] ScoreTransform: 'none' Properties, Methods

Extract a trained linear classification model.

Mdl1 = CVMdl.Trained{1}

Mdl1 = ClassificationLinear ResponseName: 'Y' ClassNames: [0 1] ScoreTransform: 'logit' Beta: [34023x11 double] Bias: [-13.3808 -13.3808 -13.3808 -13.3808 -13.3808 ... ] Lambda: [1.0000e-06 3.5481e-06 1.2589e-05 4.4668e-05 ... ] Learner: 'logistic' Properties, Methods

`Mdl1`

is a `ClassificationLinear`

model object. Because `Lambda`

is a sequence of regularization strengths, you can think of `Mdl`

as 11 models, one for each regularization strength in `Lambda`

.

Estimate the cross-validated classification error.

ce = kfoldLoss(CVMdl);

Because there are 11 regularization strengths, `ce`

is a 1-by-11 vector of classification error rates.

Higher values of `Lambda`

lead to predictor variable sparsity, which is a good quality of a classifier. For each regularization strength, train a linear classification model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model.

Mdl = fitclinear(X,Ystats,'ObservationsIn','columns',... 'Learner','logistic','Solver','sparsa','Regularization','lasso',... 'Lambda',Lambda,'GradientTolerance',1e-8); numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the cross-validated, classification error rates and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.

figure; [h,hL1,hL2] = plotyy(log10(Lambda),log10(ce),... log10(Lambda),log10(numNZCoeff)); hL1.Marker = 'o'; hL2.Marker = 'o'; ylabel(h(1),'log_{10} classification error') ylabel(h(2),'log_{10} nonzero-coefficient frequency') xlabel('log_{10} Lambda') title('Test-Sample Statistics') hold off

Choose the indexes of the regularization strength that balances predictor variable sparsity and low classification error. In this case, a value between $$1{0}^{-4}$$ to $$1{0}^{-1}$$ should suffice.

idxFinal = 7;

Select the model from `Mdl`

with the chosen regularization strength.

MdlFinal = selectModels(Mdl,idxFinal);

`MdlFinal`

is a `ClassificationLinear`

model containing one regularization strength. To estimate labels for new observations, pass `MdlFinal`

and the new data to `predict`

.

## More About

### Classification Loss

*Classification loss* functions measure the predictive
inaccuracy of classification models. When you compare the same type of loss among many
models, a lower loss indicates a better predictive model.

Consider the following scenario.

*L*is the weighted average classification loss.*n*is the sample size.For binary classification:

*y*is the observed class label. The software codes it as –1 or 1, indicating the negative or positive class (or the first or second class in the_{j}`ClassNames`

property), respectively.*f*(*X*) is the positive-class classification score for observation (row)_{j}*j*of the predictor data*X*.*m*=_{j}*y*_{j}*f*(*X*) is the classification score for classifying observation_{j}*j*into the class corresponding to*y*. Positive values of_{j}*m*indicate correct classification and do not contribute much to the average loss. Negative values of_{j}*m*indicate incorrect classification and contribute significantly to the average loss._{j}

For algorithms that support multiclass classification (that is,

*K*≥ 3):*y*is a vector of_{j}^{*}*K*– 1 zeros, with 1 in the position corresponding to the true, observed class*y*. For example, if the true class of the second observation is the third class and_{j}*K*= 4, then*y*_{2}^{*}= [0 0 1 0]′. The order of the classes corresponds to the order in the`ClassNames`

property of the input model.*f*(*X*) is the length_{j}*K*vector of class scores for observation*j*of the predictor data*X*. The order of the scores corresponds to the order of the classes in the`ClassNames`

property of the input model.*m*=_{j}*y*_{j}^{*}′*f*(*X*). Therefore,_{j}*m*is the scalar classification score that the model predicts for the true, observed class._{j}

The weight for observation

*j*is*w*. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore,_{j}$$\sum _{j=1}^{n}{w}_{j}}=1.$$

Given this scenario, the following table describes the supported loss
functions that you can specify by using the `'LossFun'`

name-value pair
argument.

Loss Function | Value of `LossFun` | Equation |
---|---|---|

Binomial deviance | `'binodeviance'` | $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left\{1+\mathrm{exp}\left[-2{m}_{j}\right]\right\}}.$$ |

Misclassified rate in decimal | `'classiferror'` | $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}}I\left\{{\widehat{y}}_{j}\ne {y}_{j}\right\}.$$ $${\widehat{y}}_{j}$$ is the class label corresponding to the class with the
maximal score. |

Cross-entropy loss | `'crossentropy'` |
The weighted cross-entropy loss is $$L=-{\displaystyle \sum _{j=1}^{n}\frac{{\tilde{w}}_{j}\mathrm{log}({m}_{j})}{Kn}},$$ where the weights $${\tilde{w}}_{j}$$ are normalized to sum to |

Exponential loss | `'exponential'` | $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{exp}\left(-{m}_{j}\right)}.$$ |

Hinge loss | `'hinge'` | $$L={\displaystyle \sum}_{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1-{m}_{j}\right\}.$$ |

Logit loss | `'logit'` | $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left(1+\mathrm{exp}\left(-{m}_{j}\right)\right)}.$$ |

Minimal expected misclassification cost | `'mincost'` |
The software computes
the weighted minimal expected classification cost using this procedure
for observations Estimate the expected misclassification cost of classifying the observation *X*into the class_{j}*k*:$${\gamma}_{jk}={\left(f{\left({X}_{j}\right)}^{\prime}C\right)}_{k}.$$ *f*(*X*) is the column vector of class posterior probabilities for binary and multiclass classification for the observation_{j}*X*._{j}*C*is the cost matrix stored in the`Cost` property of the model.For observation *j*, predict the class label corresponding to the minimal expected misclassification cost:$${\widehat{y}}_{j}=\underset{k=1,\mathrm{...},K}{\text{argmin}}{\gamma}_{jk}.$$ Using *C*, identify the cost incurred (*c*) for making the prediction._{j}
The weighted average of the minimal expected misclassification cost loss is $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{c}_{j}}.$$ If you use the default cost matrix (whose element
value is 0 for correct classification and 1 for incorrect
classification), then the |

Quadratic loss | `'quadratic'` | $$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{\left(1-{m}_{j}\right)}^{2}}.$$ |

This figure compares the loss functions (except `'crossentropy'`

and
`'mincost'`

) over the score *m* for one observation.
Some functions are normalized to pass through the point (0,1).

## See Also

`ClassificationPartitionedLinear`

| `ClassificationLinear`

| `kfoldPredict`

| `loss`

**Introduced in R2016a**

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

# Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)