# kfoldMargin

Classification margins for observations not used for training

## Syntax

`M = kfoldMargin(obj)`

## Description

`M = kfoldMargin(obj)` returns classification margins obtained by cross-validated classification model `obj`. For every fold, this method computes classification margins for in-fold observations using a model trained on out-of-fold observations.

## Input Arguments

 `obj` A partitioned classification model of type `ClassificationPartitionedModel` or `ClassificationPartitionedEnsemble`.

## Output Arguments

 `M` The classification margin.

## Definitions

### Margin

The classification margin is the difference between the classification score for the true class and maximal classification score for the false classes.

The classification margin is a column vector with the same number of rows as in the matrix `X`. A high value of margin indicates a more reliable prediction than a low value.

### Score

For discriminant analysis, the score of a classification is the posterior probability of the classification. For the definition of posterior probability in discriminant analysis, see Posterior Probability.

For ensembles, a classification score represents the confidence of a classification into a class. The higher the score, the higher the confidence.

Different ensemble algorithms have different definitions for their scores. Furthermore, the range of scores depends on ensemble type. For example:

• `AdaBoostM1` scores range from –∞ to ∞.

• `Bag` scores range from `0` to `1`.

For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.

For example, consider classifying a predictor `X` as `true` when `X` < `0.15` or `X` > `0.95`, and `X` is false otherwise.

Generate 100 random points and classify them:

```rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph') ```

Prune the tree:

```tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph') ```

The pruned tree correctly classifies observations that are less than 0.15 as `true`. It also correctly classifies observations from .15 to .94 as `false`. However, it incorrectly classifies observations that are greater than .94 as `false`. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for `true`, and about .8/.85=.94 for `false`.

Compute the prediction scores for the first 10 rows of `X`:

```[~,score] = predict(tree1,X(1:10)); [score X(1:10,:)] ```
```ans = 0.9059 0.0941 0.8147 0.9059 0.0941 0.9058 0 1.0000 0.1270 0.9059 0.0941 0.9134 0.9059 0.0941 0.6324 0 1.0000 0.0975 0.9059 0.0941 0.2785 0.9059 0.0941 0.5469 0.9059 0.0941 0.9575 0.9059 0.0941 0.9649 ```

Indeed, every value of `X` (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of `0` and `1`, while the other values of `X` have associated scores of `0.91` and `0.09`. The difference (score `0.09` instead of the expected `.06`) is due to a statistical fluctuation: there are `8` observations in `X` in the range `(.95,1)` instead of the expected `5` observations.

## Examples

collapse all

### Esimtate the k-fold Margins of a Classifier

Find the k-fold margins for an ensemble that classifies the `ionosphere` data.

Load the `ionosphere` data set.

```load ionosphere ```

Train a classification ensemble of decision trees.

```Mdl = fitensemble(X,Y,'AdaBoostM1',100,'Tree'); ```

Cross validate the classifier using 10-fold cross validation.

```cvens = crossval(Mdl); ```

Compute the _k_fold margins. Disaply summary statistics for the margins.

```m = kfoldMargin(cvens); marginStats = table(min(m),mean(m),max(m),... 'VariableNames',{'Min','Mean','Max'}) ```
```marginStats = Min Mean Max _______ ______ ______ -11.312 7.3236 23.517 ```