# resubPredict

Class: ClassificationTree

Predict resubstitution response of tree

## Syntax

```label = resubPredict(tree)[label,posterior] = resubPredict(tree)[label,posterior,node] = resubPredict(tree)[label,posterior,node,cnum] = resubPredict(tree)[label,...] = resubPredict(tree,Name,Value)```

## Description

`label = resubPredict(tree)` returns the labels `tree` predicts for the data `tree.X`. `label` is the predictions of `tree` on the data that `fitctree` used to create `tree`.

```[label,posterior] = resubPredict(tree)``` returns the posterior class probabilities for the predictions.

```[label,posterior,node] = resubPredict(tree)``` returns the node numbers of `tree` for the resubstituted data.

```[label,posterior,node,cnum] = resubPredict(tree)``` returns the predicted class numbers for the predictions.

`[label,...] = resubPredict(tree,Name,Value)` returns resubstitution predictions with additional options specified by one or more `Name,Value` pair arguments.

## Input Arguments

 `tree` A classification tree constructed by `fitctree`.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

 `'Subtrees'` A vector of nonnegative integers in ascending order or `'all'`. If you specify a vector, then all elements must be at least `0` and at most `max(tree.PruneList)`. `0` indicates the full, unpruned tree and `max(tree.PruneList)` indicates the a completely pruned tree (i.e., just the root node). If you specify `'all'`, then `ClassificationTree.resubPredict` operates on all subtrees (i.e., the entire pruning sequence). This specification is equivalent to using `0:max(tree.PruneList)`. `ClassificationTree.resubPredict` prunes `tree` to each level indicated in `Subtrees`, and then estimates the corresponding output arguments. The size of `Subtrees` determines the size of some output arguments. To invoke `Subtrees`, the properties `PruneList` and `PruneAlpha` of `tree` must be nonempty. In other words, grow `tree` by setting `'Prune','on'`, or by pruning `tree` using `prune`. Default: `0`

## Output Arguments

 `label` The response `tree` predicts for the training data. `label` is the same data type as the training response data `tree.Y`. If the `Subtrees` name-value argument contains `m`>`1` entries, `label` has `m` columns, each of which represents the predictions of the corresponding subtree. Otherwise, `label` is a vector. `posterior` Matrix or array of posterior probabilities for classes `tree` predicts. If the `Subtrees` name-value argument is a scalar or is missing, `posterior` is an `n`-by-`k` matrix, where `n` is the number of rows in the training data `tree.X`, and `k` is the number of classes. If `Subtrees` contains `m`>`1` entries, `posterior` is an `n`-by-`k`-by-`m` array, where the matrix for each `m` gives posterior probabilities for the corresponding subtree. `node` The node numbers of `tree` where each data row resolves. If the `Subtrees` name-value argument is a scalar or is missing, `node` is a numeric column vector with `n` rows, the same number of rows as `tree.X`. If `Subtrees` contains `m`>`1` entries, `node` is a `n`-by-`m` matrix. Each column represents the node predictions of the corresponding subtree. `cnum` The class numbers that `tree` predicts for the resubstituted data. If the `Subtrees` name-value argument is a scalar or is missing, `cnum` is a numeric column vector with `n` rows, the same number of rows as `tree.X`. If `Subtrees` contains `m`>`1` entries, `cnum` is a `n`-by-`m` matrix. Each column represents the class predictions of the corresponding subtree.

## Definitions

### Posterior Probability

The posterior probability of the classification at a node is the number of training sequences that lead to that node with this classification, divided by the number of training sequences that lead to that node.

For example, consider classifying a predictor `X` as `true` when `X`<`0.15` or `X`>`0.95`, and `X` is false otherwise.

1. Generate 100 random points and classify them:

```rng(0) % For reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','graph') ```

2. Prune the tree:

```tree1 = prune(tree,'Level',1); view(tree1,'Mode','graph') ```

The pruned tree correctly classifies observations that are less than 0.15 as `true`. It also correctly classifies observations between .15 and .94 as `false`. However, it incorrectly classifies observations that are greater than .94 as `false`. Therefore the score for observations that are greater than .15 should be about .05/.85=.06 for `true`, and about .8/.85=.94 for `false`.

3. Compute the prediction scores for the first 10 rows of `X`:

```[~,score] = predict(tree1,X(1:10)); [score X(1:10,:)] ```
```ans = 0.9059 0.0941 0.8147 0.9059 0.0941 0.9058 0 1.0000 0.1270 0.9059 0.0941 0.9134 0.9059 0.0941 0.6324 0 1.0000 0.0975 0.9059 0.0941 0.2785 0.9059 0.0941 0.5469 0.9059 0.0941 0.9575 0.9059 0.0941 0.9649 ```

Indeed, every value of `X` (the rightmost column) that is less than 0.15 has associated scores (the left and center columns) of `0` and `1`, while the other values of `X` have associated scores of `0.94` and `0.06`.

## Examples

collapse all

### Compute Number of Misclassified Observations

Find the total number of misclassifications of the Fisher iris data for a classification tree.

```load fisheriris tree = fitctree(meas,species); Ypredict = resubPredict(tree); % The predictions Ysame = strcmp(Ypredict,species); % True when == sum(~Ysame) % How many are different? ```
```ans = 3 ```

### Compare In-Sample Posterior Probabilities for Each Subtree

Load Fisher's iris data set. Partition the data into training (50%)

```load fisheriris ```

Grow a classification tree using the all petal measurements.

```Mdl = fitctree(meas(:,3:4),species); n = size(meas,1); % Sample size K = numel(Mdl.ClassNames); % Number of classes ```

View the classification tree.

```view(Mdl,'Mode','graph'); ```

The classification tree has four pruning levels. Level 0 is the full, unpruned tree (as displayed). Level 4 is just the root node (i.e., no splits).

Estimate the posterior probabilities for each class using the subtrees pruned to levels 1 and 3.

```[~,Posterior] = resubPredict(Mdl,'SubTrees',[1 3]); ```

`Posterior` is an `n`-by- `K`-by- 2 array of posterior probabilities. Rows of `Posterior` correspond to observations, columns correspond to the classes with order `Mdl.ClassNames`, and pages correspond to pruning level.

Display the class posterior probabilities for iris 125 using each subtree.

```Posterior(125,:,:) ```
```ans(:,:,1) = 0 0.0217 0.9783 ans(:,:,2) = 0 0.5000 0.5000 ```

The decision stump (page 2 of `Posterior`) has trouble predicting whether iris 125 is versicolor or virginica.