# predict

Predict classification

## Syntax

```label = predict(tree,X)[label,score] = predict(tree,X)[label,score,node] = predict(tree,X)[label,score,node,cnum] = predict(tree,X)[label,...] = predict(tree,X,Name,Value)```

## Description

`label = predict(tree,X)` returns a vector of predicted class labels for a matrix `X`, based on `tree`, a trained full or compact classification tree.

```[label,score] = predict(tree,X)``` returns a matrix of scores, indicating the likelihood that a label comes from a particular class.

```[label,score,node] = predict(tree,X)``` returns a vector of predicted node numbers for the classification, based on `tree`.

```[label,score,node,cnum] = predict(tree,X)``` returns a vector of predicted class number for the classification, based on `tree`.

`[label,...] = predict(tree,X,Name,Value)` returns labels with additional options specified by one or more `Name,Value` pair arguments.

## Input Arguments

 `tree` A classification tree created by `fitctree`, or a compact classification tree created by `compact`. `X` A matrix where each row represents an observation, and each column represents a predictor. The number of columns in `X` must equal the number of predictors in `tree`.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

 `'Subtrees'` A vector of nonnegative integers in ascending order or `'all'`. If you specify a vector, then all elements must be at least `0` and at most `max(tree.PruneList)`. `0` indicates the full, unpruned tree and `max(tree.PruneList)` indicates the a completely pruned tree (i.e., just the root node). If you specify `'all'`, then `CompactClassificationTree.predict` operates on all subtrees (i.e., the entire pruning sequence). This specification is equivalent to using `0:max(tree.PruneList)`. `CompactClassificationTree.predict` prunes `tree` to each level indicated in `Subtrees`, and then estimates the corresponding output arguments. The size of `Subtrees` determines the size of some output arguments. To invoke `Subtrees`, the properties `PruneList` and `PruneAlpha` of `tree` must be nonempty. In other words, grow `tree` by setting `'Prune','on'`, or by pruning `tree` using `prune`. Default: `0`

## Output Arguments

 `label` Vector of class labels of the same type as the response data used in training `tree`. Each entry of `label` corresponds to the class with minimal expected cost for the corresponding row of `X`. See Predicted Class Label. If `Subtrees` has `T` elements, and `X` has `N` rows, then `labels` is an `N`-by-`T` matrix. The `i`th column of `labels` contains the fitted values produced by the `Subtrees(I)` subtree. `score` Numeric matrix of size `N`-by-`K`, where `N` is the number of observations (rows) in `X`, and `K` is the number of classes (in `tree.ClassNames`). `score(i,j)` is the posterior probability that row `i` of `X` is of class `j`. If `Subtrees` has `T` elements, and `X` has `N` rows, then `score` is an `N`-by-`K`-by-`T` array, and `node` and `cnum` are `N`-by-`T` matrices. `node` Numeric vector of node numbers for the predicted classes. Each entry corresponds to the predicted node in `tree` for the corresponding row of `X`. `cnum` Numeric vector of class numbers corresponding to the predicted `labels`. Each entry of `cnum` corresponds to a predicted class number for the corresponding row of `X`.

## Definitions

### Predicted Class Label

`predict` classifies so as to minimize the expected classification cost:

$\stackrel{^}{y}=\underset{y=1,...,K}{\mathrm{arg}\mathrm{min}}\sum _{k=1}^{K}\stackrel{^}{P}\left(k|x\right)C\left(y|k\right),$

where

• $\stackrel{^}{y}$ is the predicted classification.

• K is the number of classes.

• $\stackrel{^}{P}\left(k|x\right)$ is the posterior probability of class k for observation x.

• $C\left(y|k\right)$ is the cost of classifying an observation as y when its true class is k.

### Score (tree)

For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.

For example, consider classifying a predictor `X` as `true` when `X` < `0.15` or `X` > `0.95`, and `X` is false otherwise.

Generate 100 random points and classify them:

```rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph') ```

Prune the tree:

```tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph') ```

The pruned tree correctly classifies observations that are less than 0.15 as `true`. It also correctly classifies observations from .15 to .94 as `false`. However, it incorrectly classifies observations that are greater than .94 as `false`. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for `true`, and about .8/.85=.94 for `false`.

Compute the prediction scores for the first 10 rows of `X`:

```[~,score] = predict(tree1,X(1:10)); [score X(1:10,:)] ```
```ans = 0.9059 0.0941 0.8147 0.9059 0.0941 0.9058 0 1.0000 0.1270 0.9059 0.0941 0.9134 0.9059 0.0941 0.6324 0 1.0000 0.0975 0.9059 0.0941 0.2785 0.9059 0.0941 0.5469 0.9059 0.0941 0.9575 0.9059 0.0941 0.9649 ```

Indeed, every value of `X` (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of `0` and `1`, while the other values of `X` have associated scores of `0.91` and `0.09`. The difference (score `0.09` instead of the expected `.06`) is due to a statistical fluctuation: there are `8` observations in `X` in the range `(.95,1)` instead of the expected `5` observations.

### True Misclassification Cost

There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.

You can set the true misclassification cost per class in the `Cost` name-value pair when you create the classifier using the `fitctree` method. `Cost(i,j)` is the cost of classifying an observation into class `j` if its true class is `i`. By default, `Cost(i,j)=1` if `i~=j`, and `Cost(i,j)=0` if `i=j`. In other words, the cost is `0` for correct classification, and `1` for incorrect classification.

### Expected Cost

There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.

Suppose you have `Nobs` observations that you want to classify with a trained classifier. Suppose you have `K` classes. You place the observations into a matrix `Xnew` with one observation per row.

The expected cost matrix `CE` has size `Nobs`-by-`K`. Each row of `CE` contains the expected (average) cost of classifying the observation into each of the `K` classes. `CE(n,k)` is

$\sum _{i=1}^{K}\stackrel{^}{P}\left(i|Xnew\left(n\right)\right)C\left(k|i\right),$

where

• K is the number of classes.

• $\stackrel{^}{P}\left(i|Xnew\left(n\right)\right)$ is the posterior probability of class i for observation Xnew(n).

• $C\left(k|i\right)$ is the true misclassification cost of classifying an observation as k when its true class is i.

### Predictive Measure of Association

The predictive measure of association between the optimal split on variable i and a surrogate split on variable j is:

${\lambda }_{i,j}=\frac{\text{min}\left({P}_{L},{P}_{R}\right)-\left(1-{P}_{{L}_{i}{L}_{j}}-{P}_{{R}_{i}{R}_{j}}\right)}{\text{min}\left({P}_{L},{P}_{R}\right)}.$

Here

• PL and PR are the node probabilities for the optimal split of node i into Left and Right nodes respectively.

• ${P}_{{L}_{i}{L}_{j}}$ is the probability that both (optimal) node i and (surrogate) node j send an observation to the Left.

• ${P}_{{R}_{i}{R}_{j}}$ is the probability that both (optimal) node i and (surrogate) node j send an observation to the Right.

Clearly, λi,j lies from –∞ to 1. Variable j is a worthwhile surrogate split for variable i if λi,j > 0.

## Examples

collapse all

### Predict Labels Using a Classification Tree

Examine predictions for a few rows in a data set left out of training.

```load fisheriris ```

Partition the data into training (50%) and validation (50%) sets.

```n = size(meas,1); rng(1) % For reproducibility idxTrn = false(n,1); idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices idxVal = idxTrn == false; % Validation set logical indices ```

Grow a classification tree using the training set.

```Mdl = fitctree(meas(idxTrn,:),species(idxTrn)); ```

Predict labels for the validation data. Count the number of misclassified observations.

```label = predict(Mdl,meas(idxVal,:)); label(randsample(numel(label),5)) % Display several predicted labels numMisclass = sum(~strcmp(label,species(idxVal))) ```
```ans = 'setosa' 'setosa' 'setosa' 'virginica' 'versicolor' numMisclass = 3 ```

The software misclassifies three out-of-sample observations.

### Estimate Class Posterior Probabilities Using a Classification Tree

```load fisheriris ```

Partition the data into training (50%) and validation (50%) sets.

```n = size(meas,1); rng(1) % For reproducibility idxTrn = false(n,1); idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices idxVal = idxTrn == false; % Validation set logical indices ```

Grow a classification tree using the training set, and then view it.

```Mdl = fitctree(meas(idxTrn,:),species(idxTrn)); view(Mdl,'Mode','graph') ```

The resulting tree has four levels.

Estimate posterior probabilities for the test set using subtrees pruned to levels 1 and 3.

```[~,Posterior] = predict(Mdl,meas(idxVal,:),'SubTrees',[1 3]); Mdl.ClassNames Posterior(randsample(size(Posterior,1),5),:,:),... % Display several posterior probabilities ```
```ans = 'setosa' 'versicolor' 'virginica' ans(:,:,1) = 1.0000 0 0 1.0000 0 0 1.0000 0 0 0 0 1.0000 0 0.8571 0.1429 ans(:,:,2) = 0.3733 0.3200 0.3067 0.3733 0.3200 0.3067 0.3733 0.3200 0.3067 0.3733 0.3200 0.3067 0.3733 0.3200 0.3067 ```

The elements of `Posterior` are class posterior probabilities:

• Rows correspond to observations in the validation set.

• Columns correspond to the classes as listed in `Mdl.ClassNames`.

• Pages correspond to the subtrees.

The subtree pruned to level 1 is more sure of its predictions than the subtree pruned to level 3 (i.e., the root node).

## Algorithms

`predict` generates predictions by following the branches of `tree` until it reaches a leaf node or a missing value. If `predict` reaches a leaf node, it returns the classification of that node.

If `predict` reaches a node with a missing value for a predictor, its behavior depends on the setting of the `Surrogate` name-value pair when `fitctree` constructs `tree`.

• `Surrogate` = `'off'` (default) — `predict` returns the label with the largest number of training samples that reach the node.

• `Surrogate` = `'on'``predict` uses the best surrogate split at the node. If all surrogate split variables with positive predictive measure of association are missing, `predict` returns the label with the largest number of training samples that reach the node. For a definition, see Predictive Measure of Association.