# Documentation

# predict

Predict labels using classification tree

## Syntax

``label = predict(Mdl,X)``
``label = predict(Mdl,X,Name,Value)``
``````[label,score,node,cnum] = predict(___)``````

## Description

````label = predict(Mdl,X)` returns a vector of predicted class labels for the predictor data in the table or matrix `X`, based on the trained, full or compact classification tree `Mdl`.```
````label = predict(Mdl,X,Name,Value)` uses additional options specified by one or more `Name,Value` pair arguments. For example, you can specify to prune `Mdl` to a particular level before predicting labels.```
``````[label,score,node,cnum] = predict(___)``` uses any of the input argument in the previous syntaxes and additionally returns:A matrix of classification scores (`score`) indicating the likelihood that a label comes from a particular class. For classification trees, scores are posterior probabilities. For each observation in `X`, the predicted class label corresponds to the minimum expected misclassification cost among all classes.A vector of predicted node numbers for the classification (`node`).A vector of predicted class number for the classification (`cnum`).```

## Input Arguments

Trained classification tree, specified as a `ClassificationTree` or `CompactClassificationTree` model object. That is, `Mdl` is a trained classification model returned by `fitctree` or `compact`.

Predictor data to be classified, specified as a numeric matrix or table.

Each row of `X` corresponds to one observation, and each column corresponds to one variable.

• For a numeric matrix:

• The variables making up the columns of `X` must have the same order as the predictor variables that trained `Mdl`.

• If you trained `Mdl` using a table (for example, `Tbl`), then `X` can be a numeric matrix if `Tbl` contains all numeric predictor variables. To treat numeric predictors in `Tbl` as categorical during training, identify categorical predictors using the `CategoricalPredictors` name-value pair argument of `fitctree`. If `Tbl` contains heterogeneous predictor variables (for example, numeric and categorical data types) and `X` is a numeric matrix, then `predict` throws an error.

• For a table:

• `predict` does not support multi-column variables and cell arrays other than cell arrays of character vectors.

• If you trained `Mdl` using a table (for example, `Tbl`), then all predictor variables in `X` must have the same variable names and data types as those that trained `Mdl` (stored in `Mdl.PredictorNames`). However, the column order of `X` does not need to correspond to the column order of `Tbl`. `Tbl` and `X` can contain additional variables (response variables, observation weights, etc.), but `predict` ignores them.

• If you trained `Mdl` using a numeric matrix, then the predictor names in `Mdl.PredictorNames` and corresponding predictor variable names in `X` must be the same. To specify predictor names during training, see the `PredictorNames` name-value pair argument of `fitctree`. All predictor variables in `X` must be numeric vectors. `X` can contain additional variables (response variables, observation weights, etc.), but `predict` ignores them.

Data Types: `table` | `double` | `single`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Pruning level, specified as the comma-separated pair consisting of `'Subtrees'` and a vector of nonnegative integers in ascending order or `'all'`.

If you specify a vector, then all elements must be at least `0` and at most `max(Mdl.PruneList)`. `0` indicates the full, unpruned tree and `max(Mdl.PruneList)` indicates the completely pruned tree (i.e., just the root node).

If you specify `'all'`, then `CompactClassificationTree.predict` operates on all subtrees (i.e., the entire pruning sequence). This specification is equivalent to using `0:max(Mdl.PruneList)`.

`CompactClassificationTree.predict` prunes `Mdl` to each level indicated in `Subtrees`, and then estimates the corresponding output arguments. The size of `Subtrees` determines the size of some output arguments.

To invoke `Subtrees`, the properties `PruneList` and `PruneAlpha` of `Mdl` must be nonempty. In other words, grow `Mdl` by setting `'Prune','on'`, or by pruning `Mdl` using `prune`.

Example: `'Subtrees','all'`

## Output Arguments

Predicted class labels, returned as a vector or array. Each entry of `label` corresponds to the class with minimal expected cost for the corresponding row of `X`.

Suppose `Subtrees` is a numeric vector containing `T` elements (for `'all'`, see `Subtrees`), and `X` has `N` rows.

• If the response data type is `char` and:

• `T` = 1, then `label` is a character matrix containing `N` rows. Each row contains the predicted label produced by subtree `Subtrees`.

• `T` > 1, then `label` is an `N`-by-`T` cell array.

• Otherwise, `labels` is an `N`-by-`T` array having the same data type as the response.

In the latter two cases, column `j` of `labels` contains the vector of predicted labels produced by subtree `Subtrees(j)`.

Posterior probabilities, returned as a numeric matrix of size `N`-by-`K`, where `N` is the number of observations (rows) in `X`, and `K` is the number of classes (in `Mdl.ClassNames`). `score(i,j)` is the posterior probability that row `i` of `X` is of class `j`.

If `Subtrees` has `T` elements, and `X` has `N` rows, then `score` is an `N`-by-`K`-by-`T` array, and `node` and `cnum` are `N`-by-`T` matrices.

Node numbers for the predicted classes, returned as a numeric vector. Each entry corresponds to the predicted node in `Mdl` for the corresponding row of `X`.

Class numbers corresponding to the predicted `labels`, returned as a numeric vector. Each entry of `cnum` corresponds to a predicted class number for the corresponding row of `X`.

## Examples

Examine predictions for a few rows in a data set left out of training.

```load fisheriris ```

Partition the data into training (50%) and validation (50%) sets.

```n = size(meas,1); rng(1) % For reproducibility idxTrn = false(n,1); idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices idxVal = idxTrn == false; % Validation set logical indices ```

Grow a classification tree using the training set.

```Mdl = fitctree(meas(idxTrn,:),species(idxTrn)); ```

Predict labels for the validation data. Count the number of misclassified observations.

```label = predict(Mdl,meas(idxVal,:)); label(randsample(numel(label),5)) % Display several predicted labels numMisclass = sum(~strcmp(label,species(idxVal))) ```
```ans = 5×1 cell array 'setosa' 'setosa' 'setosa' 'virginica' 'versicolor' numMisclass = 3 ```

The software misclassifies three out-of-sample observations.

```load fisheriris ```

Partition the data into training (50%) and validation (50%) sets.

```n = size(meas,1); rng(1) % For reproducibility idxTrn = false(n,1); idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices idxVal = idxTrn == false; % Validation set logical indices ```

Grow a classification tree using the training set, and then view it.

```Mdl = fitctree(meas(idxTrn,:),species(idxTrn)); view(Mdl,'Mode','graph') ```

The resulting tree has four levels.

Estimate posterior probabilities for the test set using subtrees pruned to levels 1 and 3.

```[~,Posterior] = predict(Mdl,meas(idxVal,:),'SubTrees',[1 3]); Mdl.ClassNames Posterior(randsample(size(Posterior,1),5),:,:),... % Display several posterior probabilities ```
```ans = 3×1 cell array 'setosa' 'versicolor' 'virginica' ans(:,:,1) = 1.0000 0 0 1.0000 0 0 1.0000 0 0 0 0 1.0000 0 0.8571 0.1429 ans(:,:,2) = 0.3733 0.3200 0.3067 0.3733 0.3200 0.3067 0.3733 0.3200 0.3067 0.3733 0.3200 0.3067 0.3733 0.3200 0.3067 ```

The elements of `Posterior` are class posterior probabilities:

• Rows correspond to observations in the validation set.

• Columns correspond to the classes as listed in `Mdl.ClassNames`.

• Pages correspond to the subtrees.

The subtree pruned to level 1 is more sure of its predictions than the subtree pruned to level 3 (i.e., the root node).

## Algorithms

`predict` generates predictions by following the branches of `Mdl` until it reaches a leaf node or a missing value. If `predict` reaches a leaf node, it returns the classification of that node.

If `predict` reaches a node with a missing value for a predictor, its behavior depends on the setting of the `Surrogate` name-value pair when `fitctree` constructs `Mdl`.

• `Surrogate` = `'off'` (default) — `predict` returns the label with the largest number of training samples that reach the node.

• `Surrogate` = `'on'``predict` uses the best surrogate split at the node. If all surrogate split variables with positive predictive measure of association are missing, `predict` returns the label with the largest number of training samples that reach the node. For a definition, see Predictive Measure of Association.