`Yfit = predict(B,X)`

[Yfit,stdevs] = predict(B,X)

[Yfit,scores] = predict(B,X)

[Yfit,scores,stdevs] = predict(B,X)

Yfit = predict(B,X,'param1',val1,'param2',val2,...)

`Yfit = predict(B,X)`

returns a vector of
predicted responses for the predictor data in the table or matrix `X`

,
based on the ensemble of bagged decision trees `B`

. `Yfit`

is
a cell array of character vectors for classification and a numeric
array for regression. By default, `predict`

takes
a democratic (nonweighted) average vote from all trees in the ensemble.

`B`

is a trained `TreeBagger`

model object, that is, a model
returned by `TreeBagger`

.

`X`

is a table or matrix of predictor data
used to generate responses. Rows represent observations and columns
represent variables.

If

`X`

is a numeric matrix:The variables making up the columns of

`X`

must have the same order as the predictor variables that trained`B`

.If you trained

`B`

using a table (for example,`Tbl`

), then`X`

can be a numeric matrix if`Tbl`

contains all numeric predictor variables. To treat numeric predictors in`Tbl`

as categorical during training, identify categorical predictors using the`CategoricalPredictors`

name-value pair argument of`TreeBagger`

. If`Tbl`

contains heterogeneous predictor variables (for example, numeric and categorical data types) and`X`

is a numeric matrix, then`predict`

throws an error.

If

`X`

is a table:`predict`

does not support multi-column variables and cell arrays other than cell arrays of character vectors.If you trained

`B`

using a table (for example,`Tbl`

), then all predictor variables in`X`

must have the same variable names and be of the same data types as those that trained`B`

(stored in`B.PredictorNames`

). However, the column order of`X`

does not need to correspond to the column order of`Tbl`

.`Tbl`

and`X`

can contain additional variables (response variables, observation weights, etc.), but`predict`

ignores them.If you trained

`B`

using a numeric matrix, then the predictor names in`B.PredictorNames`

and corresponding predictor variable names in`X`

must be the same. To specify predictor names during training, see the`PredictorNames`

name-value pair argument of`TreeBagger`

. All predictor variables in`X`

must be numeric vectors.`X`

can contain additional variables (response variables, observation weights, etc.), but`predict`

ignores them.

For regression, `[Yfit,stdevs] = predict(B,X)`

also
returns standard deviations of the computed responses over the ensemble
of the grown trees.

For classification, `[Yfit,scores] = predict(B,X)`

also
returns scores for all classes. `scores`

is a matrix
with one row per observation and one column per class. For each observation
and each class, the score generated by each tree is the probability
of this observation originating from this class computed as the fraction
of observations of this class in a tree leaf. `predict`

averages
these scores over all trees in the ensemble.

`[Yfit,scores,stdevs] = predict(B,X)`

also
returns standard deviations of the computed scores for classification. `stdevs`

is
a matrix with one row per observation and one column per class, with
standard deviations taken over the ensemble of the grown trees.

`Yfit = predict(B,X,'param1',val1,'param2',val2,...)`

specifies
optional parameter name/value pairs:

`'Trees'` | Array of tree indices to use for computation of responses.
Default is `'all'` . |

`'TreeWeights'` | Array of `NTrees` weights for weighting votes
from the specified trees. |

`'UseInstanceForTree'` | Logical matrix of size `Nobs` -by-`NTrees` indicating
which trees to use to make predictions for each observation. By default
all trees are used for all observations. |

For regression problems, the predicted response for an observation is the weighted average of the predictions using selected trees only. That is,

$${\widehat{y}}_{\text{bag}}=\frac{1}{{\displaystyle \sum _{t=1}^{T}{\alpha}_{t}I(t\in S)}}{\displaystyle \sum _{t=1}^{T}{\alpha}_{t}{\widehat{y}}_{t}I(t\in S)}.$$

$${\widehat{y}}_{t}$$ is the prediction from tree

*t*in the ensemble.*S*is the set of indices of selected trees that comprise the prediction (see`'`

`Trees`

`'`

and`'`

`UseInstanceForTree`

`'`

). $$I(t\in S)$$ is 1 if*t*is in the set*S*, and 0 otherwise.*α*is the weight of tree_{t}*t*(see`'`

`TreeWeights`

`'`

).

For classification problems, the predicted class for an observation is the class that yields the largest weighted average of the class posterior probabilities (i.e., classification scores) computed using selected trees only. That is,

For each class

*c*∊*C*and each tree*t*= 1,...,*T*,`predict`

computes $${\widehat{P}}_{t}\left(c|x\right)$$, which is the estimated posterior probability of class*c*given observation*x*using tree*t*.*C*is the set of all distinct classes in the training data. For more details on classification tree posterior probabilities, see`fitctree`

and`predict`

.`predict`

computes the weighted average of the class posterior probabilities over the selected trees.$${\widehat{P}}_{\text{bag}}\left(c|x\right)=\frac{1}{{\displaystyle \sum _{t=1}^{T}{\alpha}_{t}I(t\in S)}}{\displaystyle \sum _{t=1}^{T}{\alpha}_{t}{\widehat{P}}_{t}\left(c|x\right)I(t\in S)}.$$

The predicted class is the class that yields the largest weighted average.

$${\widehat{y}}_{\text{bag}}=\underset{c\in C}{\mathrm{arg}\mathrm{max}}\left\{{\widehat{P}}_{\text{bag}}\left(c|x\right)\right\}.$$

Was this topic helpful?