`err = error(B,X,Y) `

err = error(B,X,Y,'param1',val1,'param2',val2,...)

`err = error(B,X,Y) `

computes the misclassification
probability for classification trees or mean squared error (MSE) for
regression trees for each tree, for predictors `X`

given
true response `Y`

. For classification, `Y`

can
be either a numeric vector, character matrix, cell array of strings,
categorical vector or logical vector. For regression, `Y`

must
be a numeric vector. `err`

is a vector with one
error measure for each of the `NTrees`

trees in the
ensemble `B`

.

`err = error(B,X,Y,'param1',val1,'param2',val2,...)`

specifies
optional parameter name/value pairs:

`'Mode'` | String indicating how the method computes errors. If set to `'cumulative'` (default), `error` computes
cumulative errors and `err` is a vector of length `NTrees` ,
where the first element gives error from `trees(1)` ,
second element gives error from`trees(1:2)` etc.,
up to `trees(1:NTrees)` . If set to `'individual'` , `err` is
a vector of length `NTrees` , where each element is
an error from each tree in the ensemble. If set to `'ensemble'` , `err` is
a scalar showing the cumulative error for the entire ensemble. |

`'Weights'` | Vector of observation weights to use for error averaging. By
default the weight of every observation is 1. The length of this vector
must be equal to the number of rows in `X` . |

`'Trees'` | Vector of indices indicating what trees to include in this
calculation. By default, this argument is set to `'all'` and
the method uses all trees. If `'Trees'` is a numeric
vector, the method returns a vector of length `NTrees` for `'cumulative'` and `'individual'` modes,
where `NTrees` is the number of elements in the input
vector, and a scalar for `'ensemble'` mode. For example,
in the `'cumulative'` mode, the first element gives
error from `trees(1)` , the second element gives error
from `trees(1:2)` etc. |

`'TreeWeights'` | Vector of tree weights. This vector must have the same length
as the `'Trees'` vector. The method uses these weights
to combine output from the specified trees by taking a weighted average
instead of the simple non-weighted majority vote. You cannot use this
argument in the `'individual'` mode. |

`'UseInstanceForTree'` | Logical matrix of size `Nobs` -by-`NTrees` indicating
which trees should be used to make predictions for each observation.
By default the method uses all trees for all observations. |

When estimating the ensemble error, you can request:

To return the ensemble error three ways: the error for individual trees in the ensemble, the cumulative error over all trees, and the error for the entire ensemble (see the

`'Mode'`

name-value pair argument).Which trees to use in the ensemble error calculations (see the

`'Trees'`

name-value pair argument). Suppose that*T*trees compose the ensemble, and that there are*T*^{*}≤*T*selected trees.For each selected tree, which observations in the input data (

`X`

and`Y`

) to use in the ensemble error calculation (see the`'UseInstanceForTree'`

name-value pair argument).To attribute each observation with a weight (see the

`'Weights'`

name-value pair argument). For the formulae that follow,*w*is the weight of observation_{j}*j*.To attribute each tree with a weight (see the

`'TreeWeights'`

name-value pair argument).

For regression problems,

`error`

estimates the weighted MSE of the ensemble of bagged regression trees for predicting`Y`

given`X`

using selected trees and observations.`error`

predicts responses for selected observations in`X`

using the selected regression trees in the ensemble.If you specify

`'Mode','Individual'`

, then the weighted MSE for tree*t*is$${\text{MSE}}_{t}=\frac{1}{{\displaystyle \sum _{j=1}^{n}{w}_{j}}}{\displaystyle \sum _{j=1}^{n}{w}_{j}{\left({y}_{j}-{\widehat{y}}_{tj}\right)}^{2}}.$$

$${\widehat{y}}_{tj}$$ is the predicted response of observation

*j*from selected regression tree*t*.`error`

sets any unselected observations within a selected tree to the weighted sample average of the observed, training data responses.If you specify

`'Mode','Cumulative'`

, then the weighted MSE is a vector of size*T*^{*}containing cumulative, weighted MSEs over the selected trees.`error`

follows these steps to estimate MSE_{t}^{*}, the cumulative, weighted MSE using the first*t*selected trees.For selected observation

*j*,*j*= 1,...,*n*,`error`

estimates $${\widehat{y}}_{\text{bag},tj}$$, the weighted average of the predictions among the first*t*selected trees (for details, see`predict`

). For this computation,`error`

uses the tree weights.`error`

estimates the cumulative, weighted MSE through tree*t*.$${\text{MSE}}_{t}^{\ast}=\frac{1}{{\displaystyle \sum _{j=1}^{n}{w}_{j}}}{\displaystyle \sum _{j=1}^{n}{w}_{j}{\left({y}_{j}-{\widehat{y}}_{\text{bag},tj}\right)}^{2}}.$$

`error`

sets observations that are unselected for all selected trees to the weighted sample average of the observed, training data responses.If you specify

`'Mode','Ensemble'`

, then the weighted MSE is the last element of the cumulative, weighted MSE vector.

For classification problems,

`error`

estimates the weighted misclassification rate of the ensemble of bagged classification trees for predicting`Y`

given`X`

using selected trees and observations.If you specify

`'Mode','Individual'`

, then the weighted misclassification rate for tree*t*is$${e}_{t}=\frac{1}{{\displaystyle \sum _{j=1}^{n}{w}_{j}}}{\displaystyle \sum _{j=1}^{n}{w}_{j}I\left({y}_{j}\ne {\widehat{y}}_{tj}\right)}.$$

$${\widehat{y}}_{tj}$$ is the predicted class for selected observation

*j*using from selected classification tree*t*.`error`

sets any unselected observations within a selected tree to the predicted, weighted, most popular class over all training responses. If there are multiple most popular classes,`error`

considers the one listed first in the`ClassNames`

property of the`TreeBagger`

model the most popular.If you specify

`'Mode','Cumulative'`

then the weighted misclassification rate is a vector of size*T*^{*}containing cumulative, weighted misclassification rates over the selected trees.`error`

follows these steps to estimate*e*_{t}^{*}, the cumulative, weighted misclassification rate using the first*t*selected trees.For selected observation

*j*,*j*= 1,...,*n*,`error`

estimates $${\widehat{y}}_{\text{bag},tj}$$, the weighted, most popular class among the first*t*selected trees (for details, see`predict`

). For this computation,`error`

uses the tree weights.`error`

estimates the cumulative, weighted misclassification rate through tree*t*.$${e}_{t}^{\ast}=\frac{1}{{\displaystyle \sum _{j=1}^{n}{w}_{j}}}{\displaystyle \sum _{j=1}^{n}{w}_{j}I\left({y}_{j}\ne {\widehat{y}}_{\text{bag},tj}\right)}.$$

`error`

sets any observations that are unselected for all selected trees to the predicted, weighted, most popular class over all training responses. If there are multiple most popular classes,`error`

considers the one listed first in the`ClassNames`

property of the`TreeBagger`

model the most popular.

If you specify

`'Mode','Ensemble'`

, then the weighted misclassification rate is the last element of the cumulative, weighted misclassification rate vector.

Was this topic helpful?