## Documentation |

`cost = test(t,'resubstitution')cost = test(t,'test',X,y)cost = test(t,'crossvalidate',X,y)[cost,secost,ntnodes,bestlevel] = test(...)[...] = test(...,`

`cost = test(t,'resubstitution')` computes
the cost of the tree `t` using a resubstitution method. `t` is
a decision tree as created by `classregtree`.
The cost of the tree is the sum over all terminal nodes of the estimated
probability of a node times the cost of a node. If `t` is
a classification tree, the cost of a node is the sum of the misclassification
costs of the observations in that node. If `t` is
a regression tree, the cost of a node is the average squared error
over the observations in that node. `cost` is a vector
of cost values for each subtree in the optimal pruning sequence for `t`.
The resubstitution cost is based on the same sample that was used
to create the original tree, so it under estimates the likely cost
of applying the tree to new data.

`cost = test(t,'test',X,y)` uses the matrix
of predictors `X` and the response vector `y` as
a test sample, applies the decision tree `t` to that
sample, and returns a vector `cost` of cost values
computed for the test sample. `X` and `y` should
not be the same as the learning sample, that is, the sample that was
used to fit the tree `t`.

`cost = test(t,'crossvalidate',X,y)` uses
10-fold cross-validation to compute the cost vector. `X` and `y` should
be the learning sample, that is, the sample that was used to fit
the tree `t`. The function partitions the sample
into 10 subsamples, chosen randomly but with roughly equal size. For
classification trees, the subsamples also have roughly the same class
proportions. For each subsample, `test` fits a
tree to the remaining data and uses it to predict the subsample. It
pools the information from all subsamples to compute the cost for
the whole sample.

`[cost,secost,ntnodes,bestlevel] = test(...)` also
returns the vector `secost` containing the standard
error of each `cost` value, the vector `ntnodes` containing
the number of terminal nodes for each subtree, and the scalar `bestlevel` containing
the estimated best level of pruning. A `bestlevel` of `0` means
no pruning. The best level is the one that produces the smallest tree
that is within one standard error of the minimum-cost subtree.

`[...] = test(...,param1,val1,param2,val2,...)` specifies
optional parameter name/value pairs for methods other than

`'weights'`— Observation weights.`'nsamples'`— The number of cross-validation samples (default is`10`).`'treesize'`— Either`'se'`(default) to choose the smallest tree whose cost is within one standard error of the minimum cost, or`'min'`to choose the minimal cost tree.

[1] Breiman, L., J. Friedman, R. Olshen, and
C. Stone. *Classification and Regression Trees*.
Boca Raton, FL: CRC Press, 1984.

`classregtree` | `eval` | `prune` | `view`

Was this topic helpful?