cost = treetest(t,'resubstitution')
cost = treetest(t,'test',X,y)
cost = treetest(t,'crossvalidate',X,y)
[cost,secost,ntnodes,bestlevel] = treetest(...)
[...] = treetest(...,
treetest will be removed in a future release.
fitrtree to grow a tree. Then use
cost = treetest(t,'resubstitution') computes
the cost of the tree
t using a resubstitution method.
a decision tree as created by the
The cost of the tree is the sum over all terminal nodes of the estimated
probability of that node times the node's cost. If
a classification tree, the cost of a node is the sum of the misclassification
costs of the observations in that node. If
a regression tree, the cost of a node is the average squared error
over the observations in that node.
cost is a vector
of cost values for each subtree in the optimal pruning sequence for
The resubstitution cost is based on the same sample that was used
to create the original tree, so it underestimates the likely cost
of applying the tree to new data.
cost = treetest(t,'test',X,y) uses
the predictor matrix
X and response
a test sample, applies the decision tree
t to that
sample, and returns a vector
cost of cost values
computed for the test sample.
not be the same as the learning sample, which is the sample that was
used to fit the tree
cost = treetest(t,'crossvalidate',X,y) uses
10-fold cross-validation to compute the cost vector.
be the learning sample, which is the sample that was used to fit the
t. The function partitions the sample into
10 subsamples, chosen randomly but with roughly equal size. For classification
trees, the subsamples also have roughly the same class proportions.
For each subsample,
treetest fits a tree to the
remaining data and uses it to predict the subsample. It pools the
information from all subsamples to compute the cost for the whole
[cost,secost,ntnodes,bestlevel] = treetest(...) also
returns the vector
secost containing the standard
error of each
cost value, the vector
number of terminal nodes for each subtree, and the scalar
the estimated best level of pruning.
bestlevel = 0 means
no pruning, i.e., the full unpruned tree. The best level is the one
that produces the smallest tree that is within one standard error
of the minimum-cost subtree.
[...] = treetest(..., specifies
optional parameter name-value pairs chosen from the following table.
The number of cross-validations samples (default is 10).
Find the best tree for Fisher's iris data using cross-validation. The solid line shows the estimated cost for each tree size, the dashed line marks one standard error above the minimum, and the square marks the smallest tree under the dashed line.
% Start with a large tree. load fisheriris; t = treefit(meas,species','splitmin',5); % Find the minimum-cost tree. [c,s,n,best] = treetest(t,'cross',meas,species); tmin = treeprune(t,'level',best); % Plot smallest tree within 1 std of minimum cost tree. [mincost,minloc] = min(c); plot(n,c,'b-o',... n(best+1),c(best+1),'bs',... n,(mincost+s(minloc))*ones(size(n)),'k--'); xlabel('Tree size (number of terminal nodes)') ylabel('Cost')
 Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.