Documentation

cvloss

Class: RegressionTree

Regression error by cross validation

Syntax

E = cvloss(tree)
[E,SE] = cvloss(tree)
[E,SE,Nleaf] = cvloss(tree)
[E,SE,Nleaf,BestLevel] = cvloss(tree)
[E,...] = cvloss(tree,Name,Value)

Description

E = cvloss(tree) returns the cross-validated regression error (loss) for tree, a regression tree.

[E,SE] = cvloss(tree) returns the standard error of E.

[E,SE,Nleaf] = cvloss(tree) returns the number of leaves (terminal nodes) in tree.

[E,SE,Nleaf,BestLevel] = cvloss(tree) returns the optimal pruning level for tree.

[E,...] = cvloss(tree,Name,Value) cross validates with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments

tree

A regression tree produced by fitrtree.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Subtrees'

A vector of nonnegative integers in ascending order or 'all'.

If you specify a vector, then all elements must be at least 0 and at most max(tree.PruneList). 0 indicates the full, unpruned tree and max(tree.PruneList) indicates the a completely pruned tree (i.e., just the root node).

If you specify 'all', then RegressionTree.cvloss operates on all subtrees (i.e., the entire pruning sequence). This specification is equivalent to using 0:max(tree.PruneList).

RegressionTree.cvloss prunes tree to each level indicated in Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments.

To invoke Subtrees, the properties PruneList and PruneAlpha of tree must be nonempty. In other words, grow tree by setting 'Prune','on', or by pruning tree using prune.

Default: 0

'TreeSize'

One of the following strings:

  • 'se'cvloss uses the smallest tree whose cost is within one standard error of the minimum cost.

  • 'min'cvloss uses the minimal cost tree.

Default: 'se'

'KFold'

Number of cross-validation samples, a positive integer.

Default: 10

Output Arguments

E

The cross-validation mean squared error (loss). A vector or scalar depending on the setting of the Subtrees name-value pair.

SE

The standard error of E. A vector or scalar depending on the setting of the Subtrees name-value pair.

Nleaf

Number of leaf nodes in tree. Leaf nodes are terminal nodes, which give responses, not splits. A vector or scalar depending on the setting of the Subtrees name-value pair.

BestLevel

By default, a scalar representing the largest pruning level that achieves a value of E within SE of the minimum error. If you set TreeSize to 'min', BestLevel is the smallest value in Subtrees.

Examples

collapse all

Compute the Cross-Validation Error

Compute the cross-validation error for a default regression tree.

Load the carsmall data set. Consider Displacement, Horsepower, and Weight as predictors of the response MPG.

load carsmall
X = [Displacement Horsepower Weight];

Grow a regression tree using the entire data set.

Mdl = fitrtree(X,MPG);

Compute the cross-validation error.

rng(1); % For reproducibility
E = cvloss(Mdl)
E =

   25.7383

E is the 10-fold weighted, averge MSE (weighted by number of test observations in the folds).

Find the Best Pruning Level Using Cross Validation

Apply k-fold cross validation to find the best level to prune a regression tree for all of its subtrees.

Load the carsmall data set. Consider Displacement, Horsepower, and Weight as predictors of the response MPG.

load carsmall
X = [Displacement Horsepower Weight];

Grow a regression tree using the entire data set. View the resulting tree.

Mdl = fitrtree(X,MPG);
view(Mdl,'Mode','graph')

Compute the 5-fold cross-validation error for each subtree except for the first two lowest and highest pruning level. Specify to return the best pruning level over all subtrees.

rng(1); % For reproducibility
m = max(Mdl.PruneList) - 1
[~,~,~,bestLevel] = cvloss(Mdl,'SubTrees',2:m,'KFold',5)
m =

    15


bestLevel =

    14

Of the 15 pruning levels, the best pruning level is 14.

Prune the tree to the best level. View the resulting tree.

MdlPrune = prune(Mdl,'Level',bestLevel);
view(MdlPrune,'Mode','graph')

Alternatives

You can construct a cross-validated tree model with crossval, and call kfoldLoss instead of cvloss. If you are going to examine the cross-validated tree more than once, then the alternative can save time.

However, unlike cvloss, kfoldLoss does not return SE, Nleaf, or BestLevel.

Was this topic helpful?