MATLAB Examples

Cross Validate a Regression Tree

This example shows how to examine the resubstitution and cross-validation accuracy of a regression tree for predicting mileage based on the carsmall data.

Load the carsmall data set. Consider acceleration, displacement, horsepower, and weight as predictors of MPG.

load carsmall
X = [Acceleration Displacement Horsepower Weight];

Grow a regression tree using all of the observations.

rtree = fitrtree(X,MPG);

Compute the in-sample error.

resuberror = resubLoss(rtree)
resuberror =


The resubstitution loss for a regression tree is the mean-squared error. The resulting value indicates that a typical predictive error for the tree is about the square root of 4.7, or a bit over 2.

Estimate the cross-validation MSE.

rng 'default';
cvrtree = crossval(rtree);
cvloss = kfoldLoss(cvrtree)
cvloss =


The cross-validated loss is almost 25, meaning a typical predictive error for the tree on new data is about 5. This demonstrates that cross-validated loss is usually higher than simple resubstitution loss.