Cross Validate a Regression Tree
This example shows how to examine the resubstitution and cross-validation accuracy of a regression tree for predicting mileage based on the carsmall data.
Load the carsmall data set. Consider acceleration, displacement, horsepower, and weight as predictors of MPG.
load carsmall X = [Acceleration Displacement Horsepower Weight];
Grow a regression tree using all of the observations.
rtree = fitrtree(X,MPG);
Compute the in-sample error.
resuberror = resubLoss(rtree)
resuberror = 4.7188
The resubstitution loss for a regression tree is the mean-squared error. The resulting value indicates that a typical predictive error for the tree is about the square root of 4.7, or a bit over 2.
Estimate the cross-validation MSE.
rng 'default'; cvrtree = crossval(rtree); cvloss = kfoldLoss(cvrtree)
cvloss = 23.8065
The cross-validated loss is almost 25, meaning a typical predictive error for the tree on new data is about 5. This demonstrates that cross-validated loss is usually higher than simple resubstitution loss.