Documentation Center

  • Trial Software
  • Product Updates

predictorImportance

Class: CompactRegressionTree

Estimates of predictor importance

Syntax

imp = predictorImportance(tree)

Description

imp = predictorImportance(tree) computes estimates of predictor importance for tree by summing changes in the mean squared error due to splits on every predictor and dividing the sum by the number of branch nodes.

Input Arguments

tree

A regression tree created by fitrtree, or by the compact method.

Output Arguments

imp

A row vector with the same number of elements as the number of predictors (columns) in tree.X. The entries are the estimates of predictor importance, with 0 representing the smallest possible importance.

Definitions

Predictor Importance

predictorImportance computes estimates of predictor importance for tree by summing changes in the mean squared error (MSE) due to splits on every predictor and dividing the sum by the number of branch nodes. If the tree is grown without surrogate splits, this sum is taken over best splits found at each branch node. If the tree is grown with surrogate splits, this sum is taken over all splits at each branch node including surrogate splits. imp has one element for each input predictor in the data used to train this tree. At each node, MSE is estimated as node error weighted by the node probability. Variable importance associated with this split is computed as the difference between MSE for the parent node and the total MSE for the two children.

Estimates of predictor importance do not depend on the order of predictors if you use surrogate splits, but do depend on the order if you do not use surrogate splits.

If you use surrogate splits, predictorImportance computes estimates before the tree is reduced by pruning or merging leaves. If you do not use surrogate splits, predictorImportance computes estimates after the tree is reduced by pruning or merging leaves. Therefore, reducing the tree by pruning affects the predictor importance for a tree grown without surrogate splits, and does not affect the predictor importance for a tree grown with surrogate splits.

Examples

Find predictor importance for the carsmall data. Use just the numeric predictors:

load carsmall
X = [Acceleration Cylinders Displacement ...
    Horsepower Model_Year Weight];
tree = fitrtree(X,MPG);
imp = predictorImportance(tree)

imp =
    0.0315         0    0.1082    0.0686    0.1629    1.2924

The weight (last predictor) has the most impact on mileage (MPG). The second predictor has importance 0; this means the number of cylinders has no impact on predictions made with tree.

 

Estimate the predictor importance for all variables in the carsmall data for a tree grown with surrogate splits:

load carsmall
X = [Acceleration Cylinders Displacement ...
    Horsepower Model_Year Weight];
tree2 = fitrtree(X,MPG,...
    'Surrogate','on');
imp2 = predictorImportance(tree2)

imp2 =
    0.5287    1.1977    1.2400    0.7059    1.0677    1.4106

While weight (last predictor) still has the most impact on mileage (MPG), this estimate has the second predictor (number of cylinders) as the third most important predictor.

See Also

|

Was this topic helpful?