# predictorImportance

Estimates of predictor importance for regression tree

## Description

## Examples

### Estimate Predictor Importance

Estimate the predictor importance for all predictor variables in the data.

Load the `carsmall`

data set.

`load carsmall`

Grow a regression tree for `MPG`

using `Acceleration`

, `Cylinders`

, `Displacement`

, `Horsepower`

, `Model_Year`

, and `Weight`

as predictors.

X = [Acceleration Cylinders Displacement Horsepower Model_Year Weight]; tree = fitrtree(X,MPG);

Estimate the predictor importance for all predictor variables.

imp = predictorImportance(tree)

`imp = `*1×6*
0.0647 0.1068 0.1155 0.1411 0.3348 2.6565

`Weight`

, the last predictor, has the most impact on mileage. The predictor with the minimal impact on making predictions is the first variable, which is `Acceleration`

.

### Predictor Importance and Surrogate Splits

Estimate the predictor importance for all variables in the data and where the regression tree contains surrogate splits.

Load the `carsmall`

data set.

`load carsmall`

Grow a regression tree for `MPG`

using `Acceleration`

, `Cylinders`

, `Displacement`

, `Horsepower`

, `Model_Year`

, and `Weight`

as predictors. Specify to identify surrogate splits.

```
X = [Acceleration Cylinders Displacement Horsepower Model_Year Weight];
tree = fitrtree(X,MPG,Surrogate="on");
```

Estimate the predictor importance for all predictor variables.

imp = predictorImportance(tree)

`imp = `*1×6*
1.0449 2.4560 2.5570 2.5788 2.0832 2.8938

Comparing `imp`

to the results in Estimate Predictor Importance, `Weight`

still has the most impact on mileage, but `Cylinders`

is the fourth most important predictor.

### Unbiased Predictor Importance Estimates

Load the `carsmall`

data set. Consider a model that predicts the mean fuel economy of a car given its acceleration, number of cylinders, engine displacement, horsepower, manufacturer, model year, and weight. Consider `Cylinders`

, `Mfg`

, and `Model_Year`

as categorical variables.

load carsmall Cylinders = categorical(Cylinders); Mfg = categorical(cellstr(Mfg)); Model_Year = categorical(Model_Year); X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg, ... Model_Year,Weight,MPG);

Display the number of categories represented in the categorical variables.

numCylinders = numel(categories(Cylinders))

numCylinders = 3

numMfg = numel(categories(Mfg))

numMfg = 28

numModelYear = numel(categories(Model_Year))

numModelYear = 3

Because there are 3 categories only in `Cylinders`

and `Model_Year`

, the standard CART, predictor-splitting algorithm prefers splitting a continuous predictor over these two variables.

Train a regression tree using the entire data set. To grow unbiased trees, specify usage of the curvature test for splitting predictors. Because there are missing values in the data, specify usage of surrogate splits.

Mdl = fitrtree(X,"MPG",PredictorSelection="curvature",Surrogate="on");

Estimate predictor importance values by summing changes in the risk due to splits on every predictor and dividing the sum by the number of branch nodes. Compare the estimates using a bar graph.

imp = predictorImportance(Mdl); figure bar(imp) title("Predictor Importance Estimates") ylabel("Estimates") xlabel("Predictors") h = gca; h.XTickLabel = Mdl.PredictorNames; h.XTickLabelRotation = 45; h.TickLabelInterpreter = "none";

In this case, `Displacement`

is the most important predictor, followed by `Horsepower`

.

## Input Arguments

`tree`

— Regression tree

`RegressionTree`

object | `CompactRegressionTree`

object

Regression tree, specified as a `RegressionTree`

object created
by the `fitrtree`

function or a `CompactRegressionTree`

object
created by the `compact`

function.

## Output Arguments

`imp`

— Predictor importance

numeric row vector

Predictor importance, returned as a numeric row vector with the same
number of elements as the number of predictors (columns) in
`tree`

`.X`

. The entries are the
estimates of predictor importance, with `0`

representing
the smallest possible importance.

## More About

### Predictor Importance

`predictorImportance`

computes importance measures of the predictors in a tree by
summing changes in the node risk due to splits on every predictor, and then dividing the sum
by the total number of branch nodes. The change in the node risk is the difference between
the risk for the parent node and the total risk for the two children. For example, if a tree
splits a parent node (for example, node 1) into two child nodes (for example, nodes 2 and
3), then `predictorImportance`

increases the importance of the split predictor by

(*R _{1}* –

*R*–

_{2}*R*)/

_{3}*N*

_{branch},

where *R _{i}* is node risk of node

*i*, and

*N*

_{branch}is the total number of branch nodes. A

*node risk*is defined as a node error weighted by the node probability:

*R _{i}* =

*P*

_{i}*E*,

_{i}where *P _{i}* is the node
probability of node

*i*, and

*E*is the mean squared error of node

_{i}*i*.

The estimates of predictor importance depend on whether you use surrogate splits for training.

If you use surrogate splits,

`predictorImportance`

sums the changes in the node risk over all splits at each branch node, including surrogate splits. If you do not use surrogate splits, then the function takes the sum over the best splits found at each branch node.Estimates of predictor importance do not depend on the order of predictors if you use surrogate splits, but do depend on the order if you do not use surrogate splits.

If you use surrogate splits,

`predictorImportance`

computes estimates before the tree is reduced by pruning (or merging leaves). If you do not use surrogate splits,`predictorImportance`

computes estimates after the tree is reduced by pruning. Therefore, pruning affects the predictor importance for a tree grown without surrogate splits, and does not affect the predictor importance for a tree grown with surrogate splits.

## Extended Capabilities

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

## Version History

**Introduced in R2011a**

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)