What is 'categorical predictor' in decision tree for regression

Hi, I'd like to use Matlab's own example for the question. Please refer to https://uk.mathworks.com/help/stats/fitrtree.html for the original example.
>> load carsmall
>> whos
Name Size Bytes Class Attributes
Acceleration 100x1 800 double
Cylinders 100x1 800 double
Displacement 100x1 800 double
Horsepower 100x1 800 double
MPG 100x1 800 double
Mfg 100x13 2600 char
Model 100x33 6600 char
Model_Year 100x1 800 double
Origin 100x7 1400 char
Weight 100x1 800 double
>> tree = fitrtree([Weight, Cylinders],MPG,...
'categoricalpredictors',2,'MinParentSize',20,...
'PredictorNames',{'W','C'})
tree =
RegressionTree
PredictorNames: {'W' 'C'}
ResponseName: 'Y'
CategoricalPredictors: 2
ResponseTransform: 'none'
NumObservations: 94
Properties, Methods
What exactly is the Categorical Predictors in this case and why it is 2?

 Accepted Answer

Adam Danz
Adam Danz on 10 Jan 2019
Edited: Adam Danz on 10 Jan 2019
Matlab's fitrtree() function returns a regression tree object. Read more about this object and its properties here:
As you'll read in the link above, the "CategoricalPredictors" contains index values corresponding to the columns of the categorical predictor data (if none of the predictors are categorical, this will be empty []).
So, why is it CategoricalPredictors equal to 2?
Now read about the function you're using fitrtree()
One of the name-value pairs (<- link) is 'CategoricalPredictors' which, is specified in your call to fitrtree() as 2. That's because you have two predictors being treated as categorical variables, [Weight, Cylinders].

3 Comments

This is also covered in the documentation I linked to. In general, it's a good idea to take a few minutes to read through the documentation when you're using a new function and the literature is generally short and terse.
It's just a cell array of strings that name your predictors. If you add a 3rd predictor, you need to add a 3rd predictor name.
tree = fitrtree([Weight, Cylinders, Acceleration],MPG,...
'categoricalpredictors',3,'MinParentSize',20,...
'PredictorNames',{'W','C', 'A'}) % <- added 'A' for acceleration
Actually, as I read a few times, 'the "CategoricalPredictors" contains index values corresponding to the columns of the categorical predictor data', 'index value' (or the 'entry') means if index value is 1, that is the first column of the predictor data, in this case, it is 'Weight'; if 'index value' is 2, that is the second column of the predictor data, in this case, it is 'Cylinders'.
So
tree = fitrtree([Weight, Cylinders],MPG,...
'categoricalpredictors',2,'MinParentSize',20,...
'PredictorNames',{'W','C'})
indicates 'Cylinders' (the 2nd column in the predictor data) is a categorical predictor. In fact, there are only 4, 6, 8 cylinders, the number is not countinuous.
how could i combine the tree with boxplot

Sign in to comment.

More Answers (0)

Asked:

on 10 Jan 2019

Commented:

on 23 Nov 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!