MathWorks Machine Translation
The automated translation of this page is provided by a general purpose third party translator tool.
MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.
Fit binary regression decision tree
returns
a regression tree based on the input variables (also known as predictors,
features, or attributes) in the table tree
= fitrtree(Tbl
,ResponseVarName
)Tbl
and output
(response) contained in Tbl.ResponseVarName
. tree
is
a binary tree where each branching node is split based on the values
of a column of Tbl
.
fits
a tree with additional options specified by one or more tree
= fitrtree(___,Name,Value
)Name,Value
pair
arguments. For example, you can specify observation weights or train
a crossvalidated model.
Load the sample data.
load carsmall;
Construct a regression tree using the sample data.
tree = fitrtree([Weight, Cylinders],MPG,... 'categoricalpredictors',2,'MinParentSize',20,... 'PredictorNames',{'W','C'})
tree = RegressionTree PredictorNames: {'W' 'C'} ResponseName: 'Y' CategoricalPredictors: 2 ResponseTransform: 'none' NumObservations: 94
Predict the mileage of 4,000pound cars with 4, 6, and 8 cylinders.
mileage4K = predict(tree,[4000 4; 4000 6; 4000 8])
mileage4K = 19.2778 19.2778 14.3889
You can control the depth of trees using the MaxNumSplits
, MinLeafSize
, or MinParentSize
namevalue pair parameters. fitrtree
grows deep decision trees by default. You can grow shallower trees to reduce model complexity or computation time.
Load the carsmall
data set. Consider Displacement
, Horsepower
, and Weight
as predictors of the response MPG
.
load carsmall
X = [Displacement Horsepower Weight];
The default values of the treedepth controllers for growing regression trees are:
n  1
for MaxNumSplits
. n
is the training sample size.
1
for MinLeafSize
.
10
for MinParentSize
.
These default values tend to grow deep trees for large training sample sizes.
Train a regression tree using the default values for treedepth control. Cross validate the model using 10fold cross validation.
rng(1); % For reproducibility MdlDefault = fitrtree(X,MPG,'CrossVal','on');
Draw a histogram of the number of imposed splits on the trees. The number of imposed splits is one less than the number of leaves. Also, view one of the trees.
numBranches = @(x)sum(x.IsBranch); mdlDefaultNumSplits = cellfun(numBranches, MdlDefault.Trained); figure; histogram(mdlDefaultNumSplits) view(MdlDefault.Trained{1},'Mode','graph')
The average number of splits is between 14 and 15.
Suppose that you want a regression tree that is not as complex (deep) as the ones trained using the default number of splits. Train another regression tree, but set the maximum number of splits at 7, which is about half the mean number of splits from the default regression tree. Cross validate the model using 10fold cross validation.
Mdl7 = fitrtree(X,MPG,'MaxNumSplits',7,'CrossVal','on'); view(Mdl7.Trained{1},'Mode','graph')
Compare the cross validation MSEs of the models.
mseDefault = kfoldLoss(MdlDefault) mse7 = kfoldLoss(Mdl7)
mseDefault = 27.7277 mse7 = 28.3833
Mdl7
is much less complex and performs only slightly worse than MdlDefault
.
This example shows how to optimize hyperparameters automatically using fitrtree
. The example uses the carsmall
data.
Load the carsmall
data.
load carsmall
Use Weight
and Horsepower
as predictors for MPG
. Find hyperparameters that minimize fivefold crossvalidation loss by using automatic hyperparameter optimization.
For reproducibility, set the random seed and use the 'expectedimprovementplus'
acquisition function.
X = [Weight,Horsepower]; Y = MPG; rng default Mdl = fitrtree(X,Y,'OptimizeHyperparameters','auto',... 'HyperparameterOptimizationOptions',struct('AcquisitionFunctionName',... 'expectedimprovementplus'))
==================================================================================  Iter  Eval  Objective  Objective  BestSoFar  BestSoFar  MinLeafSize    result   runtime  (observed)  (estim.)   ==================================================================================  1  Best  3.5834  15.581  3.5834  3.5834  34   2  Best  3.086  1.5383  3.086  3.1163  4   3  Accept  3.2898  0.48642  3.086  3.086  1   4  Accept  3.1239  0.51081  3.086  3.1141  13   5  Accept  3.086  1.3447  3.086  3.086  4   6  Accept  3.4534  0.43333  3.086  3.086  27   7  Best  3.0522  0.35231  3.0522  3.0521  7   8  Accept  3.0885  0.52855  3.0522  3.0524  8   9  Accept  3.0771  0.2674  3.0522  3.0672  5   10  Accept  3.0558  0.50171  3.0522  3.0631  6   11  Accept  3.0558  0.23902  3.0522  3.0606  6   12  Accept  3.0558  0.41868  3.0522  3.0527  6   13  Accept  3.1598  0.38257  3.0522  3.0606  2   14  Accept  3.0818  0.38265  3.0522  3.0528  3   15  Accept  3.0522  0.22666  3.0522  3.0525  7   16  Accept  3.0522  0.3301  3.0522  3.0524  7   17  Accept  3.0522  0.33122  3.0522  3.0523  7   18  Accept  4.1753  0.53547  3.0522  3.0524  50   19  Accept  3.1956  0.294  3.0522  3.0524  18   20  Best  3.0518  0.42776  3.0518  3.0524  10  ==================================================================================  Iter  Eval  Objective  Objective  BestSoFar  BestSoFar  MinLeafSize    result   runtime  (observed)  (estim.)   ==================================================================================  21  Accept  3.0873  0.4277  3.0518  3.0523  11   22  Accept  3.0518  0.22773  3.0518  3.0523  10   23  Accept  3.0518  0.31952  3.0518  3.0523  10   24  Accept  3.0518  0.37479  3.0518  3.052  10   25  Accept  3.3432  0.28657  3.0518  3.052  22   26  Accept  3.1959  0.26452  3.0518  3.0519  15   27  Best  3.0457  0.25744  3.0457  3.0462  9   28  Accept  3.0457  0.54074  3.0457  3.0459  9   29  Accept  3.0457  0.4317  3.0457  3.0458  9   30  Accept  3.0457  0.44484  3.0457  3.0458  9  __________________________________________________________ Optimization completed. MaxObjectiveEvaluations of 30 reached. Total function evaluations: 30 Total elapsed time: 350.2363 seconds. Total objective function evaluation time: 28.6887 Best observed feasible point: MinLeafSize ___________ 9 Observed objective function value = 3.0457 Estimated objective function value = 3.0458 Function evaluation time = 0.25744 Best estimated feasible point (according to models): MinLeafSize ___________ 9 Estimated objective function value = 3.0458 Estimated function evaluation time = 0.45359 Mdl = RegressionTree ResponseName: 'Y' CategoricalPredictors: [] ResponseTransform: 'none' NumObservations: 94 HyperparameterOptimizationResults: [1×1 BayesianOptimization]
Load the carsmall
data set. Consider a model that predicts the mean fuel economy of a car given its acceleration, number of cylinders, engine displacement, horsepower, manufacturer, model year, and weight. Consider Cylinders
, Mfg
, and Model_Year
as categorical variables.
load carsmall Cylinders = categorical(Cylinders); Mfg = categorical(cellstr(Mfg)); Model_Year = categorical(Model_Year); X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,... Model_Year,Weight,MPG);
Display the number of categories represented in the categorical variables.
numCylinders = numel(categories(Cylinders)) numMfg = numel(categories(Mfg)) numModelYear = numel(categories(Model_Year))
numCylinders = 3 numMfg = 28 numModelYear = 3
Because there are 3 categories only in Cylinders
and Model_Year
, the standard CART, predictorsplitting algorithm prefers splitting a continuous predictor over these two variables.
Train a regression tree using the entire data set. To grow unbiased trees, specify usage of the curvature test for splitting predictors. Because there are missing values in the data, specify usage of surrogate splits.
Mdl = fitrtree(X,'MPG','PredictorSelection','curvature','Surrogate','on');
Estimate predictor importance values by summing changes in the risk due to splits on every predictor and dividing the sum by the number of branch nodes. Compare the estimates using a bar graph.
imp = predictorImportance(Mdl); figure; bar(imp); title('Predictor Importance Estimates'); ylabel('Estimates'); xlabel('Predictors'); h = gca; h.XTickLabel = Mdl.PredictorNames; h.XTickLabelRotation = 45; h.TickLabelInterpreter = 'none';
In this case, Displacement
is the most important predictor, followed by Horsepower
.
Tbl
— Sample dataSample data used to train the model, specified as a table. Each
row of Tbl
corresponds to one observation, and
each column corresponds to one predictor variable. Optionally, Tbl
can
contain one additional column for the response variable. Multicolumn
variables and cell arrays other than cell arrays of character vectors
are not allowed.
If Tbl
contains the response variable, and
you want to use all remaining variables in Tbl
as
predictors, then specify the response variable using ResponseVarName
.
If Tbl
contains the response variable, and
you want to use only a subset of the remaining variables in Tbl
as
predictors, then specify a formula using formula
.
If Tbl
does not contain the response variable,
then specify a response variable using Y
. The
length of response variable and the number of rows of Tbl
must
be equal.
Data Types: table
ResponseVarName
— Response variable nameTbl
Response variable name, specified as the name of a variable
in Tbl
. The response variable must be a numeric
vector.
You must specify ResponseVarName
as a character
vector. For example, if Tbl
stores the response
variable Y
as Tbl.Y
, then
specify it as 'Y'
. Otherwise, the software treats
all columns of Tbl
, including Y
,
as predictors when training the model.
formula
— Explanatory model of response and subset of predictor variablesExplanatory model of the response and a subset of the predictor
variables, specified as a character vector in the form of 'Y~X1+X2+X3'
.
In this form, Y
represents the response variable,
and X1
, X2
, and X3
represent
the predictor variables. The variables must be variable names in Tbl
(Tbl.Properties.VariableNames
).
To specify a subset of variables in Tbl
as
predictors for training the model, use a formula. If you specify a
formula, then the software does not use any variables in Tbl
that
do not appear in formula
.
Data Types: char
Y
— Response dataResponse data, specified as a numeric column vector with the
same number of rows as X
. Each entry in Y
is
the response to the data in the corresponding row of X
.
The software considers NaN
values in Y
to
be missing values. fitrtree
does not use observations
with missing values for Y
in the fit.
Data Types: single
 double
X
— Predictor dataPredictor data, specified as numeric matrix. Each column of X
represents
one variable, and each row represents one observation.
fitrtree
considers NaN
values
in X
as missing values. fitrtree
does
not use observations with all missing values for X
the
fit. fitrtree
uses observations with some missing
values for X
to find splits on variables for which
these observations have valid values.
Data Types: single
 double
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.
'CrossVal','on','MinParentSize',30
specifies
a crossvalidated regression tree with a minimum of 30 observations
per branch node.Note:
You cannot use any crossvalidation namevalue pair along with 
'CategoricalPredictors'
— Categorical predictors list'all'
Categorical predictors list, specified as the commaseparated
pair consisting of 'CategoricalPredictors'
and
one of the following:
A numeric vector with indices from 1
through p
,
where p
is the number of columns of X
.
A logical vector of length p
, where
a true
entry means that the corresponding column
of X
is a categorical variable.
A cell array of character vectors, where each element
in the array is the name of a predictor variable. The names must match
entries in PredictorNames
values.
A character matrix, where each row of the matrix is
a name of a predictor variable. The names must match entries in PredictorNames
values.
Pad the names with extra blanks so each row of the character matrix
has the same length.
'all'
, meaning all predictors are
categorical.
By default, if the predictor data is in a matrix (X
),
the software assumes that none of the predictors are categorical.
If the predictor data is in a table (Tbl
), the
software assumes that a variable is categorical if it contains, logical
values, values of the unordered data type categorical
,
or a cell array of character vectors.
Example: 'CategoricalPredictors','all'
Data Types: single
 double
 char
 logical
 cell
'MergeLeaves'
— Leaf merge flag'on'
(default)  'off'
Leaf merge flag, specified as the commaseparated pair consisting
of 'MergeLeaves'
and 'on'
or 'off'
.
If MergeLeaves
is 'on'
,
then fitrtree
:
Merges leaves that originate from the same parent node, and that yields a sum of risk values greater or equal to the risk associated with the parent node
Estimates the optimal sequence of pruned subtrees, but does not prune the regression tree
Otherwise, fitrtree
does not
merge leaves.
Example: 'MergeLeaves','off'
'MinParentSize'
— Minimum number of branch node observations10
(default)  positive integer valueMinimum number of branch node observations, specified as the
commaseparated pair consisting of 'MinParentSize'
and
a positive integer value. Each branch node in the tree has at least MinParentSize
observations.
If you supply both MinParentSize
and MinLeafSize
, fitrtree
uses the setting that gives larger
leaves: MinParentSize = max(MinParentSize,2*MinLeafSize)
.
Example: 'MinParentSize',8
Data Types: single
 double
'PredictorNames'
— Predictor variable namesPredictor variable names, specified as the commaseparated pair
consisting of 'PredictorNames'
and a cell array
of unique character vectors. The functionality of 'PredictorNames'
depends
on the way you supply the training data.
If you supply X
and Y
,
then you can use 'PredictorNames'
to give the predictor
variables in X
names.
The order of the names in PredcitorNames
must
correspond to the column order of X
. That is, PredictorNames{1}
is
the name of X(:,1)
, PredictorNames{2}
is
the name of X(:,2)
, and so on. Also, size(X,2)
and numel(PredictorNames)
must
be equal.
By default, PredictorNames
is {x1,x2,...}
.
If you supply Tbl
, then you can
use 'PredictorNames'
to choose which predictor
variables to use in training. That is, fitrtree
uses
the predictor variables in PredictorNames
and the
response only in training.
PredictorNames
must be a subset
of Tbl.Properties.VariableNames
and cannot include
the name of the response variable.
By default, PredictorNames
contains
the names of all predictor variables.
It good practice to specify the predictors for training
using one of 'PredictorNames'
or formula
only.
Example: 'PredictorNames',{'SepalLength','SepalWidth','PedalLength','PedalWidth'}
Data Types: cell
'PredictorSelection'
— Algorithm used to select the best split predictor'allsplits'
(default)  'curvature'
 'interactioncurvature'
Algorithm used to select the best split predictor at each node,
specified as the commaseparated pair consisting of 'PredictorSelection'
and
a value in this table.
Value  Description 

'allsplits'  Standard CART — Selects the split predictor that maximizes the splitcriterion gain over all possible splits of all predictors [1]. 
'curvature'  Curvature test — Selects the split predictor that minimizes the pvalue of chisquare tests of independence between each predictor and the response [2]. Training speed is similar to standard CART. 
'interactioncurvature'  Interaction test — Chooses the split predictor that minimizes the pvalue of chisquare tests of independence between each predictor and the response (that is, conducts curvature tests), and that minimizes the pvalue of a chisquare test of independence between each pair of predictors and response [2]. Training speed can be slower than standard CART. 
For 'curvature'
and 'interactioncurvature'
,
if all tests yield pvalues greater than 0.05,
then fitrtree
stops splitting nodes.
Tip

For details on how fitrtree
selects
split predictors, see Node Splitting Rules.
Example: 'PredictorSelection','curvature'
Data Types: char
'Prune'
— Flag to estimate optimal sequence of pruned subtrees'on'
(default)  'off'
Flag to estimate the optimal sequence of pruned subtrees, specified
as the commaseparated pair consisting of 'Prune'
and 'on'
or 'off'
.
If Prune
is 'on'
, then fitrtree
grows
the regression tree and estimates the optimal sequence of pruned subtrees,
but does not prune the regression tree. Otherwise, fitrtree
grows
the regression tree without estimating the optimal sequence of pruned
subtrees.
To prune a trained regression tree, pass the regression tree
to prune
.
Example: 'Prune','off'
'PruneCriterion'
— Pruning criterion'mse'
(default)Pruning criterion, specified as the commaseparated pair consisting
of 'PruneCriterion'
and 'mse'
.
'QuadraticErrorTolerance'
— Quadratic error tolerance1e6
(default)  positive scalar valueQuadratic error tolerance per node, specified as the commaseparated
pair consisting of 'QuadraticErrorTolerance'
and
a positive scalar value. Splitting nodes stops when quadratic error
per node drops below QuadraticErrorTolerance*QED
,
where QED
is the quadratic error for the entire
data computed before the decision tree is grown.
Example: 'QuadraticErrorTolerance',1e4
'ResponseName'
— Response variable name'Y'
(default)  character vectorResponse variable name, specified as the commaseparated pair
consisting of 'ResponseName'
and a character vector.
If you supply Y
, then you can
use 'ResponseName'
to specify a name for the response
variable.
If you supply ResponseVarName
or formula
,
then you cannot use 'ResponseName'
.
Example: 'ResponseName','response'
Data Types: char
'ResponseTransform'
— Response transform function'none'
(default)  function handleResponse transform function for transforming the raw response
values, specified as the commaseparated pair consisting of 'ResponseTransform'
and
either a function handle or 'none'
. The function
handle must accept a matrix of response values and return a matrix
of the same size. The default is 'none'
, which
means @(x)x
, or no transformation.
Add or change a ResponseTransform
function
using dot notation:
tree.ResponseTransform = @function
Data Types: function_handle
'SplitCriterion'
— Split criterion'MSE'
(default)Split criterion, specified as the commaseparated pair consisting
of 'SplitCriterion'
and 'MSE'
,
meaning mean squared error.
Example: 'SplitCriterion','MSE'
'Surrogate'
— Surrogate decision splits flag'off'
(default)  'on'
 'all'
 positive integerSurrogate decision splits flag, specified as the commaseparated
pair consisting of 'Surrogate'
and 'on'
, 'off'
, 'all'
,
or a positive integer.
When 'on'
, fitrtree
finds
at most 10 surrogate splits at each branch node.
When set to a positive integer, fitrtree
finds
at most the specified number of surrogate splits at each branch node.
When set to 'all'
, fitrtree
finds all surrogate splits at
each branch node. The 'all'
setting can use much
time and memory.
Use surrogate splits to improve the accuracy of predictions for data with missing values. The setting also enables you to compute measures of predictive association between predictors.
Example: 'Surrogate','on'
Data Types: single
 double
 char
'Weights'
— Observation weightsones(size(X,1),1)
(default)  vector of scalar valuesObservation weights, specified as the commaseparated pair consisting
of 'Weights'
and a vector of scalar values. The
software weights the observations in each row of X
or Tbl
with
the corresponding value in Weights
. The size of Weights
must
equal the number of rows in X
or Tbl
.
If you specify the input data as a table Tbl
,
then Weights
can be the name of a variable in Tbl
that
contains a numeric vector. In this case, you must specify Weights
as
a character vector. For example, if weights vector W
is
stored as Tbl.W
, then specify it as 'W'
.
Otherwise, the software treats all columns of Tbl
,
including W
, as predictors when training the model.
fitrtree
normalizes the
weights in each class to add up to 1.
Data Types: single
 double
'CrossVal'
— Crossvalidation flag'off'
(default)  'on'
Crossvalidation flag, specified as the commaseparated pair
consisting of 'CrossVal'
and either 'on'
or 'off'
.
If 'on'
, fitrtree
grows
a crossvalidated decision tree with 10 folds. You can override this
crossvalidation setting using one of the 'KFold'
, 'Holdout'
, 'Leaveout'
,
or 'CVPartition'
namevalue pair arguments. You
can only use one of these four options ('KFold'
, 'Holdout'
, 'Leaveout'
,
or 'CVPartition'
) at a time when creating a crossvalidated
tree.
Alternatively, crossvalidate tree
later
using the crossval
method.
Example: 'CrossVal','on'
'CVPartition'
— Partition for crossvalidation treecvpartition
objectPartition for crossvalidated tree, specified as the commaseparated
pair consisting of 'CVPartition'
and an object
created using cvpartition
.
If you use 'CVPartition'
, you cannot use
any of the 'KFold'
, 'Holdout'
,
or 'Leaveout'
namevalue pair arguments.
'Holdout'
— Fraction of data for holdout validation0
(default)  scalar value in the range [0,1]
Fraction of data used for holdout validation, specified as the
commaseparated pair consisting of 'Holdout'
and
a scalar value in the range [0,1]
. Holdout validation
tests the specified fraction of the data, and uses the rest of the
data for training.
If you use 'Holdout'
, you cannot use any
of the 'CVPartition'
, 'KFold'
,
or 'Leaveout'
namevalue pair arguments.
Example: 'Holdout',0.1
Data Types: single
 double
'KFold'
— Number of folds10
(default)  positive integer greater than 1Number of folds to use in a crossvalidated tree, specified
as the commaseparated pair consisting of 'KFold'
and
a positive integer value greater than 1.
If you use 'KFold'
, you cannot use any of
the 'CVPartition'
, 'Holdout'
,
or 'Leaveout'
namevalue pair arguments.
Example: 'KFold',8
Data Types: single
 double
'Leaveout'
— Leaveoneout crossvalidation flag'off'
(default)  'on'
Leaveoneout crossvalidation flag, specified as the commaseparated
pair consisting of 'Leaveout'
and either 'on'
or 'off
.
Use leaveoneout cross validation by setting to 'on'
.
If you use 'Leaveout'
, you cannot use any
of the 'CVPartition'
, 'Holdout'
,
or 'KFold'
namevalue pair arguments.
Example: 'Leaveout','on'
'MaxNumSplits'
— Maximal number of decision splitssize(X,1)  1
(default)  positive integerMaximal number of decision splits (or branch nodes), specified
as the commaseparated pair consisting of 'MaxNumSplits'
and
a positive integer. fitrtree
splits MaxNumSplits
or
fewer branch nodes. For more details on splitting behavior, see Tree Depth Control.
Example: 'MaxNumSplits',5
Data Types: single
 double
'MinLeafSize'
— Minimum number of leaf node observations1
(default)  positive integer valueMinimum number of leaf node observations, specified as the commaseparated
pair consisting of 'MinLeafSize'
and a positive
integer value. Each leaf has at least MinLeafSize
observations
per tree leaf. If you supply both MinParentSize
and MinLeafSize
, fitrtree
uses
the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize)
.
Example: 'MinLeafSize',3
Data Types: single
 double
'NumVariablesToSample'
— Number of predictors for split'all'
(default)  positive integer valueNumber of predictors to select at random for each split, specified
as the commaseparated pair consisting of 'NumVariablesToSample'
and
a positive integer value. You can also specify 'all'
to
use all available predictors.
Example: 'NumVariablesToSample',3
Data Types: single
 double
'OptimizeHyperparameters'
— Parameters to optimize'none'
(default)  'auto'
 'all'
 cell array of eligible parameter names  vector of optimizableVariable
objectsParameters to optimize, specified as:
'none'
— Do not optimize.
'auto'
— Use {'MinLeafSize'}
.
'all'
— Optimize all eligible
parameters.
Cell array of eligible parameter names
Vector of optimizableVariable
objects,
typically the output of hyperparameters
The optimization attempts to minimize the crossvalidation loss
(error) for fitrtree
by varying the parameters.
To control the crossvalidation type and other aspects of the optimization,
use the HyperparameterOptimizationOptions
namevalue
pair.
The eligible parameters for fitrtree
are:
MaxNumSplits
— fitrtree
searches
among integers, by default logscaled in the range [1,max(2,NumObservations1)]
.
MinLeafSize
— fitrtree
searches
among integers, by default logscaled in the range [1,max(2,floor(NumObservations/2))]
.
NumVariablesToSample
— fitrtree
does
not optimize over this hyperparameter. If you pass NumVariablesToSample
as
a parameter name, fitrtree
simply uses the full
number of predictors. However, fitrensemble
does
optimize over this hyperparameter.
Set nondefault parameters by passing a vector of optimizableVariable
objects
that have nondefault values. For example,
load carsmall params = hyperparameters('fitrtree',[Horsepower,Weight],MPG); params(1).Range = [1,30];
Pass params
as the value of OptimizeHyperparameters
.
By default, iterative display appears at the command line, and
plots appear according to the number of hyperparameters in the optimization.
For the optimization and plots, the objective function is log(1 + crossvalidation loss) for
regression, and the misclassification rate for classification. To
control the iterative display, set the HyperparameterOptimizationOptions
namevalue
pair, Verbose
field. To control the plots, set
the HyperparameterOptimizationOptions
namevalue
pair, ShowPlots
field.
For an example, see Optimize Regression Tree.
Example: 'auto'
Data Types: char
 cell
'HyperparameterOptimizationOptions'
— Options for optimizationOptions for optimization, specified as a structure. Modifies
the effect of the OptimizeHyperparameters
namevalue
pair. All fields in the structure are optional.
Field Name  Values  Default 

Optimizer 
 'bayesopt' 
AcquisitionFunctionName 
bayesopt AcquisitionFunctionName namevalue
pair, or Acquisition Function Types.  'expectedimprovementpersecondplus' 
MaxObjectiveEvaluations  Maximum number of objective function evaluations.  30 for 'bayesopt' or 'randomsearch' ,
and the entire grid for 'gridsearch' 
NumGridDivisions  For 'gridsearch' , the number of values in
each dimension. Can be a vector of positive integers giving the number
of values for each dimension, or a scalar that applies to all dimensions.
Ignored for categorical variables.  10 
ShowPlots  Logical value indicating whether to show plots. If true ,
plots the best objective function value against iteration number.
If there are one or two optimization parameters, and if Optimizer is 'bayesopt' ,
then ShowPlots also plots a model of the objective
function against the parameters.  true 
SaveIntermediateResults  Logical value indicating whether to save results when Optimizer is 'bayesopt' .
If true , overwrites a workspace variable named 'BayesoptResults' at
each iteration. The variable is a BayesianOptimization object.  false 
Verbose  Display to the command line.
bayesopt Verbose namevalue
pair.  1 
Repartition  Logical value indicating whether to repartition the crossvalidation
at every iteration. If
 false 
Use no more than one of the following three field names.  
CVPartition  A cvpartition object, as created by cvpartition  Kfold = 5 
Holdout  A scalar in the range (0,1) representing
the holdout fraction.  
Kfold  An integer greater than 1. 
Example: struct('MaxObjectiveEvaluations',60)
Data Types: struct
tree
— Regression treeRegression tree, returned as a regression tree object. Using
the 'Crossval'
, 'KFold'
, 'Holdout'
, 'Leaveout'
,
or 'CVPartition'
options results in a tree of class RegressionPartitionedModel
.
You cannot use a partitioned tree for prediction, so this kind of
tree does not have a predict
method.
Otherwise, tree
is of class RegressionTree
, and
you can use the predict
method to make predictions.
The curvature test is a statistical test assessing the null hypothesis that two variables are unassociated.
The curvature test between predictor variable x and y is conducted using this process.
If x is continuous, then partition it into its quartiles. Create a nominal variable that bins observations according to which section of the partition they occupy. If there are missing values, then create an extra bin for them.
For each level in the partitioned predictor j = 1...J and class in the response k = 1,...,K, compute the weighted proportion of observations in class k
$${\widehat{\pi}}_{jk}={\displaystyle \sum _{i=1}^{n}I\{{y}_{i}=k\}}{w}_{i}.$$
w_{i} is the weight of observation i, $$\sum {w}_{i}}=1$$, I is the indicator function, and n is the sample size. If all observations have the same weight, then $${\widehat{\pi}}_{jk}=\frac{{n}_{jk}}{n}$$, where n_{jk} is the number of observations in level j of the predictor that are in class k.
Compute the test statistic
$$t=n{\displaystyle \sum _{k=1}^{K}{\displaystyle \sum _{j=1}^{J}\frac{{\left({\widehat{\pi}}_{jk}{\widehat{\pi}}_{j+}{\widehat{\pi}}_{+k}\right)}^{2}}{{\widehat{\pi}}_{j+}{\widehat{\pi}}_{+k}}}}$$
$${\widehat{\pi}}_{j+}={\displaystyle \sum _{k}{\widehat{\pi}}_{jk}}$$, that is, the marginal probability of observing the predictor at level j. $${\widehat{\pi}}_{+k}={\displaystyle \sum _{j}{\widehat{\pi}}_{jk}}$$, that is the marginal probability of observing class k. If n is large enough, then t is distributed as a χ^{2} with (K – 1)(J – 1) degrees of freedom.
If the pvalue for the test is less than 0.05, then reject the null hypothesis that there is no association between x and y.
When determining the best split predictor at each node, the standard CART algorithm prefers to select continuous predictors that have many levels. Sometimes, such a selection can be spurious and can also mask more important predictors that have fewer levels, such as categorical predictors.
The curvature test can be applied instead of standard CART to determine the best split predictor at each node. In that case, the best split predictor variable is the one that minimizes the significant pvalues (those less than 0.05) of curvature tests between each predictor and the response variable. Such a selection is robust to the number of levels in individual predictors.
For more details on how the curvature test applies to growing regression trees, see Node Splitting Rules and [3].
The interaction test is a statistical test that assesses the null hypothesis that there is no interaction between a pair of predictor variables and the response variable.
The interaction test assessing the association between predictor variables x_{1} and x_{2} with respect to y is conducted using this process.
If x_{1} or x_{2} is continuous, then partition that variable into its quartiles. Create a nominal variable that bins observations according to which section of the partition they occupy. If there are missing values, then create an extra bin for them.
Create the nominal variable z with J = J_{1}J_{2} levels that assigns an index to observation i according to which levels of x_{1} and x_{2} it belongs. Remove any levels of z that do not correspond to any observations.
Conduct a curvature test between z and y.
When growing decision trees, if there are important interactions between pairs of predictors, but there are also many other less important predictors in the data, then standard CART tends to miss the important interactions. However, conducting curvature and interaction tests for predictor selection instead can improve detection of important interactions, which can yield more accurate decision trees.
For more details on how the interaction test applies to growing decision trees, see Curvature Test, Node Splitting Rules and [2].
The predictive measure of association is a value that indicates the similarity between decision rules that split observations. Among all possible decision splits that are compared to the optimal split (found by growing the tree), the best surrogate decision split yields the maximum predictive measure of association. The secondbest surrogate split has the secondlargest predictive measure of association.
Suppose x_{j} and x_{k} are predictor variables j and k, respectively, and j ≠ k. At node t, the predictive measure of association between the optimal split x_{j} < u and a surrogate split x_{k} < v is
$${\lambda}_{jk}=\frac{\text{min}\left({P}_{L},{P}_{R}\right)\left(1{P}_{{L}_{j}{L}_{k}}{P}_{{R}_{j}{R}_{k}}\right)}{\text{min}\left({P}_{L},{P}_{R}\right)}.$$
P_{L} is the proportion of observations in node t, such that x_{j} < u. The subscript L stands for the left child of node t.
P_{R} is the proportion of observations in node t, such that x_{j} ≥ u. The subscript R stands for the right child of node t.
$${P}_{{L}_{j}{L}_{k}}$$ is the proportion of observations at node t, such that x_{j} < u and x_{k} < v.
$${P}_{{R}_{j}{R}_{k}}$$ is the proportion of observations at node t, such that x_{j} ≥ u and x_{k} ≥ v.
Observations with missing values for x_{j} or x_{k} do not contribute to the proportion calculations.
λ_{jk} is a value in (–∞,1]. If λ_{jk} > 0, then x_{k} < v is a worthwhile surrogate split for x_{j} < u.
A surrogate decision split is an alternative to the optimal decision split at a given node in a decision tree. The optimal split is found by growing the tree; the surrogate split uses a similar or correlated predictor variable and split criterion.
When the value of the optimal split predictor for an observation is missing, the observation is sent to the left or right child node using the best surrogate predictor. When the value of the best surrogate split predictor for the observation is also missing, the observation is sent to the left or right child node using the secondbest surrogate predictor, and so on. Candidate splits are sorted in descending order by their predictive measure of association.
fitrtree
uses these processes to determine
how to split node t.
For standard CART (that is, if PredictorSelection
is 'allpairs'
)
and for all predictors x_{i}, i = 1,...,p:
fitrtree
computes the weighted,
meansquare error (MSE) of the responses in node t using
$${\epsilon}_{t}={\displaystyle \sum _{j\in T}{w}_{j}}{\left({y}_{j}{\overline{y}}_{t}\right)}^{2}.$$
w_{j} is
the weight of observation j, and T is
the set of all observation indices in node t. If
you do not specify Weights
, then w_{j} =
1/n, where n is the sample size.
fitrtree
estimates the probability
that an observation is in node t using
$$P\left(T\right)={\displaystyle \sum _{j\in T}{w}_{j}}.$$
fitrtree
sorts x_{i} in
ascending order. Each element of the sorted predictor is a splitting
candidate or cut point. fitrtree
records any
indices corresponding to missing values in the set T_{U},
which is the unsplit set.
fitrtree
determines the best
way to split node t using x_{i} by
maximizing the reduction in MSE (ΔI) over
all splitting candidates. That is, for all splitting candidates in x_{i}:
fitrtree
splits the observations
in node t into left and right child nodes (t_{L} and t_{R},
respectively).
fitrtree
computes ΔI.
Suppose that for a particular splitting candidate, t_{L} and t_{R} contain
observation indices in the sets T_{L} and T_{R},
respectively.
If x_{i} does not contain any missing values, then the reduction in MSE for the current splitting candidate is
$$\Delta I=P\left(T\right){\epsilon}_{t}P\left({T}_{L}\right){\epsilon}_{{t}_{L}}P\left({T}_{R}\right){\epsilon}_{{t}_{R}}.$$
If x_{i} contains missing values, then, assuming that the observations are missing at random, the reduction in MSE is
$$\Delta {I}_{U}=P\left(T{T}_{U}\right){\epsilon}_{t}P\left({T}_{L}\right){\epsilon}_{{t}_{L}}P\left({T}_{R}\right){\epsilon}_{{t}_{R}}.$$
T – T_{U} is the set of all observation indices in node t that are not missing.
If you use surrogate decision splits, then:
fitrtree
computes the predictive
measures of association between the decision split x_{j} < u and
all possible decision splits x_{k} < v, j ≠ k.
fitrtree
sorts the possible
alternative decision splits in descending order by their predictive
measure of association with the optimal split. The surrogate split
is the decision split yielding the largest measure.
fitrtree
decides the child
node assignments for observations with a missing value for x_{i} using
the surrogate split. If the surrogate predictor also contains a missing
value, then fitrtree
uses the decision split
with the second largest measure, and so on, until there are no other
surrogates. It is possible for fitrtree
to
split two different observations at node t using
two different surrogate splits. For example, suppose the predictors x_{1} and x_{2} are
the best and second best surrogates, respectively, for the predictor x_{i}, i ∉
{1,2}, at node t. If observation m of
predictor x_{i} is missing
(i.e., x_{mi} is missing),
but x_{m1} is
not missing, then x_{1} is
the surrogate predictor for observation x_{mi}.
If observations x_{(m +
1),i} and x(m +
1),1 are missing, but x_{(m +
1),2} is not missing, then x_{2} is
the surrogate predictor for observation m + 1.
fitrtree
uses the appropriate
MSE reduction formula. That is, if fitrtree
fails
to assign all missing observations in node t to
children nodes using surrogate splits, then the MSE reduction is ΔI_{U}.
Otherwise, fitrtree
uses ΔI for
the MSE reduction.
fitrtree
chooses the candidate
that yields the largest MSE reduction.
fitrtree
splits the predictor
variable at the cut point that maximizes the MSE reduction.
For the curvature test (that is, if PredictorSelection
is 'curvature'
):
fitrtree
computes the residuals $${r}_{ti}={y}_{ti}{\overline{y}}_{t}$$ for
all observations in node t. $${\overline{y}}_{t}=\frac{1}{{\displaystyle {\sum}_{i}{w}_{i}}}{\displaystyle {\sum}_{i}{w}_{i}{y}_{ti}}$$,
which is the weighted average of the responses in node t.
The weights are the observation weights in Weights
.
fitrtree
assigns observations
to one of two bins according to the sign of the corresponding residuals.
Let z_{t} be a nominal variable
that contains the bin assignments for the observations in node t.
fitrtree
conducts curvature tests between
each predictor and z_{t}. For
regression trees, K = 2.
If all pvalues are at least 0.05,
then fitrtree
does not split node t.
If there is a minimal pvalue,
then fitrtree
chooses the corresponding predictor
to split node t.
If more than one pvalue is zero
due to underflow, then fitrtree
applies standard
CART to the corresponding predictors to choose the split predictor.
If fitrtree
chooses a split
predictor, then it uses standard CART to choose the cut point (see
step 4 in the standard CART process).
For the interaction test (that is, if PredictorSelection
is 'interactioncurvature'
):
For observations in node t, fitrtree
conducts curvature tests between
each predictor and the response and interaction tests between each pair
of predictors and the response.
If all pvalues are at least 0.05,
then fitrtree
does not split node t.
If there is a minimal pvalue and
it is the result of a curvature test, then fitrtree
chooses
the corresponding predictor to split node t.
If there is a minimal pvalue and
it is the result of an interaction test, then fitrtree
chooses
the split predictor using standard CART on the corresponding pair
of predictors.
If more than one pvalue is zero
due to underflow, then fitrtree
applies standard
CART to the corresponding predictors to choose the split predictor.
If fitrtree
chooses a split
predictor, then it uses standard CART to choose the cut point (see
step 4 in the standard CART process).
If MergeLeaves
is 'on'
and PruneCriterion
is 'mse'
(which
are the default values for these namevalue pair arguments), then
the software applies pruning only to the leaves and by using MSE.
This specification amounts to merging leaves coming from the same
parent node whose MSE is at most the sum of the MSE of its two leaves.
To accommodate MaxNumSplits
, fitrtree
splits
all nodes in the current layer, and then counts
the number of branch nodes. A layer is the set of nodes that are equidistant
from the root node. If the number of branch nodes exceeds MaxNumSplits
, fitrtree
follows
this procedure:
Determine how many branch nodes in the current layer
must be unsplit so that there are at most MaxNumSplits
branch
nodes.
Sort the branch nodes by their impurity gains.
Unsplit the number of least successful branches.
Return the decision tree grown so far.
This procedure produces maximally balanced trees.
The software splits branch nodes layer by layer until at least one of these events occurs:
There are MaxNumSplits
branch
nodes.
A proposed split causes the number of observations
in at least one branch node to be fewer than MinParentSize
.
A proposed split causes the number of observations
in at least one leaf node to be fewer than MinLeafSize
.
The algorithm cannot find a good split within a layer
(i.e., the pruning criterion (see PruneCriterion
),
does not improve for all proposed splits in a layer). A special case
is when all nodes are pure (i.e., all observations in the node have
the same class).
For values 'curvature'
or 'interactioncurvature'
of PredictorSelection
,
all tests yield pvalues greater than 0.05.
MaxNumSplits
and MinLeafSize
do
not affect splitting at their default values. Therefore, if you set 'MaxNumSplits'
,
splitting might stop due to the value of MinParentSize
,
before MaxNumSplits
splits occur.
For dualcore systems and above, fitrtree
parallelizes
training decision trees using Intel^{®} Threading Building Blocks
(TBB). For details on Intel TBB, see https://software.intel.com/enus/inteltbb.
[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.
[2] Loh, W.Y. "Regression Trees with Unbiased Variable Selection and Interaction Detection." Statistica Sinica, Vol. 12, 2002, pp. 361–386.
[3] Loh, W.Y. and Y.S. Shih. "Split Selection Methods for Classification Trees." Statistica Sinica, Vol. 7, 1997, pp. 815–840.
predict
 prune
 RegressionPartitionedModel
 RegressionTree
 surrogateAssociation
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
You can also select a location from the following list: