CompactRegressionTree class

Compact regression tree

Description

Compact version of a regression tree (of class RegressionTree). The compact version does not include the data for training the regression tree. Therefore, you cannot perform some tasks with a compact regression tree, such as cross validation. Use a compact regression tree for making predictions (regressions) of new data.

Construction

ctree = compact(tree) constructs a compact decision tree from a full decision tree.

Input Arguments

tree

A decision tree constructed by fitrtree.

Properties

CategoricalPredictors

List of categorical predictors. CategoricalPredictors is a numeric vector with indices from 1 to p, where p is the number of columns of X.

CategoricalSplits

An n-by-2 cell array, where n is the number of categorical splits in tree. Each row in CategoricalSplits gives left and right values for a categorical split. For each branch node with categorical split j based on a categorical predictor variable z, the left child is chosen if z is in CategoricalSplits(j,1) and the right child is chosen if z is in CategoricalSplits(j,2). The splits are in the same order as nodes of the tree. Nodes for these splits can be found by running cuttype and selecting 'categorical' cuts from top to bottom.

Children

An n-by-2 array containing the numbers of the child nodes for each node in tree, where n is the number of nodes. Leaf nodes have child node 0.

CutCategories

An n-by-2 cell array of the categories used at branches in tree, where n is the number of nodes. For each branch node i based on a categorical predictor variable x, the left child is chosen if x is among the categories listed in CutCategories{i,1}, and the right child is chosen if x is among those listed in CutCategories{i,2}. Both columns of CutCategories are empty for branch nodes based on continuous predictors and for leaf nodes.

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories.

CutPoint

An n-element vector of the values used as cut points in tree, where n is the number of nodes. For each branch node i based on a continuous predictor variable x, the left child is chosen if CutPoint<v(i) and the right child is chosen if x>=CutPoint(i). CutPoint is NaN for branch nodes based on categorical predictors and for leaf nodes.

CutType

An n-element cell array indicating the type of cut at each node in tree, where n is the number of nodes. For each node i, CutType{i} is:

  • 'continuous' — If the cut is defined in the form x < v for a variable x and cut point v.

  • 'categorical' — If the cut is defined by whether a variable x takes a value in a set of categories.

  • '' — If i is a leaf node.

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories.

CutPredictor

An n-element cell array of the names of the variables used for branching in each node in tree, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, CutPredictor contains an empty string.

CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories.

IsBranchNode

An n-element logical vector ib that is true for each branch node and false for each leaf node of tree.

NodeError

An n-element vector e of the errors of the nodes in tree, where n is the number of nodes. e(i) is the misclassification probability for node i.

NodeMean

An n-element numeric array with mean values in each node of tree, where n is the number of nodes in the tree. Every element in NodeMean is the average of the true Y values over all observations in the node.

NodeProbability

An n-element vector p of the probabilities of the nodes in tree, where n is the number of nodes. The probability of a node is computed as the proportion of observations from the original data that satisfy the conditions for the node. This proportion is adjusted for any prior probabilities assigned to each class.

NodeRisk

An n-element vector of the risk of the nodes in the tree, where n is the number of nodes. The risk for each node is the node error weighted by the node probability.

NodeSize

An n-element vector sizes of the sizes of the nodes in tree, where n is the number of nodes. The size of a node is defined as the number of observations from the data used to create the tree that satisfy the conditions for the node.

NumNodes

The number of nodes n in tree.

Parent

An n-element vector p containing the number of the parent node for each node in tree, where n is the number of nodes. The parent of the root node is 0.

PredictorNames

A cell array of names for the predictor variables, in the order in which they appear in X.

PruneAlpha

Numeric vector with one element per pruning level. If the pruning level ranges from 0 to M, then PruneAlpha has M + 1 elements sorted in ascending order. PruneAlpha(1) is for pruning level 0 (no pruning), PruneAlpha(2) is for pruning level 1, and so on.

PruneList

An n-element numeric vector with the pruning levels in each node of tree, where n is the number of nodes. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node.

ResponseName

Name of the response variable Y, a string.

ResponseTransform

Function handle for transforming the raw response values (mean squared error). The function handle should accept a matrix of response values and return a matrix of the same size. The default string 'none' means @(x)x, or no transformation.

Add or change a ResponseTransform function using dot notation:

ctree.ResponseTransform = @function

SurrogateCutCategories

An n-element cell array of the categories used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogateCutCategories{k} is a cell array. The length of SurrogateCutCategories{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutCategories{k} is either an empty string for a continuous surrogate predictor, or is a two-element cell array with categories for a categorical surrogate predictor. The first element of this two-element cell array lists categories assigned to the left child by this surrogate split, and the second element of this two-element cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutCategories contains an empty cell.

SurrogateCutFlip

An n-element cell array of the numeric cut assignments used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogateCutFlip{k} is a numeric vector. The length of SurrogateCutFlip{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutFlip{k} is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z<C and the cut assignment for this surrogate split is +1, or if ZC and the cut assignment for this surrogate split is –1. Similarly, the right child is chosen if ZC and the cut assignment for this surrogate split is +1, or if Z<C and the cut assignment for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutFlip contains an empty array.

SurrogateCutPoint

An n-element cell array of the numeric values used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogateCutPoint{k} is a numeric vector. The length of SurrogateCutPoint{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogateCutPoint{k} is either NaN for a categorical surrogate predictor, or a numeric cut for a continuous surrogate predictor. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z<C and SurrogateCutFlip for this surrogate split is +1, or if ZC and SurrogateCutFlip for this surrogate split is –1. Similarly, the right child is chosen if ZC and SurrogateCutFlip for this surrogate split is +1, or if Z<C and SurrogateCutFlip for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables returned by SurrogateCutVar. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutPoint contains an empty cell.

SurrogateCutType

An n-element cell array indicating types of surrogate splits at each node in tree, where n is the number of nodes in tree. For each node k, SurrogateCutType{k} is a cell array with the types of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The order of the surrogate split variables at each node is matched to the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutType contains an empty cell. A surrogate split type can be either 'continuous' if the cut is defined in the form Z<V for a variable Z and cut point V or 'categorical' if the cut is defined by whether Z takes a value in a set of categories.

SurrogateCutPredictor

An n-element cell array of the names of the variables used for surrogate splits in each node in tree, where n is the number of nodes in tree. Every element of SurrogateCutPredictor is a cell array with the names of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogateCutPredictor contains an empty cell.

SurrogatePredictorAssociation

An n-element cell array of the predictive measures of association for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrogatePredictorAssociation{k} is a numeric vector. The length of SurrogatePredictorAssociation{k} is equal to the number of surrogate predictors found at this node. Every element of SurrogatePredictorAssociation{k} gives the predictive measure of association between the optimal split and this surrogate split. The order of the surrogate split variables at each node is the order of variables in SurrogateCutPredictor. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, SurrogatePredictorAssociation contains an empty cell.

Methods

lossRegression error
predictPredict response of regression tree
predictorImportanceEstimates of predictor importance
surrogateAssociationMean predictive measure of association for surrogate splits in decision tree
viewView tree

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB® documentation.

Examples

expand all

Construct and Compact a Regression Tree

Load the sample data.

load carsmall

Construct a regression tree for the sample data.

tree = fitrtree([Weight, Cylinders],MPG,...
    'MinParentSize',20,...
    'PredictorNames',{'W','C'});

Make a compact version of the tree.

ctree = compact(tree);

Compare the size of the compact tree to that of the full tree.

t = whos('tree'); % t.bytes = size of tree in bytes
c = whos('ctree'); % c.bytes = size of ctree in bytes
[c.bytes t.bytes]
ans =
        4972        8173

The compact tree is smaller than the full tree.

Was this topic helpful?