When growing a classification tree, finding an optimal binary split for a categorical predictor with many levels is significantly more computationally challenging than finding a split for a continuous predictor. For a continuous predictor, a tree can split halfway between any two adjacent unique values of this predictor.

In contrast, to find an exact optimal binary split for a categorical
predictor with *L* levels, a classification tree
needs to consider 2^{L–1}–1
splits. To obtain this formula, observe that you can assign *L* distinct
values to the left and right nodes in 2^{L} ways.
Two out of these 2^{L} configurations
leave either the left or right node empty, and therefore should be
discarded. Now, divide by 2 because left and right can be swapped.

For regression and binary classification problems, with *K* =
2 response classes, there is a computational shortcut [1]. The tree can order the categories by mean response (for
regression) or class probability for one of the classes (for classification).
Then, the optimal split is one of the *L* –
1 splits for the ordered list. When *K* = 2, `fitctree`

always uses an exact search.

Therefore, computational challenges really only arise when
growing classification trees for data with *K* ≥
3 classes. To reduce computation, there are several heuristic algorithms
for finding a good split. When using `fitctree`

to
grow a classification tree, you can choose an algorithm for splitting
categorical predictors using the `AlgorithmForCategorical`

name-value pair
argument. You can also set this algorithm when creating a classification `template`

.

If you do not specify an algorithm, `fitctree`

splits
categorical predictors using the exact search algorithm, provided
the predictor has at most `MaxNumCategories`

levels
(the default is 10 levels, and, depending on your platform, you cannot
perform an exact search on categorical predictors with more than 32
or 64 levels). Otherwise, `fitctree`

chooses
a good inexact search algorithm based on the number of classes and
levels.

The available heuristic algorithms are: pull left by purity, a principal component-based partitioning, and one versus all by class.

This algorithm starts with all *L* categorical
levels on the right branch. Inspect the *K* categories
that have the largest class probabilities for each class. Move the
category with the maximum value of the split criterion to the left
branch. Continue moving categories from right to left, recording the
split criterion at each move, until the right child has only one category
remaining. Out of this sequence, the chosen split is the one that
maximizes the split criterion.

Select this pull left by purity algorithm by using the `'AlgorithmForCategorial','PullLeft'`

name-value
pair in `fitctree`

.

This algorithm was developed by Coppersmith, Hong, and Hosking [2].
It finds a close-to-optimal binary partition of the *L* predictor
levels by searching for a separating hyperplane that is perpendicular
to the first principal component of the weighted covariance matrix
of the centered class probability matrix.

The algorithm assigns a score to each of the *L* categories,
computed as the inner product between the found principal component
and the vector of class probabilities for that category. Then, the
chosen split is the one of the *L* – 1 splits
of the scores that maximizes the split criterion.

Select this principal component-based partitioning by using
the `'AlgorithmForCategorical','PCA'`

name-value
pair in `fitctree`

.

This algorithm starts with all *L* categorical
levels on the right branch. For each of the *K* classes,
order the categories based on their probability for that class.

For the first class, move each category to the left branch in order, recording the split criterion at each move. Repeat for the remaining classes. Out of this sequence, the chosen split is the one that maximizes the split criterion.

Select this one versus all by class algorithm by using the `'AlgorithmForCategorial','OVAbyClass'`

name-value
pair in `fitctree`

.

[1] Breiman, L., J. H. Friedman, R. A. Olshen,
and C. J. Stone. *Classification and Regression Trees*.
Chapman & Hall, Boca Raton, 1993.

[2] Coppersmith, D., S. J. Hong, and J. R.
M. Hosking. "Partitioning Nominal Attributes in Decision Trees." *Data
Mining and Knowledge Discovery*, Vol. 3, 1999, pp. 197–217.

`fitctree`

| `fitrtree`

| `template`

Was this topic helpful?