When growing a classification tree, finding an optimal binary split for a categorical predictor with many levels is significantly more computationally challenging than finding a split for a continuous predictor. For a continuous predictor, a tree can split halfway between any two adjacent unique values of this predictor.
In contrast, to find an exact optimal binary split for a categorical predictor with L levels, a classification tree needs to consider 2L–1–1 splits. To obtain this formula, observe that you can assign L distinct values to the left and right nodes in 2L ways. Two out of these 2L configurations leave either the left or right node empty, and therefore should be discarded. Now, divide by 2 because left and right can be swapped.
For regression and binary classification problems, with K =
2 response classes, there is a computational shortcut . The tree can order the categories by mean response (for
regression) or class probability for one of the classes (for classification).
Then, the optimal split is one of the L –
1 splits for the ordered list. When K = 2,
fitctree always uses an exact search.
Therefore, computational challenges really only arise when
growing classification trees for data with K ≥
3 classes. To reduce computation, there are several heuristic algorithms
for finding a good split. When using
grow a classification tree, you can choose an algorithm for splitting
categorical predictors using the
AlgorithmForCategorical name-value pair
argument. You can also set this algorithm when creating a classification
If you do not specify an algorithm,
categorical predictors using the exact search algorithm, provided
the predictor has at most
(the default is 10 levels, and, depending on your platform, you cannot
perform an exact search on categorical predictors with more than 32
or 64 levels). Otherwise,
a good inexact search algorithm based on the number of classes and
The available heuristic algorithms are: pull left by purity, a principal component-based partitioning, and one versus all by class.
This algorithm starts with all L categorical levels on the right branch. Inspect the K categories that have the largest class probabilities for each class. Move the category with the maximum value of the split criterion to the left branch. Continue moving categories from right to left, recording the split criterion at each move, until the right child has only one category remaining. Out of this sequence, the chosen split is the one that maximizes the split criterion.
Select this pull left by purity algorithm by using the
This algorithm was developed by Coppersmith, Hong, and Hosking . It finds a close-to-optimal binary partition of the L predictor levels by searching for a separating hyperplane that is perpendicular to the first principal component of the weighted covariance matrix of the centered class probability matrix.
The algorithm assigns a score to each of the L categories, computed as the inner product between the found principal component and the vector of class probabilities for that category. Then, the chosen split is the one of the L – 1 splits of the scores that maximizes the split criterion.
Select this principal component-based partitioning by using
This algorithm starts with all L categorical levels on the right branch. For each of the K classes, order the categories based on their probability for that class.
For the first class, move each category to the left branch in order, recording the split criterion at each move. Repeat for the remaining classes. Out of this sequence, the chosen split is the one that maximizes the split criterion.
Select this one versus all by class algorithm by using the
 Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Chapman & Hall, Boca Raton, 1993.
 Coppersmith, D., S. J. Hong, and J. R. M. Hosking. "Partitioning Nominal Attributes in Decision Trees." Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217.