|On this page…|
When growing a classification tree, finding an optimal binary split for a categorical predictor with many levels is significantly more computationally challenging than finding a split for a continuous predictor. For a continuous predictor, a tree can split halfway between any two adjacent unique values of this predictor.
In contrast, to find an exact optimal binary split for a categorical predictor with L levels, a classification tree needs to consider 2L–1–1 splits. To obtain this formula, observe that you can assign L distinct values to the left and right nodes in 2L ways. Two out of these 2L configurations leave either the left or right node empty, and therefore should be discarded. Now, divide by 2 because left and right can be swapped.
For regression and binary classification problems, with K = 2 response classes, there is a computational shortcut . The tree can order the categories by mean response (for regression) or class probability for one of the classes (for classification). Then, the optimal split is one of the L – 1 splits for the ordered list. When K = 2, fitctree always uses an exact search.
Therefore, computational challenges really only arise when growing classification trees for data with K ≥ 3 classes. To reduce computation, there are several heuristic algorithms for finding a good split. When using fitctree to grow a classification tree, you can choose an algorithm for splitting categorical predictors using the AlgorithmForCategorical name-value pair argument. You can also set this algorithm when creating a classification template.
If you do not specify an algorithm, fitctree splits categorical predictors using the exact search algorithm, provided the predictor has at most MaxCat levels (the default is 10 levels, and, depending on your platform, you cannot perform an exact search on categorical predictors with more than 32 or 64 levels). Otherwise, fitctree chooses a good inexact search algorithm based on the number of classes and levels.
The available heuristic algorithms are: pull left by purity, a principal component-based partitioning, and one versus all by class.
This algorithm starts with all L categorical levels on the right branch. Inspect the K categories that have the largest class probabilities for each class. Move the category with the maximum value of the split criterion to the left branch. Continue moving categories from right to left, recording the split criterion at each move, until the right child has only one category remaining. Out of this sequence, the chosen split is the one that maximizes the split criterion.
Select this pull left by purity algorithm by using the 'AlgorithmForCategorial','PullLeft' name-value pair in fitctree.
This algorithm was developed by Coppersmith, Hong, and Hosking . It finds a close-to-optimal binary partition of the L predictor levels by searching for a separating hyperplane that is perpendicular to the first principal component of the weighted covariance matrix of the centered class probability matrix.
The algorithm assigns a score to each of the L categories, computed as the inner product between the found principal component and the vector of class probabilities for that category. Then, the chosen split is the one of the L – 1 splits of the scores that maximizes the split criterion.
Select this principal component-based partitioning by using the 'AlgorithmForCategorical','PCA' name-value pair in fitctree.
This algorithm starts with all L categorical levels on the right branch. For each of the K classes, order the categories based on their probability for that class.
For the first class, move each category to the left branch in order, recording the split criterion at each move. Repeat for the remaining classes. Out of this sequence, the chosen split is the one that maximizes the split criterion.
Select this one versus all by class algorithm by using the 'AlgorithmForCategorial','OVAbyClass' name-value pair in fitctree.
 Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Chapman & Hall, Boca Raton, 1993.
 Coppersmith, D., S. J. Hong, and J. R. M. Hosking. "Partitioning Nominal Attributes in Decision Trees." Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217.