Documentation

predict

Class: CompactClassificationTree

Predict classification

Syntax

  • label = predict(tree,TBL)
  • label = predict(tree,X)
  • label = predict(___,Name,Value)
  • [label,score,node,cnum] = predict(___)

Description

label = predict(tree,TBL) returns a vector of predicted class labels for a table TBL, based on tree, a trained full or compact classification tree.

label = predict(tree,X) returns a vector of predicted class labels for a matrix X, based on tree, a trained full or compact classification tree.

label = predict(___,Name,Value) returns labels with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes. For example, you can specify subtrees.

[label,score,node,cnum] = predict(___)also returns a matrix of scores indicating the likelihood that a label comes from a particular class (score), a vector of predicted node numbers for the classification (node), and a vector of predicted class number for the classification (cnum), using any of the previous syntaxes.

Input Arguments

expand all

tree — Trained classification treeClassificationTree model object | CompactClassificationTree model object

Trained classification tree, specified as a ClassificationTree or CompactClassificationTree model object. That is, tree is a trained classification model returned by fitctree or compact.

TBL — Sample datatable

Sample data, specified as a table. Each row of TBL corresponds to one observation, and each column corresponds to one predictor variable. Optionally, TBL can contain additional columns for the response variable and observation weights. TBL must contain all the predictors used to train tree. Multi-column variables and cell arrays other than cell arrays of character vectors are not allowed.

If TBL contains the response variable used to train tree, then you do not need to specify ResponseVarName or Y.

If you train tree using sample data contained in a table, then the input data for this method must also be in a table.

Data Types: table

X — Data to classifynumeric matrix

Data to classify, specified as a numeric matrix. Each row of X represents one observation, and each column represents one predictor. X must have the same number of columns as the data used to train tree. X must have the same number of rows as the number of elements in Y.

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Subtrees' — Pruning level0 (default) | vector of nonnegative integers | 'all'

Pruning level, specified as the comma-separated pair consisting of 'Subtrees' and a vector of nonnegative integers in ascending order or 'all'.

If you specify a vector, then all elements must be at least 0 and at most max(tree.PruneList). 0 indicates the full, unpruned tree and max(tree.PruneList) indicates the completely pruned tree (i.e., just the root node).

If you specify 'all', then CompactClassificationTree.predict operates on all subtrees (i.e., the entire pruning sequence). This specification is equivalent to using 0:max(tree.PruneList).

CompactClassificationTree.predict prunes tree to each level indicated in Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments.

To invoke Subtrees, the properties PruneList and PruneAlpha of tree must be nonempty. In other words, grow tree by setting 'Prune','on', or by pruning tree using prune.

Example: 'Subtrees','all'

Output Arguments

expand all

label — Class labelsvector

Class labels, returned as a vector of the same type as the response data used in training tree. Each entry of label corresponds to the class with minimal expected cost for the corresponding row of X. See Predicted Class Label.

If Subtrees has T elements, and X has N rows, then labels is an N-by-T matrix. The ith column of labels contains the fitted values produced by the Subtrees(I) subtree.

score — Posterior probabilitiesnumeric matrix

Posterior probabilities, returned as a numeric matrix of size N-by-K, where N is the number of observations (rows) in X, and K is the number of classes (in tree.ClassNames). score(i,j) is the posterior probability that row i of X is of class j.

If Subtrees has T elements, and X has N rows, then score is an N-by-K-by-T array, and node and cnum are N-by-T matrices.

node — Node numbersnumeric vector

Node numbers for the predicted classes, returned as a numeric vector. Each entry corresponds to the predicted node in tree for the corresponding row of X.

cnum — Class numbersnumeric vector

Class numbers corresponding to the predicted labels, returned as a numeric vector. Each entry of cnum corresponds to a predicted class number for the corresponding row of X.

Definitions

Predicted Class Label

predict classifies so as to minimize the expected classification cost:

y^=argminy=1,...,Kk=1KP^(k|x)C(y|k),

where

  • y^ is the predicted classification.

  • K is the number of classes.

  • P^(k|x) is the posterior probability of class k for observation x.

  • C(y|k) is the cost of classifying an observation as y when its true class is k.

Score (tree)

For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.

For example, consider classifying a predictor X as true when X < 0.15 or X > 0.95, and X is false otherwise.

Generate 100 random points and classify them:

rng(0,'twister') % for reproducibility
X = rand(100,1);
Y = (abs(X - .55) > .4);
tree = fitctree(X,Y);
view(tree,'Mode','Graph')

Prune the tree:

tree1 = prune(tree,'Level',1);
view(tree1,'Mode','Graph')

The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations from .15 to .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false.

Compute the prediction scores for the first 10 rows of X:

[~,score] = predict(tree1,X(1:10));
[score X(1:10,:)]
ans =

    0.9059    0.0941    0.8147
    0.9059    0.0941    0.9058
         0    1.0000    0.1270
    0.9059    0.0941    0.9134
    0.9059    0.0941    0.6324
         0    1.0000    0.0975
    0.9059    0.0941    0.2785
    0.9059    0.0941    0.5469
    0.9059    0.0941    0.9575
    0.9059    0.0941    0.9649

Indeed, every value of X (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.91 and 0.09. The difference (score 0.09 instead of the expected .06) is due to a statistical fluctuation: there are 8 observations in X in the range (.95,1) instead of the expected 5 observations.

True Misclassification Cost

There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.

You can set the true misclassification cost per class in the Cost name-value pair when you create the classifier using the fitctree method. Cost(i,j) is the cost of classifying an observation into class j if its true class is i. By default, Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. In other words, the cost is 0 for correct classification, and 1 for incorrect classification.

Expected Cost

There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.

Suppose you have Nobs observations that you want to classify with a trained classifier. Suppose you have K classes. You place the observations into a matrix Xnew with one observation per row.

The expected cost matrix CE has size Nobs-by-K. Each row of CE contains the expected (average) cost of classifying the observation into each of the K classes. CE(n,k) is

i=1KP^(i|Xnew(n))C(k|i),

where

  • K is the number of classes.

  • P^(i|Xnew(n)) is the posterior probability of class i for observation Xnew(n).

  • C(k|i) is the true misclassification cost of classifying an observation as k when its true class is i.

Predictive Measure of Association

The predictive measure of association is a value that indicates the similarity between decision rules that split observations. Among all possible decision splits that are compared to the optimal split (found by growing the tree), the best surrogate decision split yields the maximum predictive measure of association. The second-best surrogate split has the second-largest predictive measure of association.

Suppose xj and xk are predictor variables j and k, respectively, and jk. At node t, the predictive measure of association between the optimal split xj < u and a surrogate split xk < v is

λjk=min(PL,PR)(1PLjLkPRjRk)min(PL,PR).

  • PL is the proportion of observations in node t, such that xj < u. The subscript L stands for the left child of node t.

  • PR is the proportion of observations in node t, such that xju. The subscript R stands for the right child of node t.

  • PLjLk is the proportion of observations at node t, such that xj < u and xk < v.

  • PRjRk is the proportion of observations at node t, such that xju and xkv.

  • Observations with missing values for xj or xk do not contribute to the proportion calculations.

λjk is a value in (–∞,1]. If λjk > 0, then xk < v is a worthwhile surrogate split for xj < u.

Examples

expand all

Predict Labels Using a Classification Tree

Examine predictions for a few rows in a data set left out of training.

Load Fisher's iris data set.

load fisheriris

Partition the data into training (50%) and validation (50%) sets.

n = size(meas,1);
rng(1) % For reproducibility
idxTrn = false(n,1);
idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices
idxVal = idxTrn == false;                  % Validation set logical indices

Grow a classification tree using the training set.

Mdl = fitctree(meas(idxTrn,:),species(idxTrn));

Predict labels for the validation data. Count the number of misclassified observations.

label = predict(Mdl,meas(idxVal,:));
label(randsample(numel(label),5)) % Display several predicted labels
numMisclass = sum(~strcmp(label,species(idxVal)))
ans = 

    'setosa'
    'setosa'
    'setosa'
    'virginica'
    'versicolor'


numMisclass =

     3

The software misclassifies three out-of-sample observations.

Estimate Class Posterior Probabilities Using a Classification Tree

Load Fisher's iris data set.

load fisheriris

Partition the data into training (50%) and validation (50%) sets.

n = size(meas,1);
rng(1) % For reproducibility
idxTrn = false(n,1);
idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices
idxVal = idxTrn == false;                  % Validation set logical indices

Grow a classification tree using the training set, and then view it.

Mdl = fitctree(meas(idxTrn,:),species(idxTrn));
view(Mdl,'Mode','graph')

The resulting tree has four levels.

Estimate posterior probabilities for the test set using subtrees pruned to levels 1 and 3.

[~,Posterior] = predict(Mdl,meas(idxVal,:),'SubTrees',[1 3]);
Mdl.ClassNames
Posterior(randsample(size(Posterior,1),5),:,:),...
    % Display several posterior probabilities
ans = 

    'setosa'
    'versicolor'
    'virginica'


ans(:,:,1) =

    1.0000         0         0
    1.0000         0         0
    1.0000         0         0
         0         0    1.0000
         0    0.8571    0.1429


ans(:,:,2) =

    0.3733    0.3200    0.3067
    0.3733    0.3200    0.3067
    0.3733    0.3200    0.3067
    0.3733    0.3200    0.3067
    0.3733    0.3200    0.3067

The elements of Posterior are class posterior probabilities:

  • Rows correspond to observations in the validation set.

  • Columns correspond to the classes as listed in Mdl.ClassNames.

  • Pages correspond to the subtrees.

The subtree pruned to level 1 is more sure of its predictions than the subtree pruned to level 3 (i.e., the root node).

Algorithms

predict generates predictions by following the branches of tree until it reaches a leaf node or a missing value. If predict reaches a leaf node, it returns the classification of that node.

If predict reaches a node with a missing value for a predictor, its behavior depends on the setting of the Surrogate name-value pair when fitctree constructs tree.

  • Surrogate = 'off' (default) — predict returns the label with the largest number of training samples that reach the node.

  • Surrogate = 'on'predict uses the best surrogate split at the node. If all surrogate split variables with positive predictive measure of association are missing, predict returns the label with the largest number of training samples that reach the node. For a definition, see Predictive Measure of Association.

See Also

| | | | |

Was this topic helpful?