Documentation Center

  • Trial Software
  • Product Updates

resubPredict

Class: ClassificationTree

Predict resubstitution response of tree

Syntax

label = resubPredict(tree)
[label,posterior] = resubPredict(tree)
[label,posterior,node] = resubPredict(tree)
[label,posterior,node,cnum] = resubPredict(tree)
[label,...] = resubPredict(tree,Name,Value)

Description

label = resubPredict(tree) returns the labels tree predicts for the data tree.X. label is the predictions of tree on the data that fitctree used to create tree.

[label,posterior] = resubPredict(tree) returns the posterior class probabilities for the predictions.

[label,posterior,node] = resubPredict(tree) returns the node numbers of tree for the resubstituted data.

[label,posterior,node,cnum] = resubPredict(tree) returns the predicted class numbers for the predictions.

[label,...] = resubPredict(tree,Name,Value) returns resubstitution predictions with additional options specified by one or more Name,Value pair arguments.

Input Arguments

tree

A classification tree constructed by fitctree.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'subtrees'

A vector with integer values from 0 (full unpruned tree) to the maximal pruning level max(tree.PruneList). subtrees must be in ascending order.

Default: 0

Output Arguments

label

The response tree predicts for the training data. label is the same data type as the training response data tree.Y.

If the subtrees name-value argument contains m>1 entries, label has m columns, each of which represents the predictions of the corresponding subtree. Otherwise, label is a vector.

posterior

Matrix or array of posterior probabilities for classes tree predicts.

If the subtrees name-value argument is a scalar or is missing, posterior is an n-by-k matrix, where n is the number of rows in the training data tree.X, and k is the number of classes.

If subtrees contains m>1 entries, posterior is an n-by-k-by-m array, where the matrix for each m gives posterior probabilities for the corresponding subtree.

node

The node numbers of tree where each data row resolves.

If the subtrees name-value argument is a scalar or is missing, node is a numeric column vector with n rows, the same number of rows as tree.X.

If subtrees contains m>1 entries, node is a n-by-m matrix. Each column represents the node predictions of the corresponding subtree.

cnum

The class numbers that tree predicts for the resubstituted data.

If the subtrees name-value argument is a scalar or is missing, cnum is a numeric column vector with n rows, the same number of rows as tree.X.

If subtrees contains m>1 entries, cnum is a n-by-m matrix. Each column represents the class predictions of the corresponding subtree.

Definitions

Posterior Probability

The posterior probability of the classification at a node is the number of training sequences that lead to that node with this classification, divided by the number of training sequences that lead to that node.

For example, consider classifying a predictor X as true when X<0.15 or X>0.95, and X is false otherwise.

  1. Generate 100 random points and classify them:

    rng(0,'twister') % for reproducibility
    X = rand(100,1);
    Y = (abs(X - .55) > .4);
    tree = fitctree(X,Y);
    view(tree,'mode','graph')

  2. Prune the tree:

    tree1 = prune(tree,'level',1);
    view(tree1,'mode','graph')

    The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations between .15 and .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false.

  3. Compute the prediction scores for the first 10 rows of X:

    [~,score] = predict(tree1,X(1:10));
    [score X(1:10,:)]
    
    ans =
        0.9405    0.0595    0.6555
        0.9405    0.0595    0.1712
        0.9405    0.0595    0.7060
             0    1.0000    0.0318
        0.9405    0.0595    0.2769
             0    1.0000    0.0462
             0    1.0000    0.0971
        0.9405    0.0595    0.8235
        0.9405    0.0595    0.6948
        0.9405    0.0595    0.3171

    Indeed, every value of X (the rightmost column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.94 and 0.06.

Examples

Find the total number of misclassifications of the Fisher iris data for a classification tree:

load fisheriris
tree = fitctree(meas,species);
Ypredict = resubPredict(tree); % the predictions
Ysame = strcmp(Ypredict,species); % true when ==
sum(~Ysame) % how many are different?

ans =
     3

See Also

| | | |

Was this topic helpful?