Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

predict

Class: CompactClassificationTree

Predict labels using classification tree

Syntax

label = predict(Mdl,X)
label = predict(Mdl,X,Name,Value)
[label,score,node,cnum] = predict(___)

Description

label = predict(Mdl,X) returns a vector of predicted class labels for the predictor data in the table or matrix X, based on the trained, full or compact classification tree Mdl.

label = predict(Mdl,X,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, you can specify to prune Mdl to a particular level before predicting labels.

[label,score,node,cnum] = predict(___) uses any of the input argument in the previous syntaxes and additionally returns:

  • A matrix of classification scores (score) indicating the likelihood that a label comes from a particular class. For classification trees, scores are posterior probabilities. For each observation in X, the predicted class label corresponds to the minimum expected misclassification cost among all classes.

  • A vector of predicted node numbers for the classification (node).

  • A vector of predicted class number for the classification (cnum).

Input Arguments

expand all

Trained classification tree, specified as a ClassificationTree or CompactClassificationTree model object. That is, Mdl is a trained classification model returned by fitctree or compact.

Predictor data to be classified, specified as a numeric matrix or table.

Each row of X corresponds to one observation, and each column corresponds to one variable.

  • For a numeric matrix:

    • The variables making up the columns of X must have the same order as the predictor variables that trained Mdl.

    • If you trained Mdl using a table (for example, Tbl), then X can be a numeric matrix if Tbl contains all numeric predictor variables. To treat numeric predictors in Tbl as categorical during training, identify categorical predictors using the CategoricalPredictors name-value pair argument of fitctree. If Tbl contains heterogeneous predictor variables (for example, numeric and categorical data types) and X is a numeric matrix, then predict throws an error.

  • For a table:

    • predict does not support multi-column variables and cell arrays other than cell arrays of character vectors.

    • If you trained Mdl using a table (for example, Tbl), then all predictor variables in X must have the same variable names and data types as those that trained Mdl (stored in Mdl.PredictorNames). However, the column order of X does not need to correspond to the column order of Tbl. Tbl and X can contain additional variables (response variables, observation weights, etc.), but predict ignores them.

    • If you trained Mdl using a numeric matrix, then the predictor names in Mdl.PredictorNames and corresponding predictor variable names in X must be the same. To specify predictor names during training, see the PredictorNames name-value pair argument of fitctree. All predictor variables in X must be numeric vectors. X can contain additional variables (response variables, observation weights, etc.), but predict ignores them.

Data Types: table | double | single

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

expand all

Pruning level, specified as the comma-separated pair consisting of 'Subtrees' and a vector of nonnegative integers in ascending order or 'all'.

If you specify a vector, then all elements must be at least 0 and at most max(Mdl.PruneList). 0 indicates the full, unpruned tree and max(Mdl.PruneList) indicates the completely pruned tree (i.e., just the root node).

If you specify 'all', then CompactClassificationTree.predict operates on all subtrees (i.e., the entire pruning sequence). This specification is equivalent to using 0:max(Mdl.PruneList).

CompactClassificationTree.predict prunes Mdl to each level indicated in Subtrees, and then estimates the corresponding output arguments. The size of Subtrees determines the size of some output arguments.

To invoke Subtrees, the properties PruneList and PruneAlpha of Mdl must be nonempty. In other words, grow Mdl by setting 'Prune','on', or by pruning Mdl using prune.

Example: 'Subtrees','all'

Output Arguments

expand all

Predicted class labels, returned as a vector or array. Each entry of label corresponds to the class with minimal expected cost for the corresponding row of X.

Suppose Subtrees is a numeric vector containing T elements (for 'all', see Subtrees), and X has N rows.

  • If the response data type is char and:

    • T = 1, then label is a character matrix containing N rows. Each row contains the predicted label produced by subtree Subtrees.

    • T > 1, then label is an N-by-T cell array.

  • Otherwise, labels is an N-by-T array having the same data type as the response.

In the latter two cases, column j of labels contains the vector of predicted labels produced by subtree Subtrees(j).

Posterior probabilities, returned as a numeric matrix of size N-by-K, where N is the number of observations (rows) in X, and K is the number of classes (in Mdl.ClassNames). score(i,j) is the posterior probability that row i of X is of class j.

If Subtrees has T elements, and X has N rows, then score is an N-by-K-by-T array, and node and cnum are N-by-T matrices.

Node numbers for the predicted classes, returned as a numeric vector. Each entry corresponds to the predicted node in Mdl for the corresponding row of X.

Class numbers corresponding to the predicted labels, returned as a numeric vector. Each entry of cnum corresponds to a predicted class number for the corresponding row of X.

Examples

expand all

Examine predictions for a few rows in a data set left out of training.

Load Fisher's iris data set.

load fisheriris

Partition the data into training (50%) and validation (50%) sets.

n = size(meas,1);
rng(1) % For reproducibility
idxTrn = false(n,1);
idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices
idxVal = idxTrn == false;                  % Validation set logical indices

Grow a classification tree using the training set.

Mdl = fitctree(meas(idxTrn,:),species(idxTrn));

Predict labels for the validation data. Count the number of misclassified observations.

label = predict(Mdl,meas(idxVal,:));
label(randsample(numel(label),5)) % Display several predicted labels
numMisclass = sum(~strcmp(label,species(idxVal)))
ans =

  5×1 cell array

    'setosa'
    'setosa'
    'setosa'
    'virginica'
    'versicolor'


numMisclass =

     3

The software misclassifies three out-of-sample observations.

Load Fisher's iris data set.

load fisheriris

Partition the data into training (50%) and validation (50%) sets.

n = size(meas,1);
rng(1) % For reproducibility
idxTrn = false(n,1);
idxTrn(randsample(n,round(0.5*n))) = true; % Training set logical indices
idxVal = idxTrn == false;                  % Validation set logical indices

Grow a classification tree using the training set, and then view it.

Mdl = fitctree(meas(idxTrn,:),species(idxTrn));
view(Mdl,'Mode','graph')

The resulting tree has four levels.

Estimate posterior probabilities for the test set using subtrees pruned to levels 1 and 3.

[~,Posterior] = predict(Mdl,meas(idxVal,:),'SubTrees',[1 3]);
Mdl.ClassNames
Posterior(randsample(size(Posterior,1),5),:,:),...
    % Display several posterior probabilities
ans =

  3×1 cell array

    'setosa'
    'versicolor'
    'virginica'


ans(:,:,1) =

    1.0000         0         0
    1.0000         0         0
    1.0000         0         0
         0         0    1.0000
         0    0.8571    0.1429


ans(:,:,2) =

    0.3733    0.3200    0.3067
    0.3733    0.3200    0.3067
    0.3733    0.3200    0.3067
    0.3733    0.3200    0.3067
    0.3733    0.3200    0.3067

The elements of Posterior are class posterior probabilities:

  • Rows correspond to observations in the validation set.

  • Columns correspond to the classes as listed in Mdl.ClassNames.

  • Pages correspond to the subtrees.

The subtree pruned to level 1 is more sure of its predictions than the subtree pruned to level 3 (i.e., the root node).

Definitions

expand all

Algorithms

predict generates predictions by following the branches of Mdl until it reaches a leaf node or a missing value. If predict reaches a leaf node, it returns the classification of that node.

If predict reaches a node with a missing value for a predictor, its behavior depends on the setting of the Surrogate name-value pair when fitctree constructs Mdl.

  • Surrogate = 'off' (default) — predict returns the label with the largest number of training samples that reach the node.

  • Surrogate = 'on'predict uses the best surrogate split at the node. If all surrogate split variables with positive predictive measure of association are missing, predict returns the label with the largest number of training samples that reach the node. For a definition, see Predictive Measure of Association.

Extended Capabilities

Introduced in R2011a

Was this topic helpful?