Class: ClassificationPartitionedModel
Predict response for observations not used for training
label = kfoldPredict(obj)
[label,score]
= kfoldPredict(obj)
[label,score,cost]
= kfoldPredict(obj)
returns
class labels predicted by label
= kfoldPredict(obj
)obj
, a crossvalidated
classification. For every fold, kfoldPredict
predicts
class labels for infold observations using a model trained on outoffold
observations.
[
returns the predicted
classification scores for infold observations using a model trained
on outoffold observations.label
,score
]
= kfoldPredict(obj
)
[
returns misclassification
costs.label
,score
,cost
]
= kfoldPredict(obj
)

Object of class 

Vector of class labels of the same type as the response data
used in training 

Numeric matrix of size 

Numeric matrix of misclassification costs of size 
The average misclassification cost is the mean misclassification cost for predictions made by the crossvalidated classifiers trained on outoffold observations. The matrix of expected costs per observation is defined in Cost.
For discriminant analysis, the score of a classification is the posterior probability of the classification. For the definition of posterior probability in discriminant analysis, see Posterior Probability.
For ensembles, a classification score represents the confidence of a classification into a class. The higher the score, the higher the confidence.
Different ensemble algorithms have different definitions for their scores. Furthermore, the range of scores depends on ensemble type. For example:
AdaBoostM1
scores range from –∞
to ∞.
Bag
scores range from 0
to 1
.
For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.
For example, consider classifying a predictor X
as true
when X
< 0.15
or X
> 0.95
, and X
is
false otherwise.
Generate 100 random points and classify them:
rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X  .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')
Prune the tree:
tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')
The pruned tree correctly classifies observations that are less
than 0.15 as true
. It also correctly classifies
observations from .15 to .94 as false
. However,
it incorrectly classifies observations that are greater than .94 as false
.
Therefore, the score for observations that are greater than .15 should
be about .05/.85=.06 for true
, and about .8/.85=.94
for false
.
Compute the prediction scores for the first 10 rows of X
:
[~,score] = predict(tree1,X(1:10)); [score X(1:10,:)]
ans = 0.9059 0.0941 0.8147 0.9059 0.0941 0.9058 0 1.0000 0.1270 0.9059 0.0941 0.9134 0.9059 0.0941 0.6324 0 1.0000 0.0975 0.9059 0.0941 0.2785 0.9059 0.0941 0.5469 0.9059 0.0941 0.9575 0.9059 0.0941 0.9649
Indeed, every value of X
(the rightmost
column) that is less than 0.15 has associated scores (the left and
center columns) of 0
and 1
,
while the other values of X
have associated scores
of 0.91
and 0.09
. The difference
(score 0.09
instead of the expected .06
)
is due to a statistical fluctuation: there are 8
observations
in X
in the range (.95,1)
instead
of the expected 5
observations.
ClassificationPartitionedModel
 crossval
 kfoldEdge
 kfoldfun
 kfoldLoss
 kfoldMargin