Out-of-bag classification error
L = oobloss(ens)
L = oobloss(ens,Name,Value)
error with additional options specified by one or more
L = oobloss(
arguments. You can specify several name-value pair arguments in any
A classification bagged ensemble, constructed with
Specify optional comma-separated pairs of
Name is the argument
Value is the corresponding
Name must appear
inside single quotes (
You can specify several name and value pair
arguments in any order as
Indices of weak learners in the ensemble ranging from
Function handle or string representing a loss function. Built-in loss functions:
You can write your own loss function in the syntax described in Loss Functions.
String representing the meaning of the output
Classification error of the out-of-bag observations, a scalar.
Bagging, which stands for "bootstrap
aggregation", is a type of ensemble learning. To bag a weak
learner such as a decision tree on a dataset,
many bootstrap replicas of the dataset and grows decision trees on
fitensemble obtains each bootstrap
replica by randomly selecting
N observations out
N with replacement, where
the dataset size. To find the predicted response of a trained ensemble,
an average over predictions from individual trees.
N out of
with replacement omits on average 37% (1/e) of
observations for each decision tree. These are "out-of-bag" observations.
For each observation,
oobLoss estimates the out-of-bag
prediction by averaging over predictions from all trees in the ensemble
for which this observation is out of bag. It then compares the computed
prediction against the true response for this observation. It calculates
the out-of-bag error by comparing the out-of-bag predicted responses
against the true responses for all observations used for training.
This out-of-bag average is an unbiased estimator of the true ensemble
The built-in loss functions are:
'binodeviance' — For binary
classification, assume the classes yn are
With weight vector w normalized to have sum
and predictions of row n of data X as f(Xn),
the binomial deviance is
'classiferror' — Fraction
of misclassified data, weighted by w.
'exponential' — With the
same definitions as for
'binodeviance', the exponential
'hinge' — Classification
error measure that has the form
wj is weight j.
For binary classification, yj = 1 for the positive class and -1 for the negative class. For problems where the number of classes K > 3, yj is a vector of 0s, but with a 1 in the position corresponding to the true class, e.g., if the second observation is in the third class and K = 4, then y2 = [0 0 1 0]′.
is, for binary classification, the posterior probability or, for K > 3, a vector of posterior probabilities for each class given observation j.
'mincost' — Predict the
label with the smallest expected misclassification cost, with expectation
taken over the posterior probability, and cost as given by the
of the classifier (a matrix). The loss is then the true misclassification
cost averaged over the observations.
To write your own loss function, create a function file of the form
function loss = lossfun(C,S,W,COST)
N is the number of rows of
K is the number of classes in
C is an
matrix, with one
true per row for the true class.
The index for each class is its position in
S is an
S is a matrix of posterior probabilities
for classes with one row per observation, similar to the
W is a numeric vector with
the observation weights.
COST is a
matrix of misclassification costs. The default
function uses a cost of
0 for correct classification,
1 for misclassification. In other words,
loss should be a scalar.
Pass the function handle
the value of the
lossfun name-value pair.
Find the out-of-bag error for a bagged ensemble from the Fisher iris data:
load fisheriris ens = fitensemble(meas,species,'Bag',100,... 'Tree','type','classification'); L = oobLoss(ens) L = 0.0467