m = margin(tree,TBL,ResponseVarName)
m = margin(tree,TBL,Y)
m = margin(tree,X,Y)
tree— Trained classification tree
ClassificationTreemodel object |
TBL— Sample data
Sample data, specified as a table. Each row of
to one observation, and each column corresponds to one predictor variable.
TBL can contain additional columns
for the response variable and observation weights.
contain all the predictors used to train
Multi-column variables and cell arrays other than cell arrays of character
vectors are not allowed.
If you train
tree using sample data contained
table, then the input data for this method
must also be in a table.
X— Data to classify
ResponseVarName— Response variable name
If you specify
ResponseVarName, then you
must do so as a character vector. For example, if the response variable
is stored as
TBL.Response, then specify it as
Otherwise, the software treats all columns of
TBL.ResponseVarName, as predictors.
The response variable must be a categorical or character array, logical or numeric vector, or cell array of character vectors. If the response variable is a character array, then each element must correspond to one row of the array.
Y— Class labels
Class labels, specified as a categorical or character array,
a logical or numeric vector, or a cell array of character vectors.
be of the same type as the classification used to train
and its number of elements must equal the number of rows of
Compute the classification margin for the Fisher iris data, trained on its first two columns of data, and view the last 10 entries.
load fisheriris X = meas(:,1:2); tree = fitctree(X,species); M = margin(tree,X,species); M(end-10:end)
ans = 0.1111 0.1111 0.1111 -0.2857 0.6364 0.6364 0.1111 0.7500 1.0000 0.6364 0.2000
The classification tree trained on all the data is better.
tree = fitctree(meas,species); M = margin(tree,meas,species); M(end-10:end)
ans = 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565
The classification margin is the difference
between the classification score for the true
class and maximal classification score for the false classes. Margin
is a column vector with the same number of rows as in the matrix
For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.
For example, consider classifying a predictor
Generate 100 random points and classify them:
rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')
Prune the tree:
tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')
The pruned tree correctly classifies observations that are less
than 0.15 as
true. It also correctly classifies
observations from .15 to .94 as
it incorrectly classifies observations that are greater than .94 as
Therefore, the score for observations that are greater than .15 should
be about .05/.85=.06 for
true, and about .8/.85=.94
Compute the prediction scores for the first 10 rows of
[~,score] = predict(tree1,X(1:10)); [score X(1:10,:)]
ans = 0.9059 0.0941 0.8147 0.9059 0.0941 0.9058 0 1.0000 0.1270 0.9059 0.0941 0.9134 0.9059 0.0941 0.6324 0 1.0000 0.0975 0.9059 0.0941 0.2785 0.9059 0.0941 0.5469 0.9059 0.0941 0.9575 0.9059 0.0941 0.9649
Indeed, every value of
X (the right-most
column) that is less than 0.15 has associated scores (the left and
center columns) of
while the other values of
X have associated scores
0.09. The difference
0.09 instead of the expected
is due to a statistical fluctuation: there are
X in the range
of the expected
This function fully supports tall arrays. For more information, see Tall Arrays (MATLAB).