E = edge(tree,X,Y)
E = edge(tree,X,Y,Name,Value)
A matrix where each row represents an observation, and each column represents a predictor. The number of columns in X must equal the number of predictors in tree.
Class labels, with the same data type as exists in tree. The number of elements of Y must equal the number of rows of X.
Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.
Observation weights, a numeric vector of length size(X,1). If you supply weights, edge computes weighted classification edge.
The classification margin is the difference between the classification score for the true class and maximal classification score for the false classes. Margin is a column vector with the same number of rows as the matrix X.
For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.
For example, consider classifying a predictor X as true when X < 0.15 or X > 0.95, and X is false otherwise.
Generate 100 random points and classify them:
rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')
Prune the tree:
tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')
The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations from .15 to .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false.
Compute the prediction scores for the first 10 rows of X:
[~,score] = predict(tree1,X(1:10)); [score X(1:10,:)]
ans = 0.9059 0.0941 0.8147 0.9059 0.0941 0.9058 0 1.0000 0.1270 0.9059 0.0941 0.9134 0.9059 0.0941 0.6324 0 1.0000 0.0975 0.9059 0.0941 0.2785 0.9059 0.0941 0.5469 0.9059 0.0941 0.9575 0.9059 0.0941 0.9649
Indeed, every value of X (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.91 and 0.09. The difference (score 0.09 instead of the expected .06) is due to a statistical fluctuation: there are 8 observations in X in the range (.95,1) instead of the expected 5 observations.
The edge is the weighted mean value of the classification margin. The weights are the class probabilities in tree.Prior. If you supply weights in the weights name-value pair, those weights are normalized to sum to the prior probabilities in the respective classes, and are then used to compute the weighted average.
Compute the classification margin and edge for the Fisher iris data, trained on its first two columns of data, and view the last 10 entries:
load fisheriris X = meas(:,1:2); tree = fitctree(X,species); E = edge(tree,X,species) E = 0.6299 M = margin(tree,X,species); M(end-10:end)
ans = 0.1111 0.1111 0.1111 -0.2857 0.6364 0.6364 0.1111 0.7500 1.0000 0.6364 0.2000
The classification tree trained on all the data is better.
tree = fitctree(meas,species); E = edge(tree,meas,species) E = 0.9384 M = margin(tree,meas,species); M(end-10:end)
ans = 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565 0.9565