Documentation

edge

Class: CompactClassificationTree

Classification edge

Syntax

  • E = edge(tree,TBL,ResponseVarName)
  • E = edge(tree,X,Y)
  • E = edge(___,Name,Value)

Description

E = edge(tree,TBL,ResponseVarName) returns the classification edge for tree with data TBL and classification TBL.ResponseVarName.

E = edge(tree,X,Y) returns the classification edge for tree with data X and classification Y.

E = edge(___,Name,Value) computes the edge with additional options specified by one or more Name,Value pair arguments, using any of the previous syntaxes. For example, you can specify observation weights.

Input Arguments

expand all

tree — Trained classification treeClassificationTree model object | CompactClassificationTree model object

Trained classification tree, specified as a ClassificationTree or CompactClassificationTree model object. That is, tree is a trained classification model returned by fitctree or compact.

TBL — Sample datatable

Sample data, specified as a table. Each row of TBL corresponds to one observation, and each column corresponds to one predictor variable. Optionally, TBL can contain additional columns for the response variable and observation weights. TBL must contain all the predictors used to train tree. Multi-column variables and cell arrays other than cell arrays of strings are not allowed.

If TBL contains the response variable used to train tree, then you do not need to specify ResponseVarName or Y.

If you train tree using sample data contained in a table, then the input data for this method must also be in a table.

Data Types: table

X — Data to classifynumeric matrix

Data to classify, specified as a numeric matrix. Each row of X represents one observation, and each column represents one predictor. X must have the same number of columns as the data used to train tree. X must have the same number of rows as the number of elements in Y.

Data Types: single | double

ResponseVarName — Response variable namename of a variable in TBL

Response variable name, specified as the name of a variable in TBL. If TBL contains the response variable used to train tree, then you do not need to specify ResponseVarName.

If you specify ResponseVarName, then you must do so as a string. For example, if the response variable is stored as TBL.Response, then specify it as 'Response'. Otherwise, the software treats all columns of TBL, including TBL.ResponseVarName, as predictors.

The response variable must be a categorical or character array, logical or numeric vector, or cell array of strings. If the response variable is a character array, then each element must correspond to one row of the array.

Y — Class labelscategorical array | character array | logical vector | vector of numeric values | cell array of strings

Class labels, specified as a categorical or character array, a logical or numeric vector, or a cell array of strings. Y must be of the same type as the classification used to train tree, and its number of elements must equal the number of rows of X.

Data Types: single | double | categorical | char | logical | cell

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Weights' — Observation weightsones(size(X,1)) (default) | name of a variable in TBL | numeric vector

Observation weights, specified as the comma-separated pair consisting of 'Weights' and a numeric vector or the name of a variable in TBL.

If you specify Weights as a numeric vector, then the size of Weights must be equal to the number of rows in X or TBL.

If you specify Weights as the name of a variable in TBL, you must do so as a string. For example, if the weights are stored as TBL.W, then specify it as 'W'. Otherwise, the software treats all columns of TBL, including TBL.W, as predictors.

If you supply weights, edge computes the weighted classification edge. The software weights the observations in each row of X or TBL with the corresponding weight in Weights.

Output Arguments

expand all

E — Classification edgescalar value

Classification edge, returned as a scalar representing the weighted average value of the margin.

Definitions

Margin

The classification margin is the difference between the classification score for the true class and maximal classification score for the false classes. Margin is a column vector with the same number of rows as the matrix X.

Score (tree)

For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.

For example, consider classifying a predictor X as true when X < 0.15 or X > 0.95, and X is false otherwise.

Generate 100 random points and classify them:

rng(0,'twister') % for reproducibility
X = rand(100,1);
Y = (abs(X - .55) > .4);
tree = fitctree(X,Y);
view(tree,'Mode','Graph')

Prune the tree:

tree1 = prune(tree,'Level',1);
view(tree1,'Mode','Graph')

The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations from .15 to .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false.

Compute the prediction scores for the first 10 rows of X:

[~,score] = predict(tree1,X(1:10));
[score X(1:10,:)]
ans =

    0.9059    0.0941    0.8147
    0.9059    0.0941    0.9058
         0    1.0000    0.1270
    0.9059    0.0941    0.9134
    0.9059    0.0941    0.6324
         0    1.0000    0.0975
    0.9059    0.0941    0.2785
    0.9059    0.0941    0.5469
    0.9059    0.0941    0.9575
    0.9059    0.0941    0.9649

Indeed, every value of X (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.91 and 0.09. The difference (score 0.09 instead of the expected .06) is due to a statistical fluctuation: there are 8 observations in X in the range (.95,1) instead of the expected 5 observations.

Edge

The edge is the weighted mean value of the classification margin. The weights are the class probabilities in tree.Prior. If you supply weights in the weights name-value pair, those weights are normalized to sum to the prior probabilities in the respective classes, and are then used to compute the weighted average.

Examples

Compute the classification margin and edge for the Fisher iris data, trained on its first two columns of data, and view the last 10 entries:

load fisheriris
X = meas(:,1:2);
tree = fitctree(X,species);
E = edge(tree,X,species)

E =
    0.6299

M = margin(tree,X,species);
M(end-10:end)
ans =
    0.1111
    0.1111
    0.1111
   -0.2857
    0.6364
    0.6364
    0.1111
    0.7500
    1.0000
    0.6364
    0.2000

The classification tree trained on all the data is better.

tree = fitctree(meas,species);
E = edge(tree,meas,species)

E =
    0.9384

M = margin(tree,meas,species);
M(end-10:end)
ans =
    0.9565
    0.9565
    0.9565
    0.9565
    0.9565
    0.9565
    0.9565
    0.9565
    0.9565
    0.9565
    0.9565

See Also

| | |

Was this topic helpful?