surrogateAssociation

Mean predictive measure of association for surrogate splits in regression tree

Syntax

ma = surrogateAssociation(tree)

ma = surrogateAssociation(tree,N)

Description

ma = surrogateAssociation(tree) returns a numeric matrix of size P-by-P, where P is the number of predictors in tree. The element ma(i,j) is the predictive measure of association between the optimal split on variable i and a surrogate split on variable j. For more details, see Algorithms.

example

ma = surrogateAssociation(tree,N) returns a matrix of predictive measures of association averaged over the nodes in vector N.

Examples

collapse all

Estimate Predictive Measures of Association for Surrogate Splits

Open Live Script

Load the carsmall data set. Specify Displacement, Horsepower, and Weight as predictor variables.

load carsmall
X = [Displacement Horsepower Weight];

Grow a regression tree using MPG as the response. Specify to use surrogate splits for missing values.

tree = fitrtree(X,MPG,surrogate="on");

Find the mean predictive measure of association between the predictor variables.

ma = surrogateAssociation(tree)

ma = 3×3

    1.0000    0.2167    0.5083
    0.4521    1.0000    0.3769
    0.2540    0.2659    1.0000

Find the mean predictive measure of association averaged over the odd-numbered nodes in tree.

N = 1:2:tree.NumNodes;
ma = surrogateAssociation(tree,N)

ma = 3×3

    1.0000    0.1250    0.6875
    0.5632    1.0000    0.5861
    0.3333    0.3148    1.0000

Input Arguments

collapse all

`tree` — Trained regression tree
`RegressionTree` model object | `CompactRegressionTree` model object

Trained regression tree, specified as a RegressionTree model object trained with fitrtree, or a CompactRegressionTree model object created with compact.

`N` — Node numbers
`1:tree.NumNodes` (default) | numeric vector

Node numbers in tree, specified as a numeric vector. N contains node numbers from 1 to max(tree.NumNodes).

More About

collapse all

Predictive Measure of Association

The predictive measure of association is a value that indicates the similarity between decision rules that split observations. Among all possible decision splits that are compared to the optimal split (found by growing the tree), the best surrogate decision split yields the maximum predictive measure of association. The second-best surrogate split has the second-largest predictive measure of association.

Suppose x_j and x_k are predictor variables j and k, respectively, and j ≠ k. At node t, the predictive measure of association between the optimal split x_j < u and a surrogate split x_k < v is

$λ_{j k} = \frac{min (P_{L}, P_{R}) - (1 - P_{L_{j} L_{k}} - P_{R_{j} R_{k}})}{min (P_{L}, P_{R})} .$

P_L is the proportion of observations in node t, such that x_j < u. The subscript L stands for the left child of node t.
P_R is the proportion of observations in node t, such that x_j ≥ u. The subscript R stands for the right child of node t.
$P_{L_{j} L_{k}}$ is the proportion of observations at node t, such that x_j < u and x_k < v.
$P_{R_{j} R_{k}}$ is the proportion of observations at node t, such that x_j ≥ u and x_k ≥ v.
Observations with missing values for x_j or x_k do not contribute to the proportion calculations.

λ_jk is a value in (–∞,1]. If λ_jk > 0, then x_k < v is a worthwhile surrogate split for x_j < u.

Surrogate Decision Splits

A surrogate decision split is an alternative to the optimal decision split at a given node in a decision tree. The optimal split is found by growing the tree; the surrogate split uses a similar or correlated predictor variable and split criterion.

When the value of the optimal split predictor for an observation is missing, the observation is sent to the left or right child node using the best surrogate predictor. When the value of the best surrogate split predictor for the observation is also missing, the observation is sent to the left or right child node using the second-best surrogate predictor, and so on. Candidate splits are sorted in descending order by their predictive measure of association.

Algorithms

Element ma(i,j) is the predictive measure of association averaged over surrogate splits on predictor j for which predictor i is the optimal split predictor. This average is computed by summing positive values of the predictive measure of association over optimal splits on predictor i and surrogate splits on predictor j and dividing by the total number of optimal splits on predictor i, including splits for which the predictive measure of association between predictors i and j is negative.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2011a

surrogateAssociation

Syntax

Description

Examples

Estimate Predictive Measures of Association for Surrogate Splits

Input Arguments

tree — Trained regression tree RegressionTree model object | CompactRegressionTree model object

N — Node numbers 1:tree.NumNodes (default) | numeric vector

More About

Predictive Measure of Association

Surrogate Decision Splits

Algorithms

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

`tree` — Trained regression tree
`RegressionTree` model object | `CompactRegressionTree` model object

`N` — Node numbers
`1:tree.NumNodes` (default) | numeric vector

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.