Proximity Matrix of Random Forest

I want to know how to get the proximity matrix of random forest in Matlab. For random forest, I build the model through fitcensemble with bag method. The concept of proximity matrix is not complex. It is a N by N matrix for N data points. Element (i, j) of proximity matrix represents the number of trees x_i nad x_j end in the same leaf. The proximity usually is scaled with the total number of trees. I want to know how to extract related information within Matlab.

Answers (1)

Aneela
Aneela on 26 May 2024
Hi Keyan Li,
In MATLAB, there is a built-in function, “proximity” to calculate proximity matrix. But this function works only for “CompactTreeBagger”. You can refer to the following link: https://www.mathworks.com/help/stats/compacttreebagger.proximity.html
However, there isn’t direct built-in support for obtaining the proximity matrix from a random forest model built with “fitcensemble” using the “Bag” method.
Yet, the proximity matrix for a random forest can be calculated manually, here’s a workaround:
  • For each tree in the ensemble, predict the leaf indices for each data point.
  • For each tree, if two data points end up in the same leaf, increment their corresponding entry in the proximity matrix.
  • Optionally, scale the proximity matrix by the total number of trees to get the average proximity.

Products

Release

R2021b

Asked:

on 3 Mar 2022

Answered:

on 26 May 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!