version 1.7.0.0 (16.1 KB) by
Leo

Creates an ensemble of cart trees similar to the matlab TreeBagger class.

An alternative to the Matlab Treebagger class written in C++ and Matlab.

Creates an ensemble of cart trees (Random Forests). The code includes an implementation of cart trees which are

considerably faster to train than the matlab's classregtree.

Compiled and tested on 64-bit Ubuntu.

Leo (2021). Random Forest (https://www.mathworks.com/matlabcentral/fileexchange/31036-random-forest), MATLAB Central File Exchange. Retrieved .

Created with
R2009b

Compatible with any release

**Inspired by:**
getargs.m

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

Subhadip PramanikHi Leo, What is varargin in Stochastic_Bosque?? I mean what value should I input as 'varargin'?

lin linKhadouj BKZCould any one help me understanding the code pleaaase!

dongyangyaozhao zhongEmir CeyaniHello,

In Stochastic_Bosque.m, what is the purpose of

TDindx = round(numel(Labels)*rand(n,1)+.5)

Because the TDindx also contains indices exceeding the dataset. Thanks

Jing Huangsmoothie tnwould you like please to share the code of Random Forest for Regression?

Dattaprasadperry shaomasina mounikacould any one please help me understand the below code line by line, if possible

Shiang Qi@sunit Adhikary Could you please send the documentation for cartree.m to me? Thanks a lot!

Shiang QiHello everyone,

can you please tell me how to test data and output the accuracy while I am a bit confused.

And I have one data, I need to split it to training data and testing data, right? But in which way to split it may be better?

thanks.

Quoc PhamHello everyone,

I have a question about evaluation of training and test data.

I random forest classification and the accuracy of model performance on test data is 78%, however, the accuracy of model performance on training data always equal 100%. Does is make sense?

Matthew Boring@Yogesh and any other users experiencing the error "Undefined function or variable 'best_cut_node'."

Make sure you compile the code first with "Stochastic_Bosque/cartree/mex_files/mk_compile_cartree" .

sunit AdhikaryAppreciate the choice of names in your code (bosque in stochastic bosque)

sunit Adhikary@Austin Jordan

node_cut_var : means for a particular node, on which feature would you cut.

node_cut_value : means what would be the value of the feature where you would cut.

child_node: contains the index of the child nodes of the given parent node. In case you need a documentation for cartree.m you can reach me at sunit140995@gmail.com.

sunit AdhikaryHi Leo , suppose data is of the form

Data Labels

1 1 1

1 2 1

1 3 2

1 4 2

than should bcvar and bcval not have been 2 abd 2.5 respectively ? I feel you have not incorporated this fact in your programme.

Yogesh Aggarwalhello,, when i ran Random Forest Function file with data set.... come a error...

Undefined function or variable 'best_cut_node'.

Error in cartree (line 84)

[bestCutVar bestCutValue] = ...

Error in Stochastic_Bosque (line 48)

Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

please help for this error

Austin JordanFor those trying to visualize this data, if forest = Stochastic_Bosque(data,labels) and children = forest(1).childnode...

labels = tree.nodelabel;

children = tree.childnode;

nodes = zeros(1,size(children,1));

for i = 1:size(children,1)

if children(i) ~= 0

nodes(children(i)) = i;

nodes(children(i)+1) = i;

end

end

treeplot(nodes)

hold on

[x,y] = treelayout(nodes);

for i = 1:length(x)

if tree.nodelabel(i) ~= 0

text(x(i),y(i),num2str(labels(i)),'HorizontalAlignment','center')

end

end

hold off

Austin JordanCan someone explain what the variables are? nodeCutVar, nodeCutVal, childnode

How might I visualize this data?

Chris Lugjwolftry

Justin IgweHello, I just downloaded this code, Can someone please assist in guiding how to use it. i cannot find the description.

Volker Osterholt@Akshay

This should help:

http://stackoverflow.com/questions/758001/log2-not-found-in-my-math-h

(inset log2 function in GBCC.cpp)

Akshay RavindranHi, when I tried compiling the required mex files,the following error came up.Can someone help me address this issue

mx_compile_cartree

Building with 'Microsoft Windows SDK 7.1 (C++)'.

Error using mex

GBCC.cpp

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(24) : error C3861: 'log2': identifier not found

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(25) : warning C4244: '=' : conversion from 'double'

to 'int', possible loss of data

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(43) : warning C4244: '=' : conversion from 'double'

to 'int', possible loss of data

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(108) : warning C4244: '=' : conversion from 'double'

to 'int', possible loss of data

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(115) : error C3861: 'log2': identifier not found

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(115) : error C3861: 'log2': identifier not found

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(127) : warning C4244: '=' : conversion from 'double'

to 'int', possible loss of data

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(151) : error C3861: 'log2': identifier not found

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(151) : error C3861: 'log2': identifier not found

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(152) : error C3861: 'log2': identifier not found

C:\Users\Preejith\Downloads\dtw\Stochastic_Bosque\cartree\mx_files\GBCC.cpp(152) : error C3861: 'log2': identifier not found

Error in mx_compile_cartree (line 8)

mex -O best_cut_node.cpp GBCR.cpp GBCP.cpp GBCC.cpp

Hassan mohamedHello, I tried the code but it gives me an error using the varargin in Random Forest= (..,..,varargin) as follows:

Attempt to execute SCRIPT varargin as a function:

C:\Program Files\MATLAB\R2016a\toolbox\matlab\lang\varargin.m

and when I remove the varargin sentence it gives the same results as outputs.

please give advice.

Lioo lysHassan mohamedEEElearnerI have successfully created the mex files. What next? How can I make use of the other programs shared in this file?

mkwhat is the algorithm of treebagger based on？ and randomforest is the algorithm of treebagger？

mkAleanUndefined variable "internal" or class "internal.stats.getargs".

Error in eval_Stochastic_Bosque (line 12)

[eid,emsg,oobe_flag] = internal.stats.getargs(okargs,defaults,varargin{:});

anne_frankI am new at MATLAB and I'm not able to run this. Could someone please help me with the function call?

ashok knmx_compile_cartree

Error: Could not detect a compiler on local system

which can compile the specified input file(s)

Jaroslaw TuszynskiWorks great, although I would like to have greater control over the creation of decision trees. For example I would like to preset the maximum depth.

I also run into trouble with internal.stats.getargs and had to change it to just "getargs"

biao qibiao qi??? Error: File: cartree.m Line: 50 Column: 26

Expression or statement is incorrect--possibly unbalanced (, {, or [.

Error in ==> Stochastic_Bosque at 48

Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

My matlab appears the above fault, can you help me ?

KALYAN KUMARi am getting this error can u please sort out this issue

Undefined variable "internal" or class "internal.stats.getargs".

Error in eval_Stochastic_Bosque (line 12)

[eid,emsg,oobe_flag] = internal.stats.getargs(okargs,defaults,varargin{:});

ET-Tahir ZemouriET-Tahir ZemouriHi everybody,

I would like to thank Leo for this code,

Does any one know how to use this code for One class RF

Thanks

Olivier OlivierHi Leo

Thank you very much for your greats *.m codes.

I have a question concerning 'weights' input parameter from Stochastic_Bosque.m

If my 'Data' input parameter is constitute from three examples of label '1' and three example of label '2'

If i want

-the weight of label '1' instances to be 6 and

- the weight of label '2' instances to be 3

Does weights=[6;6;6;3;3;3]?

Data=[....;....;....;....;....;....]

labels=[1;1;1;2;2;2];

so > weights=[6;6;6;3;3;3]?

Regards

Olivier

vimalhi leo

when i using your code , founds following error

Undefined function 'best_cut_node' for input arguments of type 'char'.

Error in cartree (line 84)

[bestCutVar bestCutValue] = ...

Error in Stochastic_Bosque (line 53)

Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ....

please tell me how to debug this

Hira ImtiazHi Leo ...

I am a student and have to implement random forest algorithm on ECG signal feature vectors. I am finding features in form of different peaks in the signal by this method http://www.codeproject.com/Articles/309938/ECG-Feature-Extraction-with-Wavelet-Transform-and

Can you plx tell me how can i apply your Random Forest algo code on the above results?

This algo is very new for me , Could u plx help...

Regards

ZhimingHi Leo,

My problem has already solved. It can not run with data type 'uint8'.

Thanks.

ZhimingHi Leo,

I have compiled with visual C++ succeed. But I can not run the code. Some of error information are listed below:

----------------------------------------------

Segmentation violation detected at Wed Jan 07 09:43:26 2015

----------------------------------------------

Fault Count: 1

Register State:

EAX = 00007d23 EBX = 00001736

ECX = 0bc71998 EDX = 0bc60000

ESI = ffffdccd EDI = 0bc7d348

EBP = 00c2c0cc ESP = 00c2c0c0

EIP = 0ba623f8 FLG = 00010202

Stack Trace:

[0] best_cut_node.mexw32:0x0ba623f8(0x0bc71998, 0x0bc60048, 5942, 5999)

[1] best_cut_node.mexw32:0x0ba6278c(6000, 28, 0x45e97450, 0x45f99450)

[2] best_cut_node.mexw32:0x0ba61182(2, 0x04dacdd0, 6, 1)

......

Could you help me? Thanks!

joy barbosaThank Leo for sharing this code!

Following the solutions provided in the comments down here, I was able to run your code. But, could you help me figure out how to come up with the probability estimates using this code?

HusseinCould anyone give an example of how to use this function, I mean the input parameters,.... I successfully built it, so if anyone could please advise. Thanks

fairyFatemeh SakiHi everybody,

Does any one know how can I visualize the built tree after training?

Thanks

Fatemeh

Gary Tsuitry

in GBCC.cpp line 3

#define log2(x) ( (1.0/log(2.0)) * log( (double)(x) ) ) // use double constants

that's what i did, can anyone else help to verify?

Fatemeh SakiHi Leo,

I can not run the code. Error occurs during the compiling mx_compile file !!!

Would you please help me with that?

fairyI have found the reason.Because the Data mat is integral not double.

fairyThanks Leo!

With your help,I have compiled succed !

by the way,the line 24 should change to saved_logs[j] = log((double)(j+1))/log(2.0);

.....

Thanks Leo again.

LeoHi fairy,

lcc is not a cpp compiler. Using the Visual Studio compiler I think the following should do the trick.

in GBCC.cpp change line 24 to

saved_logs[j] = log(j+1)/log(2);

line 115 to

if (diff_labels[nl]>0) bh-=diff_labels[nl]*(log(diff_labels[nl])/log(2)-log(sum_W)/log(2));

line 151 to

if(diff_labels_l[nl]>0) ch-=(diff_labels_l[nl])*(log(diff_labels_l[nl])/log(2)-log(sum_l)/log(2));

and line 152 to

if(diff_labels_r[nl]>0) ch-=(diff_labels_r[nl])*(log(diff_labels_r[nl])/log(2)-log(sum_W-sum_l)/log(2));

Hope this solves it.

Leo

fairyHi Leo

These are the erros and Warings when I compile 'mx_compile_cartree'

mx_compile_cartree

GBCC.cpp

GBCC.cpp(24) : error C2563: mismatch in formal parameter list

GBCC.cpp(24) : error C2568: '=' : unable to resolve function overload

C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\math.h(567): could be 'long double log(long double)'

C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\math.h(519): or 'float log(float)'

C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\math.h(121): or 'double log(double)'

GBCC.cpp(24) : error C2143: syntax error : missing ';' before 'constant'

GBCC.cpp(24) : error C2064: term does not evaluate to a function taking 1 arguments

GBCC.cpp(25) : warning C4244: '=' : conversion from 'double' to 'int', possible loss of data

GBCC.cpp(43) : warning C4244: '=' : conversion from 'double' to 'int', possible loss of data

GBCC.cpp(108) : warning C4244: '=' : conversion from 'double' to 'int', possible loss of data

GBCC.cpp(115) : error C3861: 'log2': identifier not found

GBCC.cpp(115) : error C3861: 'log2': identifier not found

GBCC.cpp(127) : warning C4244: '=' : conversion from 'double' to 'int', possible loss of data

GBCC.cpp(151) : error C3861: 'log2': identifier not found

GBCC.cpp(151) : error C3861: 'log2': identifier not found

GBCC.cpp(152) : error C3861: 'log2': identifier not found

GBCC.cpp(152) : error C3861: 'log2': identifier not found

C:\PROGRA~1\MATLAB\R2012B\BIN\MEX.PL: Error: Compile of 'GBCC.cpp' failed.

Error using mex (line 206)

Unable to complete successfully.

Error in mx_compile_cartree (line 8)

mex -O best_cut_node.cpp GBCR.cpp GBCP.cpp GBCC.cpp

I use the version 2012(MATLAB) and VC++(2008) .

fairyHi Leo

These are the erros and Warings when I compile 'mx_compile_cartree'

lcc preprocessor warning: .\node_cuts.h:8 best_cut_node.cpp:2 No newline at end of file

lcc preprocessor warning: best_cut_node.cpp:60 No newline at end of file

Error best_cut_node.cpp: .\node_cuts.h: 2 redeclaration of `GBCC' previously declared at .\node_cuts.h 1

Error best_cut_node.cpp: .\node_cuts.h: 5 redeclaration of `GBCR' previously declared at .\node_cuts.h 4

Error best_cut_node.cpp: .\node_cuts.h: 8 redeclaration of `GBCP' previously declared at .\node_cuts.h 7

Error best_cut_node.cpp: 35 type error in argument 5 to `GBCC'; found `int' expected `pointer to double'

Error best_cut_node.cpp: 35 type error in argument 7 to `GBCC'; found `pointer to double' expected `int'

Error best_cut_node.cpp: 35 insufficient number of arguments to `GBCC'

Error best_cut_node.cpp: 38 type error in argument 5 to `GBCP'; found `int' expected `pointer to double'

Error best_cut_node.cpp: 38 type error in argument 7 to `GBCP'; found `pointer to double' expected `int'

Error best_cut_node.cpp: 38 insufficient number of arguments to `GBCP'

Error best_cut_node.cpp: 41 type error in argument 5 to `GBCR'; found `int' expected `pointer to double'

Error best_cut_node.cpp: 41 type error in argument 6 to `GBCR'; found `pointer to double' expected `int'

Error best_cut_node.cpp: 41 insufficient number of arguments to `GBCR'

Error best_cut_node.cpp: 59 undeclared identifier `delete'

Error best_cut_node.cpp: 59 illegal expression

Error best_cut_node.cpp: 59 syntax error; found `method' expecting `]'

Error best_cut_node.cpp: 59 type error: pointer expected

Warning best_cut_node.cpp: 59 Statement has no effect

Error best_cut_node.cpp: 59 syntax error; found `method' expecting `;'

Warning best_cut_node.cpp: 59 Statement has no effect

Warning best_cut_node.cpp: 59 possible usage of delete before definition

17 errors, 5 warnings

C:\PROGRA~1\MATLAB\R2012B\BIN\MEX.PL: Error: Compile of 'best_cut_node.cpp' failed.

Error using mex (line 206)

Unable to complete successfully.

Error in mx_compile_cartree (line 8)

mex -O best_cut_node.cpp GBCR.cpp GBCP.cpp GBCC.cpp

I use the version 2012(MATLAB) and VC++(2008) .

LeoHi fairy,

Could you copy paste the exact error message you get when running mx_compile_cartree.m ?

Leo

fairyHi Leo，Thanks for your help，But now I have another erro！

??? Undefined function or method 'best_cut_node' for input arguments of type 'char'.

Error in ==> cartree at 84

[bestCutVar bestCutValue] = ...

Error in ==> Stochastic_Bosque at 48

Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

It seems like mx_compile_cartree.m compiled failed. Exactly it was this commend:mex -O best_cut_node.cpp GBCR.cpp GBCP.cpp GBCC.cpp; failed.So why??Thanks again.

zeelhow to use this function I means what are the parameters that I have to give to this function?

LeoHi fairy,

It would seem that the function is not in matlab's search path. You can run

addpath(genpath(cd))

Leo

fairyleo

I paste the code so you can give me adives.Thank you.

load diabetes

Data = diabetes.x;

Labels = diabetes.y;

Random_Forest = Stochastic_Bosque(Data,Labels);

I get this error:

??? Undefined function or method 'cartree' for input arguments of type 'double'.

Error in ==> Stochastic_Bosque at 48

Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

Could you kindly tell me why??Thanks very much.

LEHi,Leo.I am afraid this package can not handle the categorical feature.So how could I update these code to handle these dataset with categorical features?

Kindly guide me.

Thanks.

LeoHi qing,

Yes the elements of the vector "nodeCutVar" are feature indexes. You can retrieve the tree structure from the field RETree.childnode. For a node i the indexes of the child nodes are RETree.childnode(i) and RETree.childnode(i) + 1, for the left and right child.

Hopes this helps.

Leo

qingHi leo,

Are the elements of the vector "nodeCutVar" feature indexes? But, how can I see the tree structure? I mean the relationship between features. Thanks!

Linh DangCould anybody give some example how to run these file. Really appreciate.

MariosMichaelQuick, clean and easy to use.

A useful submission.

maihai leo..

im just new in matlab and would like to explore more about random forest. but im not understand most of them.

function Random_Forest = Stochastic_Bosque(Data,Labels,varargin)

data is refer to my data.

what is for labels and varargin?

MichaelExcellent work, code is well documented and clear, plus runtime is reasonable.

Adding a Readme file with description of the data format, and a demo.m would be very helpful.

Thanks for sharing.

LeoHi Matteo,

Sorry for the late reply, did not receive a notification email.

Anyway you are correct, that is a bug in the code. It was pointed out by c. a few comments up.

The code has now been updated to remove that line.

Thanks for the feedback and rating.

MatteoVery good software, thank you for your effort!

I was wondering whether a replace with replacement is really implemented in this method as the documentation says.

When you make (lines 43-44):

TDindx = round(numel(Labels)*rand(n,1)+.5);

(NOTE: why not using 'randi' function?)

you get 'n' indexes and THEN you make:

TDindx = unique(TDindx);

removing all the duplicates (or more)!!!

Is it correct?

JeffHi Leo, based on your experience if this program is converted into pure C/C++, does that help improving the processing speed on PC?

Enric Junque de FortunyI noticed that the f_output vector sometimes swaps dimensions in eval_Stochastic_Bosque(). Quickfix:

Add this at Line34 in eval_Stochastic_Bosque():

if (size(Data,1) ~= size(f_output,1))

f_output = f_output';

f_votes = f_votes';

end

LeoHi C.

Thanks for pointing that out. I believe you are correct, that line should be commented out.

c.Hi,

why do you call

TDindx = unique(TDindx);

when creating the forest?

I was under the impression that the use of bagging would improve the generalization abilities of the model, but through the call of unique, we are getting rid of all multiple instances. Why did you chose to not use bagging, but rather use subsets of the original data?

MingImpressive.

Hi all,

Some above said that the package failed to be complied in windows. I found that probably it is because in GBCC.cpp, log2() is not a C standard function. A feasible solution is to replace log2(N) with log((double)n)/log(double(2)).

Thanks again for the code sharing.

MingAfzantq.

LeoHi Afzan,

It is in :

/Stochastic_Bosque/cartree/mx_files

It's a C++ file

Leo

AfzanLeo.. where is the best_cut_node function(in cartree line 45)? I cant find it, or its compatibality problem again?

LeoHi Kourosh,

Sorry for the late reply. Unfortunately I dont have a windows machine to try this out. I am suprised it wont compile under windows though. In Ubuntu it compiles using gcc.

Anyway if you paste the errors here maybe I can help out a bit more.

Kourosh KhoshelhamHi Leo, thanks for sharing this code. I have difficulty mexing the cpp files. Do i need a special compiler? when I get to this line:

mex -O best_cut_node.cpp GBCR.cpp GBCP.cpp GBCC.cpp

i receive so many errors in node_cuts.h and it finally says: Compile of 'best_cut_node.cpp' failed.

I'm using R2007b on win32.

could you help?

thanks,

kourosh

LeoHey,

sorry, the update was pending approval. It should be ok now.

AMBHi - the zip file that I can download now has the same creation date as the old - requires all the changes I made in order to run it, and performs the same as well - please advise

LeoHey,

I have already updated the code. You can re-download it.

AMBHi - Will you be posting the new code? - I would like to try it out on regression - Right now, when I use the old code on the classic Boston Housing data set, I get all NaNs. I would like to see if this problem disappears in the code you fixed.

LeoHi Shujjat,

You would have to show me exactly what line 99 is in your code. In my code I do not get any errors. I suspect you have altered the code and inadvertently added or omitted a parenthesis.

Leo

ShujjatHi Leo,

I am following your's and Mohammad's threads and I am getting following errors after doing your indicated amendments.

>> Random_Forest = Stochastic_Bosque(data,label);

??? Error: File: cartree.m Line: 99 Column: 27

Expression or statement is incorrect--possibly unbalanced (, {, or [.

Error in ==> Stochastic_Bosque at 45

Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

Can you plz help me?

cheers

LeoHey AMB,

Thanks a lot for all your help. I found the bug that was causing the difference in performance (accuracy-wise), a part of the code (erroneously) implicitly assumed that features values were distinct.

It is now fixed. And results on the Glass dataset are equivalent to the results you quote for the google code.

Regarding speed, the code seems to run considerably faster on my PC but nowhere near as fast as the google code, which is to be expected as the google code is written almost entirely in C/C++

I have also removed the dependencies from the statistics toolbox using your suggestions (thanks!).

Dependence on internal.stats.getargs has also been removed

AMBHi - I have followed your suggestion to compare the results of your code versus the "google code". This google code is at

http://code.google.com/p/randomforest-matlab/

This is a Matlab (and Standalone application) port for the excellent machine learning algorithm `Random Forests' - By Leo Breiman et al. from the R-source by Andy Liaw et al. http://cran.r-project.org/web/packages/randomForest/index.html ( Fortran original by Leo Breiman and Adele Cutler, R port by Andy Liaw and Matthew Wiener.) Current code version is based on 4.5-29 from source of randomForest package by Abhishek Jaiantilal.

Against the "glass" data set here are the statistics for 10 and 100 trees, withholding the 35% of the data as you had done.

For RandomBosque, the results were:

Elapsed time for 1000 runs: 1648.743 seconds

Average number correct with 35% samples held out: 0.636 for 10 trees 0.684 for 100 trees

Standard deviation correct with 35% samples held out: 0.076 for 10 trees 0.070 for 100 trees

For class_RFtrain and classRFpredict, the results were:

Elapsed time for 1000 runs: 88.021 seconds

Average number correct with 35% samples held out: 0.722 for 10 trees 0.758 for 100 trees

Standard deviation correct with 35% samples held out: 0.051 for 10 trees 0.048 for 100 trees

I use a MACAIR with MATLAB 2011a and OS 10.6.7.

I was surprised at the runtime differences and the differences in the statistics.

My calls to Randombosque look as follows:

tic;

correct = zeros(1000,2);

for i = 1:length(correct);

M = length(Labels);

m = round(.65*M);

intraining = randperm(M);

intraining = sort(intraining(1:m));

notintraining = setdiff([1:M],intraining);

Random_Forest = Stochastic_Bosque(Data(intraining,:),Labels(:,intraining),'ntrees',10);

[f_output f_votes] = eval_Stochastic_Bosque(Data(notintraining,:),Random_Forest);

error = Labels(:,notintraining)'-f_output;

correctlyclassified = numel(find(error == 0))/numel(error);

correct(i,1) = correctlyclassified;

Random_Forest = Stochastic_Bosque(Data(intraining,:),Labels(:,intraining),'ntrees',100);

[f_output f_votes] = eval_Stochastic_Bosque(Data(notintraining,:),Random_Forest);

error = Labels(:,notintraining)'-f_output;

correctlyclassified = numel(find(error == 0))/numel(error);

correct(i,2) = correctlyclassified;

if rem(i,25) == 1

fprintf('Iteration: %3.0f\n',i);

end

end

toc;

fprintf('Elapsed time for %3.0f runs: %5.3f seconds\n',length(correct),toc)

fprintf('Average number correct with 35%% samples held out: %5.3f for 10 trees %5.3f for 100 trees \n',mean(correct));

fprintf('Standard deviation correct with 35%% samples held out: %5.3f for 10 trees %5.3f for 100 trees\n',std(correct));

LeoWaleed Hi,

Unfortunately it is quite hard to figure out what the problem is without more specific feedback. On top of this, the getargs function is not my code so I am not that familiar with how it works (or how it can fail).

Perhaps what you could do is remove that line of code all together and hard code the parameters. For example in the case of the cartree function make it :

function RETree = cartree(Data,Labels)

and then replace the call to getargs by :

minparent=2;

minleaf=1;

m=size(Data,2);

method= 'c';

W= [];

Alternatively you could make the call to cartree :

function RETree = cartree(Data,Labels,minparent,minleaf,m,method,W)

remove the call to getargs and just make sure you pass values for all the parameters whenever you call cartree.

If you want to look into more fancy options for passing parameters, you might find this thread useful :

http://stackoverflow.com/questions/2775263/how-to-deal-with-name-value-pairs-of-function-arguments-in-matlab

LeoHi AMB,

Unfortunately I dont have the google code installed to compare, but I ran comparisons with matlab's TreeBagger (glass data, 140/74 split, 10 trees) and got similar results for the two methods (my code seems to give better results though I am not sure why).

LeoHi AMB,

Thanks for all the feedback. You make some very good suggestions which I will try to incorporate soon, especially concerning the randsample dependency (which hadnt crossed my mind).

For the datasets you are testing on : I tested on Glass with 10 trees and got ~=72% accuracy on a 140/74 split. Could you report what splits, number of trees you are using and what accuracies you are getting with this code and the "google" code?

Thanks!

AMBI have been running my modified code and comparing the results with the version on

http://code.google.com/p/randomforest-matlab/

The results of the present package that I modified as above, against the google code do not agree well. I am using classical datasets such as glass (classification) and boston housing (regression), and the google code has a much higher degree of accuracy. I would be grateful if anyone could share their experience on using these classical data sets to see whether they see the same result in their implementations. The boston data set is at http://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html and the glass data set is at http://archive.ics.uci.edu/ml/datasets/Glass+Identification

AMBThis package was extremely useful. I should say to all that I am just a new student in this field and my comments reflect my interest in learning more, having a toolbox that is accessible, and one that actually works without days of effort. I must say that I have managed to get the other RandomForest implementations (Google code etc...) up and running but only with considerable difficulty owing to mex compilation issues. I did not have this particular difficulty with this package and as a result was delighted.

This package could be improved if it were accompanied by a demonstration file, some instruction on how to build the package and link the paths, and had eliminated the dependency on the randsample statistic toolbox routine, which some users do not have.

After modification of a few lines, the calls to randsample can be replaced, I believe. For instance the call in Random_Bosque:

TDindx = randsample(numel(Labels),n,true);

could I think be replaced with

TDindx = round(numel(Labels)*rand(n,1)+.5);

TDindx = unique(TDindx);

and the call in cartree

node_var = sort(randsample(M,m,0));

could be replaced with

node_var = randperm(M);

node_var = sort(node_var(1:m));

There may be limitations to using these substitutions when M is large, but I was very pleased with the speed of the entire package.

The author's suggestions to replace the internal.stats.getargs with calls to getargs were entirely successful.

On a MAC, the cpp programs mex'd without difficulty. I found it expedient to simply move the mx_eval_cartree.mexmaci64 and the best_cut_node.mexmaci64 and the weighted_hist.m files to the folder containing Stochastic_Bosque.m than to adjust paths.

I used the irisdata as a demonstration. It is short and uncomplicated. It is available from http://en.wikipedia.org/wiki/Iris_flower_data_set

Just copy the data out and place it into an mfile. I put the data into a matrix called Data. To try out the Stochastic Bosque routines, I then wrote

Labels = Data(:,5)';

Data = Data(:,1:4);

and then invoked the package by the calls:

Random_Forest = Stochastic_Bosque(Data,Labels,'ntrees',50);

[f_output f_votes]= eval_Stochastic_Bosque(Data,Random_Forest);

error = Labels'-f_output;

correctlyclassified = numel(find(error == 0))/numel(error)

As I am a beginner, and was operating without a license on the author's source code! I thought it useful to subsample the iris data set so that I would have a test set against which to examine the performance of the Random_Forest. While this was unnecessary from a theoretical standpoint, I thought it was worthwhile from the standpoint of checking that my modifications to the source were not ruinous.

The resulting test code looks like

M = length(Labels);

m = round(.5*M);

intraining = randperm(M);

intraining = sort(intraining(1:m));

notintraining = setdiff([1:M],intraining);

Random_Forest = Stochastic_Bosque(Data(intraining,:),Labels(:,intraining),'ntrees',10);

[f_output f_votes] = eval_Stochastic_Bosque(Data(notintraining,:),Random_Forest);

error = Labels(:,notintraining)'-f_output;

correctlyclassified = numel(find(error == 0))/numel(error)

and I was pleasantly pleased to see that the correctlyclassified measured compared favorably with the original

I should have also liked to see some proximity measures and permutation importance measures present, I speculate that perhaps these were eliminated to produce a package that ran swiftly. At any rate, I shall try to make these myself, because it seems to me that I can write a wrapper and call the Stochastic_Bosque to make my own calculations. If the author would care to offer any further suggestions or caveats, I would like to hear them because I think that his work is useful and can be extended.

Waleed YousefThanks, but what about the error message

LeoHi,

The line 45 you refer to has to do with the subsampling of data samples not the features. Each tree is trained using a different subset of the training data.

Waleed YousefI received the same errors above, as Mohammed. I corrected them as you advised. I receive now this error:

Error in ==> getargs at 48

emsg = '';

??? Output argument "varargout{7}" (and maybe others) not assigned during call to "C:\MyDocuments\MATLAB\tmp\getargs.m>getargs".

Waleed YousefSo, what is line 45 in Stochastic_Bosque where you say: cartree(Data(TDindx,:), ...

Doesn't this mean that you enforce a subset of the features on the whole tree.

LeoHi,

Random feature selection for the cartrees is done in line 74 :

node_var=sort(randsample(M,m,0));

which is inside the tree construction loop. So it is done separately for each node. Is this what you were referring to or was it another line of code?

Waleed YousefLeo, I just skimmed your code. I think you do random selection of features for a tree not for each node in the tree as it should be. Am I right?

LeoHi Mohammad,

It seems to be another version incompatability.

Replace :

[unique_labels,~,Labels]= unique(Labels);

with

[unique_labels,dummy,Labels]= unique(Labels);

and it should work.

Leo

Mohammad Ali BagheriThanks Leo

I got another error using your codes:

??? Error: File: cartree.m Line: 50 Column: 25

Expression or statement is incorrect--possibly unbalanced (, {, or [.

Error in ==> Stochastic_Bosque at 46

Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

The 50th line is:

[unique_labels,~,Labels]= unique(Labels);

It seems odd; at least for me.

Besides, I wanna know that your code is based on Random subspace method? If so, how many percent of features is used to create feature subsets?

LeoDefault number of features sampled at each node is

round(sqrt(size(Data,2)))

where size(Data,2) is the dimensionality of the data.

You can set this parameter via the

'nvartosample' parameter.

Mohammad Ali BagheriThanks Leo

I got another error using your codes:

??? Error: File: cartree.m Line: 50 Column: 25

Expression or statement is incorrect--possibly unbalanced (, {, or [.

Error in ==> Stochastic_Bosque at 46

Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

The 50th line is:

[unique_labels,~,Labels]= unique(Labels);

It seems odd; at least for me.

Besides, I wanna know that your code is based on Random subspace method? If so, how many percent of features is used to create feature subsets?

LeoHi Mohammed,

internal.stats.getargs is an internal Matlab command which I assume is not available in your version. You can download the following :

http://www.mathworks.com/matlabcentral/fileexchange/24082-getargs-m

and simply replace that line by :

[eid,emsg,minparent,minleaf,m,nTrees,n,method,oobe,W] =

getargs(okargs,defaults,varargin{:});

(and similarly in the cartree function :

[eid,emsg,minparent,minleaf,m,method,W] = getargs(okargs,defaults,varargin{:});

)

Mohammad Ali BagheriHi Leo

When I run this command:

Random_Forest = Stochastic_Bosque(Patterns,Targets);

I get this error:

??? Undefined variable "internal" or class "internal.stats.getargs".

Error in ==> Stochastic_Bosque at 39

[eid,emsg,minparent,minleaf,m,nTrees,n,method,oobe,W] =

internal.stats.getargs(okargs,defaults,varargin{:});

Why?!!