Code covered by the BSD License  

Highlights from
Random Forest

4.66667
4.7 | 20 ratings Rate this file 198 Downloads (last 30 days) File Size: 16.1 KB File ID: #31036 Version: 1.7

Random Forest

by

Leo (view profile)

 

13 Apr 2011 (Updated )

Creates an ensemble of cart trees similar to the matlab TreeBagger class.

| Watch this File

File Information
Description

An alternative to the Matlab Treebagger class written in C++ and Matlab.

Creates an ensemble of cart trees (Random Forests). The code includes an implementation of cart trees which are
considerably faster to train than the matlab's classregtree.

Compiled and tested on 64-bit Ubuntu.

Acknowledgements

Getargs.M inspired this file.

MATLAB release MATLAB 7.9 (R2009b)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (69)
16 May 2015 ET-Tahir Zemouri  
16 May 2015 ET-Tahir Zemouri

Hi everybody,
I would like to thank Leo for this code,
Does any one know how to use this code for One class RF

Thanks

Comment only
17 Apr 2015 Olivier Olivier

Hi Leo
Thank you very much for your greats *.m codes.
I have a question concerning 'weights' input parameter from Stochastic_Bosque.m
If my 'Data' input parameter is constitute from three examples of label '1' and three example of label '2'
If i want
-the weight of label '1' instances to be 6 and
- the weight of label '2' instances to be 3
Does weights=[6;6;6;3;3;3]?

Data=[....;....;....;....;....;....]
labels=[1;1;1;2;2;2];
so > weights=[6;6;6;3;3;3]?

Regards
Olivier

27 Mar 2015 vimal

vimal (view profile)

hi leo
when i using your code , founds following error
Undefined function 'best_cut_node' for input arguments of type 'char'.

Error in cartree (line 84)
[bestCutVar bestCutValue] = ...

Error in Stochastic_Bosque (line 53)
Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ....

please tell me how to debug this

Comment only
25 Jan 2015 Hira Imtiaz

Hi Leo ...
I am a student and have to implement random forest algorithm on ECG signal feature vectors. I am finding features in form of different peaks in the signal by this method http://www.codeproject.com/Articles/309938/ECG-Feature-Extraction-with-Wavelet-Transform-and
Can you plx tell me how can i apply your Random Forest algo code on the above results?
This algo is very new for me , Could u plx help...
Regards

Comment only
07 Jan 2015 Zhiming

Hi Leo,
My problem has already solved. It can not run with data type 'uint8'.
Thanks.

07 Jan 2015 Zhiming

Hi Leo,
I have compiled with visual C++ succeed. But I can not run the code. Some of error information are listed below:
----------------------------------------------
Segmentation violation detected at Wed Jan 07 09:43:26 2015
----------------------------------------------
Fault Count: 1

Register State:
EAX = 00007d23 EBX = 00001736
ECX = 0bc71998 EDX = 0bc60000
ESI = ffffdccd EDI = 0bc7d348
EBP = 00c2c0cc ESP = 00c2c0c0
EIP = 0ba623f8 FLG = 00010202

Stack Trace:
[0] best_cut_node.mexw32:0x0ba623f8(0x0bc71998, 0x0bc60048, 5942, 5999)
[1] best_cut_node.mexw32:0x0ba6278c(6000, 28, 0x45e97450, 0x45f99450)
[2] best_cut_node.mexw32:0x0ba61182(2, 0x04dacdd0, 6, 1)
......

Could you help me? Thanks!

07 Dec 2014 joy barbosa

Thank Leo for sharing this code!

Following the solutions provided in the comments down here, I was able to run your code. But, could you help me figure out how to come up with the probability estimates using this code?

31 Jan 2014 Hussein

Could anyone give an example of how to use this function, I mean the input parameters,.... I successfully built it, so if anyone could please advise. Thanks

Comment only
18 Dec 2013 fairy

fairy (view profile)

 
15 Oct 2013 Fatemeh Saki

Hi everybody,
Does any one know how can I visualize the built tree after training?
Thanks

Fatemeh

Comment only
16 Aug 2013 Gary Tsui

try
in GBCC.cpp line 3
#define log2(x) ( (1.0/log(2.0)) * log( (double)(x) ) ) // use double constants

that's what i did, can anyone else help to verify?

Comment only
15 Jun 2013 Fatemeh Saki

Hi Leo,
I can not run the code. Error occurs during the compiling mx_compile file !!!
Would you please help me with that?

Comment only
29 Dec 2012 fairy

fairy (view profile)

I have found the reason.Because the Data mat is integral not double.

Comment only
23 Dec 2012 fairy

fairy (view profile)

Thanks Leo!

With your help,I have compiled succed !

by the way,the line 24 should change to saved_logs[j] = log((double)(j+1))/log(2.0);
.....

Thanks Leo again.

22 Dec 2012 Leo

Leo (view profile)

Hi fairy,

lcc is not a cpp compiler. Using the Visual Studio compiler I think the following should do the trick.

in GBCC.cpp change line 24 to

saved_logs[j] = log(j+1)/log(2);

line 115 to

if (diff_labels[nl]>0) bh-=diff_labels[nl]*(log(diff_labels[nl])/log(2)-log(sum_W)/log(2));

line 151 to

if(diff_labels_l[nl]>0) ch-=(diff_labels_l[nl])*(log(diff_labels_l[nl])/log(2)-log(sum_l)/log(2));

and line 152 to

if(diff_labels_r[nl]>0) ch-=(diff_labels_r[nl])*(log(diff_labels_r[nl])/log(2)-log(sum_W-sum_l)/log(2));

Hope this solves it.

Leo

Comment only
22 Dec 2012 fairy

fairy (view profile)

Hi Leo

These are the erros and Warings when I compile 'mx_compile_cartree'
mx_compile_cartree
GBCC.cpp
GBCC.cpp(24) : error C2563: mismatch in formal parameter list
GBCC.cpp(24) : error C2568: '=' : unable to resolve function overload
C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\math.h(567): could be 'long double log(long double)'
C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\math.h(519): or 'float log(float)'
C:\Program Files\Microsoft Visual Studio 9.0\VC\INCLUDE\math.h(121): or 'double log(double)'
GBCC.cpp(24) : error C2143: syntax error : missing ';' before 'constant'
GBCC.cpp(24) : error C2064: term does not evaluate to a function taking 1 arguments
GBCC.cpp(25) : warning C4244: '=' : conversion from 'double' to 'int', possible loss of data
GBCC.cpp(43) : warning C4244: '=' : conversion from 'double' to 'int', possible loss of data
GBCC.cpp(108) : warning C4244: '=' : conversion from 'double' to 'int', possible loss of data
GBCC.cpp(115) : error C3861: 'log2': identifier not found
GBCC.cpp(115) : error C3861: 'log2': identifier not found
GBCC.cpp(127) : warning C4244: '=' : conversion from 'double' to 'int', possible loss of data
GBCC.cpp(151) : error C3861: 'log2': identifier not found
GBCC.cpp(151) : error C3861: 'log2': identifier not found
GBCC.cpp(152) : error C3861: 'log2': identifier not found
GBCC.cpp(152) : error C3861: 'log2': identifier not found

C:\PROGRA~1\MATLAB\R2012B\BIN\MEX.PL: Error: Compile of 'GBCC.cpp' failed.

Error using mex (line 206)
Unable to complete successfully.

Error in mx_compile_cartree (line 8)
mex -O best_cut_node.cpp GBCR.cpp GBCP.cpp GBCC.cpp
I use the version 2012(MATLAB) and VC++(2008) .

Comment only
22 Dec 2012 fairy

fairy (view profile)

Hi Leo

These are the erros and Warings when I compile 'mx_compile_cartree'

lcc preprocessor warning: .\node_cuts.h:8 best_cut_node.cpp:2 No newline at end of file
lcc preprocessor warning: best_cut_node.cpp:60 No newline at end of file
Error best_cut_node.cpp: .\node_cuts.h: 2 redeclaration of `GBCC' previously declared at .\node_cuts.h 1
Error best_cut_node.cpp: .\node_cuts.h: 5 redeclaration of `GBCR' previously declared at .\node_cuts.h 4
Error best_cut_node.cpp: .\node_cuts.h: 8 redeclaration of `GBCP' previously declared at .\node_cuts.h 7
Error best_cut_node.cpp: 35 type error in argument 5 to `GBCC'; found `int' expected `pointer to double'
Error best_cut_node.cpp: 35 type error in argument 7 to `GBCC'; found `pointer to double' expected `int'
Error best_cut_node.cpp: 35 insufficient number of arguments to `GBCC'
Error best_cut_node.cpp: 38 type error in argument 5 to `GBCP'; found `int' expected `pointer to double'
Error best_cut_node.cpp: 38 type error in argument 7 to `GBCP'; found `pointer to double' expected `int'
Error best_cut_node.cpp: 38 insufficient number of arguments to `GBCP'
Error best_cut_node.cpp: 41 type error in argument 5 to `GBCR'; found `int' expected `pointer to double'
Error best_cut_node.cpp: 41 type error in argument 6 to `GBCR'; found `pointer to double' expected `int'
Error best_cut_node.cpp: 41 insufficient number of arguments to `GBCR'
Error best_cut_node.cpp: 59 undeclared identifier `delete'
Error best_cut_node.cpp: 59 illegal expression
Error best_cut_node.cpp: 59 syntax error; found `method' expecting `]'
Error best_cut_node.cpp: 59 type error: pointer expected
Warning best_cut_node.cpp: 59 Statement has no effect
Error best_cut_node.cpp: 59 syntax error; found `method' expecting `;'
Warning best_cut_node.cpp: 59 Statement has no effect
Warning best_cut_node.cpp: 59 possible usage of delete before definition
17 errors, 5 warnings

C:\PROGRA~1\MATLAB\R2012B\BIN\MEX.PL: Error: Compile of 'best_cut_node.cpp' failed.

Error using mex (line 206)
Unable to complete successfully.

Error in mx_compile_cartree (line 8)
mex -O best_cut_node.cpp GBCR.cpp GBCP.cpp GBCC.cpp

I use the version 2012(MATLAB) and VC++(2008) .

Comment only
22 Dec 2012 Leo

Leo (view profile)

Hi fairy,

Could you copy paste the exact error message you get when running mx_compile_cartree.m ?

Leo

Comment only
21 Dec 2012 fairy

fairy (view profile)

Hi Leo,Thanks for your help,But now I have another erro!
??? Undefined function or method 'best_cut_node' for input arguments of type 'char'.

Error in ==> cartree at 84
[bestCutVar bestCutValue] = ...

Error in ==> Stochastic_Bosque at 48
Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

It seems like mx_compile_cartree.m compiled failed. Exactly it was this commend:mex -O best_cut_node.cpp GBCR.cpp GBCP.cpp GBCC.cpp; failed.So why??Thanks again.

20 Dec 2012 zeel

zeel (view profile)

how to use this function I means what are the parameters that I have to give to this function?

Comment only
18 Dec 2012 Leo

Leo (view profile)

Hi fairy,

It would seem that the function is not in matlab's search path. You can run

addpath(genpath(cd))

Leo

Comment only
18 Dec 2012 fairy

fairy (view profile)

leo
I paste the code so you can give me adives.Thank you.
load diabetes
Data = diabetes.x;
Labels = diabetes.y;
Random_Forest = Stochastic_Bosque(Data,Labels);

I get this error:
??? Undefined function or method 'cartree' for input arguments of type 'double'.

Error in ==> Stochastic_Bosque at 48
Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...
Could you kindly tell me why??Thanks very much.

Comment only
13 Dec 2012 LE

LE (view profile)

Hi,Leo.I am afraid this package can not handle the categorical feature.So how could I update these code to handle these dataset with categorical features?
Kindly guide me.
Thanks.

11 Dec 2012 Leo

Leo (view profile)

Hi qing,

Yes the elements of the vector "nodeCutVar" are feature indexes. You can retrieve the tree structure from the field RETree.childnode. For a node i the indexes of the child nodes are RETree.childnode(i) and RETree.childnode(i) + 1, for the left and right child.

Hopes this helps.

Leo

Comment only
11 Dec 2012 qing

qing (view profile)

Hi leo,

Are the elements of the vector "nodeCutVar" feature indexes? But, how can I see the tree structure? I mean the relationship between features. Thanks!

Comment only
30 Oct 2012 Linh Dang

Could anybody give some example how to run these file. Really appreciate.

03 Oct 2012 Marios

Marios (view profile)

 
26 Jul 2012 Michael

Quick, clean and easy to use.
A useful submission.

24 May 2012 mai

mai (view profile)

hai leo..
im just new in matlab and would like to explore more about random forest. but im not understand most of them.
function Random_Forest = Stochastic_Bosque(Data,Labels,varargin)
data is refer to my data.
what is for labels and varargin?

Comment only
21 May 2012 Michael

Excellent work, code is well documented and clear, plus runtime is reasonable.

Adding a Readme file with description of the data format, and a demo.m would be very helpful.

Thanks for sharing.

11 Apr 2012 Leo

Leo (view profile)

Hi Matteo,

Sorry for the late reply, did not receive a notification email.

Anyway you are correct, that is a bug in the code. It was pointed out by c. a few comments up.

The code has now been updated to remove that line.

Thanks for the feedback and rating.

Comment only
23 Mar 2012 Matteo

Matteo (view profile)

Very good software, thank you for your effort!
I was wondering whether a replace with replacement is really implemented in this method as the documentation says.
When you make (lines 43-44):
TDindx = round(numel(Labels)*rand(n,1)+.5);
(NOTE: why not using 'randi' function?)
you get 'n' indexes and THEN you make:
TDindx = unique(TDindx);
removing all the duplicates (or more)!!!
Is it correct?

22 Mar 2012 Jeff

Jeff (view profile)

Hi Leo, based on your experience if this program is converted into pure C/C++, does that help improving the processing speed on PC?

06 Mar 2012 Enric Junque de Fortuny

I noticed that the f_output vector sometimes swaps dimensions in eval_Stochastic_Bosque(). Quickfix:

Add this at Line34 in eval_Stochastic_Bosque():
if (size(Data,1) ~= size(f_output,1))
f_output = f_output';
f_votes = f_votes';
end

23 Feb 2012 Leo

Leo (view profile)

Hi C.

Thanks for pointing that out. I believe you are correct, that line should be commented out.

Comment only
23 Feb 2012 c.

c. (view profile)

Hi,
why do you call
TDindx = unique(TDindx);
when creating the forest?
I was under the impression that the use of bagging would improve the generalization abilities of the model, but through the call of unique, we are getting rid of all multiple instances. Why did you chose to not use bagging, but rather use subsets of the original data?

Comment only
15 Nov 2011 Ming

Ming (view profile)

Impressive.
Hi all,
Some above said that the package failed to be complied in windows. I found that probably it is because in GBCC.cpp, log2() is not a C standard function. A feasible solution is to replace log2(N) with log((double)n)/log(double(2)).
Thanks again for the code sharing.

15 Nov 2011 Ming

Ming (view profile)

 
03 Nov 2011 Afzan

tq.

Comment only
02 Nov 2011 Leo

Leo (view profile)

Hi Afzan,

It is in :

/Stochastic_Bosque/cartree/mx_files

It's a C++ file

Leo

Comment only
02 Nov 2011 Afzan

Leo.. where is the best_cut_node function(in cartree line 45)? I cant find it, or its compatibality problem again?

14 Sep 2011 Leo

Leo (view profile)

Hi Kourosh,

Sorry for the late reply. Unfortunately I dont have a windows machine to try this out. I am suprised it wont compile under windows though. In Ubuntu it compiles using gcc.

Anyway if you paste the errors here maybe I can help out a bit more.

Comment only
24 Aug 2011 Kourosh Khoshelham

Hi Leo, thanks for sharing this code. I have difficulty mexing the cpp files. Do i need a special compiler? when I get to this line:
mex -O best_cut_node.cpp GBCR.cpp GBCP.cpp GBCC.cpp
i receive so many errors in node_cuts.h and it finally says: Compile of 'best_cut_node.cpp' failed.
I'm using R2007b on win32.
could you help?
thanks,
kourosh

Comment only
16 Jun 2011 Leo

Leo (view profile)

Hey,

sorry, the update was pending approval. It should be ok now.

Comment only
12 Jun 2011 AMB

AMB (view profile)

Hi - the zip file that I can download now has the same creation date as the old - requires all the changes I made in order to run it, and performs the same as well - please advise

Comment only
06 Jun 2011 Leo

Leo (view profile)

Hey,

I have already updated the code. You can re-download it.

Comment only
03 Jun 2011 AMB

AMB (view profile)

Hi - Will you be posting the new code? - I would like to try it out on regression - Right now, when I use the old code on the classic Boston Housing data set, I get all NaNs. I would like to see if this problem disappears in the code you fixed.

Comment only
03 Jun 2011 Leo

Leo (view profile)

Hi Shujjat,

You would have to show me exactly what line 99 is in your code. In my code I do not get any errors. I suspect you have altered the code and inadvertently added or omitted a parenthesis.

Leo

Comment only
03 Jun 2011 Shujjat

Hi Leo,
I am following your's and Mohammad's threads and I am getting following errors after doing your indicated amendments.
>> Random_Forest = Stochastic_Bosque(data,label);
??? Error: File: cartree.m Line: 99 Column: 27
Expression or statement is incorrect--possibly unbalanced (, {, or [.

Error in ==> Stochastic_Bosque at 45
Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

Can you plz help me?
cheers

Comment only
31 May 2011 Leo

Leo (view profile)

Hey AMB,

Thanks a lot for all your help. I found the bug that was causing the difference in performance (accuracy-wise), a part of the code (erroneously) implicitly assumed that features values were distinct.

It is now fixed. And results on the Glass dataset are equivalent to the results you quote for the google code.

Regarding speed, the code seems to run considerably faster on my PC but nowhere near as fast as the google code, which is to be expected as the google code is written almost entirely in C/C++

I have also removed the dependencies from the statistics toolbox using your suggestions (thanks!).

Dependence on internal.stats.getargs has also been removed

Comment only
31 May 2011 AMB

AMB (view profile)

Hi - I have followed your suggestion to compare the results of your code versus the "google code". This google code is at

http://code.google.com/p/randomforest-matlab/

This is a Matlab (and Standalone application) port for the excellent machine learning algorithm `Random Forests' - By Leo Breiman et al. from the R-source by Andy Liaw et al. http://cran.r-project.org/web/packages/randomForest/index.html ( Fortran original by Leo Breiman and Adele Cutler, R port by Andy Liaw and Matthew Wiener.) Current code version is based on 4.5-29 from source of randomForest package by Abhishek Jaiantilal.

Against the "glass" data set here are the statistics for 10 and 100 trees, withholding the 35% of the data as you had done.

For RandomBosque, the results were:

Elapsed time for 1000 runs: 1648.743 seconds
Average number correct with 35% samples held out: 0.636 for 10 trees 0.684 for 100 trees
Standard deviation correct with 35% samples held out: 0.076 for 10 trees 0.070 for 100 trees

For class_RFtrain and classRFpredict, the results were:

Elapsed time for 1000 runs: 88.021 seconds
Average number correct with 35% samples held out: 0.722 for 10 trees 0.758 for 100 trees
Standard deviation correct with 35% samples held out: 0.051 for 10 trees 0.048 for 100 trees

I use a MACAIR with MATLAB 2011a and OS 10.6.7.
I was surprised at the runtime differences and the differences in the statistics.

My calls to Randombosque look as follows:

tic;
correct = zeros(1000,2);
for i = 1:length(correct);

M = length(Labels);
m = round(.65*M);
intraining = randperm(M);
intraining = sort(intraining(1:m));
notintraining = setdiff([1:M],intraining);

Random_Forest = Stochastic_Bosque(Data(intraining,:),Labels(:,intraining),'ntrees',10);
[f_output f_votes] = eval_Stochastic_Bosque(Data(notintraining,:),Random_Forest);
error = Labels(:,notintraining)'-f_output;
correctlyclassified = numel(find(error == 0))/numel(error);
correct(i,1) = correctlyclassified;

Random_Forest = Stochastic_Bosque(Data(intraining,:),Labels(:,intraining),'ntrees',100);
[f_output f_votes] = eval_Stochastic_Bosque(Data(notintraining,:),Random_Forest);
error = Labels(:,notintraining)'-f_output;
correctlyclassified = numel(find(error == 0))/numel(error);
correct(i,2) = correctlyclassified;

if rem(i,25) == 1
fprintf('Iteration: %3.0f\n',i);
end
end
toc;

fprintf('Elapsed time for %3.0f runs: %5.3f seconds\n',length(correct),toc)
fprintf('Average number correct with 35%% samples held out: %5.3f for 10 trees %5.3f for 100 trees \n',mean(correct));
fprintf('Standard deviation correct with 35%% samples held out: %5.3f for 10 trees %5.3f for 100 trees\n',std(correct));

Comment only
22 May 2011 Leo

Leo (view profile)

Waleed Hi,

Unfortunately it is quite hard to figure out what the problem is without more specific feedback. On top of this, the getargs function is not my code so I am not that familiar with how it works (or how it can fail).

Perhaps what you could do is remove that line of code all together and hard code the parameters. For example in the case of the cartree function make it :

function RETree = cartree(Data,Labels)

and then replace the call to getargs by :

minparent=2;
minleaf=1;
m=size(Data,2);
method= 'c';
W= [];

Alternatively you could make the call to cartree :

function RETree = cartree(Data,Labels,minparent,minleaf,m,method,W)

remove the call to getargs and just make sure you pass values for all the parameters whenever you call cartree.

If you want to look into more fancy options for passing parameters, you might find this thread useful :

http://stackoverflow.com/questions/2775263/how-to-deal-with-name-value-pairs-of-function-arguments-in-matlab

Comment only
22 May 2011 Leo

Leo (view profile)

Hi AMB,

Unfortunately I dont have the google code installed to compare, but I ran comparisons with matlab's TreeBagger (glass data, 140/74 split, 10 trees) and got similar results for the two methods (my code seems to give better results though I am not sure why).

Comment only
22 May 2011 Leo

Leo (view profile)

Hi AMB,

Thanks for all the feedback. You make some very good suggestions which I will try to incorporate soon, especially concerning the randsample dependency (which hadnt crossed my mind).

For the datasets you are testing on : I tested on Glass with 10 trees and got ~=72% accuracy on a 140/74 split. Could you report what splits, number of trees you are using and what accuracies you are getting with this code and the "google" code?

Thanks!

Comment only
22 May 2011 AMB

AMB (view profile)

I have been running my modified code and comparing the results with the version on

http://code.google.com/p/randomforest-matlab/

The results of the present package that I modified as above, against the google code do not agree well. I am using classical datasets such as glass (classification) and boston housing (regression), and the google code has a much higher degree of accuracy. I would be grateful if anyone could share their experience on using these classical data sets to see whether they see the same result in their implementations. The boston data set is at http://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html and the glass data set is at http://archive.ics.uci.edu/ml/datasets/Glass+Identification

Comment only
22 May 2011 AMB

AMB (view profile)

This package was extremely useful. I should say to all that I am just a new student in this field and my comments reflect my interest in learning more, having a toolbox that is accessible, and one that actually works without days of effort. I must say that I have managed to get the other RandomForest implementations (Google code etc...) up and running but only with considerable difficulty owing to mex compilation issues. I did not have this particular difficulty with this package and as a result was delighted.

This package could be improved if it were accompanied by a demonstration file, some instruction on how to build the package and link the paths, and had eliminated the dependency on the randsample statistic toolbox routine, which some users do not have.

After modification of a few lines, the calls to randsample can be replaced, I believe. For instance the call in Random_Bosque:

TDindx = randsample(numel(Labels),n,true);

could I think be replaced with

TDindx = round(numel(Labels)*rand(n,1)+.5);
TDindx = unique(TDindx);

and the call in cartree

node_var = sort(randsample(M,m,0));

could be replaced with

node_var = randperm(M);
node_var = sort(node_var(1:m));

There may be limitations to using these substitutions when M is large, but I was very pleased with the speed of the entire package.

The author's suggestions to replace the internal.stats.getargs with calls to getargs were entirely successful.

On a MAC, the cpp programs mex'd without difficulty. I found it expedient to simply move the mx_eval_cartree.mexmaci64 and the best_cut_node.mexmaci64 and the weighted_hist.m files to the folder containing Stochastic_Bosque.m than to adjust paths.

I used the irisdata as a demonstration. It is short and uncomplicated. It is available from http://en.wikipedia.org/wiki/Iris_flower_data_set

Just copy the data out and place it into an mfile. I put the data into a matrix called Data. To try out the Stochastic Bosque routines, I then wrote

Labels = Data(:,5)';
Data = Data(:,1:4);

and then invoked the package by the calls:

Random_Forest = Stochastic_Bosque(Data,Labels,'ntrees',50);
[f_output f_votes]= eval_Stochastic_Bosque(Data,Random_Forest);
error = Labels'-f_output;
correctlyclassified = numel(find(error == 0))/numel(error)

As I am a beginner, and was operating without a license on the author's source code! I thought it useful to subsample the iris data set so that I would have a test set against which to examine the performance of the Random_Forest. While this was unnecessary from a theoretical standpoint, I thought it was worthwhile from the standpoint of checking that my modifications to the source were not ruinous.

The resulting test code looks like

M = length(Labels);
m = round(.5*M);
intraining = randperm(M);
intraining = sort(intraining(1:m));
notintraining = setdiff([1:M],intraining);
Random_Forest = Stochastic_Bosque(Data(intraining,:),Labels(:,intraining),'ntrees',10);
[f_output f_votes] = eval_Stochastic_Bosque(Data(notintraining,:),Random_Forest);
error = Labels(:,notintraining)'-f_output;
correctlyclassified = numel(find(error == 0))/numel(error)

and I was pleasantly pleased to see that the correctlyclassified measured compared favorably with the original

I should have also liked to see some proximity measures and permutation importance measures present, I speculate that perhaps these were eliminated to produce a package that ran swiftly. At any rate, I shall try to make these myself, because it seems to me that I can write a wrapper and call the Stochastic_Bosque to make my own calculations. If the author would care to offer any further suggestions or caveats, I would like to hear them because I think that his work is useful and can be extended.

21 May 2011 Waleed Yousef

Thanks, but what about the error message

Comment only
21 May 2011 Leo

Leo (view profile)

Hi,

The line 45 you refer to has to do with the subsampling of data samples not the features. Each tree is trained using a different subset of the training data.

Comment only
21 May 2011 Waleed Yousef

I received the same errors above, as Mohammed. I corrected them as you advised. I receive now this error:

Error in ==> getargs at 48
emsg = '';

??? Output argument "varargout{7}" (and maybe others) not assigned during call to "C:\MyDocuments\MATLAB\tmp\getargs.m>getargs".

Comment only
20 May 2011 Waleed Yousef

So, what is line 45 in Stochastic_Bosque where you say: cartree(Data(TDindx,:), ...

Doesn't this mean that you enforce a subset of the features on the whole tree.

Comment only
20 May 2011 Leo

Leo (view profile)

Hi,

Random feature selection for the cartrees is done in line 74 :

node_var=sort(randsample(M,m,0));

which is inside the tree construction loop. So it is done separately for each node. Is this what you were referring to or was it another line of code?

Comment only
20 May 2011 Waleed Yousef

Leo, I just skimmed your code. I think you do random selection of features for a tree not for each node in the tree as it should be. Am I right?

Comment only
09 May 2011 Leo

Leo (view profile)

Hi Mohammad,

It seems to be another version incompatability.

Replace :

[unique_labels,~,Labels]= unique(Labels);

with

[unique_labels,dummy,Labels]= unique(Labels);

and it should work.

Leo

Comment only
09 May 2011 Mohammad Ali Bagheri

Thanks Leo

I got another error using your codes:

??? Error: File: cartree.m Line: 50 Column: 25
Expression or statement is incorrect--possibly unbalanced (, {, or [.

Error in ==> Stochastic_Bosque at 46
Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

The 50th line is:
[unique_labels,~,Labels]= unique(Labels);
It seems odd; at least for me.

Besides, I wanna know that your code is based on Random subspace method? If so, how many percent of features is used to create feature subsets?

Comment only
09 May 2011 Leo

Leo (view profile)

Default number of features sampled at each node is

round(sqrt(size(Data,2)))

where size(Data,2) is the dimensionality of the data.

You can set this parameter via the
'nvartosample' parameter.

Comment only
09 May 2011 Mohammad Ali Bagheri

Thanks Leo

I got another error using your codes:

??? Error: File: cartree.m Line: 50 Column: 25
Expression or statement is incorrect--possibly unbalanced (, {, or [.

Error in ==> Stochastic_Bosque at 46
Random_ForestT = cartree(Data(TDindx,:),Labels(TDindx), ...

The 50th line is:
[unique_labels,~,Labels]= unique(Labels);
It seems odd; at least for me.

Besides, I wanna know that your code is based on Random subspace method? If so, how many percent of features is used to create feature subsets?

Comment only
03 May 2011 Leo

Leo (view profile)

Hi Mohammed,

internal.stats.getargs is an internal Matlab command which I assume is not available in your version. You can download the following :

http://www.mathworks.com/matlabcentral/fileexchange/24082-getargs-m

and simply replace that line by :

[eid,emsg,minparent,minleaf,m,nTrees,n,method,oobe,W] =
getargs(okargs,defaults,varargin{:});

(and similarly in the cartree function :

[eid,emsg,minparent,minleaf,m,method,W] = getargs(okargs,defaults,varargin{:});

)

Comment only
28 Apr 2011 Mohammad Ali Bagheri

Hi Leo

When I run this command:
Random_Forest = Stochastic_Bosque(Patterns,Targets);

I get this error:

??? Undefined variable "internal" or class "internal.stats.getargs".

Error in ==> Stochastic_Bosque at 39
[eid,emsg,minparent,minleaf,m,nTrees,n,method,oobe,W] =
internal.stats.getargs(okargs,defaults,varargin{:});

Why?!!

Comment only
Updates
14 Jun 2011 1.6

Removed implicit assumption of distinct feature values, removed statistical toolbox dependency, removed internal command dependency

11 Apr 2012 1.7

Fixed bug (see comment by c.)

Contact us