Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Replicating results of TreeBagger run in a loop?

Subject: Replicating results of TreeBagger run in a loop?

From: Evan Ruzanski

Date: 5 Jun, 2012 19:03:06

Message: 1 of 3

Hello,

I'm trying to replicate the results of TreeBagger (random forest of regression trees) corresponding to one result generated within a nested for-loop. Because of the random nature of TreeBagger, this is not apparent or easy to discover how to do this. Let me elaborate...

I'm looping over two parameters in search of the "best" settings for a particular application. I specify and initialize the random number stream before the loop so it looks like this:

ntrees = 10:10:300;
thresh = 50:10:150;

%% LOAD X AND Y HERE %%

ntrain = round(size(X,1)/2);

RandStream.setGlobalStream(RandStream('mlfg6331_64','seed',29));
options = statset('UseParallel','never', 'Streams',...
    RandStream.getGlobalStream,'UseSubStreams','never');

for i = 1:length(ntrees)
    for j = 1:length(thresh)

        tb = TreeBagger(ntrees(i),X(1:ntrain,:),Y(1:ntrain),'method','regression',...
                        'Options',options);
        ts = predict(ts,X(ntrain+1:end,:));

        %% EVALUATE TS VS. Y(NTRAIN+1:END) USING THRESH(J) AS A PARAMETER %%

    end
end

%% SAVE EVALUATION RESULTS TO DISK %%

The problem is to discover how to replicate a run, say ntrees = 100 and thresh = 100, outside the loop. The results come out very differently if I run through the loop and later select the result corresponding to ntrees = 100 and thresh = 100 vs. running one instance of TreeBagger with ntrees = 100 and thresh = 100.

The question is: How can I get these to be the same without going through the entire loop?

Thank you kindly...

Subject: Replicating results of TreeBagger run in a loop?

From: Ilya Narsky

Date: 5 Jun, 2012 19:48:06

Message: 2 of 3

"Evan Ruzanski" <ruzanski@alumni.colostate.edu> wrote in message
news:jqll5a$t8v$1@newscl01ah.mathworks.com...
> Hello,
>
> I'm trying to replicate the results of TreeBagger (random forest of
> regression trees) corresponding to one result generated within a nested
> for-loop. Because of the random nature of TreeBagger, this is not apparent
> or easy to discover how to do this. Let me elaborate...
>
> I'm looping over two parameters in search of the "best" settings for a
> particular application. I specify and initialize the random number stream
> before the loop so it looks like this:
>
> ntrees = 10:10:300;
> thresh = 50:10:150;
>
> %% LOAD X AND Y HERE %%
>
> ntrain = round(size(X,1)/2);
>
> RandStream.setGlobalStream(RandStream('mlfg6331_64','seed',29));
> options = statset('UseParallel','never', 'Streams',...
> RandStream.getGlobalStream,'UseSubStreams','never');
>
> for i = 1:length(ntrees)
> for j = 1:length(thresh)
>
> tb =
> TreeBagger(ntrees(i),X(1:ntrain,:),Y(1:ntrain),'method','regression',...
> 'Options',options); ts =
> predict(ts,X(ntrain+1:end,:));
>
> %% EVALUATE TS VS. Y(NTRAIN+1:END) USING THRESH(J) AS A PARAMETER
> %%
>
> end
> end
>
> %% SAVE EVALUATION RESULTS TO DISK %%
>
> The problem is to discover how to replicate a run, say ntrees = 100 and
> thresh = 100, outside the loop. The results come out very differently if I
> run through the loop and later select the result corresponding to ntrees =
> 100 and thresh = 100 vs. running one instance of TreeBagger with ntrees =
> 100 and thresh = 100.
> The question is: How can I get these to be the same without going through
> the entire loop?
>
> Thank you kindly...
>

The easiest thing to do would be to pass the RNG seed at the beginning of
every iteration by executing, for instance

rng(j+(i-1)*length(thresh))

Then you could execute rng with the same seed before running TreeBagger
outside the loop.

Assigning into the Substream property of the RandStream object at the
beginning of each iteration would be a better option for randomness. I doubt
this would matter in practice since TreeBagger with a few hundred trees
generates a fairly small number of random numbers overall.

-Ilya

Subject: Replicating results of TreeBagger run in a loop?

From: Evan Ruzanski

Date: 5 Jun, 2012 21:50:07

Message: 3 of 3

Good reply Ilya, thank you kindly...

"Ilya Narsky" <inarsky@mathworks.com> wrote in message <jqlnpm$buu$1@newscl01ah.mathworks.com>...
> "Evan Ruzanski" <ruzanski@alumni.colostate.edu> wrote in message
> news:jqll5a$t8v$1@newscl01ah.mathworks.com...
> > Hello,
> >
> > I'm trying to replicate the results of TreeBagger (random forest of
> > regression trees) corresponding to one result generated within a nested
> > for-loop. Because of the random nature of TreeBagger, this is not apparent
> > or easy to discover how to do this. Let me elaborate...
> >
> > I'm looping over two parameters in search of the "best" settings for a
> > particular application. I specify and initialize the random number stream
> > before the loop so it looks like this:
> >
> > ntrees = 10:10:300;
> > thresh = 50:10:150;
> >
> > %% LOAD X AND Y HERE %%
> >
> > ntrain = round(size(X,1)/2);
> >
> > RandStream.setGlobalStream(RandStream('mlfg6331_64','seed',29));
> > options = statset('UseParallel','never', 'Streams',...
> > RandStream.getGlobalStream,'UseSubStreams','never');
> >
> > for i = 1:length(ntrees)
> > for j = 1:length(thresh)
> >
> > tb =
> > TreeBagger(ntrees(i),X(1:ntrain,:),Y(1:ntrain),'method','regression',...
> > 'Options',options); ts =
> > predict(ts,X(ntrain+1:end,:));
> >
> > %% EVALUATE TS VS. Y(NTRAIN+1:END) USING THRESH(J) AS A PARAMETER
> > %%
> >
> > end
> > end
> >
> > %% SAVE EVALUATION RESULTS TO DISK %%
> >
> > The problem is to discover how to replicate a run, say ntrees = 100 and
> > thresh = 100, outside the loop. The results come out very differently if I
> > run through the loop and later select the result corresponding to ntrees =
> > 100 and thresh = 100 vs. running one instance of TreeBagger with ntrees =
> > 100 and thresh = 100.
> > The question is: How can I get these to be the same without going through
> > the entire loop?
> >
> > Thank you kindly...
> >
>
> The easiest thing to do would be to pass the RNG seed at the beginning of
> every iteration by executing, for instance
>
> rng(j+(i-1)*length(thresh))
>
> Then you could execute rng with the same seed before running TreeBagger
> outside the loop.
>
> Assigning into the Substream property of the RandStream object at the
> beginning of each iteration would be a better option for randomness. I doubt
> this would matter in practice since TreeBagger with a few hundred trees
> generates a fairly small number of random numbers overall.
>
> -Ilya

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us