Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Splitting Criterion in Decision Forests for Matlab

Subject: Splitting Criterion in Decision Forests for Matlab

From: Jay

Date: 1 Aug, 2013 12:00:14

Message: 1 of 4

Hi!

There is a thorough description of decision forests in this report
http://research.microsoft.com/apps/pubs/default.aspx?id=155552
and I am looking to use a framework like this in Matlab.

Matlabs Treebagger class in combination with the classregtree seems the way to go but I have two problems with this solution:
1. The splitcriterion (In the above report called the weak learner model) seems to be fixed. classregtree always aims to minimize a least square error in every node. It would be nice if I was able to write the weak learner model myself and to decide how to split.

2. What exactly is the predictor model used in classregtree, i.e., what is the function a leaf node uses to make a prediction, and how does treebagger handle the different outputs of its single trees? It would also be nice if I could change the predictor model used.


So, is there a way to do what I want Matlab (either with Treebagger or with another implementation)? I would highly appreciate not having to modify any C++ files or alike :)

Subject: Splitting Criterion in Decision Forests for Matlab

From: Alan_Weiss

Date: 1 Aug, 2013 12:44:10

Message: 2 of 4

On 8/1/2013 8:00 AM, Jay wrote:
> Hi!
>
> There is a thorough description of decision forests in this report
> http://research.microsoft.com/apps/pubs/default.aspx?id=155552
> and I am looking to use a framework like this in Matlab.
> Matlabs Treebagger class in combination with the classregtree seems
> the way to go but I have two problems with this solution:
> 1. The splitcriterion (In the above report called the weak learner
> model) seems to be fixed. classregtree always aims to minimize a least
> square error in every node. It would be nice if I was able to write
> the weak learner model myself and to decide how to split.
>
> 2. What exactly is the predictor model used in classregtree, i.e.,
> what is the function a leaf node uses to make a prediction, and how
> does treebagger handle the different outputs of its single trees? It
> would also be nice if I could change the predictor model used.
>
>
> So, is there a way to do what I want Matlab (either with Treebagger or
> with another implementation)? I would highly appreciate not having to
> modify any C++ files or alike :)

In fact, as of R2011a, Statistics Toolbox offers an ensemble learning
framework for both boosting and bagging that in most respects supersedes
classregtree and TreeBagger. In particular, it allows you to choose the
split criterion:
http://www.mathworks.com/help/stats/classificationtreeclass.html
See the 'SplitCriterion' name-value pair, and the Impurity and Node
Error definition.

The definition of prediction in this new framework is here:
http://www.mathworks.com/help/stats/compactclassificationtree.predict.html
http://www.mathworks.com/help/stats/compactclassificationensemble.predict.html

Alan Weiss
MATLAB mathematical toolbox documentation

Subject: Splitting Criterion in Decision Forests for Matlab

From: Ilya Narsky

Date: 1 Aug, 2013 13:55:14

Message: 3 of 4

"Jay " <aro-_-mail@web.de> wrote in message
news:ktdike$j23$1@newscl01ah.mathworks.com...
> Hi!
>
> There is a thorough description of decision forests in this report
> http://research.microsoft.com/apps/pubs/default.aspx?id=155552
> and I am looking to use a framework like this in Matlab.
> Matlabs Treebagger class in combination with the classregtree seems the
> way to go but I have two problems with this solution:
> 1. The splitcriterion (In the above report called the weak learner model)
> seems to be fixed. classregtree always aims to minimize a least square
> error in every node. It would be nice if I was able to write the weak
> learner model myself and to decide how to split.
>
> 2. What exactly is the predictor model used in classregtree, i.e., what is
> the function a leaf node uses to make a prediction, and how does
> treebagger handle the different outputs of its single trees? It would also
> be nice if I could change the predictor model used.
>
>
> So, is there a way to do what I want Matlab (either with Treebagger or
> with another implementation)? I would highly appreciate not having to
> modify any C++ files or alike :)
>

It's unlikely someone in this forum will want to read a ~140 page document
you are referring to. Without reading it, I'd like to clarify a few things.

"Weak learner" in the ensemble literature refers not to the split criterion
imposed by a decision tree, but to any data-fitting model model used
repeatedly for growing an ensemble of such models. The fitensemble function
provides several ensemble-learning algorithms. Some of them use decision
tree as the weak learner, and some of them use k-NN and discriminant.

The classregtree class, as well as the decision tree classes introduced
later (see Alan's response), support several split criteria for
classification and one (MSE) for regression. Even if you focus on
regression, decision tree is not fixed. An (perhaps, the most) important
tuning knob in ensemble learning by decision trees is the tree size. You can
control it by passing the 'minleaf' and 'minparent' parameters to TreeBagger
or to fitensemble (via a learner template).

At present, we do not support arbitrary weak learners for ensembles out of
the box. But if you want to code a weak learner model yourself, feel free to
get in touch with me, and I'll see what can be arranged.

Ilya

Subject: Splitting Criterion in Decision Forests for Matlab

From: Jay

Date: 11 Aug, 2013 20:29:22

Message: 4 of 4

Hello and thanks for your answers.
I don't want to change the (what you call) weak learners themselves (e.g. the tree in a random forest) but the method how we decide how to split a node, so the MSE in a regression tree.
If that would be somehow possible, that would be great.

I can give you further inside on why I believe I need that:
I want to make a regression for angles. If I feed normal angles in the tree (0..360 degrees) the tree has no idea that 2° and 355° are actually very close and not very far from each other. I think that I could improve the regression if I would somehow alter the splitting criterion such that it reflects this properties about angles.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us