Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: Re: Give me a Regression Problem

Subject: Re: Give me a Regression Problem

From: Greg Heath

Date: 17 Jul, 2008 11:52:37

Message: 1 of 1

On Jul 16, 6:54=A0pm, baldrick <philbrier...@hotmail.com> wrote:
> On Jul 17, 5:12=A0am, Greg Heath <he...@alumni.brown.edu> wrote:
> > On Jul 15, 7:14 am, baldrick <philbrier...@hotmail.com> wrote:
> > -----SNIP
>
> The randomizations were repeated 100 times for each variable being
> tested. It is no good doing it just once - the more the merrier. If
> you do it only a few times you will not get consistent results.
>
> The scrambled correlation is the new model r^2 when the variable in
> question is messed around with, or 'scrambled', averaged over the
> number of times the 'scrambling' is repeated.
>
> The relative importance is calculated by simple linear transformation
> based on the scrambled correlations, such that the variable whose
> scrambled correlation is lowest gets and importance of 1 and any
> variable whose scrambled correlation is the same as the normal model
> gets an importance of 0 (which means that it does not matter what
> value that variable has). It is possible to get negative importance
> which would mean randomizing that variable is actually improving the
> model!
>
> You have to drop statistical thinking to understand what this method
> is telling you. It is saying, 'if I use this current model, what will
> happen to the performance if one of my varibles goes belly up'.

I don't consider randomizing inputs an abandonment of statistical
thinking.

> This is particularly important in areas such as credit risk. If you
> are using a field such as FICO score in your model, and FICO suddenly
> decide they are going to calculate it in a different way without
> telling you, or you loose the data feed, then you need to know what
> will happen to your model. Another example is personal income, which
> gradually increases over time - if your model is heavily reliant on
> income, then it will start to deteriorate quite quickly.
>
> There is also no reason why stongly correlated variables should not be
> used together. Historically this is to do with the maths behind
> finding the coefficients using certain techniques - inverting matrices
> and so on.

Inverting matrices for regression is passe.
Although you may have to resort to pseudoinversion,
keeping highly correlated pairs is acceptable.

> Logically, I would rather have say, both income and bank
> balance in my model even if they were highly correlated. How do I know
> which one is the real driver of whatever is being predicted, and there
> is no reason why they should stay correlated (banks know your balance,
> but you could start lying about your income). =A0Having both in the
> model is kind of hedging your bets against things going wrong (you
> would want them to have similar importance though).

Devil's Advocate:

If you try to hedge your bet you may will notice, as quickly,
when one of the quantities becomes unreliable.

> Personally I use these importance calculations for initially trimming
> out the rubbish (as in the random numbers I put in the concrete data)
> and getting down to the variables of interest.

I use the Quadratic backward elimination the same way.

> It does save a lot of
> time. I have come accross model builders who have thousands of
> candidate variables and spend months inspecting each one in turn -
> only to end up getting rid of 95% of them.

Sometimes it is tough. Especially when the number of obsevations
is not high and predictor correlations are significant. The goal then
is not optimality; just acheiving some prediction measure threshold.

Hope this helps.

Greg

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics