Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: Give me a Regression Problem

Subject: Give me a Regression Problem

From: Greg Heath

Date: 18 Jul, 2008 01:29:28

Message: 1 of 3

On Jul 17, 12:59 pm, paulvbi...@gmail.com wrote:
> On Jul 17, 7:11 am, Greg Heath <he...@alumni.brown.edu> wrote:
>
> > On Jul 16, 5:37 pm, paulvbi...@gmail.com wrote:
>
> > > > Although our methods are not optimal, neither Baldrick nor me has
> > > > found grounds for eliminating x3.
>
> > > **********************************************************
>
> > > Dear Greg
>
> > > why not drop x3 ONLY and see if you get an improvement with your NN
>
> > > in a way I do not like this "choosing" with linear methods
>
> > Linear? ...
>
> > Linear in coefficients so the canned MATLAB stagewise selection
> > function can be used. However, NONLINEAR in variables by
> > including cross-products and higher powers ... very quick and
> > useful. Furthermore, since the simpler models have fewer degrees
> > of freedom, they tend to pinpoint the best variables for
> > consideration.
>
> > For the concrete data set R^2 ~ 0.6, 0.8 and 0.9 were obtained
> > from linear, quadratic and NN models.
>
> ***********************
> the WGL is nonlinear in the coefficients and the spss was with 3 and 6
> out resulted in
> R2=0.805 compared to a pure linear in the coefficients you found of ~
> 0.6

Clarification:

Linear-in-coefficients/linear-in-variables ==> 0.6
Linear in coefficients/quadratic-in-variables ==> 0.8

> so you can see the transformation is not bad
> again the actual WGL run produced the R2=0.848
> ***********************

What does the acronym WGL stand for?


> > That's why I always compare Linear and Quadratic polynomial
> > models before designing NNs. It is very quick and is part of my
> > pretraining data familiarization ritual that considers ranks of input
> > and output data matrices, x-x and y-x correlations, plotting, PCA
> > and clustering.
>
> > > and I want to look at bit more closely at Phil's suggestion of using
> > > the NN for picking and choosing the variable's importance
>
> > Certainly...the NN always has the final say.
>
> *****************
> good I think we all agree on that
> ********************
>
> > However, searching through the different combinations from scratch
> > can get expensive with backward search algorithms; even when they
> > are stepwise greedy instead of stagewise.
>
> > This has lead to quicker algorithms based on path-weight products,
> > sensitivity, and other concepts as well as the longer algorithms based
> > on GA.
>
> > I have never used any of these so I cannot summarize their pros
> > and cons.
>
> > However, regardless of which NN subset selection method is used,
> > the Linear/Quadratic warmup can be used as a consistency check
> > and may be useful for reducing the number of variable combinations
> > considered.
>
> **********************************
> I think the Quadratic is good cause you get some interactions effects,
> is that right Greg
> **********************

Yep. Linear is useful but still leaves questions that are usually
answered
by quadratic. I've thought about cubic but never had remaining
questions
that plagued me enough. Besides, the number of variables begins to
get up there

Linear: n0 = 8
Quadratic: nq = 2*n0 + n0*(n0-1)/2 = 16+28 = 44
Cubic: nc = 3*n0 + 2*n0*(n0-1)/2 = 24+56 = 80

I'd rather use NN techniques at this stage. I would
probably start with the path-weight-product approximation
to using exact sensitivity analysis as I discussed in

nn input sensitivity analysis, c.a.n-n, 19-20Feb04

Sensitivity of inputs, comp.ai.neural-nets, 12Jan05

Note the averaging over a calibration set rather than
evaluating at a midpoint of the training set. The calibration
set could be an independent validation set, a subset
of the training set, a set of cluster centers or ...
you name it.

The experts in sci.stat.consult recommend that if backward
elimination on a linear variable model rejects x6 and x7
and if adding interactions also leads to the rejection of
x6 and x7 BUT, e.g., x6*x1, x6*x4, x7*x4 and x7^2 are retained,
then x6 and x7 should be kept for the sake of interpretation.

However, for the nonlinear NN model, x6 and x7 must be kept
for prediction accuracy.

 I am still curious about techniques that recommend rejecting
x3. Since all of these techniques are suboptimal, it is useful
to know that two reliable techniques can make significantly
different (p-value talk) selections with insignificant differences
in prediction accuracy.

Hope this helps.

Greg

Subject: Give me a Regression Problem

From: Roger Stafford

Date: 18 Jul, 2008 02:05:26

Message: 2 of 3

Greg Heath <heath@alumni.brown.edu> wrote in message <1fc5e119-
fb89-49bd-a966-8383bc8c1d14@b1g2000hsg.googlegroups.com>...
> On Jul 17, 12:59 pm, paulvbi...@gmail.com wrote:
> > On Jul 17, 7:11 am, Greg Heath <he...@alumni.brown.edu> wrote:
> > ..........

  Greg, I count at least two dozen individual threads that you and
paulvbi...@gmail.com have created, all about the same subject. These should all
have been in a single thread. Please desist! Matt Fig has already complained. I
add mine to his.

Roger Stafford


Subject: Give me a Regression Problem

From: Roger Stafford

Date: 18 Jul, 2008 04:13:02

Message: 3 of 3

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in
message <g5otp6$b9f$1@fred.mathworks.com>...
> Greg Heath <heath@alumni.brown.edu> wrote in message <1fc5e119-
> fb89-49bd-a966-8383bc8c1d14@b1g2000hsg.googlegroups.com>...
> > On Jul 17, 12:59 pm, paulvbi...@gmail.com wrote:
> > > On Jul 17, 7:11 am, Greg Heath <he...@alumni.brown.edu> wrote:
> > > ..........
>
> Greg, I count at least two dozen individual threads that you and
> paulvbi...@gmail.com have created, all about the same subject. These
should all
> have been in a single thread. Please desist! Matt Fig has already
complained. I
> add mine to his.
>
> Roger Stafford

  Greg, I may owe you an apology. I just discovered that in Google Groups all
the articles entitled "Give me a Regression Problem", all 47 of them, lie in only
ONE thread there and have two newsgroups, "comp.soft-sys.matlab" and
"comp.ai.neural-nets", listed. The fact that they come from two newsgroups
may account for why they are appearing individually on the Matlab Central
Newsreader. I would therefore guess there is some incompatibility in the
Newsreader with outside Usenet servers with respect to multiple newsgroups.
Perhaps you ought to inform them of this.

Roger Stafford


Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics