Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: Give me a Regression Problem

Subject: Give me a Regression Problem

From: Greg Heath

Date: 18 Jul, 2008 07:15:16

Message: 1 of 1

On Jul 17, 9:52 pm, Greg Heath <he...@alumni.brown.edu> wrote:
> On Jul 17, 5:40 am, Greg Heath <he...@alumni.brown.edu> wrote:
> > On Jul 15, 3:32 am, Greg Heath <he...@alumni.brown.edu> wrote:
> > > On Jun 25, 12:04 pm, paulvbi...@gmail.com wrote:
> > > > On Jun 24, 11:20 pm, idea_fo...@yahoo.com wrote:
>
> > > > > I recently acquired a copy of some very powerful GP software, but I am
> > > > > new to machine learning and I am not sure where to start. The software
> > > > > allows for classification and regression problems. I am most
> > > > > interested in regression problems (for forecasting), but I'm still
> > > > > learning how to find inputs/outputs.
>
> > > > > My question is, can anyone out there provide me with some inputs and
> > > > > outputs for a fairly simple regression problem that I can solve? The
> > > > > nature of the data can be anything (sun spots, stock market, weather,
> > > > > etc). I simply want to test the software so that I can get a better
> > > > > understanding of how it works. Obviously, the more data the better.
>
> > > > > I would prefer there to be at least 2 inputs and 1 output.
>
> > > > > Any help would be greatly appreciated.
>
> > > > *********************************************
>
> > > > for a round robin test with a colleague in Germany I recently
> > > > investigated the compressive strength of concrete
>
> > > > 1030 data points with 8 variables
>
> > > > all continuous
>
> > > > if you run this I would be interested in what you obtained via 10fcv
>
> > > > best
> > > > Paul
>
> > > >http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Strength
>
> > > Lazy me has found that stagewise input variable
> > > subset selection on Linear and Quadratic
> > > Polynomial models is a quick and dirty way to
> > > choose inputs.
>
> > > Stagewise is preferable to stepwise (one-way
> > > greedy forward or backward search) because it
>
> > > 1. combines forward (p-to-enter) and
> > > backward (p-to-remove) search
> > > 2. allows the specification of an
> > > initial subset which is neither
> > > full nor empty
> > > 3. allows the further specification
> > > of initial variables which are not
> > > allowed to be removed.
>
> > > The stagewise MATLAB functions are misnamed
> > > STEPWISEFIT and STEPWISE(Interactive GUI version).
>
> > > Since N/(p+1) = 1030/9 ~ 114 >> 10 , lazy me used
> > > all of the data for both training and validation
> > > with penter = 0.05 and premove = 0.1. Although
> > > the R^2 values were adjusted for design bias by
> > > using the reduced degrees of freedom, they aren't
> > > as unbiased as 10-fold XVAL. However, they should
> > > be sufficient for input variable subset selection.
>
> > > For Linear Regression STEPWISEFIT removed no
> > > variables in the Backward Elimination mode.
> > > In contrast, variables x6 and x7 were not chosen
> > > in the Forward Selection mode. However, none of
> > > the Quadratic Regression models indicated that
> > > x6 or x7 had insignificant prediction capability;
> > > merely that the capability was second order via
> > > cross products and squares.
>
> > > Also, none of the results indicated that x3 had
> > > insignificant prediction capability.
>
> > > Therefore, I used all 8 original variables for
> > > a MLP NN design.
>
> > > The variables were standardized to zero mean and
> > > unit standard deviation. Although 10 pts had
> > > x5 > 3.6 and 18 pts had x8 > 4.9, I had no
> > > convincing reason to remove data points just
> > > because the distributions were skewed.
>
> > > In contrast, Paul removed x3, x6 and 10 outliers.
>
> > > For 10-fold XVAL with I-H-O = 8-H-1,
>
> > > Ntrn = 0.9*N = 927
> > > Neq = Ntrn*O = 927
> > > Nw = (I+1)*H+(H+1)*O = 10*H + 1
> > > Neq > 10*Nw ==> H < 9.17
>
> > > Using MATLAB's TRAINBR for regularized training
> > > with weight-decay, the R^2 summary statistics are
>
> > > H min median mean stdv max
> > > 1 0.6027 0.6790 0.6831 0.0419 0.7461
> > > 2 0.7640 0.8198 0.8160 0.0295 0.8498
> > > 3 0.8314 0.8647 0.8640 0.0190 0.8910
> > > 4 0.8337 0.8733 0.8702 0.0211 0.8980
> > > 5 0.8435 0.8777 0.8765 0.0182 0.9023
> > > 6 0.8500 0.8905 0.8859 0.0179 0.9129
> > > 7 0.8635 0.8870 0.8892 0.0199 0.9215
> > > 8 0.8799 0.9000 0.8964 0.0128 0.9148
> > > 9 0.8645 0.8918 0.8975 0.0215 0.9361
>
> > > Resulting in the quote
>
> > > R^2 = 0.90 +/- 0.02 for H = 9.
>
> > > The program ran for 183 sec on a 3.2GHz DELL with
> > > Windows XP.
>
> > > However, since regularization is used, there is no
> > > compelling reason to limit H to <= 9. Therefore,
> > > H = 20 was run with the result
>
> > > H min median mean stdv max
> > > 20 0.7803 0.9216 0.9050 0.0524 0.9540
>
> > > or
>
> > > R^2 = 0.91 +/- 0.05 for H = 20.
>
> > Removing x3 and x6 yields
>
> > For 10-fold XVAL with I-H-O = 6-H-1,
>
> > Ntrn = 0.9*N = 927
> > Neq = Ntrn*O = 927
> > Nw = (I+1)*H+(H+1)*O = 8*H + 1
> > Neq > 10*Nw ==> H < 11.4625
>
> > Using MATLAB's TRAINBR for regularized training
> > with weight-decay, the R^2 summary statistics are
>
> > H min median mean stdv max
> > 1 0.6115 0.6799 0.6769 0.0384 0.7260
> > 2 0.7579 0.8177 0.8101 0.0293 0.8433
> > 3 0.7896 0.8388 0.8327 0.0255 0.8620
> > 4 0.8117 0.8536 0.8475 0.0233 0.8753
> > 5 0.8252 0.8602 0.8578 0.0187 0.8821
> > 6 0.8347 0.8649 0.8697 0.0239 0.8992
> > 7 0.8419 0.8753 0.8739 0.0164 0.8952
> > 8 0.8330 0.8896 0.8833 0.0220 0.9058
> > 9 0.8543 0.8876 0.8800 0.0193 0.9009
> > 10 0.8395 0.8986 0.8881 0.0248 0.9109
> > 11 0.8462 0.8953 0.8901 0.0233 0.9141
>
> > Resulting in the quotes
>
> > R^2 = 0.88 +/- 0.02 for H = 9.
> > R^2 = 0.89 +/- 0.02 for H = 11.
>
> > The program ran for 219 sec on a 3.2GHz DELL with
> > Windows XP.
>
> > However, since regularization is used, there is no
> > compelling reason to limit H to <= 11. Therefore,
> > H = 20 was run with the result
>
> > H min median mean stdv max
> > 20 0.8377 0.9212 0.9057 0.0373 0.9505
>
> > or
>
> > R^2 = 0.91 +/- 0.04 for H = 20.
>
> > So, ... excluding 3 and 6 doesn't appear to
> > significantly degrade performance.
>
> Moving on ...
>
> Removing x3, x6 and x7 yields
>
> For 10-fold XVAL with I-H-O = 5-H-1,
>
> Ntrn = 0.9*N = 927
> Neq = Ntrn*O = 927
> Nw = (I+1)*H+(H+1)*O = 7*H + 1
> Neq > 10*Nw ==> H < 13.1
>
> H min median mean stdv max
> 9 0.8413 0.8922 0.8837 0.0243 0.9167
> 11 0.8508 0.8887 0.8865 0.0244 0.9208
> 20. 0.8552 0.9092 0.8970 0.0259 0.9269
>
> 9 R^2 = 0.88 +/- 0.02
> 11 R^2 = 0.89 +/- 0.02
> 20 R^2 = 0.90 +/- 0.03

Why stop here?

    H min median mean stdv max
   10 0.8540 0.8856 0.8835 0.0201 0.9152
   20 0.8372 0.8902 0.8944 0.0298 0.9379
   30 0.8350 0.9096 0.9003 0.0304 0.9381
   40 0.8414 0.9150 0.9064 0.0282 0.9314
   50 0.8575 0.9079 0.9050 0.0273 0.9494
   60 0.8550 0.9009 0.9026 0.0230 0.9387
   70 0.8138 0.9069 0.9030 0.0353 0.9414

Hope this helps.

Greg


Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics