|
On Jul 17, 10:02=A0pm, paulvbi...@gmail.com wrote:
> On Jul 17, 9:52 pm, Greg Heath <he...@alumni.brown.edu> wrote:
>
>
>
>
>
> > On Jul 17, 5:40 am, Greg Heath <he...@alumni.brown.edu> wrote:
>
> > > On Jul 15, 3:32 am, Greg Heath <he...@alumni.brown.edu> wrote:
>
> > > > On Jun 25, 12:04 pm, paulvbi...@gmail.com wrote:
> > > > > On Jun 24, 11:20 pm, idea_fo...@yahoo.com wrote:
>
> > > > > > I recently acquired a copy of some very powerful GP software, b=
ut I am
> > > > > > new to machine learning and I am not sure where to start. The s=
oftware
> > > > > > allows for classification and regression problems. I am most
> > > > > > interested in regression problems (for forecasting), but I'm st=
ill
> > > > > > learning how to find inputs/outputs.
>
> > > > > > My question is, can anyone out there provide me with some input=
s and
> > > > > > outputs for a fairly simple regression problem that I can solve=
? The
> > > > > > nature of the data can be anything (sun spots, stock market, we=
ather,
> > > > > > etc). I simply want to test the software so that I can get a be=
tter
> > > > > > understanding of how it works. Obviously, the more data the bet=
ter.
>
> > > > > > I would prefer there to be at least 2 inputs and 1 output.
>
> > > > > > Any help would be greatly appreciated.
>
> > > > > *********************************************
>
> > > > > for a round robin test with a colleague in Germany I recently
> > > > > investigated the compressive strength of concrete
>
> > > > > 1030 data points with 8 variables
>
> > > > > all continuous
>
> > > > > if you run this I would be interested in what you obtained via 10=
fcv
>
> > > > > best
> > > > > Paul
>
> > > > >http://archive.ics.uci.edu/ml/datasets/Concrete+Compressive+Streng=
th
>
> > > > Lazy me has found that stagewise input variable
> > > > subset selection on Linear and Quadratic
> > > > Polynomial models is a quick and dirty way to
> > > > choose inputs.
>
> > > > Stagewise is preferable to stepwise (one-way
> > > > greedy forward or backward search) because it
>
> > > > 1. combines forward (p-to-enter) and
> > > > backward (p-to-remove) search
> > > > 2. allows the specification of an
> > > > initial subset which is neither
> > > > full nor empty
> > > > 3. allows the further specification
> > > > of initial variables which are not
> > > > allowed to be removed.
>
> > > > The stagewise MATLAB functions are misnamed
> > > > STEPWISEFIT and STEPWISE(Interactive GUI version).
>
> > > > Since N/(p+1) =3D 1030/9 ~ 114 >> 10 , lazy me used
> > > > all of the data for both training and validation
> > > > with penter =3D 0.05 and premove =3D 0.1. Although
> > > > the R^2 values were adjusted for design bias by
> > > > using the reduced degrees of freedom, they aren't
> > > > as unbiased as 10-fold XVAL. However, they should
> > > > be sufficient for input variable subset selection.
>
> > > > For Linear Regression STEPWISEFIT removed no
> > > > variables in the Backward Elimination mode.
> > > > In contrast, variables x6 and x7 were not chosen
> > > > in the Forward Selection mode. However, none of
> > > > the Quadratic Regression models indicated that
> > > > x6 or x7 had insignificant prediction capability;
> > > > merely that the capability was second order via
> > > > cross products and squares.
>
> > > > Also, none of the results indicated that x3 had
> > > > insignificant prediction capability.
>
> > > > Therefore, I used all 8 original variables for
> > > > a MLP NN design.
>
> > > > The variables were standardized to zero mean and
> > > > unit standard deviation. Although 10 pts had
> > > > x5 > 3.6 and 18 pts had x8 > 4.9, I had no
> > > > convincing reason to remove data points just
> > > > because the distributions were skewed.
>
> > > > In contrast, Paul removed x3, x6 and 10 outliers.
>
> > > > For 10-fold XVAL with I-H-O =3D 8-H-1,
>
> > > > Ntrn =3D 0.9*N =A0=3D 927
> > > > Neq =A0=3D Ntrn*O =3D 927
> > > > Nw =3D (I+1)*H+(H+1)*O =3D 10*H + 1
> > > > Neq > 10*Nw =A0=3D=3D> H < 9.17
>
> > > > Using MATLAB's TRAINBR for regularized training
> > > > with weight-decay, the R^2 summary statistics are
>
> > > > H =A0 =A0 =A0min =A0 =A0 =A0median =A0 =A0 mean =A0 =A0 =A0 stdv =
=A0 =A0 =A0 =A0max
> > > > 1 =A0 =A00.6027 =A0 =A00.6790 =A0 =A00.6831 =A0 =A00.0419 =A0 =A00.=
7461
> > > > 2 =A0 =A00.7640 =A0 =A00.8198 =A0 =A00.8160 =A0 =A00.0295 =A0 =A00.=
8498
> > > > 3 =A0 =A00.8314 =A0 =A00.8647 =A0 =A00.8640 =A0 =A00.0190 =A0 =A00.=
8910
> > > > 4 =A0 =A00.8337 =A0 =A00.8733 =A0 =A00.8702 =A0 =A00.0211 =A0 =A00.=
8980
> > > > 5 =A0 =A00.8435 =A0 =A00.8777 =A0 =A00.8765 =A0 =A00.0182 =A0 =A00.=
9023
> > > > 6 =A0 =A00.8500 =A0 =A00.8905 =A0 =A00.8859 =A0 =A00.0179 =A0 =A00.=
9129
> > > > 7 =A0 =A00.8635 =A0 =A00.8870 =A0 =A00.8892 =A0 =A00.0199 =A0 =A00.=
9215
> > > > 8 =A0 =A00.8799 =A0 =A00.9000 =A0 =A00.8964 =A0 =A00.0128 =A0 =A00.=
9148
> > > > 9 =A0 =A00.8645 =A0 =A00.8918 =A0 =A00.8975 =A0 =A00.0215 =A0 =A00.=
9361
>
> > > > Resulting in the quote
>
> > > > R^2 =3D 0.90 +/- 0.02 for H =3D 9.
>
> > > > The program ran for 183 sec on a 3.2GHz DELL with
> > > > Windows XP.
>
> > > > However, since regularization is used, there is no
> > > > compelling reason to limit H to <=3D 9. Therefore,
> > > > H =3D 20 was run with the result
>
> > > > H =A0 =A0 min =A0 =A0 =A0 =A0median =A0 =A0 mean =A0 =A0 =A0stdv =
=A0 =A0 =A0 max
> > > > 20 =A0 0.7803 =A0 =A00.9216 =A0 =A00.9050 =A0 =A00.0524 =A0 =A00.95=
40
>
> > > > or
>
> > > > R^2 =3D 0.91 +/- 0.05 for H =3D 20.
>
> > > Removing x3 and x6 yields
>
> > > For 10-fold XVAL with I-H-O =3D 6-H-1,
>
> > > Ntrn =3D 0.9*N =A0=3D 927
> > > Neq =A0=3D Ntrn*O =3D 927
> > > Nw =3D (I+1)*H+(H+1)*O =3D 8*H + 1
> > > Neq > 10*Nw =A0=3D=3D> H < 11.4625
>
> > > Using MATLAB's TRAINBR for regularized training
> > > with weight-decay, the R^2 summary statistics are
>
> > > =A0H =A0 =A0 =A0min =A0 =A0 median =A0 =A0 mean =A0 =A0 =A0stdv =A0 =
=A0 =A0 max
> > > =A01 =A0 =A00.6115 =A0 =A00.6799 =A0 =A00.6769 =A0 =A00.0384 =A0 =A00=
.7260
> > > =A02 =A0 =A00.7579 =A0 =A00.8177 =A0 =A00.8101 =A0 =A00.0293 =A0 =A00=
.8433
> > > =A03 =A0 =A00.7896 =A0 =A00.8388 =A0 =A00.8327 =A0 =A00.0255 =A0 =A00=
.8620
> > > =A04 =A0 =A00.8117 =A0 =A00.8536 =A0 =A00.8475 =A0 =A00.0233 =A0 =A00=
.8753
> > > =A05 =A0 =A00.8252 =A0 =A00.8602 =A0 =A00.8578 =A0 =A00.0187 =A0 =A00=
.8821
> > > =A06 =A0 =A00.8347 =A0 =A00.8649 =A0 =A00.8697 =A0 =A00.0239 =A0 =A00=
.8992
> > > =A07 =A0 =A00.8419 =A0 =A00.8753 =A0 =A00.8739 =A0 =A00.0164 =A0 =A00=
.8952
> > > =A08 =A0 =A00.8330 =A0 =A00.8896 =A0 =A00.8833 =A0 =A00.0220 =A0 =A00=
.9058
> > > =A09 =A0 =A00.8543 =A0 =A00.8876 =A0 =A00.8800 =A0 =A00.0193 =A0 =A00=
.9009
> > > 10 =A0 =A00.8395 =A0 =A00.8986 =A0 =A00.8881 =A0 =A00.0248 =A0 =A00.9=
109
> > > 11 =A0 =A00.8462 =A0 =A00.8953 =A0 =A00.8901 =A0 =A00.0233 =A0 =A00.9=
141
>
> > > Resulting in the quotes
>
> > > R^2 =3D 0.88 +/- 0.02 for H =3D 9.
> > > R^2 =3D 0.89 +/- 0.02 for H =3D 11.
>
> > > The program ran for 219 sec on a 3.2GHz DELL with
> > > Windows XP.
>
> > > However, since regularization is used, there is no
> > > compelling reason to limit H to <=3D 11. Therefore,
> > > H =3D 20 was run with the result
>
> > > H =A0 =A0 min =A0 =A0 =A0 =A0median =A0 =A0 mean =A0 =A0 =A0stdv =A0 =
=A0 =A0 max
> > > 20 =A0 0.8377 =A0 =A00.9212 =A0 =A00.9057 =A0 =A00.0373 =A0 =A00.9505
>
> > > or
>
> > > R^2 =3D 0.91 +/- 0.04 for H =3D 20.
>
> > > So, ... excluding 3 and 6 doesn't appear to
> > > significantly degrade performance.
>
> > Moving on ...
>
> > Removing x3, x6 and x7 yields
>
> > For 10-fold XVAL with I-H-O =3D 5-H-1,
>
> > Ntrn =3D 0.9*N =A0=3D 927
> > Neq =A0=3D Ntrn*O =3D 927
> > Nw =3D (I+1)*H+(H+1)*O =3D 7*H + 1
> > Neq > 10*Nw =A0=3D=3D> H < 13.1
>
> > =A0H =A0 =A0 =A0 min =A0 =A0 =A0median =A0 =A0 mean =A0 =A0 =A0stdv =A0=
=A0 =A0 =A0 max
> > =A09 =A0 =A0 0.8413 =A0 =A00.8922 =A0 =A00.8837 =A0 =A00.0243 =A0 =A00.=
9167
> > 11 =A0 =A00.8508 =A0 =A00.8887 =A0 =A00.8865 =A0 =A00.0244 =A0 =A00.920=
8
> > 20. =A0 0.8552 =A0 =A00.9092 =A0 =A00.8970 =A0 =A00.0259 =A0 =A00.9269
>
> > =A09 =A0 =A0 R^2 =3D 0.88 +/- 0.02
> > 11 =A0 =A0R^2 =3D 0.89 +/- 0.02
> > 20 =A0 =A0R^2 =3D 0.90 +/- 0.03
>
> > Hope this helps.
>
> > Greg
>
> *****************************************************
>
> so in some sense we do get a better answer with just x3 and x6 removed
> and not x7 (which is what I intially wanted to do when we (my German
> colleague and I) looked at this back in May and June
>
> most interesting Greg
>
> you gave the variation as R2=3D0.91 =A0+/- 0.04 so it does look like you
> should retain x7 just though
I don't agree.
Although I haven't looked up the correct formulas, I don't believe
that the null hypothesis
"All of the above input variable subsets yield the same MSE"
can be rejected when MSE is assumed to be CHISQ
distributed.
Greg
|