|
On Jul 17, 12:59 pm, paulvbi...@gmail.com wrote:
> On Jul 17, 7:11 am, Greg Heath <he...@alumni.brown.edu> wrote:
>
> > On Jul 16, 5:37 pm, paulvbi...@gmail.com wrote:
>
> > > > Although our methods are not optimal, neither Baldrick nor me has
> > > > found grounds for eliminating x3.
>
> > > **********************************************************
>
> > > Dear Greg
>
> > > why not drop x3 ONLY and see if you get an improvement with your NN
>
> > > in a way I do not like this "choosing" with linear methods
>
> > Linear? ...
>
> > Linear in coefficients so the canned MATLAB stagewise selection
> > function can be used. However, NONLINEAR in variables by
> > including cross-products and higher powers ... very quick and
> > useful. Furthermore, since the simpler models have fewer degrees
> > of freedom, they tend to pinpoint the best variables for
> > consideration.
>
> > For the concrete data set R^2 ~ 0.6, 0.8 and 0.9 were obtained
> > from linear, quadratic and NN models.
>
> ***********************
> the WGL is nonlinear in the coefficients and the spss was with 3 and 6
> out resulted in
> R2=0.805 compared to a pure linear in the coefficients you found of ~
> 0.6
Clarification:
Linear-in-coefficients/linear-in-variables ==> 0.6
Linear in coefficients/quadratic-in-variables ==> 0.8
> so you can see the transformation is not bad
> again the actual WGL run produced the R2=0.848
> ***********************
What does the acronym WGL stand for?
> > That's why I always compare Linear and Quadratic polynomial
> > models before designing NNs. It is very quick and is part of my
> > pretraining data familiarization ritual that considers ranks of input
> > and output data matrices, x-x and y-x correlations, plotting, PCA
> > and clustering.
>
> > > and I want to look at bit more closely at Phil's suggestion of using
> > > the NN for picking and choosing the variable's importance
>
> > Certainly...the NN always has the final say.
>
> *****************
> good I think we all agree on that
> ********************
>
> > However, searching through the different combinations from scratch
> > can get expensive with backward search algorithms; even when they
> > are stepwise greedy instead of stagewise.
>
> > This has lead to quicker algorithms based on path-weight products,
> > sensitivity, and other concepts as well as the longer algorithms based
> > on GA.
>
> > I have never used any of these so I cannot summarize their pros
> > and cons.
>
> > However, regardless of which NN subset selection method is used,
> > the Linear/Quadratic warmup can be used as a consistency check
> > and may be useful for reducing the number of variable combinations
> > considered.
>
> **********************************
> I think the Quadratic is good cause you get some interactions effects,
> is that right Greg
> **********************
Yep. Linear is useful but still leaves questions that are usually
answered
by quadratic. I've thought about cubic but never had remaining
questions
that plagued me enough. Besides, the number of variables begins to
get up there
Linear: n0 = 8
Quadratic: nq = 2*n0 + n0*(n0-1)/2 = 16+28 = 44
Cubic: nc = 3*n0 + 2*n0*(n0-1)/2 = 24+56 = 80
I'd rather use NN techniques at this stage. I would
probably start with the path-weight-product approximation
to using exact sensitivity analysis as I discussed in
nn input sensitivity analysis, c.a.n-n, 19-20Feb04
Sensitivity of inputs, comp.ai.neural-nets, 12Jan05
Note the averaging over a calibration set rather than
evaluating at a midpoint of the training set. The calibration
set could be an independent validation set, a subset
of the training set, a set of cluster centers or ...
you name it.
The experts in sci.stat.consult recommend that if backward
elimination on a linear variable model rejects x6 and x7
and if adding interactions also leads to the rejection of
x6 and x7 BUT, e.g., x6*x1, x6*x4, x7*x4 and x7^2 are retained,
then x6 and x7 should be kept for the sake of interpretation.
However, for the nonlinear NN model, x6 and x7 must be kept
for prediction accuracy.
I am still curious about techniques that recommend rejecting
x3. Since all of these techniques are suboptimal, it is useful
to know that two reliable techniques can make significantly
different (p-value talk) selections with insignificant differences
in prediction accuracy.
Hope this helps.
Greg
|