|
On Feb 7, 12:52 pm, "Carlos " <cardo...@hotmail.com> wrote:
> Hi all,
>
> I'm just starting with the Matlab neural network toolbox and I have a couple of
> questions that I couldn't answer reading the user help.
>
> The topic behind my research is that I have to fit a large ensamble of data to match
>an output that depends on 7 parameters (assumption). 3 of these parameters are
>related, however are independent each other.
?
Please give a simple example of variables that are related but
independent.
> Thus, I created a custom network (feedforward cascade) using 3 input vectors with
>1 dimension and another input vector with 3 dimensions
Above you implied that you had 7 input variables (not 6) and 1 output
variable.
Please clarify.
Have you considered using either of the single hidden layer universal
approximators (i.e., multilayer perceptron or radial basis function)?
> and I trained the network with Levenberg-Marquardt. The problem comes with the
>large amount of data that it is available.
You would be shocked at the number of designers that would salivate if
they were burdened with your problem instead of not having enough
data.
The basic problem is to find a subset of the data that adequately
characterizes all of the data.
One rule of thumb for models that are linear in estimated parameters
(e.g., polynomials) is that Neq >= Nw is required but Neq >> Nw is
desired to mitigate noise and measurement error (Neq = number of
training equations and Nw is the number of weights to be estimated).
I have found that the rule tends to be valid for nonlinear neural
nets provided Neq/Nw is sufficiently large. For a single hidden layer
with H nodes, the node topology is I-H-O with Nw =(I+1)*H+(H+1)*O
weights to be estimated via Neq = Ntrn*O training equations. The rule
of thumb can then be interpreted as H <= Hub is required but
H << Hub is desired where the upper bound value is Hub = (Neq-O)/(I+O
+1).
> I firstly tried to stratify the inputs and compute the means in order to generalize my
>inputs. However the results were not satisfactory. Thus, I would like to try now to
>force the training to ingest as much raw data as possible.
A better idea is to try to understand the data better. In particular,
to find sufficiently large random subsets of data that adequately
characterize the 7==>1 input/output relationship
I would begin by investigating the dependence of y on each of the
xi(i=1:7).
1. Standardize all 8 to zero-mean/unit variance
2. Reordering the I/O pairs so that y is nondecreasing is very
helpful.
3. Plot all 8 vs index
4. Check the rank and condition No. of X and Z = [ X ; y ]
5. Calculate the 8X8 correlation coefficient matrix of Z and note
unusually high or
low linear correlations.
6. Plot y vs the xi (labelled with linear correlation coefficients)
7. Consider input variable subset reduction (including PCA).
8. Obtain the R^2 of a linear model as a reference.
> Could I divide my database (with tens of millions of data) into several training sets
>and then train the network sequentially?
It might be better to extract a sufficiently large number of random
subsets for design (i.e., training and validation . Then test each on
all of the subsets. The nets can then be ranked. Then a "best" net or
a combination of good nets can be considered.
> I mean, is it possible to re-train the network once it has been already trained?
Yes, it is possible. However, the modified weights may not be
appropriate for the original data. Therefore, you may have to include
a calibration subset which adequately represents the significant
characteristics of the original data.
> are not the biases and weights initialize every time?
It depends on which algorithm you use. For NEWFF you can simply
replace the initialization weights with the former trained weights.
However, for NEWRB, you would have to modify the algorithm to use
specified intial weights.
> Thanks for your help!
Hope this helps.
Greg
|