Neural network inputs range & normalization

Hello.
I know, that at the first when network has been created (for example with ' fitnet ') it has unconfigured inputs\outputs, and the first call of ' train ' configures them according to selected input data set and target data set. Also, data normalization to [-1 1] takes place by default, right? My question is: when I have trained network and I want to get result for ANOTHER data set, with another numbers and with another ranges (may be larger, shorter or not overlap original ones), is this right to just call ' sim ' method without any reconfiguring (call ' configure ') or re-setting input\output ranges, according to new data set?
  1. Train with DataSet1 ;
  2. Get NET, trained on DataSet1 ;
  3. Simulate NET on DataSet2 (DataSet2 has another ranges of inputs, because it cosists of another samples with another numbers, not from DataSet1);
  4. Get correct result for DataSet2 ???
I expect, that result on DataSet2 must be provided by the generalization ability. I mean, that if I interpolate some function on domain 1 (DataSet1) by NN, I can find values of the other domains? Because neural network have learned to fit data as real function, that generate original data (if my DataSet consists not of empiric data but generated by some math function)? So if error on the train domain is low, I can even extrapolate function, that I examine, by this NN?
This can be confusing to understand, sorry for bad english, so ask me questions and I will answer.
Please, help me.

 Accepted Answer

A basic assumption of most statistical regression and classification models is that training, validation and testing data can all be assumed to be random draws from the same probability distribution. For time-series the data is assumed to have stationary statistics.
Accordingly, validation and test data are normalized using summary statistics of the training data.
In unusually special cases (I can't think of any off hand) you could make that assumption about the self normalized data.
Hope this helps.
Thank you for formally accepting my answer
Greg

3 Comments

So, will this be correct?
For example, I examine dependense that possibly can be provided by F(vector)=U , focusing on Domain1 (as Input Data) - which is part of domain of the function F , because I have those U values (e.g. received empirically), that F takes in Domain1 . Then
TrainData = [some train samples]; %vectors
ValData = [some validation samples]; %vectors
% without test set %
TargetData = [some target samples]; %single
DataSet1 = [TrainData ValData]; %represents Domain1 of our analyzed F
Net = fitnet([some params]); %1-2 hidden layers
%set divideInd & train and val indices, training stop conditions
Net = train(Net, DataSet1, TargetData); % training on available data
Net_outputs = sim(Net, DataSet1);
Net_errors = TargetData - Net_outputs;
%analyzing errors, if acceptable then continue%
%now we try to get new data in which we are interested
DataSet2 = [some inputs]; %vectors; represents Domain2 that not belongs Domain1
Values = sim(Net, DataSet2);
So, this Values will be our hypothetical values that F takes in Domain2 ? And any scaling\normalization\inputs minmax manipulations are not needed?
A well designed regression net should be a good interpolator.
However, there is no a priori reason that it should be expected to be a good extrapolator. Both design data and nondesign data to which the net will be applied should have similar summary statistics so that if you have an arbitrary sample, you cannot tell the set from which it came.
Thank you for your help. Can you advice something to get satisfactory NN extrapolator? Literature or maybe specific architectures, algorythms, transfering functions?

Sign in to comment.

More Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!