stochastic gradient descent neural network updating net in matlab

5 views (last 30 days)
Is it possible to train (net) as stochastic gradient descent in matlab. If possible how?
I observe that it completely ignores the previous trained data's information update the complete information. It will be helpful for large scale training. If I train the complete data, it takes very long time.
For example train iteratively 100 part of the data.
TF1 = 'tansig';TF2 = 'tansig'; TF3 = 'tansig';% layers of the transfer function , TF3 transfer function for the output layers
net = newff(trainSamples.P,trainSamples.T,[NodeNum1,NodeNum2,NodeOutput],{TF1 TF2 TF3},'traingdx');% Network created
net.trainfcn = 'traingdm' ; %'traingdm';
net.trainParam.epochs = 1000;
net.trainParam.min_grad = 0;
net.trainParam.max_fail = 2000; %large value for infinity
while(1) // iteratively takes 10 data point at a time.
p %=> get updated with following 10 new data points
t %=> get updated with following 10 new data points
[net,tr] = train(net, p, t,[], []);
end

Accepted Answer

Greg Heath
Greg Heath on 17 Jan 2014
2. Use the largest nndataset in the NNTBX for an example
help nndataset
doc nndataset
3. It is worthwhile to look at static correlation coefficients (help/doc corrcoef) and plots to help find
a. inputs that are so weakly correlated to all of the targets that those inputs can be omitted.
b. inputs that are so highly correlated with other inputs that they can be omitted
4. It may be useful to look at the input dimensionality reduction obtained with linear models (help regress)
5. Try to use as many defaults as possible when starting a NN design. Defaults that should be overridden should become evident during design trials.
6. What are the dimensions of your input and target matrices?
7. How many hidden nodes?
8. It is not necessary to use more than one hidden layer.
9. I used the largest nndataset
[ x,t] = building_dataset;
with size(x) = [ 14 4208], size(t) = [ 3 4208 ] and H = 70 hidden nodes. This yields about 10 times more training equations,3*4208= 12,624 ,than there are unknown weights (14+1)*70+(70+1)*3 = 1263.
Since the net was not close to being overfit, I only used a training set and obtained an adjusted Rsquared of 0.99 in 72 seconds with a straight forward FITNET design.
However a design by looping over 10 randomly chosen subsets took 109 seconds. The syntax after the random shuffling using randperm(4208) was
M = 420 % floor(4208/10)
imax = 10
for i=1:imax
k = 1+M*(i-1) : M*i;
[ net tr y( : , k ) ] = train( net, x( : , k ), t( : , k ) );
end
This probably doesn't show a savings because 14*4208 is not too large for the default trainlm.
I think all you have to do is use a larger data set (enough to choke trainlm) and a more appropriate training function , e.g., trainscg or trainrp.
Hope this helps.
Thank you for formally accepting my answer
Greg
  3 Comments
Avatar@ Wonderland
Avatar@ Wonderland on 19 Jan 2014
Edited: Avatar@ Wonderland on 19 Jan 2014
> Could I use adapt(), instead of train()?
> Do you have similiar example that how can I achieve this solution in recurrent neural network.
Dimesions: P(100, 100,000) T(1, 100,000)
Input_layer_size = 100; NodeNum1 = 70; %how can I find an optimal node number?/
net=newelm(threshold,[NodeNum1,prediction_ahead],{'tansig','purelin'});
%Training the Network
net.trainParam.epochs = 5000;
net=init(net); %Initialization of the weights by function init
net=train(net,P, T);
while(1)
net=adapt(net,P, T); %could I use train?
end
Thanks.
Greg Heath
Greg Heath on 19 Jan 2014
1.Radically reduce the input dimensionality
2. It may not be necessary to use most of the data for training.
3. Consider combining multiple nets that are designed on different parts of the data
4. Use narxnet
5. Do not use 'dividerand' preserve data order
6. Determine the significant input and feedback correlation lags
7. Use trainscg or trainrp for large training sets
8. Use 1 hidden layer
9. Practice on the two longest MATLAB timeseries example data sets
help nndatasets

Sign in to comment.

More Answers (2)

Greg Heath
Greg Heath on 19 Jan 2014
Answer by Alper Alimoglu about 8 hours ago Edited by Alper Alimoglu about 8 hours ago
My data set is formed by 1 000 000 data. I couldnt able to for example train 100 data points that iterativly continue by only training 100 data points each step and combine it with previous trainsportion of the data.
>I am using neural network to do prediction.
1. You should have said, first, that you wanted a net for prediction. That changes the approach from a regression/curvefitting design
help/doc fitnet % calls feedforwardnet
help/doc newfit(obsolete) % calls newff (obsolete)
to a time-series design
help/doc timedelaynet (input delays, no output feedback)
help/doc narxnet (input delays and delayed output feedback)
> My data set is formed by 1 000 000 data.
2. You should have also given the dimensions of the input and output. I will assume SISO.
3. N = 10^6 ==> You should first practice on the longest MATLAB timeseries nndatasets
help/doc nndatasets
% exchanger_dataset - Heat exchanger dataset.
[ X, T ] = exchanger_dataset;
whos
% Name Size Bytes Class Attributes
% T 1x4000 272000 cell
% X 1x4000 272000 cell
% maglev_dataset - Magnetic levitation dataset.
[ X, T ] = maglev_dataset;
whos
% Name Size Bytes Class Attributes
% T 1x4001 272068 cell
% X 1x4001 272068 cell
4. You can only predict as far ahead as the data will let you. This is determined by the significant lags of the input/output crosscorrelation function and/or the significant lags of the output autocorrelation function
5. Since N is large, use the fft to calculate the correlation functions instead of the BUGGY NNCORR or the correlation functions from other toolboxes.
6. It is worthwile to divide the data into many subsections to determine the correlation statistics that are consistent for all of the data. Plots should help.
7. See some of my recent posts on how to determine the significant thresholds and lags.
greg nncorr
8. The number of hidden nodes are chosen by trial and error if the default H = 10 is unsatisfactory.
9. Since N is huge and the default net.divideFcn = 'dividerand' destroys correlations, use 'dividetrain' in the first set of trials to determine good values for H and the delays (ID,FD).
> I wasn't able to comeup with a solution to train individual 100(small)partion of data and combine it with already trained portion.I also implemented recurrent neural network for this approach but again I face with the same problem.
10. Just because you have N~10^6, there is no rule that says you have to use all of it to train one net at the given data rate. Consider combining smaller nets designed over different intervals of time AND multiple parallel nets designed over the same time but using interleaving samples,One hidden layer per net is sufficient.

Muthu Kumar
Muthu Kumar on 14 Jul 2021
Edited: Muthu Kumar on 14 Jul 2021
how to use the same algorithm for dc-dc converter control operation

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!