stochastic gradient descent neural network updating net in matlab

Question

Avatar@ Wonderland on 16 Jan 2014

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/112665-stochastic-gradient-descent-neural-network-updating-net-in-matlab

Edited: Muthu Kumar on 14 Jul 2021

Is it possible to train (net) as stochastic gradient descent in matlab. If possible how?

I observe that it completely ignores the previous trained data's information update the complete information. It will be helpful for large scale training. If I train the complete data, it takes very long time.

For example train iteratively 100 part of the data.

TF1 = 'tansig';TF2 = 'tansig'; TF3 = 'tansig';% layers of the transfer function , TF3 transfer function for the output layers
net = newff(trainSamples.P,trainSamples.T,[NodeNum1,NodeNum2,NodeOutput],{TF1 TF2 TF3},'traingdx');% Network created
net.trainfcn = 'traingdm' ; %'traingdm';
net.trainParam.epochs   = 1000;
net.trainParam.min_grad = 0;
net.trainParam.max_fail = 2000; %large value for infinity
while(1) // iteratively takes 10 data point at a time.
 p %=> get updated with following 10 new data points
 t %=> get updated with following 10 new data points
   [net,tr]             = train(net, p, t,[], []);
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Greg Heath on 17 Jan 2014

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/112665-stochastic-gradient-descent-neural-network-updating-net-in-matlab#answer_121184

Open in MATLAB Online

1. http://en.wikipedia.org/wiki/Stochastic_gradient_descent

2. Use the largest nndataset in the NNTBX for an example

help nndataset
doc nndataset

3. It is worthwhile to look at static correlation coefficients (help/doc corrcoef) and plots to help find

a. inputs that are so weakly correlated to all of the targets that  those inputs can be omitted.
b. inputs that are so highly correlated with other inputs that they can be omitted

4. It may be useful to look at the input dimensionality reduction obtained with linear models (help regress)

5. Try to use as many defaults as possible when starting a NN design. Defaults that should be overridden should become evident during design trials.

6. What are the dimensions of your input and target matrices?

7. How many hidden nodes?

8. It is not necessary to use more than one hidden layer.

9. I used the largest nndataset

[ x,t] = building_dataset;

with size(x) = [ 14 4208], size(t) = [ 3 4208 ] and H = 70 hidden nodes. This yields about 10 times more training equations,3*4208= 12,624 ,than there are unknown weights (14+1)*70+(70+1)*3 = 1263.

Since the net was not close to being overfit, I only used a training set and obtained an adjusted Rsquared of 0.99 in 72 seconds with a straight forward FITNET design.

However a design by looping over 10 randomly chosen subsets took 109 seconds. The syntax after the random shuffling using randperm(4208) was

 M    = 420 % floor(4208/10)
 imax = 10 
 for i=1:imax
    k = 1+M*(i-1) : M*i;
    [ net tr y( : , k ) ] = train( net, x( : , k ), t( : , k ) );
 end

This probably doesn't show a savings because 14*4208 is not too large for the default trainlm.

I think all you have to do is use a larger data set (enough to choke trainlm) and a more appropriate training function , e.g., trainscg or trainrp.

Hope this helps.

Thank you for formally accepting my answer

Greg

3 Comments
Show 1 older commentHide 1 older comment

Avatar@ Wonderland on 19 Jan 2014

My data set is formed by 1 000 000 data. I couldnt able to for example train 100 data points that iterativly continue by only training 100 data points each step and combine it with previous trainsportion of the data.

I am using neural network to do prediction. I read you answers before and with starting your guidence I used newff. My goal is to predict 90 points ahead in time series.In order to do that I form X(inputlayer size,:) and T(:). T contains 90 point ahead value. I was unable to train complete that. I also try to train it iteratively by updating my training data as: p_train = [ptrain p_test(1:100)] but it would also consume long time, because after a while again training size has been increase and I wasn't able to comeup with a solution to train individual 100(small) partion of data and combine it with already trained portion.

I also implemented recurrent neural network for this approach but again I face with the same problem.

I used 3 layers(should i use 1 or 2 layers?) and hidden layer size for eachlayer was in between [20-40]. It randomly seelcted for each test. I have realize that if I try to train complete data I didn't get any result even in a few days. Thats why I think iterative training would be a good solition but after a while again iteratively training data size increases.

example dimension: m = 500000; X(100, m), y(m)

is your algorithm, iterativly trains the data and does it merge with the already trained data?

Avatar@ Wonderland on 19 Jan 2014

Edited: Avatar@ Wonderland on 19 Jan 2014

Open in MATLAB Online

> Could I use adapt(), instead of train()?

> Do you have similiar example that how can I achieve this solution in recurrent neural network.

Dimesions: P(100, 100,000) T(1, 100,000)

Input_layer_size = 100; NodeNum1 = 70; %how can I find an optimal node number?/

net=newelm(threshold,[NodeNum1,prediction_ahead],{'tansig','purelin'}); 
%Training the Network 
net.trainParam.epochs = 5000; 
net=init(net); %Initialization of the weights by function init 
net=train(net,P, T); 
while(1)
    net=adapt(net,P, T); %could I use train?
end

Thanks.

Greg Heath on 19 Jan 2014

Open in MATLAB Online

1.Radically reduce the input dimensionality

2. It may not be necessary to use most of the data for training.

3. Consider combining multiple nets that are designed on different parts of the data

4. Use narxnet

5. Do not use 'dividerand' preserve data order

6. Determine the significant input and feedback correlation lags

7. Use trainscg or trainrp for large training sets

8. Use 1 hidden layer

9. Practice on the two longest MATLAB timeseries example data sets

help nndatasets

Sign in to comment.

Answer 2

Greg Heath on 19 Jan 2014

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/112665-stochastic-gradient-descent-neural-network-updating-net-in-matlab#answer_121356

Open in MATLAB Online

Answer by Alper Alimoglu about 8 hours ago Edited by Alper Alimoglu about 8 hours ago

My data set is formed by 1 000 000 data. I couldnt able to for example train 100 data points that iterativly continue by only training 100 data points each step and combine it with previous trainsportion of the data.

>I am using neural network to do prediction.

1. You should have said, first, that you wanted a net for prediction. That changes the approach from a regression/curvefitting design

 help/doc fitnet                     % calls feedforwardnet
 help/doc newfit(obsolete)    % calls newff (obsolete)

to a time-series design

 help/doc timedelaynet  (input delays, no output feedback)
 help/doc narxnet           (input delays and delayed output feedback)

> My data set is formed by 1 000 000 data.

2. You should have also given the dimensions of the input and output. I will assume SISO.

3. N = 10^6 ==> You should first practice on the longest MATLAB timeseries nndatasets

 help/doc nndatasets
 %    exchanger_dataset     - Heat exchanger dataset.
 [ X, T ] = exchanger_dataset;
 whos
 %       Name      Size               Bytes  Class    Attributes
 %         T         1x4000            272000  cell               
 %         X         1x4000            272000  cell      
 %    maglev_dataset        - Magnetic levitation dataset.
 [ X, T ] =  maglev_dataset;
 whos
 %     Name      Size               Bytes  Class    Attributes
 %       T         1x4001            272068  cell               
 %       X         1x4001            272068  cell

4. You can only predict as far ahead as the data will let you. This is determined by the significant lags of the input/output crosscorrelation function and/or the significant lags of the output autocorrelation function

5. Since N is large, use the fft to calculate the correlation functions instead of the BUGGY NNCORR or the correlation functions from other toolboxes.

6. It is worthwile to divide the data into many subsections to determine the correlation statistics that are consistent for all of the data. Plots should help.

7. See some of my recent posts on how to determine the significant thresholds and lags.

greg nncorr

8. The number of hidden nodes are chosen by trial and error if the default H = 10 is unsatisfactory.

9. Since N is huge and the default net.divideFcn = 'dividerand' destroys correlations, use 'dividetrain' in the first set of trials to determine good values for H and the delays (ID,FD).

> I wasn't able to comeup with a solution to train individual 100(small)partion of data and combine it with already trained portion.I also implemented recurrent neural network for this approach but again I face with the same problem.

10. Just because you have N~10^6, there is no rule that says you have to use all of it to train one net at the given data rate. Consider combining smaller nets designed over different intervals of time AND multiple parallel nets designed over the same time but using interleaving samples,One hidden layer per net is sufficient.