Switching an existing batch trained NAR network to continuous incremental training with adapt?

1 view (last 30 days)
I have a NAR network that has been batch trained on a large set of historical data with Levenberg-Marquardt training. I have tweaked many of the settings so it has reasonably good performance when testing on old data and now I would like to implement the NET on a stream of continuously arriving new data and have it adjust itself according to the new data with incremental training.
From what I understand the adapt function in matlab is what I should be using for incremental training but there appears to be very limited examples on how to use it so I have a few questions. So first of all: Is it even possible to take an existing batch trained net with all of its weight settings and feedback delays and overall architecture and build on it with incremental training of new data or does the net have to be made from start with incremental training? And if it’s possible, how? The adapt documentation only shows a tiny example of adapt in use with a narx network, are there any other examples/tutorials on the subject?
The whole area of incremental training appears to be quite hidden in the neural network toolbox and it’s not shown as a training option in nnstart for instance. From what I understand the training procedure with incremental training is considerably slower compared to batch training so that’s why I would prefer to use the best out of two worlds – batch on historical data and incremental on newly arriving data. One concern I have about this would be that there will always be a gap in the time series when switching between historical and current data, how would that affect the net, if for example there is a one week gap between the historical and current data that the net is exposed to will the net perceive that as a violent change in the time series and potentially “learn” the false behavior which could ruin the net?
What I ultimately would like to achieve is a live prediction scenario where my NAR net predicts a few time steps ahead of a time series to the best of its ability and when the “correct” new data arrives it adapts on it and takes it into account when performing the next prediction. To achieve that do I need to constantly close the net, retrain it and make the prediction, create an open net again and account for the new data and repeat forever? That approach seems like a lot of hassle, is there a better way to do this?
Thanks for any advice.

Accepted Answer

Greg Heath
Greg Heath on 27 May 2015
It doesn't make any difference whether the additional training is batch or adaptive:
If the new data does not contain the dominant characteristics of the original data, the net will forget the characteristics of the original data.
This is the aptly named phenomenon called "FORGETTING".
Therefore, if additional training is planned, you must include additional data that characterizes the dominant characteristics of the original data.
Hope this helps.
Thank you for formally accepting my answer
Greg
  1 Comment
Peta
Peta on 27 May 2015
Ok, well since both the historical and current time series I want train on represents the same phenomena but in different points of time they roughly should have the same characteristics.
But do you also mean that if a time series change too much over time it becomes unnecessary altogether to train it on historical data going back too long in time since the net will forget about it ever happening? I always thought that exposing the net to the bigger picture and as much of the historical data as possible would be a good idea, but maybe creating an adapt net and training it on newly arriving data for a few thousand data points would give the same results then?
And if my batch trained NAR is stored for example in a variable called net, would the additional training simply be a matter of writing something like as in the documentation:
[net,Y,E,Pf,Af,tr] = adapt(net,P,T,Pi,Ai)
This command I believe is for a narx situation but maybe it’s simply a matter of removing T which is the network targets so it only operates on inputs? Or would calling an operation like that completely restore the net and make it forget about any old training? Is there any specific command that should be used to indicate that the training is additional to the old training?
Moreover, my current net is set up with a divideblock data division function. In an incremental training situation I imagine that there is no data division, should settings like that and potentially others be changed in any way before switching to incremental training?
thanks.

Sign in to comment.

More Answers (1)

Greg Heath
Greg Heath on 28 May 2015
If the new data is just a time extension of the old data, I recommend either
1. Use batch training on all of the data with the weights
initialized to those found in the initial batch training.
2. Use adaptive training on all of the data with the weights
initialized to those found in the initial batch training.
Hope this helps.
Thank you for formally accepting my answer
Greg
  2 Comments
Peta
Peta on 30 May 2015
Is it always necessary to perform the adaptive training on all of the original data? That could potentially take a really long time when the time series starts to get big, I was hoping that it would be possible to only expand the batch trained net with new values using adapt…
Anyway, I have tried to implement your second recommendation and recreate an adapt net with the settings generated from a batch training, but the results have so far been truly terrible. This is what I have:
%%Before I start I have:
%5001 target time series values in T
%Significant lags in Siglags
%Best feedback delay settings from batch training in Bdelays
%Best hidden nodes settings from batch training in Bhidden
%State of random number generator that yielded best initial weights from batch training stored in s
net = narnet( Siglags(1:Bdelays)', Bhidden); %Create a narnet with the best delays and hidden node settings acquired from batch training
[ Xs, Xsi, Asi, Ts ] = preparets( net, {}, {}, T ); %Prepare net
rng('default'); %Reset the random number generator
rng(s(Bhidden,Bdelays,Bweights)); %Set random number generator to the position that generated the best weights during batch training
net = configure(net,T); %Initialize weights
[net Ys Es Xf Yf tr] = adapt(net,Xs,Ts,Xsi,Asi); %Retrains with adapt instead of train
%%A closed net is now created to predict ahead. This code is completely
%%based on gregs NARNET tutorial "NARNET TUTORIAL ON MULTISTEP AHEAD PREDICTIONS"
plt=0;
ys = cell2mat( Ys );
plot( Bdelays+1:N, ys, 'ro', 'LineWidth', 2 )
legend( 'TARGET', 'OUTPUT' )
title( 'OPENLOOP NARNET RESULTS' )
[ netc Xci Aci ] = closeloop(net,Xsi,Asi);
[Xc,Xci,Aci,Tc] = preparets(netc,{},{},T);
[ Yc Xcf Acf ] = netc(Xc,Xci,Aci);
Ec = gsubtract(Tc,Yc);
yc = cell2mat(Yc);
ts = cell2mat( Ts );
tc = ts;
NMSEc = mse(Ec) /var(tc,1) %10.35! Horrible!?
Xc2 = cell(1,N);
[ Yc2 Xcf2 Acf2 ] = netc( Xc2, Xcf, Acf );
yc2 = cell2mat(Yc2);
plt = plt+1; figure(plt), hold on
plot( Bdelays+1:N, tc, 'LineWidth', 2 )
plot( Bdelays+1:N, yc, 'ro', 'LineWidth', 2 )
plot( N+1:2*N, yc2, 'o', 'LineWidth', 2 )
plot( N+1:2*N, yc2, 'r', 'LineWidth', 2 )
axis( [ 0 2*N+2 0 1.3 ] )
legend( 'TARGET', 'OUTPUT' , 'TARGETLESS PREDICTION')
title( 'CLOSED LOOP NARNET RESULTS' )
ylim([min(tc) max(tc)])
The results I get from this are extraordinarily bad, the normalized mean squared error is over 10 which I’m guessing is quite a bad sign.. Am I doing something wrong in the code?
And in an online prediction scenario if I would like to present a single new value to the net, do I just put that single value in “Ts” and run the [net Ys Es Xf Yf tr] = adapt(net,Xs,Ts,Xsi,Asi); command again or how does that work? Surely I don’t have to retrain on the entire dataset when a new value arrives?
Thanks.
Greg Heath
Greg Heath on 30 Aug 2015
After you close the loop you must test netc. If performance is bad then train netc initialized with the weights obtained from open loop training.
I don't know about trying to adapt to just one point. Don't forget you have lags to deal with. You don't want the computer to grind away changing weights to fit one point and neglecting the others.
I don't know how accurate this is, but my gut might churn significantly if I tried to adapt to new data that wasn't at least twice as long as the longest delay. Sound like something to investigate.
Hope this helps. Greg

Sign in to comment.

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!