Why my validation RMSE and loss increase after some epoch by my training data increase

6 views (last 30 days)
Hello everyone
I am trying to predict traffic flow of future steps by previous collected data so I Use LSTM for it
but my validation loss and rmse increase and training loss and rmse decrease .because I am net to LSTM I don't know which parameters I should check for improving model and predictions.
the picture of training progress is :
also I use different lags time for my predictions and here in my codes I have 4 step lag time
XTrain_ZaMir = (XTrain_ZaMir - mu_ZaMir)/sig_ZaMir;
YTrain_ZaMir = (YTrain_ZaMir - mu_ZaMir)/sig_ZaMir;
XTrain_ZaMir = XTrain_ZaMir(:,1:end-4);
YTrain_ZaMir = YTrain_ZaMir(:,5:end);
Test_ZaMir = [flowTe_ZaMir flowTeOther_ZaMir]';
nt = floor(0.7*length(Test_ZaMir));
YTest_ZaMir = Test_ZaMir(1,1:end);
XTest_ZaMir = Test_ZaMir(1,1:end); %One input
% XTest_ZaMir = Test_ZaMir(:,1:end); % More than One input
XTest_ZaMir = (XTest_ZaMir - mu_ZaMir)/sig_ZaMir;
YTest_ZaMir = (YTest_ZaMir - mu_ZaMir)/sig_ZaMir;
XVal_ZaMir = XTest_ZaMir(:,1:nt-4);
YVal_ZaMir = YTest_ZaMir(:,5:nt);
XTest_ZaMir = XTest_ZaMir(:,nt+4:end-1);
YTest_ZaMir = YTest_ZaMir(:,nt+5:end);
%% Layers and Options
numResponses = 1 ;
featureDimension = 1;
numHiddenUnits =200 ;
layers = [ ...
sequenceInputLayer(featureDimension)
lstmLayer(numHiddenUnits)
% dropoutLayer(0.002)
fullyConnectedLayer(numResponses)
regressionLayer
];
maxepochs = 250;
minibatchsize =128;
options = trainingOptions('adam', ... %%adam
'MaxEpochs',maxepochs, ...
'GradientThreshold',1, ...
'InitialLearnRate',0.005, ...
'ValidationData',{XVal_ZaMir,YVal_ZaMir},...
'ValidationFrequency',20,...
'Shuffle','every-epoch',...
'MiniBatchSize',minibatchsize,...
'LearnRateSchedule','piecewise', ...
'LearnRateDropPeriod',150, ...
'LearnRateDropFactor',0.005, ...
'Verbose',1, ...
'Plots','training-progress');
%% Train the Network
[net,info] = trainNetwork(XTrain_ZaMir,YTrain_ZaMir,layers,options);
[net,YPred_ZaMir]= predictAndUpdateState(net,XTest_ZaMir);
numTimeStepsTest= (0.5*floor(length(XTest_ZaMir)));
for i = 2:numTimeStepsTest
[net,YPred_ZaMir(:,i)] = predictAndUpdateState(net,XTest_ZaMir(:,i-1),'ExecutionEnvironment','cpu');
% net = resetState(net);
end
YTest_ZaMir = sig_ZaMir*YTest_ZaMir + mu_ZaMir;
YPred_ZaMir = sig_ZaMir*YPred_ZaMir + mu_ZaMir;

Answers (1)

Aneela
Aneela on 10 Sep 2024
Edited: Aneela on 10 Sep 2024
Hi Arash,
You are experiencing “overfitting” with the LSTM model where training loss decreases while the validation loss increases.
  • Add a “dropoutLayer” after the LSTM layer to prevent overfitting.
dropoutLayer(0.2)
  • The initial learning rate is high which might overshoot the optimal weights. Reduce it to 0.001 or even lower and see if it improves convergence.
  • Add L2 regularization to the “fullyConnectedLayer” which prevents overfitting by adding a penalty which prevents model from learning complex patterns.
fullyConnectedLayer(numResponses, 'L2Factor', 0.001)
  • Implement early stopping by monitoring the validation loss. This can prevent overfitting by stopping training when the validation loss starts to increase.
Refer to the following MathWorks documentation for more information on LSTM: https://www.mathworks.com/discovery/lstm.html

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!