How to Predict Timeseries data using Neural Network in Matlab

96 views (last 30 days)
Muhammad Usman Saleem
Muhammad Usman Saleem on 28 May 2022
Edited: Muhammad Usman Saleem on 25 Jun 2022 at 7:07
I've timeseries (past 31 years ) dataset of 41 independent variables. I want to predict my three dependent variables (These depends on rest 41 variables) using Neural Network in Matlab. I want to trained my ANN model using 80% timeseries dataset and 20% dataset for testing model. Then I want to predict (or forecaste) for two years (NAN values in excel file) my 3 dependent variables using my best trained model. I google about the problem, but not able to understand provided codes as mostly practice on images not excel dataset. How can I do that In Matlab?
I've attached my sample dataset which describe my problem well in excel with this question. A timely reply will be highly appriciated.

Answers (1)

Muhammad Usman Saleem
Muhammad Usman Saleem on 28 May 2022
Edited: Muhammad Usman Saleem on 30 May 2022
I able to make below code for prediction of variable using Google, this code is runing fine:
file=csvread('data.csv'); % you can delete first column as its date column
data=cell(size(file,2),1); %features by variables
for i=1:size(data,1)
data{i}=file(:,i).'; % feature wise timeseries dataset of variables
end
%Partition the data into training and test sets. Use 90% of the observations for training and the remainder for testing.
numObservations = numel(data);
idxTrain = 1:floor(0.9*numObservations);
idxTest = floor(0.9*numObservations)+1:numObservations;
dataTrain = data(idxTrain);
dataTest = data(idxTest);
%%Prepare Data for Training
for n = 1:numel(dataTrain)
X = dataTrain{n};
XTrain{n} = X(:,1:end-1);
TTrain{n} = X(:,2:end);
end
%% data is going normalized
muX = mean(cat(2,XTrain{:}),2); %calculating means values why code calculating all population mean?
sigmaX = std(cat(2,XTrain{:}),0,2);
muT = mean(cat(2,TTrain{:}),2);
sigmaT = std(cat(2,TTrain{:}),0,2);
for n = 1:numel(XTrain)
XTrain{n} = (XTrain{n} - muX) ./ sigmaX;
TTrain{n} = (TTrain{n} - muT) ./ sigmaT;
end
%Define LSTM Network Architecture
numChannels = size(data{1},1)
layers = [
sequenceInputLayer(numChannels)
lstmLayer(128)
fullyConnectedLayer(numChannels)
regressionLayer];
%%Specify Training Options
maxEpochs = 100; % not sure what is this value means, I just add this after google ??????
miniBatchSize = 27; % not sure what is this mini batch size? What will be suitable for my above dataset
%setting option
options = trainingOptions('adam', ...
'ExecutionEnvironment','cpu', ...
'MaxEpochs',maxEpochs, ...
'MiniBatchSize',miniBatchSize, ...
'GradientThreshold',1, ...
'Verbose',false, ...
'Plots','training-progress');
%%Train Neural Network
net = trainNetwork(XTrain,TTrain,layers,options); % taking all trainging data mean?? why i not need this
%Test Network
for n = 1:size(dataTest,1)
X = dataTest{n};
XTest{n} = (X(:,1:end-1) - muX) ./ sigmaX;
TTest{n} = (X(:,2:end) - muT) ./ sigmaT;
end
%%Make predictions using the test data.
YTest = predict(net,XTest)%,SequencePaddingDirection="left");
%%To evaluate the accuracy, for each test sequence, calculate the root mean squared error (RMSE) between the predictions and the target.
for i = 1:size(YTest,1)
rmse(i) = sqrt(mean((YTest{i} - TTest{i}).^2,"all"));
end
figure
histogram(rmse)
xlabel("RMSE")
ylabel("Frequency")
mean(rmse)
%%Forecast Future Time Steps
idx = 2;
X = XTest{idx};
T = TTest{idx};
figure
%stackedplot(X',DisplayLabels="Channel " + (1:numChannels))
stackedplot(X')
xlabel("Time Step")
title("Test Observation " + idx)
%%Open Loop Forecasting
net = resetState(net);
offset = 1;
[net,~] = predictAndUpdateState(net,X(:,1:offset));
%%To forecast further predictions, loop over time steps and update the network state using the predictAndUpdateState function
numTimeSteps = size(X,2); % why numTimesteps set here 2? My dataset is of monthly duration?????
numPredictionTimeSteps = numTimeSteps - offset;
Y = zeros(numChannels,numPredictionTimeSteps);
for t = 1:numPredictionTimeSteps
Xt = X(:,offset+t);
[net,Y(:,t)] = predictAndUpdateState(net,Xt);
end
%%Compare the predictions with the target values.
figure
%t = tiledlayout(numChannels,1); % tiledlayout in 2019b use subplot
t = subplot(numChannels,1,1);
title(t,"Open Loop Forecasting")
for i = 1:numChannels
plot(T(i,:))
hold on
plot(offset:numTimeSteps,[T(i,offset) Y(i,:)],'--')
ylabel("Channel " + i)
end
xlabel("Time Step")
legend(["Input" "Forecasted"])
Problems in the above code:
(1) I want to predict my first three columns which depends on 41 rest of colums? I want to predicted them one by one actual vs predicted in plots. I not sure whether the above code is doing the same or not??
(2) Just after the training in above code, why this mean, stdv has been caculate for all training data and testing data? Training and testing dataset consist on my 41 independent variables. Each variable has different sense that why mean or stdv looking not sense to this code? Is mean and stdv for single variable will be suitable in ANN? (according to understanding)?
(3) This is my step by step working: I want to make 90% training and 10% testing, then want to train the ANN. Then validate and select best neural network. Then want to predict 2 last values of first 3 columns in data.csv file from optimum ANN?
(4) At the end, I want to make sure from experts over this plate form which verify my code whether its doing the following tasks correctly or not ??
(5) Is after training the dataset for 45 variables, I can predict 1 variables out of 3? then variable no 2 and 3 ? Is the results get from the train model will be reliable for predicting individual variable? It's much confusing to me? Please clerify ? Please
I'll be very thankful to experts for timely response?
  2 Comments
Muhammad Usman Saleem
Muhammad Usman Saleem on 25 Jun 2022 at 7:07
Many thanks for your detail explaination to my problem. I am confuse with this why my number of channels will be 7? I've 41 variables but this line Each sequence is of length 6 with time steps corresponding to each month used, and each time step is a vector of length 7, where 7 is the number of variables for each month. In the above example, numChannels = 7
Dear Sir, I've 41 variables, why my numchannels are 7? its confusing me?
Will you please correct my code according to my problem? If I use wrong code, then error will be multiple in my research?
Many thanks for your kind support..

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!