- It covers the case where you have multiple sequences of varying length while they share the common feature size. JVC: Load Sequence Data
- You also have control over the ‘SequenceLength’ in each Mini Batch. JVC: Prepare Data for Padding

89 views (last 30 days)

Show older comments

I have come from Tensorflow background and want to use MATLAB for time-series prediction problems because my colleagues are using MATLAB. I know that in Tensorflow, the input to LSTM for each batch has following dimensions (batch_size, lookback, input_features). The term lookback is taken from Francois Chollet's book, however the similar words such as sequence length, num steps are also used for this. This represents how long sequence is fed to LSTM to predict next value. I do not see an option to set this lookback in the function lstmLayer. My question is how can we set this sequence length/look back? Is it possible to view the data being fed to LSTM at each step or for each batch?

My colleague has distributed the input data into train and test by randomly choosing points from input data as test points and remaining as training points. I have difficulty in understanding that how does this works because by removing certain points from input data, will break the "sequence" nature of the data. The code to split the input data into train and test is as following:

```MATLAB

[data] = xlsread('example.xlsx','Sheet1');

X = (data(:,1:3))';

Y = (data(:,4))';

input = num2cell(X,1);

output = num2cell(Y,1);

data_size = 1:size(input,2);

%% Seperation of training data and validation data

% validation ratio: 0.3

ratio = 0.3;

tst_num = floor(size(input,2)*ratio);

% randomly separate

idx_te = randperm(size(input,2), tst_num);

idx_te2 = sort(idx_te);

idx_tr = setdiff([data_size],idx_te2);

% training/test

XTrain = input(:,idx_tr);

YTrain = output(:,idx_tr);

```

The file "example.xlsx" contains input and output data in different columns.

Asvin Kumar
on 10 Jun 2020

For the first part of your question on number of steps in an LSTM I am going to redirect you to an earlier answer of mine. Essentially, the LSTM unit unrolls to fit the entire length of the sequence.

- It covers the case where you have multiple sequences of varying length while they share the common feature size. JVC: Load Sequence Data
- You also have control over the ‘SequenceLength’ in each Mini Batch. JVC: Prepare Data for Padding

If these don’t meet your requirements, you can always manually construct your dataset in a forward moving window fashion for the required number of steps and associate it with a prediction. This would come under a data preparation step.

For the second part of the question, it is hard for me to comment on the approach to splitting the dataset since I am not aware of the problem setup. If the data along the second dimension of the ‘input’ variable belongs to the same time sequence, I would agree with you that randomly removing data points / time steps would interfere with the temporal information.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!