Deep Learning Toolbox - Structuring the Training Data from Imported Data

Hi,
I am attempting to create a ROM of a gas exchange process by training a LSTM network. I am using the ROM example LSTM ROM as a starting point. My data is captured in a csv format (i.e, dataraw_03), where I have run 4 simulations in ANSYS, varying the equivalance ratio (0.3 to 0.6) in each simulation to capture the dynamics.
I have transposed the 4 data sets (i.e., dataT_0.3) and removed the simulation time from the data (i.e., data_0.3). From how I understand the ROM LSTM example workflow I have to now format and combine the 4 data sets into a single 4*1 cell?
I am stuck on how to get the 4 data sets into a single cell (i.e., data) so I can then prepare the data for training, which will then be downsampled the ANSYS data and partitioning the data for training and test data. Another post processing issue is ANSYS uses a variable step solver, so the 4 data set do varying in length (time).
Any help as to how best to structure and prepare the 4 data sets for training the LSTM network would be great. I have included the code, up until creating the cell array for the training.
Thanks in advance,
Patrick
%Import the ANSYS Raw Data
dataraw_03 = xlsread('export_Ethanol_20%_440t_0.3.csv');
dataraw_04 = xlsread('export_Ethanol_20%_440t_0.4.csv');
dataraw_05 = xlsread('export_Ethanol_20%_440t_0.5.csv');
dataraw_06 = xlsread('export_Ethanol_20%_440t_0.6.csv');
%Transpose the data
dataT_03 = dataraw_03';
dataT_04 = dataraw_04';
dataT_05 = dataraw_05';
dataT_06 = dataraw_06';
%Remove Time from the ANSYS data
data_03 = dataT_03(2:end, 1:end); % Remove the first row, time
data_04 = dataT_04(2:end, 1:end); % Remove the first row, time
data_05 = dataT_05(2:end, 1:end); % Remove the first row, time
data_06 = dataT_06(2:end, 1:end); % Remove the first row, time
%Create a single Cell Array from the 4 Sepertate ANSYS Simulations (.csv) for LSTM Training
numObservations = 4; % 4 ANSYS Simulations Conducted
EquivRatio = linspace(0.3,0.6,numObservations); % Equivalnce ratio swept from 0.3 to 0.6
data = cell(numObservations,1);
% Stuck at this stage???
%for i = 1:numObservations
%EquivRatio = EquivRatio(i);
%data{i} = data_03...;
%end

 Accepted Answer

Hello PB75,
If I understand it correctly, you would like to arrange data_03, data_04, data_05 and data_06 into a cell array that can then be prepared as input data for a network with a sequenceInputLayer.
If that's the case, you can concatenate them into a 4x1 cell like this:
data = {data_03; data_04; data_05; data_06};
You can then follow the remaining data processing steps from the example that you linked. With the default options, the software will automatically pad the sequences so that all observations in a mini-batch have the same length, meaning that it can handle sequences of differing lengths.
However, if the observations have all been taken over the same time interval, you may wish to interpolate the data using a function such as interp1 before training so that each time step represents the same period.

6 Comments

Hi David,
Many thanks for taking the time to answer my query. Your right, the data is being prepared for an LSTM network with a sequenceInputLayer.
With regards to the differing lengths of the input data (as ANSYS uses a variable step solver), I tried to plot one of the observations (pressure) just to check the data in the concatenated cell, and as the lengths are different, as expected it would only plot the one with the same time vector length. Plotting the data before the slitting the data would be ideal to check the data itself.
The simulations in ANSYS have all the same total simulation time of 0.0309 sec, with a maximum step size defined as 5E-05 sec in ANSYS, however, all 4 data sets have varing step sizes and length (average step sizes ranging from 5.2E-05 to 5.2E-05 sec with lengths of 574 to 596). Can the interp1 function or resample function be used on the cell or the raw data?
An example would be great, I have had a look through the help documents, but not sure exactly what I am looking for to be honest.
Thanks,
Patrick
%Create a single Cell Array from the 4 Sepertate ANSYS Simulations (.csv) for LSTM Training
numObservations = 4; % 4 ANSYS Simulations Conducted
EquivRatio = linspace(0.3,0.6,numObservations); % Equivalnce ratio swept from 0.3 to 0.6
data = {data_03; data_04; data_05; data_06}; % Concatenate all 4 data sets to a single cell array
%Extract Time from First ANSYS Data Set for plotting and valdating data
times = dataT_03(1,:); % Row 1 all Columns from first data set
numTimeSteps = length(times);
%Plot Pressures From Cell to validate data
figure % Plot the pressures for all 4 simulations, to check data. Only plots 1 pressure observation!!
for i = 1:4
pressure = data{i}(4,:);
plot(times,pressure);
hold on
end
title("In-Cylinder Pressure")
legend("Observation " + (1:4))
xlabel("Time")
ylabel("Pressure (bar)")
hold off
To linearly interpolate data_04 to the time steps of data_03, you can do the following:
queryTimes = dataT_03(1,:);
originalTimes = dataT_04(1,:);
data_04_interp = interp1(originalTimes, data_04', queryTimes)';
(note the transposes required because of the syntax of interp1).
You can do something similar with the other data points. Interpolating will lose some accuracy of course, so this assumes the time steps are appropriately fine-grained.
Thanks David,
The interp1 code works fine for the first data set data_04, however, when I attempt to do the same on the remaining data sets data_05 and data_06 and using the time steps from data_03, it flags the following error?
Sample points must be unique.
Error in interp1 (line 185)
VqLite = matlab.internal.math.interp2(Xext,V,method,method,...
%Import Chemkin Data from Folder Location
dataraw_03 = xlsread('Data_Ethanol_Raw\export_Ethanol_20%_440t_0.3.csv'); % 574x7 double
dataraw_04 = xlsread('Data_Ethanol_Raw\export_Ethanol_20%_440t_0.4.csv'); % 594x7 double
dataraw_05 = xlsread('Data_Ethanol_Raw\export_Ethanol_20%_440t_0.5.csv'); % 595x7 double
dataraw_06 = xlsread('Data_Ethanol_Raw\export_Ethanol_20%_440t_0.6.csv'); % 596x7 double
%Transpose the data
dataT_03 = dataraw_03'; % 7x574 double
dataT_04 = dataraw_04'; % 7x594 double
dataT_05 = dataraw_05'; % 7x595 double
dataT_06 = dataraw_06'; % 7x596 double
%Remove Time from the ANSYS data
data_03 = dataT_03(2:end, 1:end); % 6x574 double by Removing the first row, time
data_04 = dataT_04(2:end, 1:end); % 6x594 double by Removing the first row, time
data_05 = dataT_05(2:end, 1:end); % 6x595 double by Removing the first row, time
data_06 = dataT_06(2:end, 1:end); % 6x596 double by Removing the first row, time
%Interpolate Data set 4 with time dimensions from Data set 3
queryTimes = dataT_03(1,:); % Reference Time from data set 3
originalTimes = dataT_04(1,:); % Time from data set 4
data_04_interp = interp1(originalTimes,data_04',queryTimes)'; % 6x574 double
data_04=data_04_interp; % Re-save interp data for training
%Interpolate Data set 5 with time dimenions from Data set 3
queryTimes = dataT_03(1,:); % Reference Time from data set 3
originalTimes = dataT_05(1,:); % Time from data set 5
data_05_interp = interp1(originalTimes,data_05',queryTimes)'; % 6x574 double??
data_05=data_05_interp; % Re-save interp data for training
%Interpolate Data set 6 with time dimenions from Data set 3
queryTimes = dataT_03(1,:); % Reference Time from data set 3
originalTimes = dataT_06(1,:); % Time from data set 6
data_06_interp = interp1(originalTimes,data_06',queryTimes)'; % 6x574 double??
data_06=data_06_interp; % Re-save interp data for training
The error message suggests that you have some duplicate values in the time steps for data_05 and data_06. If the simulated values corresponding to those time steps are the same, you can probably just remove those time steps from the data arrays.
Hi David, Thanks for your answer. Yes it looks like the data captured in ANSYS has duplicate entrys in the time column.
Hi David,
Thanks for your answer which also helps me.
Further more, I want to know how to prepare the data for predicting if I have several samples. For example, we have the DATA:data_07 , data_08, ... data_04000 (means so much data need to prepared using this model) which dimensions all are 6x545. How can I INPUT those data and PREDICTANDUPDATESTATE the net state?
Many thanks!

Sign in to comment.

More Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Products

Release

R2022a

Asked:

on 24 Aug 2022

Commented:

on 29 Oct 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!