Clear Filters
Clear Filters

Partitioning data for Time Series TCN model Training, Validation, and Testing

7 views (last 30 days)
Hello there, I am trying to build a TCN model to predict a continuous variable. I have time series data in which I am using 3 input features (accelrometer measuments in x,y,z directions) to estimate/predict a continuous variable. I have acceleromter data from 10 different trials stored in a 10x1 cell and each cell has the three accelerometer measurments over time stored in a 500x3 table for that trial. The target continous varable I am trying to predict is simialrly stored in a 10x1 cell array with each cell contaning a the a 500x1 table which is the true value of the predicted variable over time named "Taget". If I am trying to build a TCN model with this data what is the best way to partition the data for training, testing (10%), and validation (10%)? I think I need to use the tspartition function but am not sure how to use it for this type of data. Do I need to combine the data from all 10 trials into one large table and then partition? Or should I partition each trial seprately, train the model on a singluar trial, and then retrain the model on the next trial and so on. Any help would be greatly appreciated!

Answers (1)

Krishna
Krishna on 6 Jun 2024
Hello Isabelle,
Based on your description, I think you're seeking the correct method for dividing your time series data into training, testing, and validation sets. I can share an effective approach that I have personally utilized.
  1. You've mentioned having 10 observations, with each one comprising both input and output data. Specifically, the input data consists of a time series sequence of 500 steps with 3 features, and the output data is a sequence of 500 steps for a single variable. Therefore, your data should be organized as 1x10 sequences within a cell array, where each sequence is represented as a list of 500x4, including 3 inputs and 1 output.
  2. To partition this data into training, testing, and validation sets, you can use the cvpartition function. However, it's important to note that cvpartition generates two sets at a time, necessitating its use twice. Initially, divide the data into a training set and a combined testing/validation set. Subsequently, split the latter into distinct testing and validation sets. After this the whole trainData would contain 8 sequences(80 percent) and validate and test would contain 1 sequence each (10 percent each).
  3. Once partitioned, proceed to organize the training data into Xtrain, which comprises the input sequences of 500x3, and Ytrain, which includes the output sequences of 500x1.
Please go through the following documentation to learn more,
Hope this helps.

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!