It's my first time dealing with the Deep Learning Toolbox and with large datasets in Matlab, so honestly, any help or direction you can give me will be extremely helpful. I'm finding it really hard to start since most of the documentation is about how to work with images, and I'm working on a different problem.
I have run simulations in Simulink and have over 2000 files with over 50000 timesteps each (about 100GB of data). Each of these files have about 60 parameters recorded, of which I want to use 39 as inputs and 2 as targets for my training (the remaining parameters would not be used for training). I do not want to train the network for a timeseries, so each timestep is a separate/independent datapoint in my training.
So far I know that:
- 100GB of data is too much to load for training
- There are some "datastore" functions I can use to somehow load/read the data but not use memory in it. Not entirely sure this is correct but the documentation linking datastores to Deep Learning Toolbox only considers images which makes it a bit harder to understand.
And I have divided my task into some steps I think I need to follow:
- Build a datastore reading function (custom) that reads my files and extracts the 41 variables I need for training. a) I have no idea what this reading function even looks like or how it should be written. b) I do not know the format I need for these training variables, some are inputs and some targets,... how does the network deal with this, how do you specify which ones?
- Find a way to get the data into the training function. Examples, again, are focused on images and it's hard to extrapolate that into different examples. In my case, each file has over 50000 training samples (multiplied by the 41 variables involved); if these were images, each file would be one single image. How does the training function understand this? How do I specify that it must deal with the samples in that way?
I will emphasise I have read the documentation already, on Deep Learning Toolbox, training and the datastore: half of the things I don't understand what they mean or I'm not able to extrapolate to my problem. I would appreciate if anyone that has worked on this for longer could give me a hand by explaining some things to me, pushing me into the right direction,...