Custom datastore - why can't I just have a datastore with doubles?
4 views (last 30 days)
Show older comments
Hello!
I'm trying to create a machine learning model based on a multilayer perceptron. The model takes in four inputs and has two outputs. All inputs and outputs are doubles that are between 0 and 1. There are a series of shared hidden layers, then the model splits and there are a series of hidden layers specifically for each output. So, one . I have created the model using dlnetwork ash shown here:
layers1 = [
featureInputLayer(4, "Name", "input")
fullyConnectedLayer(24, "Name", "fc1")
reluLayer('name', 'relu1')
fullyConnectedLayer(24, "Name", "fc2")
reluLayer('name', 'relu2')
fullyConnectedLayer(24, "Name", "fc3")
reluLayer('name', 'relu3')
];
layersA = [ fullyConnectedLayer(16, "Name", "fc4a")
reluLayer('name', 'relu4a')
fullyConnectedLayer(16, "Name", "fc5a")
reluLayer('name', 'relu5a')
fullyConnectedLayer(8, "Name", "fc6a")
reluLayer('name', 'relu6a')
fullyConnectedLayer(8, "Name", "fc7a")
reluLayer('name', 'relu7a')
fullyConnectedLayer(4, "Name", "fc8a")
reluLayer('name', 'relu8a')
fullyConnectedLayer(4, "Name", "fc9a")
reluLayer('name', 'relu9a')
fullyConnectedLayer(4, "Name", "fc10a")
reluLayer('name', 'relu10a')
fullyConnectedLayer(4, "Name", "fc11a")
reluLayer('name', 'relu11a')
softmaxLayer
];
layersB = [ fullyConnectedLayer(16, "Name", "fc4b")
reluLayer('name', 'relu4b')
fullyConnectedLayer(16, "Name", "fc5b")
reluLayer('name', 'relu5b')
fullyConnectedLayer(8, "Name", "fc6b")
reluLayer('name', 'relu6b')
fullyConnectedLayer(8, "Name", "fc7b")
reluLayer('name', 'relu7b')
fullyConnectedLayer(4, "Name", "fc8b")
reluLayer('name', 'relu8b')
fullyConnectedLayer(4, "name", "fc9b")
reluLayer('name', 'relu9b')
fullyConnectedLayer(4, "Name", "fc10b")
reluLayer('name', 'relu10b')
fullyConnectedLayer(4, "Name", "fc11b")
reluLayer('name', 'relu11b')
softmaxLayer
];
net = dlnetwork;
net = addLayers(net, layers1);
net = addLayers(net, layersA);
net = connectLayers(net, "relu3", "fc4a");
net = addLayers(net, layersB);
net = connectLayers(net, "relu3", "fc4b");
[trainedNet, info] = trainnet(xTrain', yTrain', net, "mse", opts);
However, when running trainnet, I get the following error message:
Error using trainnet (line 46)
For networks with multiple inputs or outputs, data must be a datastore.
Error in trainRutileModel_matlabEdits (line 172)
[trainedNet, info] = trainnet(xTrain', yTrain', net, "mse", opts);
I followed the instructions for creating a custom datastore, since Matlab (for whatever reason) doesn't have a datastore for doubles. I took my input and output data and combined them into a cell array formatted as follows:
>> rutileDatastore{2}
ans =
1×6 cell array
{'0.080157'} {'0.38585'} {'0.21999'} {'0.61851'} {'0'} {'0.010101'}
Which I believe is in line with the "Datastores for deep learning" support page. The first four entries are inputs, the last two are outputs. I then found out that Matlab doesn't appear to natively handle datastores of just plain old doubles - for some reason, the most basic data type is neglected. So I had to go to the "Develop Custom Datastore" support page, followed the instructions there, made the myDatastore class, and then went on to try to validate the datastore using the "Testing Guidelines for Custom Datastores" support page. Unfortunately, I can't get this to work with any of the datastore types on the custom datastore page - the datastore of cells that reads out correctly isn't one of the 'Type' values listed on the "Datastores" support page.
I managed to get it working as a tabular text datastore, but read(ds) on a tabular text datastore outputs tables, and according to the "Datastores for deep learning" support page, read(ds) needs to output cell arrays and the tabular text datastores output tables.
What gives? Does Matlab support datastores that are just numbers? How can I go about a) fulfilling the requirement of trainnet that my training data is in datastore format, and b) having the datastore sore just plain old numbers that are output in 1D cell array format?
0 Comments
Answers (2)
Garmit Pant
on 7 Aug 2024
Hello Matthew,
You have followed the correct workflow to create a datastore for a model with multiple inputs and outputs. Such a datastore should store each input and output as a separate cell, so the output of the “read” method is an N x M cell array, where N is the total number of data items and M is the total number of inputs and outputs (6, in your case).
For numerical ‘double’ data, you can create a “FileDatastore” by specifying a read function to read the file contents. To achieve this, you need to follow these steps:
- Convert Training Data into Individual Cells: Each row in your data matrix will be split into six separate cells.
- Save the Converted Data: Save the cell arrays into a MAT-file.
- Load and Transform the Data: Use 'datastore' and 'transform' to load and reformat the data.
You can use the following code snippet to create the datastore:
% Assuming your data is in a matrix called `data`
% Each row of `data` is [input1, input2, input3, input4, output1, output2]
data = [
0.080157, 0.38585, 0.21999, 0.61851, 0, 0.010101;
0.123456, 0.654321, 0.111111, 0.222222, 0, 0.333333;
% Add more rows as needed
];
% Convert each row to a cell array of six separate cells
numRows = size(data, 1);
combinedCells = cell(numRows, 6);
for i = 1:numRows
combinedCells(i, :) = num2cell(data(i, :));
end
% Save the combined data
save('trainingData.mat', 'combinedCells');
% Create a FileDatastore
filedatastore = datastore('trainingData.mat', 'Type', 'file', 'ReadFcn', @load);
% Transform the datastore to extract combinedCells
trainingDatastore = transform(filedatastore, @rearrangeData);
% Test reading from the datastore
dataOut = read(trainingDatastore)
function out = rearrangeData(ds)
out = ds.combinedCells;
end
For further understanding, kindly refer to the following MathWorks documentation:
- Refer to the ‘Input Arguments’ section to understand how to correctly create different types of datastores: MathWorks Documentation.
I hope you find the above explanation and suggestions useful!
See Also
Categories
Find more on Parallel and Cloud in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!