Multimodal deep learning : resizing a layer output

Question

Arthur CASSOU on 28 Jun 2022

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/1749460-multimodal-deep-learning-resizing-a-layer-output

Answered: Milan Bansal on 5 Oct 2023

Hello everyone, here is my problem below :

I am currently studying the combination of numeric features and images as inputs for a neural-network-based classifier. Until now, I have based my code on the following example : https://nl.mathworks.com/help/deeplearning/ug/train-network-on-image-and-feature-data.html

My data is a mix of RGB images and radar data (position, speed & radar cross section).

The twist is that for my code, I wish to use the pretrained networks that Matlab offers (GoogLeNet, AlexNet, ...) for the image treatment part. This gives me a network which looks as the following (with the pretrained network GoogLeNet) :

As you can see on the picture, I have to concatenate along dimension 3, as the output of the layer "pool5-drop_7x7_s1" is of size 1x1x1024.

When I then want to train the network (still according to the mathworks example), I get the error message that all data must have the same dimension label, which can be SSCB or CB ( see below )

Because of that, I tried to concatenate along the first dimension, which means resizing the previous ouptut to size 1024, like in the mathworks example. To do so, I added a fully connected layer between the layer "pool5-drop_7x7_s1" and the concatenation layer. This worked, however when I trained the whole network, I ended up having NaN values for my loss array, making the regression useless. I'm suspecting that this comes from the fact that I added a the layer mentionned above.

Has anyone experienced this kind of problem and would have a solution ?

Please find my code below :

close all; clear; clc;
addpath("pretrained networks", "pretrained networks\networks");
%% Importing the data
path = 'M:\00_dataset\data5';
radarData = getRadarData(path);
radarData.Properties.VariableNames = ["ID", "x", "y", "z", "vx", "vy", "vz", "rcs"];
eoData = imageDatastore(path, 'IncludeSubfolders', true, 'LabelSource', 'foldernames');
nbrClasses = numel(categories(unique(eoData.Labels)));
%% Pretrained network importation
pretrainedNet = GoogLeNet();
analyzeNetwork(pretrainedNet);
layer1 = pretrainedNet.Layers(141); % "pool5-drop_7x7_s1";
layer2 = pretrainedNet.Layers(142); % 'loss3-classifier';
%% Preprocessing the data
% Shuffling the data
eoData = shuffle(eoData);
% Splitting the EO Data (train, test & validation sets)
[trainEOData, testEOData] = splitEachLabel(eoData, 0.85);
[trainEOData, valEOData] = splitEachLabel(trainEOData, 0.7/0.85);
yTrain = trainEOData.Labels; yTest = testEOData.Labels; yVal = valEOData.Labels;
classes = categories(yTrain);
% Preprocessing the radar data
[m1, ~] = size(radarData);            % Getting rid of the id
labels = [];
for k = 1:m1
    label = char(radarData.ID(k));
    label = label(1:end-3);
    labels = [labels; convertCharsToStrings(label)];
end
radarData.Label = labels;
radarData = convertvars(radarData, "Label", 'categorical');   % Making labels categorical
% Building the radar data sets (train, test & validation sets)
trainRadarData = []; testRadarData = []; valRadarData = [];
for idx = 1:length(eoData.Files)
    file = eoData.Files{idx};
    tmp = erase(file, path);
    tmp = split(tmp, "\");
    id = tmp{3, 1};
    l = length(char(findLabelInID(id)));
    id = id(1:l+3);
    lineIdx = find(strcmp(radarData.ID, id));
    if any(strcmp(trainEOData.Files, file))
        trainRadarData = [trainRadarData; radarData(lineIdx, 2:end-1)];
    elseif any(strcmp(testEOData.Files, file))
        testRadarData = [testRadarData; radarData(lineIdx, 2:end-1)];
    else
        valRadarData = [valRadarData; radarData(lineIdx, 2:end-1)];
    end
end
% Preprocessing the EO data
trainEOData = transform(trainEOData, @(x) imresize(x, pretrainedNet.Layers(1).InputSize(1:2)));
testEOData = transform(testEOData, @(x) imresize(x, pretrainedNet.Layers(1).InputSize(1:2)));
valEOData =  transform(valEOData, @(x) imresize(x, pretrainedNet.Layers(1).InputSize(1:2)));
xTrain = readall(trainEOData); xTest = readall(testEOData); xVal = readall(valEOData);
[m1, n, p] = size(xTrain); [m2, ~] = size(xTest); [m3, ~] = size(xVal); 
splitsTrain = {n*ones(1, m1/n), n, p}; splitsTest = {n*ones(1, m2/n), n, p};
splitsVal = {n*ones(1, m3/n), n, p};
splitCellTrain = mat2cell(xTrain, splitsTrain{:});
splitCellTest = mat2cell(xTest, splitsTest{:});
splitCellVal = mat2cell(xVal, splitsVal{:});
xTrain = cat(4, splitCellTrain{:}); xTest = cat(4, splitCellTest{:});
xVal = cat(4, splitCellVal{:});
% Combining into one datastore
dsTrain = arrayDatastore(xTrain, 'IterationDimension', 4);
dsTest = arrayDatastore(xTest, 'IterationDimension', 4);
dsVal = arrayDatastore(xVal, 'IterationDimension', 4);
[~, nbrFeatures] = size(trainRadarData); 
for k = 1:nbrFeatures
    trainFeat = table2array(trainRadarData(:, k)); valFeat = table2array(valRadarData(:, k));
    testFeat = table2array(testRadarData(:, k));
    
    dsTrain = combine(dsTrain, arrayDatastore(trainFeat));
    dsVal = combine(dsVal, arrayDatastore(valFeat));
    dsTest = combine(dsTest, arrayDatastore(testFeat));
end
dsTrain = combine(dsTrain, arrayDatastore(yTrain));
dsVal = combine(dsVal, arrayDatastore(yVal));
dsTest = combine(dsTest, arrayDatastore(yTest));
%% Network adaptation
layers = layerGraph(pretrainedNet);
layers = disconnectLayers(layers, layer1.Name, layer2.Name);
fullyConnectedLayerOld = layers.Layers(142);
outputClassifierOld = layers.Layers(144);
% Uncomment to add extra layer 
% fc15 = fullyConnectedLayer(fullyConnectedLayerOld.InputSize, 'Name', 'fc1.5', 'WeightLearnRateFactor', 10, ...
%     'BiasLearnRateFactor', 10);
% layers = addLayers(layers, fc15);
% layers = connectLayers(layers, layer1.Name, fc15.Name);
dim = 3; % dim = 1 if extra layer is added
concatLayer = concatenationLayer(dim, nbrFeatures+1, 'Name', 'concat');
layers = addLayers(layers, concatLayer);
% layers = connectLayers(layers, fc15.Name,
% strcat(concatLayer.Name,"/in1")); % uncomment to add extra layer
layers = connectLayers(layers, layer1.Name, strcat(concatLayer.Name,"/in1")); % comment to add extra layer
for k = 1:nbrFeatures
    featInputLayer = featureInputLayer(1,'Name',strcat('featuresInput', int2str(k+1)));
    layers = addLayers(layers, featInputLayer);
    layers = connectLayers(layers, featInputLayer.Name, strcat(concatLayer.Name,"/in", int2str(k+1)));
end
layers = connectLayers(layers, concatLayer.Name, layer2.Name);
fullyConnectedLayerNew = fullyConnectedLayer(nbrClasses, ...
    'Name', 'Fully Connected New', ...
    'WeightLearnRateFactor', 10, ...
    'BiasLearnRateFactor', 10);
layers = replaceLayer(layers,fullyConnectedLayerOld.Name, fullyConnectedLayerNew);
layers = removeLayers(layers,outputClassifierOld.Name); 
net = dlnetwork(layers);
analyzeNetwork(net);
%% Training options
numEpochs = 15;
miniBatchSize = 128;
learnRate = 1e-5;
decay = 0.01;
momentum = 0.9;
plots = "training-progress";
%% Model training
velocity = [];
mbq = minibatchqueue(dsTrain,...
    'MiniBatchSize', miniBatchSize,...
    'MiniBatchFcn', @preprocessMiniBatch,...
    'MiniBatchFormat', {'SSCB','CB','CB', 'CB', 'CB', 'CB', 'CB', 'CB', ''});
if plots == "training-progress"
    figure
    lineLossTrain = animatedline('Color',[0.85 0.325 0.098]);
    ylim([0 inf])
    xlabel("Iteration")
    ylabel("Loss")
    grid on
end
iteration = 0;
start = tic;
% Loop over epochs.
for epoch = 1:numEpochs
    
    % Shuffle data.
    shuffle(mbq)
    
    % Loop over mini-batches.
    while hasdata(mbq)
        iteration = iteration + 1;
        
        % Read mini-batch of data.
        [dlX1, dlX2, dlX3, dlX4, dlX5, dlX6, dlX7, dlX8 ,dlY] = next(mbq);
        
        % Evaluate the model gradients, state, and loss using dlfeval and the
        % modelGradients function and update the network state.
        
        [gradients, state, loss] =...
            dlfeval(@modelGradients, net, dlX1, dlX2, dlX3, dlX4, dlX5, dlX6, dlX7, dlX8, dlY);
        net.State = state;
        
        % Update the network parameters using the SGDM optimizer.
        [net, velocity] = sgdmupdate(net, gradients, velocity, learnRate, momentum);
        
        if plots == "training-progress"
            % Display the training progress.
            D = duration(0,0,toc(start),'Format','hh:mm:ss');
            %completionPercentage = round(iteration/numIterations*100,0);
            title("Epoch: " + epoch + ", Elapsed: " + string(D));
            extractdata(loss)
            addpoints(lineLossTrain,iteration,double(gather(extractdata(loss))))
%             drawnow
        end
    end
end
%% Model testing
mbqTest = minibatchqueue(dsTest,...
    'MiniBatchSize',miniBatchSize,...
    'MiniBatchFcn', @preprocessMiniBatch,...
    'MiniBatchFormat', ["SSCB", "CB", "CB", "CB", "CB", "CB", "CB", "CB", ""]);
[predictions, predCorr] = modelPredictions(net, mbqTest, categories(eoData.Labels)); 
accuracy = mean(predCorr);
idx = randperm(size(xTest,4), 9);
figure
for i = 1:9
    subplot(3,3,i)
    I = xTest(:,:,:,idx(i));
    imshow(I)
    label = string(predictions(idx(i)));
    title("Predicted Label: " + label)
end
%% Utilities
function data = getRadarData(adress)
    % retrieves the radar data at the given adress
    data = [];
    folder = dir(fullfile(adress,'*'));
    subfolders = setdiff({folder([folder.isdir]).name},{'.','..'}); % list of subfolders of D.
    for ii = 1:numel(subfolders)
        extension = dir(fullfile(adress,subfolders{ii},'*.txt')); % improve by specifying the file extension.
        files = {extension(~[extension.isdir]).name}; % files in subfolder.
        for jj = 1:numel(files)
            file = fullfile(adress,subfolders{ii},files{jj});
%             fid = fopen(file);
%             table = textscan(fid, '%s,%f,%f,%f,%f,%f,%f,%f', 'Delimiter', ',');
%             fclose(fid);
            table = readtable(file, 'TextType', 'string');
            data = [data; table];
        end
    end
end
function s2 = findLabelInID(s1)
    labels = ["aircraftCarrier", "tanker", "fighter", "liner", "smallPlane", "tank", "car", "truck"];
    idx = 1;
    while ~contains(s1, labels(idx))
        idx = idx + 1;
    end
    s2 = labels(idx);       
end
function [gradients,state,loss] = modelGradients(dlnet, dlX1, dlX2, dlX3, dlX4, dlX5, dlX6, dlX7, dlX8, Y)
[dlYPred,state] = forward(dlnet, dlX1, dlX2, dlX3, dlX4, dlX5, dlX6, dlX7, dlX8)
loss = crossentropy(dlYPred, Y)
gradients = dlgradient(loss, dlnet.Learnables);
end
function [classesPredictions,classCorr] = modelPredictions(dlnet, mbq, classes)
    classesPredictions = [];    
    classCorr = [];  
    % Loop over mini-batches.
    while hasdata(mbq)
        [dlX1, dlX2, dlX3, dlX4, dlX5, dlX6, dlX7, dlX8, dlY] = next(mbq);
        % Make prediction.
        dlYPred = predict(dlnet, dlX1, dlX2, dlX3, dlX4, dlX5, dlX6, dlX7, dlX8);
        
       dlYPred(1:4)
        
        % Determine predicted classes.
        YPredBatch = onehotdecode(dlYPred, classes, 1);
        classesPredictions = [classesPredictions, YPredBatch];
                
        % Compare predicted and true classes
        Y = onehotdecode(dlY, classes, 1);
        classCorr = [classCorr YPredBatch == Y];      
    end
end
function [X1, X2, X3, X4, X5, X6, X7, X8, T] =...
    preprocessMiniBatch(dataX1, dataX2, dataX3, dataX4, dataX5, dataX6, dataX7, dataX8, dataT)
    
    % Preprocess predictors.
    X1 = cat(4, dataX1{:});
    X2 = cat(2, dataX2{:});
    X3 = cat(2, dataX3{:});
    X4 = cat(2, dataX4{:});
    X5 = cat(2, dataX5{:});
    X6 = cat(2, dataX6{:});
    X7 = cat(2, dataX7{:});
    X8 = cat(2, dataX8{:});
    % Extract label data from cell and concatenate.
    T = cat(2,dataT{1:end});
    
    % One-hot encode labels.
    T = onehotencode(T,1);
    % whos X1 X2 X3 X4 X5 X6 X7 X8 T
end

Thank you very much for your help !

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Milan Bansal on 5 Oct 2023

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/1749460-multimodal-deep-learning-resizing-a-layer-output#answer_1326164

Open in MATLAB Online

Hi Arthur CASSOU,

As per my understanding, you are facing an error while training the Neural Network with multiple inputs and concatenating these input layers to the pretrained model.

The output of the layer "pool5-drop_7x7_s1" of "GoogLeNet" has a dimension 1x1x1024 with dimension labels as "SSC", while the the new input layers have a dimension of 1 with dimension label "C" and therefore these layers cannot be concatenated.

To resolve this issue, kindly add a "flatten" layer after the "pool5-drop_7x7_s1" layer. The output of the "flatten" layer will be 1024 with dimension label (C), same as that of new feature input layers and therefore can be concatenated.

Please refer to the following code to add "flatten" layer to the network and then the "concatenationLayer" layer.

layers = googlenet();
layer1 = layers.Layers(141); % "pool5-drop_7x7_s1";
layer2 = layers.Layers(142); % 'loss3-classifier';
layers = disconnectLayers(layers, layer1.Name, layer2.Name);
% Connecting the flatten layer
flatLayer = flattenLayer('Name','flatten1');
layers = addLayers(layers,flatLayer)
layers = connectLayers(layers,layer1.Name,flatLayer.Name);
concatLayer = concatenationLayer(1, nbrFeatures+1, 'Name', 'concat');
layers = addLayers(layers, concatLayer);
layers = connectLayers(layers, flatLayer.Name, strcat(concatLayer.Name,"/in1"));
for k = 1:7
    featInputLayer = featureInputLayer(1,'Name',strcat('featuresInput', int2str(k+1)));
    layers = addLayers(layers, featInputLayer);
    layers = connectLayers(layers, featInputLayer.Name, strcat(concatLayer.Name,"/in", int2str(k+1)));
end

Please refer to the following documentation to learn more about "flatten" layer.

https://iwww.mathworks.com/help/deeplearning/ref/nnet.cnn.layer.flattenlayer.html

Hope it helps!

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Multimodal deep learning : resizing a layer output

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Multimodal deep learning : resizing a layer output

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments