Why is this autoencoder only predicting a single output regardless of input when using min-max scaling?

Question

Joseph Conroy on 11 Jul 2024

1
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/2136633-why-is-this-autoencoder-only-predicting-a-single-output-regardless-of-input-when-using-min-max-scali

Answered: Joseph Conroy on 16 Jul 2024

Accepted Answer: Joseph Conroy

Open in MATLAB Online

Key questions:

Why does a network predict a specific value for the output regardless of input as if the input data had no information relevant to prediction?
Why does replacing min-max scaling with standard scaling fix this, at least occassionally?

The problem background: I am trying to train a simple image autoencoder, but I keep getting networks that only output a single image regardless of the input. Taking the difference between each output image reveals they are all exactly the same. Googling this issue, I saw a stack overflow post that this often arises with improperly dimensioned loss functions. I also saw folks mentioning issues with using the sigmoid loss function for autoencoders, but the explanations as to why never surpass guesswork. I changed the scaling from min-max scaling to standard scaling and was able to obtain a network that breaks out of the single-prediction behavior, but without understanding why, I will have no recourse but trial-and-error if it breaks again.

Notes on dimensioning loss functions: When calculating the loss between a batch of images of shape [imgDim, imgDim, 1, batchSize] the mse loss function outputs a loss of dimension [1,1,1,batchSize], but this loss function has produced defective results under min-max scaling, such as the aforementioned degeneration to a single output, as well as an initial loss three orders of magnitude above the inputs and outputs scaled to the range [0,1]. To be clear, I don't mean the learning is unstable, I mean that the absolute values of the loss are absurd.

I tried to write my own loss function that reports a scalar value, but I encountered the same degeneration to a single prediction independent of input. I then wrote a version that reports an error tensor of the same shape as @mse, but this threw an error listed below, after the custom loss function in question.

% Version that reports a scalar
function meanAbsErr = myMae(prediction, target)
    meanAbsErr = mean(abs(flatten(prediction) - flatten(target)), 'all');
end
% Version that reports [1,1,1,batchSize]
function meanAbsErr = myMae(prediction, target)
    inDims = size(prediction);
    meanAbsErr = mean(abs(flatten(prediction) - flatten(target)), 1);
    outDims = ones(1,length(inDims)); outDims(end) = inDims(end);
    meanAbsErr = reshape(meanAbsErr, outDims);
end

Value to differentiate is non-scalar. It must be a traced real dlarray scalar.

Error in mathworksDebug>modelLoss (line 213)

[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);

Error in deep.internal.dlfeval (line 17)

[varargout{1:nargout}] = fun(x{:});

Error in deep.internal.dlfevalWithNestingCheck (line 19)

[varargout{1:nargout}] = deep.internal.dlfeval(fun,varargin{:});

Error in dlfeval (line 31)

[varargout{1:nargout}] = deep.internal.dlfevalWithNestingCheck(fun,varargin{:});

Error in mathworksDebug (line 134)

[loss,gradientsE,gradientsD] = dlfeval(@modelLoss,netE,netD,X,Ztarget);

Notes on scaling

I wrote a custom scaling function that executes the same behavior as rescale except that it reports the obtained extrema to use in scaling and de-scaling unseen data.

% Min-max scaling between [lb, ub]
function [scaled,smin,smax] = myRescale(varargin)
    datastruct = varargin{1}; lb = varargin{2}; ub = varargin{3};
    if length(varargin) <= 3
        smin = min(datastruct(:)); smax = max(datastruct(:));
    else
        smin = varargin{4}; smax = varargin{5};
    end
    scaled = (datastruct - smin) / (smax - smin) * (ub - lb) + lb;
end
% Invert scaling
function unscaled = myDescale(scaled, lb, ub, smin, smax)
    unscaled = (scaled + lb ) * (smax - smin) ./ (ub - lb) + smin;
end
% Converts the data to z-scores
function [standard, center, stddev] = myStandardize(varargin)
    datastruct = varargin{1};
    if length(varargin) == 1
        center = mean(datastruct(:)); stddev = std(datastruct(:));
    else
        center = varargin{2}; stddev = varargin{3};
    end
    standard = (datastruct - center) / stddev;
end
% Converts z-scores back to the data's scale
function destandard = myDestandardize(datastruct, center, stddev)
    destandard = datastruct * stddev + center;
end

In the following code, I have removed the validation set to reduce bloat.

% % I intend to regularize the latent space of this autoencoder to be a
% classifier once it can accomplish basic reconstruction. Made this note so
% it's clear what's going on with the custom losses and so forth.
training = digitTrain4DArrayData;
test = digitTest4DArrayData;
%% Scaling that does not work
% Min-max scaling
xlb = 0; xub=1;
[xTrain, xTrainMin, xTrainMax] = myRescale(training, xlb, xub);
xTest = myRescale(test, xTrainMin, xTrainMax);
%% Scaling that does work. Why?
% [xTrain, xTrainCenter, sTrainStd] = myStandardize(training);
% xTest = myStandardize(test, xTrainCenter, xTrainStd);
ntrain = size(xTrain,4); IMG_DIM = size(xTrain, 1);
N_CHANNELS=size(xTrain, 3);
numLatentChannels = 64;
imageSize = [28 28 1];
%% Layer definitions
% Encoder layer
layersE = [
    imageInputLayer(imageSize,Normalization="none")
    convolution2dLayer(3,32,Padding="same",Stride=2)
    reluLayer
    convolution2dLayer(3,64,Padding="same",Stride=2)
    reluLayer
    fullyConnectedLayer(numLatentChannels)
    tanhLayer(Name='latent')];
% Latent projection
projectionSize = [7 7 64]; enc_dim = projectionSize(1);
numInputChannels = imageSize(3);
% Decoder
layersD = [
    featureInputLayer(numLatentChannels)
    projectAndReshapeLayer(projectionSize)
    transposedConv2dLayer(3,64,Cropping="same",Stride=2)
    reluLayer
    transposedConv2dLayer(3,32,Cropping="same",Stride=2)
    reluLayer
    transposedConv2dLayer(3,numInputChannels,Cropping="same")
    sigmoidLayer(Name='Output')
    ];
netE = dlnetwork(layersE);
netD = dlnetwork(layersD);
%% Training Parameters
numEpochs = 150;
miniBatchSize = 20;
learnRate = 1e-3;
dsTrain = arrayDatastore(xTrain,IterationDimension=4);
numOutputs = 1;
mbq = minibatchqueue(dsTrain,numOutputs, ...
    MiniBatchSize = miniBatchSize, ...
    MiniBatchFormat="SSCB", ...
    MiniBatchFcn=@preprocessMiniBatch,...
    PartialMiniBatch="return");
%Initialize the parameters for the Adam solver.
trailingAvgE = [];
trailingAvgSqE = [];
trailingAvgD = [];
trailingAvgSqD = [];
%Calculate the total number of iterations for the training progress monitor
numIterationsPerEpoch = ceil(ntrain / miniBatchSize);
numIterations = numEpochs * numIterationsPerEpoch;
epoch = 0;
iteration = 0;
%Initialize the training progress monitor.
monitor = trainingProgressMonitor( ...
    Metrics="TrainingLoss", ...
    Info=["Epoch", "LearningRate"], ...
    XLabel="Iteration");
%% Training
while epoch < numEpochs && ~monitor.Stop
    epoch = epoch + 1;
    % Shuffle data.
    shuffle(mbq);
    % Loop over mini-batches.
    while hasdata(mbq) && ~monitor.Stop
        % Assess validation criterion
        iteration = iteration + 1;
    
        % Read mini-batch of data.
        X = next(mbq);
        % Evaluate loss and gradients.
        [loss,gradientsE,gradientsD] = dlfeval(@modelLoss,netE,netD,X);
        % Update learnable parameters.
        [netE,trailingAvgE,trailingAvgSqE] = adamupdate(netE, ...
            gradientsE,trailingAvgE,trailingAvgSqE,iteration,learnRate);
        [netD, trailingAvgD, trailingAvgSqD] = adamupdate(netD, ...
            gradientsD,trailingAvgD,trailingAvgSqD,iteration,learnRate);
        updateInfo(monitor, ...
            LearningRate=learnRate, ...
            Epoch=string(epoch) + " of " + string(numEpochs));
       recordMetrics(monitor,iteration, ...
            TrainingLoss=loss);
        monitor.Progress = 100*iteration/numIterations;
    end
end
%% Testing
dsTest = arrayDatastore(xTest,IterationDimension=4);
numOutputs = 1;
mbqTest = minibatchqueue(dsTest,numOutputs, ...
    MiniBatchSize = miniBatchSize, ...
    MiniBatchFcn=@preprocessMiniBatch, ...
    MiniBatchFormat="SSCB");
YTest = modelPredictions(netE,netD,mbqTest);
reconerr = mean(flatten(xTest-YTest),1);
figure
    histogram(reconerr)
    xlabel("Reconstruction Error")
    ylabel("Frequency")
    title("Test Data")
numImages = 64;
ndisplay = 10;
figure
    I = imtile(YTest(:,:,:,1:numImages));
    imshow(I)
    title("Reconstructed Images")
%% Functions
function [loss,gradientsE,gradientsD] = modelLoss(netE,netD,X)
% Forward through encoder.
Z = forward(netE,X);
% Forward through decoder.
Xrecon = forward(netD,Z);
% Calculate loss and gradients.
loss = regularizedLoss(Xrecon,X);
[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);
end
function loss = regularizedLoss(Xrecon,X)
% Image Reconstruction loss.
reconstructionLoss = mse(Xrecon, X);
% Combined loss.
loss = reconstructionLoss;
end
function Xrecon = modelPredictions(netE,netD,mbq)
Xrecon = [];
    % Loop over mini-batches.
    while hasdata(mbq)
        X = next(mbq);
    
        % Pass through encoder
        Z = predict(netE,X);
    
        % Pass through decoder to get reconstructed images
        XGenerated = predict(netD,Z);
    
        % Extract and concatenate predictions.
        Xrecon = cat(4,Xrecon,extractdata(XGenerated));
    end
end
function X = preprocessMiniBatch(Xcell)
% Concatenate.
X = cat(4,Xcell{:});
end
end

6 Comments
Show 4 older commentsHide 4 older comments

Matt J on 12 Jul 2024

Open in MATLAB Online

Your code does not run here in the online environment, see below.

clear, close all, clc
% % I intend to regularize the latent space of this autoencoder to be a
% classifier once it can accomplish basic reconstruction. Made this note so
% it's clear what's going on with the custom losses and so forth.
training = digitTrain4DArrayData;
test = digitTest4DArrayData;
%% Scaling that does not work
% Min-max scaling
xlb = 0; xub=1;
[xTrain, xTrainMin, xTrainMax] = myRescale(training, xlb, xub);
xTest = myRescale(test, xTrainMin, xTrainMax);
%% Scaling that does work. Why?
% [xTrain, xTrainCenter, sTrainStd] = myStandardize(training);
% xTest = myStandardize(test, xTrainCenter, xTrainStd);
ntrain = size(xTrain,4); IMG_DIM = size(xTrain, 1);
N_CHANNELS=size(xTrain, 3);
numLatentChannels = 64;
imageSize = [28 28 1];
%% Layer definitions
% Encoder layer
layersE = [
    imageInputLayer(imageSize,Normalization="none")
    convolution2dLayer(3,32,Padding="same",Stride=2)
    reluLayer
    convolution2dLayer(3,64,Padding="same",Stride=2)
    reluLayer
    fullyConnectedLayer(numLatentChannels)
    tanhLayer(Name='latent')];
% Latent projection
projectionSize = [7 7 64]; enc_dim = projectionSize(1);
numInputChannels = imageSize(3);
% Decoder
layersD = [
    featureInputLayer(numLatentChannels)
    projectAndReshapeLayer(projectionSize)
    transposedConv2dLayer(3,64,Cropping="same",Stride=2)
    reluLayer
    transposedConv2dLayer(3,32,Cropping="same",Stride=2)
    reluLayer
    transposedConv2dLayer(3,numInputChannels,Cropping="same")
    sigmoidLayer(Name='Output')
    ];
'projectAndReshapeLayer' is used in the following examples:
  Generate Synthetic Signals Using Conditional GAN
  Train Variational Autoencoder (VAE) to Generate Images
  Include Custom Layer in Network
  Train Generative Adversarial Network (GAN)
  Train Wasserstein GAN with Gradient Penalty (WGAN-GP)
  Generate Novel Radar Waveforms Using GAN 
netE = dlnetwork(layersE);
netD = dlnetwork(layersD);
%% Training Parameters
numEpochs = 150;
miniBatchSize = 20;
learnRate = 1e-3;
dsTrain = arrayDatastore(xTrain,IterationDimension=4);
numOutputs = 1;
mbq = minibatchqueue(dsTrain,numOutputs, ...
    MiniBatchSize = miniBatchSize, ...
    MiniBatchFormat="SSCB", ...
    MiniBatchFcn=@preprocessMiniBatch,...
    PartialMiniBatch="return");
%Initialize the parameters for the Adam solver.
trailingAvgE = [];
trailingAvgSqE = [];
trailingAvgD = [];
trailingAvgSqD = [];
%Calculate the total number of iterations for the training progress monitor
numIterationsPerEpoch = ceil(ntrain / miniBatchSize);
numIterations = numEpochs * numIterationsPerEpoch;
epoch = 0;
iteration = 0;
%Initialize the training progress monitor.
monitor = trainingProgressMonitor( ...
    Metrics="TrainingLoss", ...
    Info=["Epoch", "LearningRate"], ...
    XLabel="Iteration");
%% Training
while epoch < numEpochs && ~monitor.Stop
    epoch = epoch + 1;
    % Shuffle data.
    shuffle(mbq);
    % Loop over mini-batches.
    while hasdata(mbq) && ~monitor.Stop
        % Assess validation criterion
        iteration = iteration + 1;
        
        % Read mini-batch of data.
        X = next(mbq);
        % Evaluate loss and gradients.
        [loss,gradientsE,gradientsD] = dlfeval(@modelLoss,netE,netD,X);
        % Update learnable parameters.
        [netE,trailingAvgE,trailingAvgSqE] = adamupdate(netE, ...
            gradientsE,trailingAvgE,trailingAvgSqE,iteration,learnRate);
        [netD, trailingAvgD, trailingAvgSqD] = adamupdate(netD, ...
            gradientsD,trailingAvgD,trailingAvgSqD,iteration,learnRate);
        updateInfo(monitor, ...
            LearningRate=learnRate, ...
            Epoch=string(epoch) + " of " + string(numEpochs));
        recordMetrics(monitor,iteration, ...
            TrainingLoss=loss);
        monitor.Progress = 100*iteration/numIterations;
    end
end
%% Testing
dsTest = arrayDatastore(xTest,IterationDimension=4);
numOutputs = 1;
mbqTest = minibatchqueue(dsTest,numOutputs, ...
    MiniBatchSize = miniBatchSize, ...
    MiniBatchFcn=@preprocessMiniBatch, ...
    MiniBatchFormat="SSCB");
YTest = modelPredictions(netE,netD,mbqTest);
reconerr = mean(flatten(xTest-YTest),1);
figure
histogram(reconerr)
xlabel("Reconstruction Error")
ylabel("Frequency")
title("Test Data")
numImages = 64;
ndisplay = 10;
figure
I = imtile(YTest(:,:,:,1:numImages));
imshow(I)
title("Reconstructed Images")
%% Functions
function [loss,gradientsE,gradientsD] = modelLoss(netE,netD,X)
% Forward through encoder.
Z = forward(netE,X);
% Forward through decoder.
Xrecon = forward(netD,Z);
% Calculate loss and gradients.
loss = regularizedLoss(Xrecon,X);
[gradientsE,gradientsD] = dlgradient(loss,netE.Learnables,netD.Learnables);
end
% Min-max scaling between [lb, ub]
function [scaled,smin,smax] = myRescale(varargin)
datastruct = varargin{1}; lb = varargin{2}; ub = varargin{3};
if length(varargin) <= 3
    smin = min(datastruct(:)); smax = max(datastruct(:));
else
    smin = varargin{4}; smax = varargin{5};
end
scaled = (datastruct - smin) / (smax - smin) * (ub - lb) + lb;
end
% Invert scaling
function unscaled = myDescale(scaled, lb, ub, smin, smax)
unscaled = (scaled + lb ) * (smax - smin) ./ (ub - lb) + smin;
end
% Converts the data to z-scores
function [standard, center, stddev] = myStandardize(varargin)
datastruct = varargin{1};
if length(varargin) == 1
    center = mean(datastruct(:)); stddev = std(datastruct(:));
else
    center = varargin{2}; stddev = varargin{3};
end
standard = (datastruct - center) / stddev;
end
% Converts z-scores back to the data's scale
function destandard = myDestandardize(datastruct, center, stddev)
destandard = datastruct * stddev + center;
end
function loss = regularizedLoss(Xrecon,X)
% Image Reconstruction loss.
reconstructionLoss = mse(Xrecon, X);
% Combined loss.
loss = reconstructionLoss;
end
function Xrecon = modelPredictions(netE,netD,mbq)
Xrecon = [];
% Loop over mini-batches.
while hasdata(mbq)
    X = next(mbq);
    
    % Pass through encoder
    Z = predict(netE,X);
    
    % Pass through decoder to get reconstructed images
    XGenerated = predict(netD,Z);
    
    % Extract and concatenate predictions.
    Xrecon = cat(4,Xrecon,extractdata(XGenerated));
end
end
function X = preprocessMiniBatch(Xcell)
% Concatenate.
X = cat(4,Xcell{:});
end

Joseph Conroy on 12 Jul 2024

Ah nuts. I adapted the code from the Variational Autoencoder example. and forgot their supporting scripts. I attached the files as scripts. Again, I apologize.

Sign in to comment.

Sign in to answer this question.

Answer 1

Joseph Conroy on 16 Jul 2024

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/2136633-why-is-this-autoencoder-only-predicting-a-single-output-regardless-of-input-when-using-min-max-scali#answer_1486337

Loss Scaling: The @mse function does not automatically normalize when handling multi-dimensional inputs and outputs. If you want the loss to be on a smaller scale regardless of the data's dimension, use l2loss(Y, targets, 'NormalizationFactor', 'all-elements').

Single Predictions: The custom autoencoder training loop I posted does not allow gradients to backpropagate from the decoder to the encoder. I am still unclear as to what the appropriate syntax is to accomplish this in a custom training loop. It is simple to do using the trainnet function, provided that you do not need to access the latent variables for any predictions or loss functions. If you are in the same situation as I am and would like to incorporate the latent space into the loss function, keep an eye on this updated question.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Why is this autoencoder only predicting a single output regardless of input when using min-max scaling?

6 Comments
Show 4 older commentsHide 4 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Why is this autoencoder only predicting a single output regardless of input when using min-max scaling?

6 Comments Show 4 older commentsHide 4 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

6 Comments
Show 4 older commentsHide 4 older comments

0 Comments
Show -2 older commentsHide -2 older comments