Different classification results for varaying miniBatchSize?

4 views (last 30 days)
Hello,
I am training an LSTM network and then using the classify function to predict classes, however, when changing the MiniBatchSize in the classify function, the output results change as well, which should not be the case according to my understanding. In the documentation of the MiniBatchSize property of the classify function, it is only stated that it is faster to compute predictions when choosing a larger MiniBatchSize. So is this a bug? or am I missing on something?

Answers (1)

Viren Gupta
Viren Gupta on 28 Sep 2018
Having a larger miniBatchSize helps in faster predictions. But the results can also change. LSTM accepts all the sequences in a single mini-batch to be of same size. Hence to achieve this, padding is done to make all sequences in a single mini-batch to be of same size. Hence varying the mini-batch size at prediction time for LSTMs can change the results depending on how much padding is applied to the test sequences. Depending on the mini-batch size, the amount of padding needed in each mini-batch of sequences can vary and therefore result in different classification results.
The same happens at training time, so as general advice it's good to keep the same mini-batch size for training and testing if possible.For more information on how padding works and how one can minimize its effect please see : padding in lstm.
  1 Comment
Abolfazl Nejatian
Abolfazl Nejatian on 10 Jun 2023
Dear Viren,
I hope this email finds you well. I am currently working on a complex neural network architecture that combines a hybrid GoogleNet with an LSTM layer. My goal is to train this model using a large dataset consisting of over 4 million images. During the training phase, I have found that utilizing a larger mini-batch size significantly improves the speed and coverage of the training process.
However, I have encountered an issue during the testing and real-time classification phase. In these stages, I aim to classify individual samples that represent the latest state of the FOREX markets. To achieve this, I need to classify each sample separately rather than using a mini-batch. Surprisingly, I have observed substantial differences in the classification results compared to the training phase.
Upon investigating this matter, I learned that the varying mini-batch size during prediction can lead to differences in classification outcomes. This can be attributed to the fact that LSTM requires uniform sequence lengths within a mini-batch, resulting in the application of padding to adjust the sequence sizes. Consequently, the amount of padding can differ depending on the mini-batch size, leading to discrepancies in the classification results.
While I understand that maintaining a consistent mini-batch size for both training and testing is generally recommended, my specific requirements necessitate the classification of individual samples in real-time. I would greatly appreciate your expert guidance on how to address this situation effectively, considering the unique characteristics of my network architecture and dataset.
Thank you for your time and support. I look forward to your valuable insights.

Sign in to comment.

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!