Is there GPU support for the MatLab GitHub BERT Model?

1 view (last 30 days)
MatLab does not seem to natively support models like BERT, but there is a Gihub repository where pre-trained BERT models can be loaded.
However, to me this seems a little "workaorundy" and totally side-stepped from the standard architecture and workflow that the deep learning toolbox brings to MatLab. As painful as this is (for now I can live with this), my main problem the following:
I was not able to figure out how to use that code --for instance, using the pretrained BERT or FinBert-- with my GPU (GPU works, MatLab finds it, etc...). Inferencing on a relatively small dataset takes ages (>25 mins) compared to ~3 mins with GPU using a similar model and identical dataset in Tensorflow.
Help would be much appreciated. Thanks.
  2 Comments
Walter Roberson
Walter Roberson on 21 Feb 2022
in FineTuneBERT.m did you experiment with changing
mbqTrain = minibatchqueue(cdsTrain,2,...
"MiniBatchSize",miniBatchSize, ...
"MiniBatchFcn",@(X,Y) preprocessMiniBatch(X,Y,paddingValue,maxSequenceLength), ...
"PartialMiniBatch","discard");
to include 'OutputEnvironment',{'gpu','cpu'} ?
Bijan Sahamie
Bijan Sahamie on 25 Feb 2022
Thank you for your reply. The matter seems to be resolved from my perspective. What confused me is that when using the Finbert-script for sentiment analysis (SentimentAnalysisWithFinBERT.m), that the inferencing that is used there, namely
[sentiment,scores] = finbert.sentimentModel(X,mdl.Parameters)
goes to the CPU (for whatever reason). From that I assumed that there might be no GPU support (since with models of this size you would use GPU by default unless you have a very good reason not to). I tried the fine tuning scripts and the function calls seem to be done differently there, as they indeed utilize GPU (checked with nvtop and htop where loads go).

Sign in to comment.

Accepted Answer

David Willingham
David Willingham on 24 Feb 2022
Hi Bijan,
Did walter's comment help speed up your training? What version of MATLAB are you using?
On your comments regarding the Transformer models implementation. We currently don't have the inbuilt layers to support transformers, however the flexibility of the framework allows for users to create their own model functions when inbuilt layers don't exist. For more information on this, see this page Train Deep Learning in MATLAB. As you've pointed out though, this implementation does require more work to achieve the same benefits as compared to have full layer support. For reference we are actively looking at supporting more layers for Transformers in a future release.
  2 Comments
Bijan Sahamie
Bijan Sahamie on 25 Feb 2022
Hi David,
I just responded to Walter's response. Thank you for the help. Since you bring up the point of supporting additional layers and future extensions. Here are some thoughts:
  • I think it would be great if the set of natively supported, pre-trained Model zoo would be extended to also include more NLP models. So far, you seem to be more focused on Computer Vision. Although I appreciate the standard BERT, there are important and more recent variants that might be interesting to MatLab users.
  • The importer functions to import Tensorflow models seem to lack important functionality, at least for the NLP space. I was not able to import a pre-trained and fine-tuned BERT with (importTensorFlowNetwork). I got an error message that subclassed constructions are not supported yet.
  • I would highly appreciate if you could support additional layers natively, especially Transformers. These are now standard in the NLP-space (for a couple of years now actually).
What I noticed is that training in MatLab with the provided custom training loop is much slower than taking the same pretrained model with same dataset in Tensorflow (and identical settings regarding Batchsizes, etc.). For the time difference: Same configuration and setup takes ~6-7 Minutes in Tensorflow and ~45 Minutes with the cutstom loop design you provided. What I noticed from watching the GPU loads during training is a sawtooth - shaped curve that striked me as a little odd (comparing to Tensorflow). I haven't taken the time to compare the behaviour with native MatLab costructions. But, what I think is that with this custom loop the machine constantly switches between the kernel functions that are called for training (I guess they are implemented in C) and the MatLab scripts. Could that be the cause?
Why I writing all this: If my observation/suspicion is correct, then this would be a big motivation for native support in MatLab. A multiple-factor slower training might make the use of training in MatLab not feasible and if then also model transfers from other frameworks don't work, then MatLab as a platform for NLP applications would not be feasible entirely.
I really appreciate the work MathWorks does with MatLab and the quality you provide. With recent additions like Experiment manager I really think you have a competivite edge for AI in applications.
David Willingham
David Willingham on 27 Feb 2022
Hi Bijan,
Thanks for spending the time to give your comprehensive feedback. Whilst I can't offer an immediate solution to all of the areas you have listed, what I can state is our development team is actively working on: adding more layer support, improving training performance and enabling our importers to import more networks.

Sign in to comment.

More Answers (0)

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!