train a deep learning model (resnet-50 network) on a remote HPC cluster

I am trying to run a code, which uses a pre-trained ResNet-50 network, on a remote HPC cluster by submitting batch GPU jobs. I get the following error at this line:
net = resnet50
Error using resnet50
resnet50 requires the Deep Learning Toolbox Model for ResNet-50 Network support
package for the pretrained weights. To install this support package, use the <a
href="matlab:
matlab.addons.supportpackage.internal.explorer.showSupportPackages('RESNET50',
'tripwire')">Add-On Explorer</a>. To obtain the untrained layers, use
resnet50('Weights','none'), which does not require the support package.
It seems the Deep Learning Toolbox Model for ResNet-50 Network add-on is not installed on the cluster. How can I install this add-on on it?
Thanks

 Accepted Answer

Just to confirm, you're sending batch jobs to a HPC cluster that has MATLAB parallel server installed?
If so, one option to try would be:
  1. save resnet50 as as MAT file
  2. attach the MAT file when submitting the job
  3. have a load MAT file command in the function you're submitting.

1 Comment

Brilliant! Thank you for your answer. It solved my problem.
Yes, the HPC cluster has MATLAB paraller server installed.
In your point 1, you said "save resnet50 as a MAT file". I was not sure what you mean by "save resnet50". What I did was just I called it in MATLAB on my local machine
basenet = resnet50;
then saved it as
save('basenet.mat','basenet');
and then transferred this MAT file into the remote cluster and loaded it there.
Thanks

Sign in to comment.

More Answers (0)

Products

Release

R2022a

Asked:

on 14 Oct 2022

Commented:

on 14 Oct 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!