Parallrl calculations for Deep learning Toolbox
2 views (last 30 days)
I have problem with parallel calculations for YOLO detector based on Resnet-50 network.
For the learning task, I use a virtual machine with 32 cores without connected GPU. In the settings of Parallel preferences, I picked 8 as the number of workers.
After running the code, I get the following error:
Error using nnet.internal.cnn.DistributedDispatcher (line 79)
'nnet.internal.cnn.GeneralDatastoreDispatcher' does not support order-preserving distribution.
Error in nnet.internal.cnn.DataDispatcherFactory>iCreateDistributedDispatcherIfRequired (line 204)
dispatcher = nnet.internal.cnn.DistributedDispatcher( dispatcher, executionSettings.workerLoad, retainDataOrder );
Error in nnet.internal.cnn.DataDispatcherFactory.createDataDispatcherMIMO (line 176)
dispatcher = iCreateDistributedDispatcherIfRequired(...
Error in vision.internal.cnn.trainNetwork>iCreateTrainingDataDispatcher (line 180)
dispatcher = nnet.internal.cnn.DataDispatcherFactory.createDataDispatcherMIMO( ...
Error in vision.internal.cnn.trainNetwork (line 34)
trainingDispatcher = iCreateTrainingDataDispatcher(ds, mapping, trainedNet,...
Error in trainYOLOv2ObjectDetector>iTrainYOLOv2 (line 391)
[yolov2Net, info] = vision.internal.cnn.trainNetwork(...
Error in trainYOLOv2ObjectDetector (line 187)
[net, info] = iTrainYOLOv2(ds, lgraph, params, mapping, options, checkpointSaver);
Error in YOLO_Multi_ver (line 83)
[detector,info] = trainYOLOv2ObjectDetector(preprocessedTrainingData,lgraph,options);
My traning options is
trainingOptions('sgdm','MiniBatchSize', 16, 'InitialLearnRate',1e-3, 'MaxEpochs',20, 'CheckpointPath', tempdir, 'Shuffle','never','ExecutionEnvironment', 'parallel');
Joss Knight on 19 Apr 2020
Sorry about this not-very-good error, which should be fixed in the current release. What it means is that 'Shuffle', 'never' is not supported for your input data when training in parallel, because when the data is distributed to your GPUs there is no way to ensure that it is divided in such a way that the exact same sequence of observations is read. To fix it, change to 'Shuffle', 'once'.