It’s the performance and use of the resources installed on the Computer (Amazon Cloud EC2 instance in our case).
I am using a p3.8xlarge instance in EC2 on awamzon web server – basically it means I am using 4 GPUs V100,
I am training a neural network.
mdl(i).Net = trainNetwork(trainData(:, :, :, 1: itStep: end), trainLabels(1: itStep: end, :), layers, options);
in options I define 'multi-gpu'
I also defined 'parallel' and tried to play with number of workers but all I see is just more processes waiting in the GPU queue on nvidia-smi.
For some reason I see that all GPU are working (see GPU.png) but for limited amount of time (very high usage for 3 seconds and then drops to 0% for 10 seconds at least.
I looked at the htop information(htop.jpg), I see that not all threads of the CPU are in use so that is the bottleneck (I think?)
I have a xeon processor on this instance with 32 cores (16 physical, 32 logical)
When I try to utilize all threads through local profile (profile local pool.png) it seems like it still doesn’t respond .
I get more workers because of it (CPU ?), but the GPUs still doesn't seem to improve
Tried to increase batch size - but at some point the GPU is out of memory, so that's not the problem.
How do i utilize all cores of the CPU to transfer data to the GPUs?
I read somewhere that you can also load the data to the pool itself? will that help?
I scanned these already:
Would appreciate your help.