MATLAB Answers

0

Parallel Computing in Neural Networks is not using all the workers in 2018b?

Asked by Eric Klinefelter on 8 Jan 2019
Latest activity Commented on by Walter Roberson
on 13 Jan 2019
There was a similar question here, but I'm unable to get the parallel pool to use my CPU cores when using a GPU. My command is:
my_net = train(my_net,Xs,Ts,Xi,Ai,'useParallel','yes','useGPU','yes','showResources','yes');
Yet when starting the pool the response is:
NOTICE: Jacobian training not supported on GPU. Training function set to TRAINSCG.
Computing Resources:
Parallel Workers:
Worker 1 on w541, GPU device #1, Quadro K1100M
Worker 2 on w541, Unused
Worker 3 on w541, Unused
Worker 4 on w541, Unused
Worker 5 on w541, Unused

  0 Comments

Sign in to comment.

Products


Release

R2018b

1 Answer

Answer by Joss Knight
on 9 Jan 2019

I believe this is the designed behaviour. If multiple workers were to share the same GPU, you would get a performance reduction, not an improvement.

  4 Comments

Show 1 older comment
No, the algorithms do not attempt to use the gpu as just an additional core that happens to run quickly . The algorithms want to submit as much of the work to gpu as can be done effectively instead of using cpu.
But the message first is telling you that you configured or defaulted net.trainFcn to jacobian and that is not compatible with UseGpu so it has switched you to trainscg instead .
I am not familiar with the implementation for shallow networks, but for deep learning, even if you filled the GPU memory and gave each CPU the minimum amount of work, the GPU would end up waiting for the CPUs to finish to synchronize each iteration, so the CPUs would just slow things down.
I notice that there is no second GPU being allocated. That leads me to suspect that the Quadro K1100M might be the only GPU in the system. I wonder if it is driving a display? If it is then it would be in WDDM mode, in which case it would need to have short work timeouts, making it necessary to synchronize with the CPUs often compared to the likely total training time. If it is not driving a display and is in TCC mode then that factor is reduced... but of course the time it spends dedicated to processing work from one CPU would be time it was not processing work from a difference CPU.

Sign in to comment.