CUDA_ERROR_LAUNCH_FAILED when training large networks
Show older comments
I have trained networks (trainNetwork()) on my GPU with MATLAB R2018b for over a year without any issues.
Since when I upgraded to MATLAB R2020b, I've only been able to train small networks. The same script that would run flawlessly in R2018b with an arbitrarily large number of units (e.g., n = 2000), in R2020b works up until n = 50, and then crashes for (n > 100).
The reported error is typically:
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_FAILED
Error using trainNetwork (line 183)
Unexpected error calling cuDNN: CUDNN_STATUS_EXECUTION_FAILED.
Error in RNNprediction (line 170)
net = trainNetwork({traind.x}, {traind.y}, layers, options);
The crash happens between the 2nd and 5th training iteration. When this happens, I have to restart MATLAB in order to be able to do any training at all since reset(gpuDevice) also fails and returns:
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_LAUNCH_FAILED
Error using parallel.gpu.CUDADevice/reset
An unexpected error occurred during CUDA execution. The CUDA error was:
all CUDA-capable devices are busy or unavailable
Training of the same network runs smoothly on CPU (although very slowly).
NOTE: I have already increased the WDDM TDR Delaty to 60, but nothing has changed. I have also tried disabling altoghether the TDR with no success.
Here are some CUDA properties:
>> gpuDevice
ans =
CUDADevice with properties:
Name: 'GeForce RTX 2070'
Index: 1
ComputeCapability: '7.5'
SupportsDouble: 1
DriverVersion: 10.2000
ToolkitVersion: 10.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5899e+09
MultiprocessorCount: 36
ClockRateKHz: 1620000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Accepted Answer
More Answers (0)
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!