Help with train on GPU with large data sets

Question

Harley Edwards on 19 Aug 2018

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/415423-help-with-train-on-gpu-with-large-data-sets

Commented: Joss Knight on 20 Aug 2018

I'm trying to train a function fitting NN for regression. My inputs are 200X844000 and my targets are 6X844000. This data set trains in total on CPU, but not on GPU. I have a 1080 as follows

                      Name: 'GeForce GTX 1080'
                     Index: 1
         ComputeCapability: '6.1'
            SupportsDouble: 1
             DriverVersion: 9.2000
            ToolkitVersion: 9
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 8.5899e+09
           AvailableMemory: 7.0088e+09
       MultiprocessorCount: 20
              ClockRateKHz: 1835000
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 0
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1
   The CPU training code that works is
load('GenTog_Inputs_Zeros_Filter.mat', 'Inputs')
load('GenTog_Targets_Zeros_Filter.mat', 'Targets')
% Create a Fitting Network
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize);
% Set Training Function
net.trainFcn = 'trainscg' ;
% Train the Network
net = configure(net, Inputs, Targets);
[net,tr] = train(net,Inputs, Targets);

Works. Easy. Now the GPU code.

clear all
  clc 
  load('GenTog_Inputs_Zeros_Filter.mat', 'Inputs');
  load('GenTog_Targets_Zeros_Filter.mat', 'Targets');
  Inputs = single(Inputs);
  Targets = single(Targets);
  %%Make variably sized training data
  variableSize = 325000
  TestInputs1 = Inputs(1:200, 1:variableSize);
  TestTargets1 = Targets(1:6, 1:variableSize);

Make Data GPU Compatible

  [NewInputs1] = removeconstantrows(TestInputs1);
  GPUTargets1 = nndata2gpu(TestTargets1);
  GPUInputs1 = nndata2gpu(NewInputs1);
  %%Train
  neurons = 10
  net = fitnet(neurons)
  net2 = configure(net, NewInputs1, TestTargets1);
  [net2, tr] = train(net2, GPUInputs1, GPUTargets1, 'UseGPU', 'yes')

The above GPU code works for variableSize array 349504 and below, and for any range of my data. Since I'm well below the maximum allowed for this card, and changing the data from double to single didnt offer any more data to be trained on, I dont think it is a memory problem. Also, I have the kernel execution timeout turned off too so thats not the problem.

The exact error is
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS 
> In network/train (line 204)
  In GPUTest (line 20) 
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS 
> In network/train (line 204)
  In GPUTest (line 20) 
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS 
> In network/train (line 204)
  In GPUTest (line 20) 
Error using gpuArray/gather
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
Error in nnGPU.perfsGrad (line 23)
Perfs_and_N = gather(hints.Perfs_and_N);
Error in nnCalcLib/perfsGrad (line 294)
                lib.calcMode.perfsGrad(calcNet,lib.calcData,lib.calcHints);
Error in trainscg>initializeTraining (line 147)
[worker.perf,worker.vperf,worker.tperf,worker.gWB,worker.gradient] = calcLib.perfsGrad(calcNet);
Error in nnet.train.trainNetwork>trainNetworkInMainThread (line 28)
worker = localFcns.initializeTraining(archNet,calcLib,calcNet,tr);
Error in nnet.train.trainNetwork (line 16)
    [archNet,tr] = trainNetworkInMainThread(archNet,rawData,calcLib,calcNet,tr,feedback,localFcns);
Error in trainscg>train_network (line 141)
[archNet,tr] = nnet.train.trainNetwork(archNet,rawData,calcLib,calcNet,tr,localfunctions);
Error in trainscg (line 51)
            [out1,out2] = train_network(varargin{2:end});
Error in network/train (line 369)
    [net,tr] = feval(trainFcn,'apply',net,data,calcLib,calcNet,tr);
Error in GPUTest (line 20)
[net2, tr] = train(net2, GPUInputs1, GPUTargets1, 'UseGPU', 'yes')

The GPU hangs indefinitely after that and trying to do anything else or close the program results in

Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS

To the best of my knowledge this seems like a similar problem to the bug found at the following link. https://www.mathworks.com/matlabcentral/answers/398236-cuda-unexpected-error-for-nndata2gpu I'm posting in hopes of helping contribute to debug this issue, as well as in hope someone will just point out a mistake of my own. I would really love to use this card to the maximum. I know some PCA will help reduce the input size but the above code can be modified to try 100X349505 and that doesn't help either. It is definitely about sample size.

Thanks for any and all help in this matter

1 Comment
Show -1 older commentsHide -1 older comments

Joss Knight on 20 Aug 2018

If this is the same as that other issue, then rest assured that bug has been fixed and the fix will appear in a future version. The only work-around is to use less GPU memory, by reducing the batch size, model size, or other such parameters.

If you can rework your problem using the newer Deep Learning tools you are less likely to run into these problems:

https://uk.mathworks.com/solutions/deep-learning.html

Sign in to comment.

Sign in to answer this question.