Help with train on GPU with large data sets

11 views (last 30 days)
I'm trying to train a function fitting NN for regression. My inputs are 200X844000 and my targets are 6X844000. This data set trains in total on CPU, but not on GPU. I have a 1080 as follows
Name: 'GeForce GTX 1080'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 9.2000
ToolkitVersion: 9
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5899e+09
AvailableMemory: 7.0088e+09
MultiprocessorCount: 20
ClockRateKHz: 1835000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
The CPU training code that works is
load('GenTog_Inputs_Zeros_Filter.mat', 'Inputs')
load('GenTog_Targets_Zeros_Filter.mat', 'Targets')
% Create a Fitting Network
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize);
% Set Training Function
net.trainFcn = 'trainscg' ;
% Train the Network
net = configure(net, Inputs, Targets);
[net,tr] = train(net,Inputs, Targets);
Works. Easy. Now the GPU code.
clear all
clc
load('GenTog_Inputs_Zeros_Filter.mat', 'Inputs');
load('GenTog_Targets_Zeros_Filter.mat', 'Targets');
Inputs = single(Inputs);
Targets = single(Targets);
%%Make variably sized training data
variableSize = 325000
TestInputs1 = Inputs(1:200, 1:variableSize);
TestTargets1 = Targets(1:6, 1:variableSize);
Make Data GPU Compatible
[NewInputs1] = removeconstantrows(TestInputs1);
GPUTargets1 = nndata2gpu(TestTargets1);
GPUInputs1 = nndata2gpu(NewInputs1);
%%Train
neurons = 10
net = fitnet(neurons)
net2 = configure(net, NewInputs1, TestTargets1);
[net2, tr] = train(net2, GPUInputs1, GPUTargets1, 'UseGPU', 'yes')
The above GPU code works for variableSize array 349504 and below, and for any range of my data. Since I'm well below the maximum allowed for this card, and changing the data from double to single didnt offer any more data to be trained on, I dont think it is a memory problem. Also, I have the kernel execution timeout turned off too so thats not the problem.
The exact error is
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
> In network/train (line 204)
In GPUTest (line 20)
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
> In network/train (line 204)
In GPUTest (line 20)
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
> In network/train (line 204)
In GPUTest (line 20)
Error using gpuArray/gather
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
Error in nnGPU.perfsGrad (line 23)
Perfs_and_N = gather(hints.Perfs_and_N);
Error in nnCalcLib/perfsGrad (line 294)
lib.calcMode.perfsGrad(calcNet,lib.calcData,lib.calcHints);
Error in trainscg>initializeTraining (line 147)
[worker.perf,worker.vperf,worker.tperf,worker.gWB,worker.gradient] = calcLib.perfsGrad(calcNet);
Error in nnet.train.trainNetwork>trainNetworkInMainThread (line 28)
worker = localFcns.initializeTraining(archNet,calcLib,calcNet,tr);
Error in nnet.train.trainNetwork (line 16)
[archNet,tr] = trainNetworkInMainThread(archNet,rawData,calcLib,calcNet,tr,feedback,localFcns);
Error in trainscg>train_network (line 141)
[archNet,tr] = nnet.train.trainNetwork(archNet,rawData,calcLib,calcNet,tr,localfunctions);
Error in trainscg (line 51)
[out1,out2] = train_network(varargin{2:end});
Error in network/train (line 369)
[net,tr] = feval(trainFcn,'apply',net,data,calcLib,calcNet,tr);
Error in GPUTest (line 20)
[net2, tr] = train(net2, GPUInputs1, GPUTargets1, 'UseGPU', 'yes')
The GPU hangs indefinitely after that and trying to do anything else or close the program results in
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS
To the best of my knowledge this seems like a similar problem to the bug found at the following link. https://www.mathworks.com/matlabcentral/answers/398236-cuda-unexpected-error-for-nndata2gpu I'm posting in hopes of helping contribute to debug this issue, as well as in hope someone will just point out a mistake of my own. I would really love to use this card to the maximum. I know some PCA will help reduce the input size but the above code can be modified to try 100X349505 and that doesn't help either. It is definitely about sample size.
Thanks for any and all help in this matter
  1 Comment
Joss Knight
Joss Knight on 20 Aug 2018
If this is the same as that other issue, then rest assured that bug has been fixed and the fix will appear in a future version. The only work-around is to use less GPU memory, by reducing the batch size, model size, or other such parameters.
If you can rework your problem using the newer Deep Learning tools you are less likely to run into these problems:

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!