GPU matrix computations: gpuDevice is slow, Matlab halts. Linux Ubuntu 16.04, Matlab 2016b

I am running Matlab 2016b under Linux Ubuntu 2016b. It takes more than 200 s to execute gpuDevice. Then, I am calculating eig of matrices 2400x2400. After some time, computation halts. It can be after 10 iterations, or ~100. Any help gratefully received.
>> tic, gpuDevice, toc
ans =
CUDADevice with properties:
Name: 'GeForce GTX 1080'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 8
ToolkitVersion: 7.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5049e+09
AvailableMemory: 7.7663e+09
MultiprocessorCount: 20
ClockRateKHz: 1733500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Elapsed time is 203.871746 seconds.
====================================
In the code N_2=800
-----------
Q_ = gpuArray.linspace(1.0,1.8, 1601);
Omega_cell=cell(size(Q_));
N_3 = N_2-1;
I=1:N_3;
J=2:N_2;
M = zeros(3*N_3);
MA = zeros(3*N_3);
M(I,I) = m*(OM_(J,J));
M(I,I+N_3) = 2i*(OM_(J,J));
M(I+N_3,I) = -1i*(KP2OM_(J,J));
M(I+N_3,I+N_3) = m*(OM_(J,J));
M(I+2*N_3,I) = -1i*(RSR_(J,J) + S0s_(J,J));
M(I+2*N_3,I+N_3) = m*(S0R_(J,J));
M(I+2*N_3,I+2*N_3) = m*(OM_(J,J));
MA(I,I)=A_(J,J);
MA(I+N_3,I+N_3)=A_(J,J);
MA(I+2*N_3,I+2*N_3)=A_(J,J);
MA = gpuArray(MA);
V2Ss_ = gpuArray(V2Ss_(J,J));
V2SR_ = gpuArray(V2SR_(J,J));
B_ = gpuArray(B_(J,J));
Bs_ = gpuArray(Bs_(J,J));
M = gpuArray(M);
for jm = 1:numel(Q_),
Q = Q_(jm);
disp(num2str([Q], 'Q = %.4f'));
% Matrix XX
M(I,I+2*N_3) = -1i*(Bs_ + V2Ss_*Q^2);
M(I+N_3,I+2*N_3) = m*(B_ + V2SR_*Q^2);
[FF,W] = eig(MA\M);
FF=gather(FF);
W =gather(W);
% [FF, W] = eig(M, MA);
omega = diag(W);
% Plot
plot(real(omega)/m, imag(omega), mkr)
title(num2str([N Q_(jm)], 'N=%d, Q=%.3f'));
axis([.1 0.3 0 .01]); grid on; hold on;
drawnow;
%
Imax = find(imag(omega)>-1e-4 & real(omega)/m>-1 & real(omega)/m<2);
DD = [real(omega(Imax)) imag(omega(Imax))];
% FFs = FF(:,Imax);
Omega_cell{jm} = DD;
%
if floor(Q*200)==Q*200
save('Om_cell_m2_800_R20_g8_beta10_gpu', 'Omega_cell');
end
end

3 Comments

I tried without success:
export CUDA_CACHE_MAXSIZE=2147483647
export CUDA_CACHE_DISABLE=0
Can I just check - you ran this command, and then ran MATLAB from the same terminal? Can you double-check by showing us the results of getenv CUDA_CACHE_MAXSIZE when run in MATLAB?
Also, make sure you do this at least twice, once to actually cache the code, and then again to make sure the cache is being used. It's only the second time that you will see the improvement in library load time.
Can you repost your example code in a runnable form? At the moment it cannot be run because some variables are undefined.

Sign in to comment.

Answers (0)

Categories

Find more on MATLAB in Help Center and File Exchange

Asked:

on 20 Apr 2017

Commented:

on 26 Apr 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!