Accuracy problems using mldivide on GPU
Show older comments
Hello all,
I want to solve a system of complex linear equations A*x=B on the GPU. However, when I compare the solution with the results of the same computation on the CPU, there seems to be a relatively large error.
Here is a minimal working example:
rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
for k = length(A):-1:1
x_CPU(:,:,k) = A(:,:,k) \ B(:,:,k);
end
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagefun(@mldivide, g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all'); % relative error
disp(max_error);
On my machine, I get
max_error = 1.0197e+03
which means that the results are unusable.
In a last-ditch attempt I tried using complex() both in gpuArray() and in pagefun(), as described here, but it changes nothing.
I am running Matlab R2021a with the latest updates. GPU driver is also the latest version (527.27).
ans =
CUDADevice with properties:
Name: 'Quadro P2200'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 5.3685e+09
AvailableMemory: 4.3222e+09
MultiprocessorCount: 10
ClockRateKHz: 1493000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
I would be very grateful for ideas and suggestions on how to solve the problem. Thanks in advance!
1 Comment
Bruno Luong
on 9 Jan 2023
Again the issue only occurs with A complex and not square.
Accepted Answer
More Answers (2)
Joss Knight
on 7 Jan 2023
Edited: Joss Knight
on 7 Jan 2023
The answer is not less accurate, it is just as inaccurate. What you have are 60001 massively overdetermined systems generated by completely random data. The least squares solution to this is going to be all over the place, in other words, there will be no good answer. If you look at the relative residuals you'll find there's little difference in the (lack of) success of either CPU or GPU.
%%
rng('default');
%% CPU
A = randn(60,2,60001) + 1i*randn(60,2,60001);
B = randn(60,2,60001) + 1i*randn(60,2,60001);
x_CPU = pagemldivide(A,B);
%% GPU
g_A = gpuArray(A);
g_B = gpuArray(B);
g_x = pagemldivide(g_A, g_B);
x_GPU = gather(g_x);
%% comparison
max_error = max(abs((x_GPU - x_CPU) ./ x_CPU), [], 'all'); % relative error
disp(max_error);
res = abs(sqrt(sum((pagemtimes(A, x_CPU)-B).^2,1)));
g_res = abs(sqrt(sum((pagemtimes(g_A, g_x)-g_B).^2,1)));
rhsNorm = abs(sqrt(sum(B.^2,1)));
relres = reshape(res./rhsNorm, 2, []);
g_relres = reshape(g_res./rhsNorm, 2, []);
average_cpu_relative_residual = mean(relres, 'all')
average_gpu_relative_residual = mean(g_relres, 'all')
std_cpu = std(relres(:))
std_gpu = std(g_relres(:))
To which I got the answers
1.0197e+03
average_cpu_relative_residual =
0.9979
average_gpu_relative_residual =
1.0543
std_cpu =
0.1935
std_gpu =
0.2664
It seems the GPU solution is marginally worse, but given that the input was nonsense anyway I wouldn't read too much into it.
It's possible that you intended for the system matrix A to be square and only B to be 60x2 (i.e. 2 right-hand-sides), in which case with random data you'd get something much better behaved. I got:
1.3056e-11
average_cpu_relative_residual =
1.8911e-14
average_gpu_relative_residual =
1.6029e-14
std_cpu =
3.6059e-14
std_gpu =
3.1088e-14
3 Comments
Damian
on 9 Jan 2023
Joss Knight
on 9 Jan 2023
Well, I'll admit that numerical analysis is not my speciality (at doctorate level, anyway) but I'm fairly sure that 17% residual indicates that the data is still very poorly conditioned. Since it's 2-D data why not plot some of the data along with the best fit lines to get an idea of how poorly the GPU is doing. It may be that you can fix the problem with some basic outlier rejection, which would be sensible anyway if you have some bad outliers.
cublas's batch QR routines may have some issues, of course, with correct handling of poorly conditioned complex systems, so we'll take a look at that. In the meantime you could try using a pseudo-inverse as Matt J shows.
Damian
on 9 Jan 2023
Walter Roberson
on 9 Jan 2023
Edited: Matt J
on 9 Jan 2023
1 vote
It is a known bug that should be fixed soon that affects page left and right divisions for complex numbers
2 Comments
Bruno Luong
on 9 Jan 2023
I agree with Walter it looks pretty much similar cause (wrong detection of matrix type and decomposition) for GPU and CPU.
Categories
Find more on Mathematics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!