Inquiry Regarding Minor Variations in MATLAB GPU Computation

1 view (last 30 days)
I am running an algorithm in MATLAB utilizing my system's GPU. For the same input, the results are generally identical. However, in some cases, I notice minor variations in the decimal values of the output. Can anyone please help me understand why this is happening?

Answers (1)

Mike Croucher
Mike Croucher on 29 Jan 2025
It is difficult to comment without seeing the code but the most general thing I can think of saying goes as follows:
  • A calculation running on a GPU is usually a parallel calculation. If it isn't, don't use a GPU!
  • In any parallel calculation you cannot usually guarentee the order in which the calculations happen.
  • Thanks to to how floating point arithmetic works, the order of calculations matters to the final result even in instances where you might not imagine it should matter.
As an example to illustrate the final point, imagine the case where you have to add up a lot of numbers. To show how quickly 'interesting' things can happen lets consider a case where we only want to add up 3 numbers: 0.1, 0.2 and 0.3
We have two possible ways of proceeding:
x = 0.1 + (0.2 + 0.3); % Do 0.2 + 0.3 first
y = (0.1 + 0.2) + 0.3; % Do 0.1 + 0.2 first
% are they equal?
x==y
ans = logical
0
They are not equal and yet any mathematician will tell you that the order should not matter. What's going on?
The issue is related to the fact that all floating point numbers are represented in binary and you cannot represent 0.1 exactly in binary. You end up getting small round-off errors that accumulate. Another example that shows this is that the 64bit binary value thats closest to 0.1 is actually a little above 0.1
% You might expect this to be zero
res = 1 - (0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1) % That's 10 0.1's added up
res = 1.1102e-16
To be clear, this is not a MATLAB-thing, this is a 'floating-point arithmetic' thing.
So, even adding up three numbers is sensitive to the order you do it in. In your GPU algorithm you are probably doing millions or even billions of computations in parallel. The order of operations changes from one run to another and so you'll get tiny differences in the output.
Sometimes, the differences will not be tiny!
When round-off errors 'blow up'
In pathological cases these tiny differences can 'blow up'. As a trivial example, say you do a complex computation and (ill-advisedly) base the next step of that calculation on whether or not the answer of your computation is exactly 0.6
function out = mikeIsCrazy(input)
if input <= 0.6
fprintf("Missile has launched\n")
out = 1000000;
else
fprintf("No threat detected\n")
out = 0;
end
end
We'll use our addition of three numbers as a proxy for our complex calculation
mikeIsCrazy(0.1 + (0.2 + 0.3))
Missile has launched
ans = 1000000
mikeIsCrazy((0.1 + 0.2) + 0.3)
No threat detected
ans = 0
Fun fact: Logic like this formed the basis for my first-ever scientific computing trouble-shooting session when I first started working in academic computing support.
  4 Comments
Mike Croucher
Mike Croucher on 31 Jan 2025
What we are talking about here is a feature of floating point arithmetic, not hardware. So, yes.
How large are the differences? Can you post the code?
Doli Hazarika
Doli Hazarika on 3 Feb 2025
I have identified the error. It is inherent due to certain parameters in the algorithm and not related to the GPU or CPU.
Thank you for your time and help.

Sign in to comment.

Categories

Find more on Linear Algebra in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!