huge differences in single vs double precision math

Question

Jonathan on 7 Aug 2014

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/146603-huge-differences-in-single-vs-double-precision-math

Answered: John D'Errico on 7 Aug 2014

I am calculating a sum of squares in 32-bit FP precision (for comparison with a GPU algorithm, which isn't relevant here).

Here is the code:

Y=single((0:499).^2);
sum(Y)
ans =
   41541684
sum(double(Y))
ans = 
   41541750

The (correct) double answer is off by 66! The largest value, 499^2 = 249001, is nowhere near any FP limits.

This is R2013A on OS X 10.9.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 7 Aug 2014

4
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/146603-huge-differences-in-single-vs-double-precision-math#answer_147687

What you don't understand is that single precision has a 23 bit mantissa. While there are 32 total bits stored in a single, don't forget that one of those bits is a sign bit, which leaves 8 bits to store an exponent in a biased form. So you cannot store an INTEGER larger than 2^24-1 in a single, if you wish to do so without error.

The sum you formed was larger than that limit, so you should expect an error.