possible Bug? or checking if OverFlow happened?

2 views (last 30 days)
I have a matrix of size (6472908 x 67) all single values. Different columns have different max/min (there are different variables.
So I calculate mean of each column using
avgData=mean(Data);
I am expecting the first value in avgData to be the mean of the first column. However, when I issue
avgData(1) - mean(Data(:,1))
ans =
-100.9785
as you can see the output is not zero. So what is changed? The same thing happened if I convert everything to double.
If I do this for sum() the difference is even more. So, I was wondering if overflow is happening and how should I check if overflow has happened? lastwarn() returns nothing.
I am afraid the X is about 800MB and can't upload it here.

Accepted Answer

John D'Errico
John D'Errico on 31 Mar 2015
No. It is not a bug, but an artifact of operations that may be done in a different order due to the BLAS, or whatever scheme is used internally. NEVER assume that two distinct operations will do a given computation in the same order. It might conceivably reflect an issue of whether a double precision accumulator might be employed for vector input to sum (again, a choice probably made in the BLAS), but not for array input.
If the min and max values vary by such a large amount, that difference is trivial, essentially down in the least significant bits of the result, especially when you are summing millions of such elements.
You have not yet said what the total mean was either, so we cannot know how significant is the difference.
As for this being an overflow, that is not at all reasonable to assume. The numbers you have described are simply not large enough to cause overflow, AND if they did overflow, overflows in floating point result in inf, NOT a loss of precision.
realmax('single')
ans =
3.4028e+38
realmax('single')*2
ans =
Inf
The problem here is clearly an issue of bits lost at the low end, due to variation in the sequence of adds in these numbers.
You can test that claim by computing the mean in different sequences of your vector. For example, try this test several times:
mean(data(randperm(6472908),1))
then look at the differences.
As well, compare those differences to the size of the actual mean. How does that difference compare to eps for that same number?
  2 Comments
per isakson
per isakson on 31 Mar 2015
Edited: per isakson on 31 Mar 2015
R2013b,64bit,Win7
With 'single' I get identical mean values; i.e. order doesn't affect the value
>> R = randn([6472908,67],'single') + 12054;
>> format hex
>> mean(R(randperm(6472908),1))
ans = 463f2105
>> mean(R(randperm(6472908),1))
ans = 463f2105
>> mean(R(randperm(6472908),1))
ans = 463f2105
>> mean(R(randperm(6472908),1))
ans = 463f2105
and with 'double' there is a "rounding error" in the two last hex-positions, i.e. order does affect the value
>> R = randn([6472908,67],'double') + 12054;
>> mean(R(randperm(6472908),1))
ans = 40c78afff205ede5
>> mean(R(randperm(6472908),1))
ans = 40c78afff205edd3
>> mean(R(randperm(6472908),1))
ans = 40c78afff205eda7
Mohammad Abouali
Mohammad Abouali on 1 Apr 2015
Edited: Mohammad Abouali on 1 Apr 2015
regarding the over flow you are right. It is not that.
I did give the average/mean of the first column it is 12054.53 or 2155.51 using the two different method in single precision and it is something else (but close to these two) when using double precision.
I tried to generate the same thing using randomly generated numbers, but couldn't reproduce it.
What is for sure is that mean(Data) and mean(Data(:,1)) don't follow the same path or order in summing and division and it is causing too much difference in this case.
The funny thing is that if I do
avgData=mean(Data(:,1:10));
avgData(1)-mean(Data(:,1));
this time I get 0 as the difference.
Well, after thinking more about it, I kinda think if there was any bug in the mean() function they would have figure it out by now. If it was some new package or function, well then may be, but come on?! mean() function?! I don't know what I was thinking :D. So whatever the reason, it could't be a bug. This idea of the order make perfect sense and thanks for showing it too.

Sign in to comment.

More Answers (2)

James Tursa
James Tursa on 31 Mar 2015
Edited: James Tursa on 31 Mar 2015
How large is the average value? Is 100 close to eps of this average value? E.g., is the average value near 1e17? If so, this could just be rounding differences in the methods used. I.e., maybe the problem is split up differently for multi-threading if there are several columns of a matrix involved vs just one column.
"...The same thing happened if I convert everything to double..."
You get the exact same result? Or you get a similar result (i.e., something not "close" to zero.). It would not surprise me that in the background MATLAB uses double accumulators even if the input is single, which might explain the same result in single vs double.
EDIT:
Probably the better comparison would be eps of the sum, not eps of the average.
EDIT:
You might also consider using this FEX contribution from Jan Simon:
  1 Comment
Mohammad Abouali
Mohammad Abouali on 31 Mar 2015
Edited: Mohammad Abouali on 31 Mar 2015
The average value is 12054.53 (or 2155.51), using single precision for both case.
Actually now that double checking the same process again in Double precision I am getting almost the same results (well the difference is -8.0345e-09, which is still not explainable), so definitely that big difference is due to single precision VS double precision.
Although the difference for double precision is not much, though the question remains: if mean(X) and mean(X(:,1)) perform the same procedure for getting the average of first column, (regardless of the precision) they should end up with the same results. mean(X(:,1))-mean(X(:,1)) is returning 0 for both case (double or single), which obviously it should (otherwise something was really wrong).
So, now that we get different results for mean(X) and mean(X(:,1)) for the first column (and remember both are done either as single or double), definitely there are some other thing happening here and they don't perform the same procedure for the first column. So, I still think this is a bug.

Sign in to comment.


Image Analyst
Image Analyst on 31 Mar 2015
Yes, this is a known issue. We've converted arrays from double to single to save on memory yet when calling mean(), the means may not be correct. Basically it's ignoring the later elements as you add them because the sum is so huge and a tiny value added onto a gigantic value basically does not get added because it's so small. Basically underflow . We contacted the Mathworks to ask them. The Mathworks knows about this and does not consider it a bug , but just a normal precision issue comparable to this issue.
  3 Comments
Roger Stafford
Roger Stafford on 1 Apr 2015
"I expect the same results" <-- This is an assumption you should not make. Strictly speaking, even the associative and distributive laws of arithmetic are violated when computation is subject to round-off errors. When a series of numbers is added, the results can depend on how they are grouped together in the addition process. I like to give the following as an example which will usually give differing results even on a decimal calculator:
3/14+(3/14+15/14)
(3/14+3/14)+15/14
In computing mean(Data(:,1)) one cannot assume that the algorithm used is identical to that used for the first column of mean(Data). It all depends on just how the programmers who did the coding decided to handle the two different situations. Perhaps it depends on what they ate for breakfast? The basic assumption is that if different results agree within round-off error with perfect results, then either is acceptable.
Mohammad Abouali
Mohammad Abouali on 1 Apr 2015
You are right. The only thing that could change the roundoff error (here) is if the order of adding the numbers is changed. I was thinking that it would stay the same, but obviously in this two cases it is not.
My impression was that in both case it is summing 3/14+(3/14+15/14); so the order is not changed for the first column between the two commands, i.e. sum(X) and sum(X(:,1)). In that case the results should have been the same. So, now that it is not the same, the most possible explanation so far is exactly this order thing. So one does 3/14+(3/14+15/14) and another does (3/14+3/14)+15/14.

Sign in to comment.

Categories

Find more on Creating and Concatenating Matrices in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!