Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Very weird resolution issue. Bug ????? Seriously !!

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 3 Mar, 2009 18:50:19

Message: 1 of 50

% Run the following code. It must produce '1' as an answer.
% In R14SP3 it works fine. (It spit out '1')
% In R2008b, it doesn't work. (It spit out '0')
% Do you know why? I'm testing one of my library, and I found a lot of errors
% that I didn't get before. I struggled to isolate the error, and
% it boils down to the following problem.

% I'm using school license version of MATLAB of R2007b.
% I also tested in R2008b other's computer, and this bug is still there.

% Am I the only person who gets this non-sense result?

%%
x = [
    1 1 1 1 1 0 0
    0 0 0 0 0 1 1
    0 0 0 0 0 0 0
    0 0 0 0 0 0 0
    0 0 0 0 0 0 0
    1 0 0 0 0 1 0
    0 1 0 0 0 0 1 ];

mx = mean(x);

a = x - repmat(mx,size(x,1),1);
%a = x; % If you use this instead, it works fine!!!


b=a'*a; % very simple matrix multiplication.

c=a(:,1)'*a(:,1); % super simple vector multiplication

isequal(c, b(1,1)) % This must be '1' because c should be equal to b(1,1)
                          % no matter what matrix 'a' is !!!! But it depends on 'a' !!!

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 3 Mar, 2009 19:32:02

Message: 2 of 50

% I played around more and found that it happens when the number of
% rows of x is a multiple of '7'. And still it doesn't always happen but only when
% a certain combination of nonzeros are in the matrix. For example, the following spit out '0'.

%%
x = [
    1 1
    0 0
    0 0
    0 0
    0 0
    0 0
    0 0];

a = bsxfun(@minus,x,mean(x));

b=a'*a; c=a(:,1)'*a(:,1);
isequal(c,b(1,1))

%% But the following spit out '1'.
x = [
    0 0
    0 0
    0 0
    0 0
    0 0
    0 0
    1 1];

a = bsxfun(@minus,x,mean(x));

b=a'*a; c=a(:,1)'*a(:,1);
isequal(c,b(1,1))

%% Unreliable.... :(

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Duncan Po

Date: 3 Mar, 2009 19:55:36

Message: 3 of 50

What you are experiencing is round-off errors in floating point arithmetics. When dealing with floating point numbers, different operations may yield slightly different answers even when they should be the same in theory.
For details, refer to this document:

http://www.mathworks.com/support/tech-notes/1100/1108.html

Duncan

"Sung Soo Kim" <sskimbox@aol.com> wrote in message <gok0ji$d96$1@fred.mathworks.com>...
> % I played around more and found that it happens when the number of
> % rows of x is a multiple of '7'. And still it doesn't always happen but only when
> % a certain combination of nonzeros are in the matrix. For example, the following spit out '0'.
>
> %%
> x = [
> 1 1
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0];
>
> a = bsxfun(@minus,x,mean(x));
>
> b=a'*a; c=a(:,1)'*a(:,1);
> isequal(c,b(1,1))
>
> %% But the following spit out '1'.
> x = [
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 1 1];
>
> a = bsxfun(@minus,x,mean(x));
>
> b=a'*a; c=a(:,1)'*a(:,1);
> isequal(c,b(1,1))
>
> %% Unreliable.... :(

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Matt Fig

Date: 3 Mar, 2009 20:22:01

Message: 4 of 50

fprintf('\n %18.18g\n\n',c)
fprintf('\n %18.18g\n\n',b(1,1))

Standard floating point problem.




&lXbm-~nja`~n~~gDsdltdo~u`obkn`~~Lg?dfx~dthkngnmhn``s9a``s&

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Roger Stafford

Date: 3 Mar, 2009 20:27:02

Message: 5 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message <goju5b$nv4$1@fred.mathworks.com>...
> ........
> isequal(c, b(1,1)) % This must be '1' because c should be equal to b(1,1)
> ........

  Sung Soo Kim, in expecting that c and b(1,1) will be exactly equal you are not taking into account that differing round-off errors in digital computers can make even the associative law of addition or multiplication fail. That is, (a+b)+c and a+(b+c) can be different down at the least bit level. That being the case, you have no reason to assume that c and b(1,1) will be exactly equal if their operations happen to have been performed in a different order. In general, when dealing with floating point numbers on digital computers one should not expect results that are theoretically equal to actually be equal when arrived at by different computational procedures.

  Try this on your computer:

 3/14+(3/14+15/14) == (3/14+3/14)+15/14

If you use IEEE 754 double precision binary floating point numbers, as does Matlab, these results must be unequal if the rules of rounding-to-nearest are faithfully carried out. Try doing some binary arithmetic like this by hand and you will soon see why.

Roger Stafford

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 3 Mar, 2009 20:40:24

Message: 6 of 50

Duncan, I know what you mean (my background is EE, and I know what's going on in the CPU).
But you missed my point. Please read my code more carefully.

As far as I remember, addtion or subtraction of numbers with the same precision but significantly different does not result in inconsistent result just because of order of operation.

Even when that is true, that kind of resolution problem you mentioned MUST NOT happen in my example, especially for the case I showed you in my first reply. I understand that serveral combination of operation or operation order can cause resolution problem. BUT at least, they must be predictable. That's why there is IEEE standard.

In my example, I'm doing exactly the same two operations, in theoretical sense at least. That's why I'm using 'isequal' to them.

I'm not sure about if there is IEEE standard on "matrix multiplication", though I doubt it because this was not an issue at all in my previous MATLAB version (R14). If there is, please show me any reference. I'll appreciate it.

But I think MATLAB is violating IEEE standard. And if it is, this is a serious problem for scientists and engineers using MATLAB.

Best,


"Duncan Po" <Duncan.Po@mathworks.com> wrote in message <gok1vo$ka5$1@fred.mathworks.com>...
> What you are experiencing is round-off errors in floating point arithmetics. When dealing with floating point numbers, different operations may yield slightly different answers even when they should be the same in theory.
> For details, refer to this document:
>
> http://www.mathworks.com/support/tech-notes/1100/1108.html
>
> Duncan
>
> "Sung Soo Kim" <sskimbox@aol.com> wrote in message <gok0ji$d96$1@fred.mathworks.com>...
> > % I played around more and found that it happens when the number of
> > % rows of x is a multiple of '7'. And still it doesn't always happen but only when
> > % a certain combination of nonzeros are in the matrix. For example, the following spit out '0'.
> >
> > %%
> > x = [
> > 1 1
> > 0 0
> > 0 0
> > 0 0
> > 0 0
> > 0 0
> > 0 0];
> >
> > a = bsxfun(@minus,x,mean(x));
> >
> > b=a'*a; c=a(:,1)'*a(:,1);
> > isequal(c,b(1,1))
> >
> > %% But the following spit out '1'.
> > x = [
> > 0 0
> > 0 0
> > 0 0
> > 0 0
> > 0 0
> > 0 0
> > 1 1];
> >
> > a = bsxfun(@minus,x,mean(x));
> >
> > b=a'*a; c=a(:,1)'*a(:,1);
> > isequal(c,b(1,1))
> >
> > %% Unreliable.... :(

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 3 Mar, 2009 20:54:03

Message: 7 of 50

Thanks Roger,
Your example is cool.
So, the oder of addition and subtraction does matter.

But still... why MATLAB changed the order of calculation from what it is suppsed to be? I didn't mean to change the order of calculation in my example. My example should give the same result not only in theoretical sense but also in engineering sense.

What algorithm should I depend on then??? Is the Matlab's matrix multiplication algorithm standard???

As I said, it was not a problem in earlier version of Matlab (R14), which I can depend on. (Please don't say me to go back to that old version. I already changed a lot of my codes and it is painful to return, though I'm already in pain testing my previously perfect library.)

Anyway, is this algorithm documented??

I have to reform my question as: How can I predict my result based on obvious standard??

Best,


"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gok3qm$2gn$1@fred.mathworks.com>...
> "Sung Soo Kim" <sskimbox@aol.com> wrote in message <goju5b$nv4$1@fred.mathworks.com>...
> > ........
> > isequal(c, b(1,1)) % This must be '1' because c should be equal to b(1,1)
> > ........
>
> Sung Soo Kim, in expecting that c and b(1,1) will be exactly equal you are not taking into account that differing round-off errors in digital computers can make even the associative law of addition or multiplication fail. That is, (a+b)+c and a+(b+c) can be different down at the least bit level. That being the case, you have no reason to assume that c and b(1,1) will be exactly equal if their operations happen to have been performed in a different order. In general, when dealing with floating point numbers on digital computers one should not expect results that are theoretically equal to actually be equal when arrived at by different computational procedures.
>
> Try this on your computer:
>
> 3/14+(3/14+15/14) == (3/14+3/14)+15/14
>
> If you use IEEE 754 double precision binary floating point numbers, as does Matlab, these results must be unequal if the rules of rounding-to-nearest are faithfully carried out. Try doing some binary arithmetic like this by hand and you will soon see why.
>
> Roger Stafford

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Matt Fig

Date: 3 Mar, 2009 21:10:29

Message: 8 of 50


> What algorithm should I depend on then???

I almost always use a tolerance instead of comparing floating point numbers for exact equality. Using the a from your example:

s1 = 0; for ii = 1:length(a(:,1)),s1 = s1 + a(ii,1)^2;end % gives same as your c
s2 = 0; for ii = length(a(:,1)):-1:1,s2 = s2 + a(ii,1)^2;end % gives same as b(1,1)
fprintf('\n %18.18g\n\n',s2)
fprintf('\n %18.18g\n\n',s1)

isequal(s1,s2) % No
abs(s1-s2)<20*eps % Yes





XZj_L`DPRSZjqqLLjxjjNLXV0[WaTLZ`8[j%_LSPZjjZZM_YWNPTPLSdMY+

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 3 Mar, 2009 21:47:06

Message: 9 of 50

OK, Matt.
I've been using your methods already when I can predict the resolution problem.
I know there are a lot of cases I have to use tolerance.

Fair enough.

But you know my point. The examples are supposed to be the same. I mean, the order of addition must be the same.

Now, based on replies to my post, there may not exist any standard on matrix multiplication per se. So it is OK for Mathworks to change the order of floating point operation as they wish.

Is this right? (If so, I'm surprised...)

I might have ignored this problem if I didn't have to debug and rewrite much of my testing codes. This problem gives me a little different inverse matrix on several cases which have been really difficult to find the cause. That made me write this post. :)

Anyway, if there is no one considering this problem significantly... Maybe that's the right way.

Best,



"Matt Fig" <spamanon@yahoo.com> wrote in message <gok6c5$19g$1@fred.mathworks.com>...
>
> > What algorithm should I depend on then???
>
> I almost always use a tolerance instead of comparing floating point numbers for exact equality. Using the a from your example:
>
> s1 = 0; for ii = 1:length(a(:,1)),s1 = s1 + a(ii,1)^2;end % gives same as your c
> s2 = 0; for ii = length(a(:,1)):-1:1,s2 = s2 + a(ii,1)^2;end % gives same as b(1,1)
> fprintf('\n %18.18g\n\n',s2)
> fprintf('\n %18.18g\n\n',s1)
>
> isequal(s1,s2) % No
> abs(s1-s2)<20*eps % Yes
>
>
>
>
>
> XZj_L`DPRSZjqqLLjxjjNLXV0[WaTLZ`8[j%_LSPZjjZZM_YWNPTPLSdMY+

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Matt Fig

Date: 3 Mar, 2009 21:58:02

Message: 10 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message
> But you know my point. The examples are supposed to be the same. I mean, the order of addition must be the same.


I do see your point, and honestly, this one surprised me too.


>
> Is this right? (If so, I'm surprised...)
>



That is really a technical question for TMW, this newsgroup isn't *necessarily* monitored by anyone from TMW in an official way. If you ask, could you please let us know what they say?

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: John D'Errico

Date: 3 Mar, 2009 22:02:04

Message: 11 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message <gok8gq$2na$1@fred.mathworks.com>...
> OK, Matt.
> I've been using your methods already when I can predict the resolution problem.
> I know there are a lot of cases I have to use tolerance.
>
> Fair enough.
>
> But you know my point. The examples are supposed to be the same. I mean, the order of addition must be the same.
>
> Now, based on replies to my post, there may not exist any standard on matrix multiplication per se. So it is OK for Mathworks to change the order of floating point operation as they wish.
>

Compilers do this sort of thing routinely, in their
attempts to optimize your code.


> Is this right? (If so, I'm surprised...)

Don't be. Never trust the least significant bits in a
floating point computation to be exact.


> I might have ignored this problem if I didn't have to debug and rewrite much of my testing codes. This problem gives me a little different inverse matrix

Don't compute the inverse matrix anyway. Yet
another thing to avoid.


> on several cases which have been really difficult to find the cause. That made me write this post. :)
>

This is not a "problem" to be ignored. This is a fact of
life when working with any piece of code that uses
floating point arithmetic. This is just something to be
understood.

John

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 3 Mar, 2009 23:18:01

Message: 12 of 50

Thank you John,

> Compilers do this sort of thing routinely, in their
> attempts to optimize your code.

Your point on 'compiler' makes me try to just admit this bug. Yes, the compiler do many kinds of weird optimization. But still I think they had to make this not happen, because it changes what it is supposed to do. See this:
a=3/14+3/14+15/14
b=(3/14+3/14)+15/14
c=3/14+(3/14+15/14)
This is from Roger's example. Among 'b' and 'c', which is supposed to be equal to 'a' do you think? Can you predict the result of MATLAB without running the code? The answer is supposed to be 'b'. But are you sure? (of course this may be documented and it is fortunately 'b')

> Don't compute the inverse matrix anyway. Yet
> another thing to avoid.

Cool solution. But I'm sorry. My m-file is an implementation of a mathematical algorithm. It needs inverse operation. But the problem I mentioned happens before inverse. Anyway, if there is the same problem in inverse as well... Well... OK... I think I got to live with it. :( I just hope Mathworks documented what inverse algorithm is used when I use a certain inverse operation. Optimization is fine, but not in a way to change what it is supposed to do.

> This is not a "problem" to be ignored. This is a fact of
> life when working with any piece of code that uses
> floating point arithmetic. This is just something to be
> understood.

Good point. I know that it is not avoidable. But that's why we need standard to minimize complications.

Maybe this is an exaggeration, but who knows what will happen to the following equation? : '1==1.0' This is '1', but who knows? Compiler may use different method to initialize each side to optimize the speed and it may become false, in the future... Then '==' operator is not useful anymore... I know this is exaggeration.

So, isn't really there any standard on matrix multiplication? If it is, it is really surprising... :(

Anyway, this bug is still counter-intuitive to me. But for now, I don't have any choice but fixing all of my testing codes to use tolerance. Boring job... :( But OK. I can live with it.

Best,

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: John D'Errico

Date: 4 Mar, 2009 00:06:02

Message: 13 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message <gokdr9$heh$1@fred.mathworks.com>...
> Thank you John,
>
> > Compilers do this sort of thing routinely, in their
> > attempts to optimize your code.
>
> Your point on 'compiler' makes me try to just admit this bug. Yes, the compiler do many kinds of weird optimization. But still I think they had to make this not happen, because it changes what it is supposed to do. See this:
> a=3/14+3/14+15/14
> b=(3/14+3/14)+15/14
> c=3/14+(3/14+15/14)
> This is from Roger's example. Among 'b' and 'c', which is supposed to be equal to 'a' do you think? Can you predict the result of MATLAB without running the code? The answer is supposed to be 'b'. But are you sure? (of course this may be documented and it is fortunately 'b')
>

Never assume that such an operation gives
an exact result down to the last bit, since it
cannot possibly so so. Even a simple decimal
number like 0.1 has no exact representation
in a binary floating point system.


> > Don't compute the inverse matrix anyway. Yet
> > another thing to avoid.
>
> Cool solution. But I'm sorry. My m-file is an implementation of a mathematical algorithm. It needs inverse operation.

No, it probably does not "need" the inverse.
In fact, use of the inverse is in most cases a
poor way to implement things like this:

  inv(A)*b

Yes, the formula probably was written as

  A^(-1) *b

Better for several reasons is to write it this way
in Matlab:

   A\b

You will find some discussion on the topic here:

http://blogs.mathworks.com/loren/2007/05/16/purpose-of-inv/

> But the problem I mentioned happens before inverse. Anyway, if there is the same problem in inverse as well... Well... OK... I think I got to live with it. :( I just hope Mathworks documented what inverse algorithm is used when I use a certain inverse operation.

Not down to the last bit. You will find the very
same things happening in a call to inv as
happened to you in matrix multiplication.


> Optimization is fine, but not in a way to change what it is supposed to do.
 
I'll still suggest that the problem is as much in
your expectations of the result than in the
software.

It is not entirely matlab to blame here. Your
cpu makes many subtle decisions about your
computations too.

John

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Matt Fig

Date: 4 Mar, 2009 00:10:19

Message: 14 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message
> Maybe this is an exaggeration, but who knows what will happen to the following equation? : '1==1.0' This is '1', but who knows? Compiler may use different method to initialize each side to optimize the speed and it may become false, in the future... Then '==' operator is not useful anymore... I know this is exaggeration.
>
> So, isn't really there any standard on matrix multiplication? If it is, it is really surprising... :(
>
> Anyway, this bug is still counter-intuitive to me. But for now, I don't have any choice but fixing all of my testing codes to use tolerance. Boring job... :( But OK. I can live with it.


I think there is no doubt that your example is an exaggeration. We are talking about results that agree to within eps here. Now if any compiler optimized in such a way as to disagree in the 3rd decimal, then that compiler would be garbage.
The way I look at it, we have two choices when dealing with floating point number equality computations:

1. Rely on them and be surprised when they don't work.
2. Don't rely on them, work around it, move on.

I still think calling this a 'bug' is inaccurate, but really (and again) this is something you should bring up with TMW directly.




H,B `4C<LCG4BYRG884HRRRB@Y4r8:?A;5Il?<;B6R4wG4RA;B5R@68R4B>

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Roger Stafford

Date: 4 Mar, 2009 03:16:02

Message: 15 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message <gokdr9$heh$1@fred.mathworks.com>...
> .......
> So, isn't really there any standard on matrix multiplication? If it is, it is really surprising... :(
> .......

  Sung Soo Kim, it is a serious mistake in my opinion to constrain oneself to always performing arithmetic operations in precisely the same order or even with the same algorithm in accomplishing some goal. To give a rather extreme example of this latter, consider a problem that was posed recently on this newsgroup. The task (in slightly simplified form) was to compute the sum of the squares of every possible difference in pairings of the n elements of a vector of values. To do this directly requires n^2 subtractions, n^2 squarings, and n^2-1 additions. However, it can readily be shown that the same answer (mathematically speaking) can be obtained using a quite different expression that uses only 2*n-2 additions, n squarings, n subtractions, one division, and one crucial multiplication, which amounts to vastly fewer operations for large n. Obviously the rounding errors in the two
methods would be different, but it would be a shame not to be able to take advantage of such savings as this.

  I think the same principle should apply even to such seemingly routine tasks as computing the product of matrices. There are some important shortcuts that can be used in such calculations but they require processing the data in different arrangements than one would expect. I seriously doubt that Mathworks would be happy living under the restriction that they always had to use a consistent order of operations or even a consistent algorithm in matrix multiplication from one Matlab version to the next. In your case it was a mistake to assume that the b(1,1) element was arrived at in precisely the same manner as was c. It is far better that users create their applications with sufficient robustness so as not to be vulnerable to variations in rounding errors. Rounding errors are a fact of life that must be faced in any use of floating point numbers.

Roger Stafford

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 4 Mar, 2009 03:18:02

Message: 16 of 50

> Never assume that such an operation gives
> an exact result down to the last bit, since it
> cannot possibly so so. Even a simple decimal
> number like 0.1 has no exact representation
> in a binary floating point system.

After a long series of discussion, now I'm convinced that you guys are right. I cannot rely on what it is supposed to be right. But, the last point you said, I cannot agree. Of course it is freedom of a CPU architect to implement any floating point as s/he wish. But there is clear IEEE standard that is developed by smart engineers. I believe they had very good reason to make standard on this issue. One of the most important reason is of course to minimize calculation error for scientific purpose. For single, or double, there is only one IEEE standard for each that represents 0.1 (Let's exclude signed or unsigned number issue.)

> Better for several reasons is to write it this way
> in Matlab:
>
> A\b

Fortunately, I already know this issue, and I've been using it. What I meant by 'inverse' is right this operator. But, what I meant by 'same thing can happen on the inverse' is that this operator can change its behavior in the future. Then how can I rely on that? Maybe I have to change my testing codes again.

Operator is the most important and a basic thing in scientific calculation. If it changes its behavior, without approved modification of standard, the designer is evil. :(
Unfortunately, '\' operator is not standard at all, and I cannot blame Mathworks for that. And in fact, they improved the performance a lot. So it boils down to my decision on if I can live with that. Bad for me... But now I can live with it. In the same sense, it looks like no one knows about standard of matrix multiplication. So maybe there is not. That said, I cannot blame Mathworks on this issue, though I'll report this.

Anyway, thanks to people in this thread, I decided not to rely on MATLAB's floating point calculation on any subject. Something looks intuitive and is supposed to be obviously right (even in technical sense) may not be supported in the future.

> It is not entirely matlab to blame here. Your
> cpu makes many subtle decisions about your
> computations too.

Well, I know that might be true, though I doubt it. (Do you remember Pentium floating point error? It was a big issue, and it looks like now no one cares about this kind of thing. Maybe no on can.) Except for some acceleration-related situation like 3D game engine, a kind of optimization mentioned above may not happen, especially in scientific software. Even if they exist, I think scientific software must turn off that by default, though they can offer users to opt out. As you know, CPU optimization is mainly about pipelining, not about changing the order of mutually dependent assembly codes. Of course, I admit that there is gray area that IEEE standard cannot capture. :(

FYI, I'm testing all of these on a single machine. Because I'm migrating from old 2005 version to 2007 version, I can compare two versions line by line. So, everything is the same except MATLAB.

Anyway, thank you very much John and everyone. You convinced me that I must not rely on MATLAB for floating point calculation, maybe even for cases that IEEE standard can be an issue...

Best,

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 4 Mar, 2009 03:25:03

Message: 17 of 50

> The way I look at it, we have two choices when dealing with floating point number equality computations:
>
> 1. Rely on them and be surprised when they don't work.
> 2. Don't rely on them, work around it, move on.
>
> I still think calling this a 'bug' is inaccurate, but really (and again) this is something you should bring up with TMW directly.

Well, I didn't rely on it. My test cases are composed of tests that uses '==' or tolerance. All of them were fine. I decided to use '==' when it actually showed equal result. But it suddenly changed in later versions. It may be related to MATLAB's code optimization algorithm. It is annoying to see the change that was not a bug at all and was something right.

But, Matt, I think your view is very balanced, and I admit that I'd better follow your advice. I'll report it but I'll not rely on that.

Thanks a lot.

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Jos

Date: 4 Mar, 2009 11:48:01

Message: 18 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message <goksaf$nfd$1@fred.mathworks.com>...
> > The way I look at it, we have two choices when dealing with floating point number equality computations:
> >
> > 1. Rely on them and be surprised when they don't work.
> > 2. Don't rely on them, work around it, move on.
> >
> > I still think calling this a 'bug' is inaccurate, but really (and again) this is something you should bring up with TMW directly.
>
> Well, I didn't rely on it. My test cases are composed of tests that uses '==' or tolerance. All of them were fine. I decided to use '==' when it actually showed equal result. But it suddenly changed in later versions. It may be related to MATLAB's code optimization algorithm. It is annoying to see the change that was not a bug at all and was something right.
>
> But, Matt, I think your view is very balanced, and I admit that I'd better follow your advice. I'll report it but I'll not rely on that.
>
> Thanks a lot.

To add another example of floating point issues:

% Although we know that "sin(pi) is 0", matlab makes a round-off error
  cos(pi/2)
% ans = 6.1232e-017, which is of course expected, given the fact pi is not to be
% known exactly. The error though is small enough
  cos(pi/2) < eps
% So, on should not be suprised:
  sin(pi/4).^2 == 1/2
  sin(pi/4) == sqrt(1/2)
  sin(pi/4) == 1/sqrt(2)

Jos

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Steve Amphlett

Date: 4 Mar, 2009 12:17:01

Message: 19 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message <gokrta$q9s$1@fred.mathworks.com>...
>
> FYI, I'm testing all of these on a single machine. Because I'm migrating from old 2005 version to 2007 version, I can compare two versions line by line. So, everything is the same except MATLAB.
>
> Anyway, thank you very much John and everyone. You convinced me that I must not rely on MATLAB for floating point calculation, maybe even for cases that IEEE standard can be an issue...

If you run your calcs on a different machine, you'll get different answers again. A good reason to not rely on being on one or other side of a knife edge in your (mathematical) decision making. Relying on sitting exactly ON the knife edge is worse still.

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Peter Boettcher

Date: 4 Mar, 2009 17:58:38

Message: 20 of 50

"Sung Soo Kim" <sskimbox@aol.com> writes:

> Thank you John,
>
>> Compilers do this sort of thing routinely, in their
>> attempts to optimize your code.
>
> Your point on 'compiler' makes me try to just admit this bug. Yes, the
> compiler do many kinds of weird optimization. But still I think they
> had to make this not happen, because it changes what it is supposed to
> do. See this:
>
> a=3/14+3/14+15/14
> b=(3/14+3/14)+15/14
> c=3/14+(3/14+15/14)
>
> This is from Roger's example. Among 'b' and 'c', which is supposed to
> be equal to 'a' do you think? Can you predict the result of MATLAB
> without running the code? The answer is supposed to be 'b'. But are
> you sure? (of course this may be documented and it is fortunately 'b')

Why do you say b? Why is that obviously the "correct" answer? In math,
the answers are the same, so it shouldn't matter. It is only your
expectation which says we should add the numbers in left-to-right order.

I presume, given your matrix multiplication example, that you think the
"correct" order is to accumulate, in order, the products of pairs of
numbers from the first row and first column of the matrices. Why? Why
is that the correct order? The math doesn't care. In fact the "best"
ordering would be to accumulate the numbers in ascending order of
magnitude, to minimize the round-off error. So the top-to-bottom
ordering is just as wrong as any other order.

You can get different answers on some CPUs and compilers that use 80-bit
double precision inside the FPU, but 64-bit in memory. Current Intel
CPUs have a choice- use the FPU, or use SSE2 instructions for
double-precision computations. SSE2 doesn't use 80 bit extended.


In your experience, computations come out the same way all the time.
Your mistake is in assuming that that is somehow part of a standard, or
that they are guaranteed to do so. Take the 32-bit to 64-bit
migration. Many programmers assumed that a pointer was the same size as
an integer. It always worked before, so why not? The problem was, no
standard said it must, so the change to the new architecture breaks a
lot of code.

It's virtually impossible to write code that adheres perfectly to the
guarantees of the standards and documentation. Whatever. We do the
best we can given the constraints and needs of a given project. But
when such a situation is pointed out to you, and you understand the
mistaken assumption involved, you fix the code and move on. You don't
protest the advances in architecture, numerical methods, compilers, or
whatever.

-Peter

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: dpb

Date: 4 Mar, 2009 18:34:59

Message: 21 of 50

Sung Soo Kim wrote:
> % Run the following code. It must produce '1' as an answer.
> % In R14SP3 it works fine. (It spit out '1')
> % In R2008b, it doesn't work. (It spit out '0')
> ...floating point computation precision/optimization/etc. example
> elided for brevity...

In addition to the comments of Roger and Peter, I'd point you to

<http://docs.sun.com/source/806-3568/ncg_goldberg.html>

Widely-recommended reading, "What Every Computer Scientist Should Know
About Floating-Point Arithmetic"

--

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Steven Lord

Date: 4 Mar, 2009 19:15:25

Message: 22 of 50


"Sung Soo Kim" <sskimbox@aol.com> wrote in message
news:gokrta$q9s$1@fred.mathworks.com...
>> Never assume that such an operation gives
>> an exact result down to the last bit, since it
>> cannot possibly so so. Even a simple decimal
>> number like 0.1 has no exact representation
>> in a binary floating point system.
>
> After a long series of discussion, now I'm convinced that you guys are
> right. I cannot rely on what it is supposed to be right. But, the last
> point you said, I cannot agree. Of course it is freedom of a CPU architect
> to implement any floating point as s/he wish. But there is clear IEEE
> standard that is developed by smart engineers. I believe they had very
> good reason to make standard on this issue. One of the most important
> reason is of course to minimize calculation error for scientific purpose.
> For single, or double, there is only one IEEE standard for each that
> represents 0.1 (Let's exclude signed or unsigned number issue.)
>
>> Better for several reasons is to write it this way
>> in Matlab:
>>
>> A\b
>
> Fortunately, I already know this issue, and I've been using it. What I
> meant by 'inverse' is right this operator. But, what I meant by 'same
> thing can happen on the inverse' is that this operator can change its
> behavior in the future. Then how can I rely on that? Maybe I have to
> change my testing codes again.

If you're expecting A\b to always return the same answer, bit-for-bit,
across all operating systems and all processors and all versions of MATLAB,
then no, you cannot rely on this. Some of the factors that prevent this
from occurring:


* Different operating systems have different mathematical libraries. The
engineers at Microsoft who wrote Windows XP may not have written the
libraries precisely the same way as the engineers at Sun that wrote Solaris.
* Different processors have different instruction sets.
* Different versions of MATLAB do not always use the same code or same
libraries. We may choose to enhance backslash to perform more robustly,
more accurately, or more quickly on a certain problem and modify slightly
the result it returns on that and other problems. We may fix a bug in
backslash that affects the results it returns. We may switch to a different
library [LINPACK -> LAPACK, for instance] or a different version of a
library [for instance, last release we upgraded to version 10.0.3 of the
Intel MKL, as documented in the Release Notes.
http://www.mathworks.com/access/helpdesk/help/techdoc/rn/brqyzsl-1.html] for
various reasons, and that library may have made an enhancement or bug fix
that changes the results it returns.
* Other applications on the machine may have altered settings like the FPU
rounding mode, thus affecting the calculations the libraries underlying
MATLAB are performing. [Yes, I'm serious.]
* Differences in the compiler versions used to compile MATLAB and/or the
underlying libraries and the code those compilers generate.


> Operator is the most important and a basic thing in scientific
> calculation. If it changes its behavior, without approved modification of
> standard, the designer is evil. :(

What if the standard doesn't specify the exact behavior the operator should
use? Is the standard author then evil? :)

> Unfortunately, '\' operator is not standard at all, and I cannot blame
> Mathworks for that. And in fact, they improved the performance a lot. So
> it boils down to my decision on if I can live with that. Bad for me... But
> now I can live with it. In the same sense, it looks like no one knows
> about standard of matrix multiplication. So maybe there is not. That said,
> I cannot blame Mathworks on this issue, though I'll report this.
>
> Anyway, thanks to people in this thread, I decided not to rely on MATLAB's
> floating point calculation on any subject. Something looks intuitive and
> is supposed to be obviously right (even in technical sense) may not be
> supported in the future.

I know this next sentence is going to sound inflammatory, but it's not
intended to be -- wait until the end of the experiment before you start
yelling, please. If you can't rely on a calculation that doesn't give the
intuitive result, then you can't rely on yourself to perform calculations,
either. Let's do a little experiment (this involves fixed-point instead of
floating-point, but go with it.)

1) Choose some number of decimal places.
2) Using pencil and paper, compute the quantity x = 1/3 to that number of
decimal places.
3) Now, using pencil and paper, compute y = 3*x. Do NOT use 1/3 in this
computation; use the value of x you calculated in step 2.

Intuitively, computing 3*(1/3) should result in exactly 1, yes? Is y
exactly 1? No, it's not. Your pencil and paper calculations gave an
unintuitive result.

>> It is not entirely matlab to blame here. Your
>> cpu makes many subtle decisions about your
>> computations too.
>
> Well, I know that might be true, though I doubt it. (Do you remember
> Pentium floating point error? It was a big issue, and it looks like now no
> one cares about this kind of thing. Maybe no on can.)
> Except for some acceleration-related situation like 3D game engine, a kind
> of optimization mentioned above may not happen, especially in scientific
> software. Even if they exist, I think scientific software must turn off
> that by default, though they can offer users to opt out. As you know, CPU
> optimization is mainly about pipelining, not about changing the order of
> mutually dependent assembly codes. Of course, I admit that there is gray
> area that IEEE standard cannot capture. :(
>
> FYI, I'm testing all of these on a single machine. Because I'm migrating
> from old 2005 version to 2007 version, I can compare two versions line by
> line. So, everything is the same except MATLAB.
>
> Anyway, thank you very much John and everyone. You convinced me that I
> must not rely on MATLAB for floating point calculation, maybe even for
> cases that IEEE standard can be an issue...

The impression I get is that you believe that the IEEE specification exactly
locks down how each and every mathematical operation MUST behave on each and
every number. It's not that long a specification :) You might want to read
through the documents linked in the External Links section of the Wikipedia
page for IEEE 754-1985:

http://en.wikipedia.org/wiki/IEEE_754-1985

I'd phrase your last comment differently -- you cannot always depend on the
result of a calculation involving floating-point arithmetic to agree with
the result your intuition expects from the calculation.

--
Steve Lord
slord@mathworks.com

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 4 Mar, 2009 19:55:04

Message: 23 of 50

Hi Peter,

Thank you for your long opinion. I totally understand what you mean.

> Why do you say b? Why is that obviously the "correct" answer? In math,
> the answers are the same, so it shouldn't matter. It is only your
> expectation which says we should add the numbers in left-to-right order.

OK. I finally searched Help files for a moment, and was very surprised that their spec is so loose that the order of addition is not defined. I simply cannot find that. As you know, what I said is the technical spec of C/C++ language, in which 'b' is the correct answer for the compiler. It might have been my fault from the start that I assumed 'MATLAB' is similar to ordinary programming language. Now I know how loosely it is designed or defined at least in a technical sense. (Wait a second... Isn't Matlab designed for technical calculation supporting engineers and scientists? :) joking... )

> I presume, given your matrix multiplication example, that you think the
> "correct" order is to accumulate, in order, the products of pairs of
> numbers from the first row and first column of the matrices. Why? Why
> is that the correct order? The math doesn't care. In fact the "best"
> ordering would be to accumulate the numbers in ascending order of
> magnitude, to minimize the round-off error. So the top-to-bottom
> ordering is just as wrong as any other order.

Again. I got your main point and I won't rely on MATLAB's matrix manipulation algorithm at all. But if you say the 'best' ordering thing, why do you think 'c' and 'b(1,1)' can be different if they're really 'best' optimized to minimize the error? They must be the same in your sense. Matlab does not use the 'best' algorithm to minimize the error, but it may be using 'speed optimizing' algorithm that can sacrifice the accuracy. (Hm... this sounds very important... another issue?)

> You can get different answers on some CPUs and compilers that use 80-bit
> double precision inside the FPU, but 64-bit in memory. Current Intel
> CPUs have a choice- use the FPU, or use SSE2 instructions for
> double-precision computations. SSE2 doesn't use 80 bit extended.

Cool. That's a good example of gray area that IEEE standard cannot step in. But floating point accuracy was not my issue. My issue was about 'inconsistency' of the matrix manipulation algorithm. Anyway...

> In your experience, computations come out the same way all the time.
> Your mistake is in assuming that that is somehow part of a standard, or
> that they are guaranteed to do so. Take the 32-bit to 64-bit
> migration. Many programmers assumed that a pointer was the same size as
> an integer. It always worked before, so why not? The problem was, no
> standard said it must, so the change to the new architecture breaks a
> lot of code.

Assumption of consistency on 'scientific' software was my mistake? I don't think so. Consistency must be the heart of scientific software, though my assumption might have gone too far.

Sure. No standard say 'you must'. As I said, the purpose of standard is to minimize scientific error and complications between people. Therefore, as technology advances, standard also advances for sure.

But you know my point. They quietly changed the way of matrix manipulation, and it's not in the compatibility documents (See 'math' sections in http://www.mathworks.com/access/helpdesk/help/techdoc/rn/bqsrae0.html after R14sp3 or is it documented elsewhere???). Then I think I can at least raise an issue, though no one really seems to care about that (and I won't). Your example of pointer (32bit to 64bit) is super-big issue in technology and it is well documented. I don't have any issue with such a big change that I can easily access information of.

And again, as a huge 'scientific' software with 'millions' of users, any change of 'technical behavior' should be documented. Don't you think so? (It might have been documented and I just couldn't find it... In that case... Well... Shame on me...)

> It's virtually impossible to write code that adheres perfectly to the
> guarantees of the standards and documentation. Whatever. We do the
> best we can given the constraints and needs of a given project. But
> when such a situation is pointed out to you, and you understand the
> mistaken assumption involved, you fix the code and move on. You don't
> protest the advances in architecture, numerical methods, compilers, or
> whatever.
>
> -Peter

Hm... I got your point but you went too far. Nothing cannot be protested. You can think of many examples on that. :) And I moved on already. One thing I can say is that you're free to show tolerance but raising an issue is one's freedom and that's the real engine of advances in anything, though I don't think my issue is so big.

Peter, thank you for your opinion and I totally got your point. And (I'm saying this over and over again) I'll live with MATLAB accepting its far from perfect things. MATLAB has the fastest engine in the market, lovely interface, cool toolboxes, well designed grammars, and all of other good things I cannot give up. Matrix multiplication algorithm is a small problem to me compared to those things, though I don't like the way it is optimized in MATLAB.

Best,

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 4 Mar, 2009 20:30:05

Message: 24 of 50

OMG Steven, it is too long... :)

> If you're expecting A\b to always return the same answer, bit-for-bit,
> across all operating systems and all processors and all versions of MATLAB,
> then no, you cannot rely on this. Some of the factors that prevent this
> from occurring:

Let me clear this out first. I don't expect it at all. My original point was about inconsistency of matrix manipulation by MATLAB (I don't know why this thread became about floating point issue...). The way of matrix manipulation looks like depending on what the matrix is. And it doesn't look like it has 'scientific' or 'theoretical' reason for that. All that I can 'guess' is it is just about optimization for 'speed' sacrificing accuracy. Because of the simple example I gave you, I'm pretty sure '\' will manipulate two different matrix in two different ways.

See my examples. It is a (now I think) small thing. But it is really counter-intuitive even considering optimization. One has nx1 vector, and the other is nx2 matrix. And squaring gives different answer. Why? Optimization? OK. I'll live with that.

Another issue was that something that was consistent is not consistent anymore in a 'scientific' software with 'millions' of users. And I think this is something worth raising an issue, though no one seems caring about seriously. You said about different versions of libraries. OK. Fair enough. There may be tons of reason to change libraries. But my point was not about that. That library changed something good to worse.

> What if the standard doesn't specify the exact behavior the operator should
> use? Is the standard author then evil? :)

Of course not. That's why I talked about 'gray area' in standard. BTW, actually I was very surprised that the act of '+' is not well defined in 'scientific software' like MATLAB. There is no document on how exactly '+' is performed in MATLAB. I can just complain about this but I cannot blame them. :)

> Intuitively, computing 3*(1/3) should result in exactly 1, yes? Is y
> exactly 1? No, it's not. Your pencil and paper calculations gave an
> unintuitive result.

Again, I don't know why the story of this thread went too far from original issue.
It is true what you said, but it at least gives me 'consistent' result over and over again. If I do 1x3 first then divide by 3 gives different result, and it is more intuitive. But it gives consistent result.

See my example. I expected the same result on 'c' and 'b(1,1)' because 'intuitively' (which I went so far) they should give me 'consistent' result, and it did on R14SP3, but not on recent version.

> The impression I get is that you believe that the IEEE specification exactly
> locks down how each and every mathematical operation MUST behave on each and
> every number. It's not that long a specification :) You might want to read
> through the documents linked in the External Links section of the Wikipedia
> page for IEEE 754-1985:
>
> http://en.wikipedia.org/wiki/IEEE_754-1985
>
> I'd phrase your last comment differently -- you cannot always depend on the
> result of a calculation involving floating-point arithmetic to agree with
> the result your intuition expects from the calculation.

Thank you for your reference. I remember I studied that during my college years. But it was not my issue. I'm sorry for confusion. What I actually meant was about inconsistency about matrix manipulation. I think I, myself, also confused on my real issue at that moment after a long talk about floating point calculation. Anyway, I can live with this inconsistency believing that MathWorks might have done the right thing with bunch of pros.

Best,

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Peter Boettcher

Date: 5 Mar, 2009 00:03:48

Message: 25 of 50

"Sung Soo Kim" <sskimbox@aol.com> writes:

> They must be the same in your sense. Matlab does not use the 'best'
> algorithm to minimize the error, but it may be using 'speed
> optimizing' algorithm that can sacrifice the accuracy. (Hm... this
> sounds very important... another issue?)

Sorting, then adding, is the most accurate order. But almost no one
does that. MATLAB does not, Numerical Recipes in C does not, ATLAS
LAPACK/BLAS do not. So in some ways, all of these packages/libraries
sacrifice accuracy for speed... or simplicity... or whatever. They
simply don't address the issue.

>> You can get different answers on some CPUs and compilers that use 80-bit
>> double precision inside the FPU, but 64-bit in memory. Current Intel
>> CPUs have a choice- use the FPU, or use SSE2 instructions for
>> double-precision computations. SSE2 doesn't use 80 bit extended.
>
> Cool. That's a good example of gray area that IEEE standard cannot
> step in. But floating point accuracy was not my issue. My issue was
> about 'inconsistency' of the matrix manipulation algorithm. Anyway...

This is exactly about that. Even in C, you can't ensure perfect
consistency, because a different compiler, or a different compiler
RELEASE, or a different architecture, may generate code that still
follows the C standard, but produces slightly different results, because
of the above.

This does not make C programs useless for science. You just have to
understand the constraints and limitations when you draw your
conclusions, just like with any other scientific tool.

>> In your experience, computations come out the same way all the time.
>> Your mistake is in assuming that that is somehow part of a standard, or
>> that they are guaranteed to do so. Take the 32-bit to 64-bit
>> migration. Many programmers assumed that a pointer was the same size as
>> an integer. It always worked before, so why not? The problem was, no
>> standard said it must, so the change to the new architecture breaks a
>> lot of code.
>
> Assumption of consistency on 'scientific' software was my mistake? I
> don't think so. Consistency must be the heart of scientific software,
> though my assumption might have gone too far.

Sorry, I think it was your mistake. I don't mean to offend. I don't
think consistency in your sense *is* the heart of scientific software.
You can use Monte-Carlo simulations to do some very good science. Even
if the runs come out differently *on every run*, the statistical results
are consistent from a scientific point of view. Part of science is
tracking the sources of error in measurement and accounting for them.
Least-significant bits in floating point math are sources of error, and
failing to account for them is a mistake. In my opinion, of course.
And one that I've made many times.

> But you know my point. They quietly changed the way of matrix
> manipulation, and it's not in the compatibility documents (See 'math'
> sections in
> http://www.mathworks.com/access/helpdesk/help/techdoc/rn/bqsrae0.html
> after R14sp3 or is it documented elsewhere???). Then I think I can at
> least raise an issue, though no one really seems to care about that
> (and I won't). Your example of pointer (32bit to 64bit) is super-big
> issue in technology and it is well documented. I don't have any issue
> with such a big change that I can easily access information of.

But my point is, running the same code in the same version of MATLAB on
a different platform might also produce different results. There's no
"quiet changing" going on. If you'd like documentation on what not to
expect from floating point, well, read the papers that have been
suggested to you.

> And again, as a huge 'scientific' software with 'millions' of users,
> any change of 'technical behavior' should be documented. Don't you
> think so? (It might have been documented and I just couldn't find
> it... In that case... Well... Shame on me...)

What we're all trying to say is that there is no change of technical
behavior. Floating point computations still come with round-off error.
You still should not rely on exact answers from floating point values.
That's not a change.

The way you are arguing, it sounds like next year you will have the same
argument when some other MATLAB function produces subtly different
output. Well, it may, because that's how floating point works. If you
are still relying on exact equality from floating point expressions,
DON'T!

>> It's virtually impossible to write code that adheres perfectly to the
>> guarantees of the standards and documentation. Whatever. We do the
>> best we can given the constraints and needs of a given project. But
>> when such a situation is pointed out to you, and you understand the
>> mistaken assumption involved, you fix the code and move on. You don't
>> protest the advances in architecture, numerical methods, compilers, or
>> whatever.
>>
>> -Peter
>
> Hm... I got your point but you went too far. Nothing cannot be
> protested. You can think of many examples on that. :) And I moved on
> already. One thing I can say is that you're free to show tolerance but
> raising an issue is one's freedom and that's the real engine of
> advances in anything, though I don't think my issue is so big.

My apologies if that came across as a rebuke. My sentence above uses
"you" in the stupid colloquial English sense of "one". Of course you
are free to protest whatever you want, and post it here. I do not mean
that there are things one should keep quiet about.

Thanks for keeping this civil in spite of my blunt language. I'm sorry
for that and thanks for your opinions.

-Peter

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 5 Mar, 2009 05:29:02

Message: 26 of 50

Hi Peter,
I'm a little overwhelmed by the length of your response. Thanks anyway.

> This is exactly about that. Even in C, you can't ensure perfect
> consistency, because a different compiler, or a different compiler
> RELEASE, or a different architecture, may generate code that still
> follows the C standard, but produces slightly different results, because
> of the above.

I know that. I don't expect consistency across compiler versions or across machines or even across different settings. What I said in consistency is if I use the same compiler, same machine, same language, then I have to get the same result. That's why we use computer. I said I was using one machine. Unfortunately, my example shows that my expectation on MATLAB was technically wrong. I ADMIT IT. Even in one machine, one Matlab version, super simple matrix that 'might' be assumed to produce the same result in many sense can produce different results. That means that my definition of consistency cannot be generalized to MATLAB. That may be my fault.

Your example of Monte Carlo simulation is I think a bit out of focus on this topic (from my point of view). You know that if we start from the same seed number in the same machine using the same matlab version with the same random number generator, then the simulation result is always the same, and it must be. If not, there is something wrong. The random is not really random in computer. Computer is expected to do what is expected. Even when there is hardware malfunction, that malfunction should repeat unless something physically weird happens like overheating. If not, we're not dealing with deterministic machine, and we cannot rely on that. We cannot believe the result of Monte Carlo simulation on that machine either. This expectation is fair, period, and this kind of consistency is really heart of computing or scientific software, though physical world is not perfect and any weird
thing can happen. (This is another issue but I think, given the same 32bit seed, the random number generator is supposed to generate the same 32bit random number sequence across platform and across machine whatever if the version number of the random number generator is the same in MATLAB. If not, correct me. )

> This does not make C programs useless for science. You just have to
> understand the constraints and limitations when you draw your
> conclusions, just like with any other scientific tool.

Following the spec is very important though it may not be an issue on most cases. In C/C++, 'a+b' is defined to add b to a, not add a to b. So if the compiler follows this spec, a+b+c must be always equal to (a+b)+c, not a+(b+c) in any machine no matter what the precision is (I'm not saying cross platforms or cross compilers versions or cross vendors. I'm saying 'in a single executable'.). If it is not, we call it a bug, or it must be documented for future fix. If not documented either, we say it is not a standard compatible compiler. Is this unfair? Surely the standard or spec may change in the future, and I understand that there are many kinds of constraints and limitations. But that's a different problem from consistency I said. If it gives me a+(b+c) occasionally, do you want to use that compiler? (Hm.. maybe I would if it gives me much faster executables than other standard
compilers... But I would expect documentation. ;) )

In MATLAB, I found today that 'a+b' IS documented and defined as 'addition of a and b'... Sigh... Well... I didn't know that MATLAB's definition is less clear than C/C++ in technical sense. Shame on me... Sigh again... I think this was the start of my surprise on the examples I gave. In MATLAB, no precise way of addition is defined. That said, my complain was 'technically' unfair. Any floating point error can happen. There was nothing wrong.

> But my point is, running the same code in the same version of MATLAB on
> a different platform might also produce different results.

Well, I know this very very well. I said several times in previous replies... I'm talking about the same machine, the same version, and 'supposedly the same codes', though the last bit is not guaranteed.

But, whatever... It doesn't matter anymore... MATLAB's spec of '+' is not what I guessed.

> There's no "quiet changing" going on. If you'd like documentation on what not to
> expect from floating point, well, read the papers that have been
> suggested to you.
> What we're all trying to say is that there is no change of technical
> behavior. Floating point computations still come with round-off error.
> You still should not rely on exact answers from floating point values.
> That's not a change.

Right. That's not a change. They've not changed their spec on '+'. So any floating point rounding error can happen anytime anywhere. Mathworks has the whole right to change the order of addition as they want. That said, matrix multiplication is out of question.

Hm... I think I'm pretty much well convinced by all of you guys.

Best,

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Tim Davis

Date: 5 Mar, 2009 12:14:02

Message: 27 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message
...
> Following the spec is very important though it may not be an issue on most cases. In C/C++, 'a+b' is defined to add b to a, not add a to b. So if the compiler follows this spec, a+b+c must be always equal to (a+b)+c, not a+(b+c) in any machine no matter what the precision is (I'm not saying cross platforms or cross compilers versions or cross vendors ...

This is not correct. The C/C++ specification states that a+b+c without parentheses can be computed by the compiler in any order. Only if you add the parentheses does C/C++ do it in the order you stated.

However, a compiler on an Intel CPU may (or may not) choose to use the 80-bit internal floating-point register, depending on register usage. So the summation may be done in 80-bits or in 64-bits. You have no control over this issue, and the C/C++ specification doesn't require that it be done in one way or the other.

If your code relies on the difference, then you should be doing your work in infinite precision (via a symbolic method), not using floating-point. Try using the symbolic toolbox for this problem.

Regarding matrix multiplication ... the IEEE specification does not state that it is done in the "textbook" way. Matrix multiplication on MATLAB uses the BLAS, which is optimized (by Intel for example, or AMD, or Sun, or ...) to be fast on their particular machine. A matrix multiplication is done in a blocked fashion, to exploit cache, and is able to achieve over 90% of the theoretical peak performance of your CPU. If you wrote your own in C/C++, then matrix multiply would run about 10 times slower, or worse. And even then, it would not get a "standard" result since there is no such definition of a "standard".

The IEEE standard, even for scalar multiplication of two double precision numbers, does not specify the exact bit pattern of the result. The same 64-bit unoptimized a=b*c on two different computers can return a different result, even when both are strictly following the IEEE standard. This is not a flaw in the standard, but an intentional decision. Different floating-point hardware techniques can lead to O(eps) different rounding in the last bit, and if the IEEE specification were to specify the exact bit-level result it would over-constrain the hardware design.

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 5 Mar, 2009 12:49:01

Message: 28 of 50

Thank you Dr. Tim,

> This is not correct. The C/C++ specification states that a+b+c without parentheses can be computed by the compiler in any order. Only if you add the parentheses does C/C++ do it in the order you stated.

I might have been out of engineering field too long time that I probably remember something wrong (I even took compiler class). My memory is too bad. Sorry about that. I actually strongly believed that it was true until I read your reply. But as you said, it was not.

Rest of your point I'm already aware of and I have no issue on those (I mentioned many times in the post). Floating point round error itself was not my issue. I don't care what different machines or different softwares or different versions do. My question was 'why seemingly same codes makes different result'.

My rational was based on the order of calculation is actually part of technical spec of the language. (As you pointed out, it is not spec, and it explains all of my confusion.)

My confusion started from my bad memory that C/C++ would compute a+b+c as (a+b)+c. This wrong memory was so strong that I expected the same kind of calculation order spec in the MATLAB, and I also generalized to matrix multiplication as well.

Best,

SS

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 5 Mar, 2009 13:18:02

Message: 29 of 50

Now I'm just curious. (not offending, not raising an issue, not making any claim, not protesting)

In my original example shown below, the result is '0'. What kind of algorithm do you guess is used in each multiplication (one in vector, and the other in matrix)? I'm out of EE field for a long time (like almost a decade). I'm just curious of what the optimization algorithm is and what the advantage is (besides speed).

x = [
    1 1
    0 0
    0 0
    0 0
    0 0
    0 0
    0 0];

a = bsxfun(@minus,x,mean(x));

b=a'*a; c=a(:,1)'*a(:,1);
isequal(c,b(1,1))

% As a reminder, if x=[0 0;0 0;0 0;0 0;0 0;0 0;1 1]; is used, then the result is '1'.

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Peter Boettcher

Date: 5 Mar, 2009 15:28:24

Message: 30 of 50

"Sung Soo Kim" <sskimbox@aol.com> writes:

> Now I'm just curious. (not offending, not raising an issue, not making any claim, not protesting)
>
> In my original example shown below, the result is '0'. What kind of algorithm do you guess is used in each multiplication (one in vector, and the other in matrix)? I'm out of EE field for a long time (like almost a decade). I'm just curious of what the optimization algorithm is and what the advantage is (besides speed).
>
> x = [
> 1 1
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0
> 0 0];
>
> a = bsxfun(@minus,x,mean(x));
>
> b=a'*a; c=a(:,1)'*a(:,1);
> isequal(c,b(1,1))
>
> % As a reminder, if x=[0 0;0 0;0 0;0 0;0 0;0 0;1 1]; is used, then the result is '1'.


Just as a note... I also find this behavior completely unexpected, and
worse than the previous behavior. I'm also curious how the MATLAB
internals treat the two cases differently to get a different answer.

That opposite order comes out differently is the order of operations
thing we've been talking about.

a=1/7;
b=1-a;
(a+a+a+a+a+a+b) - (b+a+a+a+a+a+a)

ans =
   2.2204e-16


So all I can figure is that MATLAB can tell that a(:,1)'*a(:,1) is a
dot-product, and thus calls a dot-product function, which is a different
low-level function than the matrix multiplication function. Hmm. I
wonder if this is threading related? If the dot-product is not
multi-threaded, but the multiply is, then the parallel nature of the
computation would compute things in different orders. Try fooling with
maxNumCompThreads(1) and see if things change.


Still, just a reminder to respect issues with FP math.


-Peter

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Steven Lord

Date: 5 Mar, 2009 15:30:44

Message: 31 of 50


"Sung Soo Kim" <sskimbox@aol.com> wrote in message
news:goojea$d0a$1@fred.mathworks.com...
> Now I'm just curious. (not offending, not raising an issue, not making any
> claim, not protesting)
>
> In my original example shown below, the result is '0'. What kind of
> algorithm do you guess is used in each multiplication (one in vector, and
> the other in matrix)? I'm out of EE field for a long time (like almost a
> decade). I'm just curious of what the optimization algorithm is and what
> the advantage is (besides speed).

Take a look at the Algorithm section of the reference page for MTIMES:

http://www.mathworks.com/access/helpdesk/help/techdoc/ref/mtimes.html

By the way, optimizations were mentioned a lot during the course of this
thread. In case you're curious, MATLAB does ship a BLAS library that was
compiled with fewer (or maybe none, I don't remember offhand) optimizations,
if you want to try it out. I'd try it out on a small problem before using
it for anything really large, though. These two documents talk about that
BLAS library and how to tell MATLAB to use a specific BLAS.

http://www.mathworks.com/support/solutions/data/1-195RP.html

http://www.mathworks.com/support/solutions/data/1-18QUC.html

--
Steve Lord
slord@mathworks.com

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Peter Boettcher

Date: 5 Mar, 2009 15:36:28

Message: 32 of 50

"Tim Davis" <davis@cise.ufl.edu> writes:

> "Sung Soo Kim" <sskimbox@aol.com> wrote in message
> ...
>> Following the spec is very important though it may not be an issue on
>> most cases. In C/C++, 'a+b' is defined to add b to a, not add a to
>> b. So if the compiler follows this spec, a+b+c must be always equal
>> to (a+b)+c, not a+(b+c) in any machine no matter what the precision
>> is (I'm not saying cross platforms or cross compilers versions or
>> cross vendors ...
>
> This is not correct. The C/C++ specification states that a+b+c
> without parentheses can be computed by the compiler in any order.
> Only if you add the parentheses does C/C++ do it in the order you
> stated.

Getting far afield, but I think it does:

From ISO/IEC 9899:TC2
--------------------------------------------------------
6.5 Expressions
...
4 The grouping of operators and operands is indicated by the
syntax. (72) ...

[72 ...
Within each major subclause, the operators have the same
precedence. Left- or right-associativity is indicated in each subclause
by the syntax for the expressions discussed therein.]


6.5.6 Additive operators

Syntax

additive-expression:
        multiplicative-expression
        additive-expression + multiplicative-expression
        additive-expression - multiplicative-expression
--------------------------------------------------------


A scalar is a multiplicative expression. The only way to break down
1+2+3 to be valid in the above syntax is: (1 + 2) + 3, to make

additive-expression + multiplicative-expression


Pedantic, yes, but fully specified.


-Peter

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Walter Roberson

Date: 5 Mar, 2009 15:54:40

Message: 33 of 50

Sung Soo Kim wrote:
 
> In C/C++, 'a+b' is defined to add b to a, not add a to b.

There is no language named 'C/C++'. I do not know the specifications of C++ very
well, but I am quite sure that C does not define '+' in the way you indicate.

In C, a+b+c being (a+b)+c versus a+(b+c) is controlled by the rules for
associativity, which exist in theory but have holes in them large enough to
drive a freight train through. To be really sure of getting what you expect in
C, you have to study the C "abstract machine" and study C "sequence points"...
and then you have to go back and study them again and again: they are
subtle enough that people who are on the ISO C committee are not always able to
agree on what the meaning is of particular expressions when it comes to matters
of sequence points.

If you study the C standards in detail, you will find that if you code
A/B then the C standard is not able to tell you whether that must be done
as a division, or whether it is permissible for the machine to translate
it as A * (1/B) . Who cares? Well, it turns out that some architectures
have a recip instruction that can calculate 1/B and that with pipelining,
A * (1/B) can often be done faster than A / B especially in cases where
A is changing and B isn't so pre-calculating T = 1/B and coding as
A * T can result in quite measurable speed improvements. Is it legal in C
or not? Does your answer take into account the precision requirements
that C places upon A/B, and does your answer take into account the
possibility that B might be a signaling NaN, and does your answer take into
account the possibility that B might be 0? (Do you know the exact specifications
in C for what happens when you attempt to divide by 0? Do you know the
differences between that specification and the specification for what happens
in C++ when you attempt to divide by 0?)


> What I said in consistency is if I use the same compiler, same machine, same
> language, then I have to get the same result.

That suggests to me that you have never studied C in detail. Read in detail, the
C standard is not about what happens under all circumstances: read in detail,
the C standard says "If you use this operator under these -narrow- circumstances,
this is the result, and if you use this operator under any other circumstances
Here Be Dragons Who Play Russian Roulette with Your Program". If you
violate a compile-time constraint in a C program, it is completely allowed
for the C compiler to compile your entire program into nonsense -- as is
also the case if the compiler detects that you would have a run-time constraint
violation in a section of code that will be executed in all execution paths.

If you are expecting consistency with a C program just because you are using the
same compiler, same version, same compiler switches, same execution target, then
you are insufficiently paranoid.

Let me phrase that more strongly: if you write a C program that involves
floating point operations, and the C program conforms to every last
constraint clause in C, and you run the same executable twice on the same
machine and you have no random number generation happening, then as far as
C is concerned, it is absolutely and completely legal for the same executable
to produce different answers on different runs! In C, floating point calculations
are not required to be deterministic -- only to be accurate within certain limits.

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 5 Mar, 2009 19:50:02

Message: 34 of 50

Hi Walter,

First, thank you for your serious opinion and perspectives in C.

I think I showed my opinion enough in previous posts, but I'm eating lunch and have time to say that again.

Fist of all, you're right. There is no language like 'C/C++'. I'll talk about 'C' in this post.

About '+' operator. Actually, Peter pointed out that my memory was actually right, that a+b+c is (a+b)+c, not a+(b+c). But it might be changed in certain circumstances (like a gray area that standard cannot step in that someone mentioned above, like 80bit FP?). If you have an complex enough example that ISO committee can disagree on the behavior of a+b+c, please let me know. I'm curious.

Other operators. I well remember that dividing by '0' (or other examples you mentioned) is not well defined in C or C++. It is mainly machine dependent as far as I remember. So, surely standard or spec cannot cover every single possible case in the real world.

> To be really sure of getting what you expect in
> C, you have to study the C "abstract machine" and study C "sequence points"...
> and then you have to go back and study them again and again: they are
> subtle enough that people who are on the ISO C committee are not always able to
> agree on what the meaning is of particular expressions when it comes to matters
> of sequence points.

Well, you're evil (joking) because you remind me of my college year nightmare that I did read the abstract machine over and over again to finish my term project of writing a proper C compiler. Anyway, you talk about gray area that ISO committee cannot always agree. I was talking about white area. :)

If something is clearly defined, and if the result does not match with that, then we call it a bug. What do you call it?

> the C standard says "If you use this operator under these -narrow- circumstances,
> this is the result, and if you use this operator under any other circumstances
> Here Be Dragons Who Play Russian Roulette with Your Program".

That's a good expression of all those holes in spec. You're right. I know that.

> Let me phrase that more strongly: if you write a C program that involves
> floating point operations, and the C program conforms to every last
> constraint clause in C, and you run the same executable twice on the same
> machine and you have no random number generation happening, then as far as
> C is concerned, it is absolutely and completely legal for the same executable
> to produce different answers on different runs! In C, floating point calculations
> are not required to be deterministic -- only to be accurate within certain limits.

Probably... and hopefully, you're misleading and exaggerating the main issue.

Anyway, in realistic sense, do you really think that I'll allow the compiler to behave like that if I'm writing an important arithmatic software? I'll probably set a compiler flag so that the executable should select a proper FP unit for my arithmatic calculation and I'll not let the executable to select FP unit on the fly as it wish. I think your assumption on my behavior went too far.

And physically, if the codes are the same and all environments are the same and the compiler is the same, and even no random number generator is used, then how can the result ever vary? Your assumption on the situation is so deterministic that I can't imagine anything like what you mentioned. Where is the dice? That behavior may be legal that but that's unrealistic.

Even for the language's legality issue, the spec says FP result depends on platform (and/or precision) and that's the only accepted variation on FP '+' operation. That said, representation of FP is not C language's responsibility. Right? Well, it is your freedom to interpret that as 'the compiler can change what the machine spit out as it wish within a certain amount of error', but I'll rather interpret that 'C compiler must not touch what the machine spit out'. And if the machine spit out different results on simple equations like 'a+b' across runs, then it is a useless computer.

So... I don't think I have to change my opinion: The heart of computation and scientific software is 'consistency' (of course accuracy as well).

Best,

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 5 Mar, 2009 20:15:03

Message: 35 of 50

Thank your for your reply. I learned a lot in the conversation with you. I'll remember FP issue in mind in my future MATLAB programming. Have a nice day.

Best,

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 5 Mar, 2009 20:24:01

Message: 36 of 50

Thank you Steven,

I think I can try with different version of BLAS for my examples. Those were helpful links. I'll definitely dig into it.

Have a nice day.
Best,

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Walter Roberson

Date: 6 Mar, 2009 23:27:49

Message: 37 of 50

Sung Soo Kim wrote:
> About '+' operator. Actually, Peter pointed out that my memory was actually right, that
> a+b+c is (a+b)+c, not a+(b+c). But it might be changed in certain circumstances
> (like a gray area that standard cannot step in that someone mentioned above, like 80bit FP?).
> If you have an complex enough example that ISO committee can disagree on the behavior of
> a+b+c, please let me know. I'm curious.

Just introduce parallelism at the machine instruction level. ISO C90 and ISO C99
are written assuming that (short of interrupts and operations on volatiles)
that at the abstract machine does one operation at a time; the standards contain no
information on how to map that into the reality of modern processors, -except-
through the guarantees provided at sequence points. Guarantees which are routinely
broken if any kind of optimization is used.
 

> If something is clearly defined, and if the result does not match with that, then we call it
> a bug. What do you call it?

Me? I tend to call it one of two things:

A) Something that the person only *thought* was clearly defined, but which falls through
one of the many holes or constraints in the C standards; or

B) "Realpolitik"
http://en.wikipedia.org/wiki/Realpolitik#Example_of_Realpolitik_in_Politics_in_the_United_States

"Realpolitik" is why the Intel x86 CISC architecture still has the greatest share of
the non-embedded market, long after Intel and AMD internally recoded their
architectures to be RISC (or other non-CISC) underneath the hood. "Realpolitik" is
living with what you get, providing that the results are not provably wrong.


>> Let me phrase that more strongly: if you write a C program that involves
>> floating point operations, and the C program conforms to every last
>> constraint clause in C, and you run the same executable twice on the same
>> machine and you have no random number generation happening, then as far as
>> C is concerned, it is absolutely and completely legal for the same executable
>> to produce different answers on different runs! In C, floating point calculations
>> are not required to be deterministic -- only to be accurate within certain limits.
 

> And physically, if the codes are the same and all environments are the same and the
> compiler is the same, and even no random number generator is used, then how can the
> result ever vary? Your assumption on the situation is so deterministic that I can't
> imagine anything like what you mentioned. Where is the dice? That behavior may be legal
> that but that's unrealistic.

Interrupts have been known, in production architectures, to change the FP state in
subtle ways.
 
> Even for the language's legality issue, the spec says FP result depends on platform
> (and/or precision) and that's the only accepted variation on FP '+' operation.

No, no, you over-estimate the specification! I have the C89 spec right here (well
worn!) and what it says about FP result accuracy for the arithemetic operations and
for the <math.h> library, is... NOTHING. Except, that is, that it places some
restrictions on the legal values that can result if a floating point constant appears
in the source and is in the representable range but is not exactly representable; and
it place similar (but not identical) restrictions when a value is being converted
to or from a different floating (or integer) type. But if you want a guarantee
that (say) 1.0 + 1.0 is within 1 ULP of 2.0, you will not find it in C89!

> That said, representation of FP is not C language's responsibility. Right?

C89 2.2.4.2.2 defines a *model* of how floating point values are represented in
C, and various functions such as frexp() are allowed to assume that arithmetic
in C happens *as if* that model is used. C99 devotes an appendix to the IEEE 754
floating point model, and specifically allows implementations to #define a particular
constant if the implementation's floating point meets all clauses of the model
(which isn't always the case: DSPs have a tendency to use something that is *almost*
IEEE 754 but not to support all of the rounding modes that the 754 model requires.)
C99 says that if that particular constant is #define'd by the implementation, then
there are a series of library functions that shall be available that have certain
relatively well-behaved properties. But support for that floating point model
is strictly optional.

> Well, it is your freedom to interpret that as 'the compiler can change what the machine
> spit out as it wish within a certain amount of error', but I'll rather interpret that
> 'C compiler must not touch what the machine spit out'.

You would be wrong. The C language does not define how computer architectures must
work: it defines how certain operations must behave -regardless of how complex
the machine code is that is needed to achieve that behaviour-. There *are* real
C libraries that "fix up" the result of some machine operations, especially when it
comes to the edge-case behaviours that the C language defines as resulting in
particular return values.

> And if the machine spit out different results on simple equations like 'a+b' across runs,
> then it is a useless computer.

Or the computer is implementing "round alternately" (perhaps not the official term.)
When have a sequence of additions or subtractions and always round 1/2 ULP "up" or
always round 1/2 ULP "down", then the consistency of the rounding leads to a cumulative
error that would not be present if you rounded 1/2 ULP randomly; even rounding
1/2 ULP up one time and down the next time has better statistical properties.
"round to even" is the usual modern tactic taken to control this issue, but it isn't
the only possibility. The point is when you examine the matter from a longer viewpoint,
having a+b always come out the same way is not necessarily the best!

> So... I don't think I have to change my opinion: The heart of computation and scientific
> software is 'consistency' (of course accuracy as well).

For floating point calculations, 'consistency' lost out to 'speed' at least 15 years ago.
And no, I am not exaggerating just to make a point.

I work in a non-University research organization. Our sub-division has about 200 people.
Number of official mathematicians on staff: one (a statistician). Number of other staff
with math degrees: as far as I know, one (me, a programmer). Number of times in the
last 15 years I have been asked to do a numeric analysis or to otherwise ensure that
the program was getting the most accurate answer it could: zero (but I insisted on doing
some anyhow). Number of the incoming "young turks" with snazzy modern object-oriented
computer science degrees that I have ever heard even *mention* undertaking a numeric
analysis for their projects: zero. Number of researchers who want to use the clusters
or want multi-core processors on their desktops... ur, was that carry three heads and
four knees, or carry four heads and three knees? Beyond any easy counting, whatever it is :(

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 7 Mar, 2009 16:06:02

Message: 38 of 50

Walter, it is amazing. Thank you for your seriousness on reply.

I'm well aware of most of your points like exploiting distributed units or existence of C library that touches FP result. But they are pretty well documented as you saw on your desk on what situation they happen, and what we have (or not) to expect, though I didn't know/remember that touching the FP result by C library is actually a part of C spec (hm... the standard covers more area than I remember) and I didn't know that Intel is using RISC. In fact, I assumed that those kind of deterministic correction by library, or asynchronous distributed computing (which is not at all deterministic in the sense of timing or order) was outside our scope. My apology for that. What I meant in other words was that at least the core CPU must give the same result over and over again on the same calculation. If not, we cannot build any complex architecture (like super computer or scientific software)
upon it. The real world is already probabilistic. If the tool or the building block is also probabilistic on its foundation, to what extent can we depend on it? The reason why there is no one asking about accurate numerical answer in your institute is maybe because the computer already does its job very well 'deterministically'. It gives us firm foundation of scientific computation even with such a complex distributed computing. Such reliability is possible only because we have firm deterministic cores so that some jitters or miscommunication between cores can be ignored or corrected (using error correction or by comparing spec). Furthermore, the development of technology has never gave up accuracy or consistency for the sake of speed. Instead, they both are advanced together a lot. They might have drop some rules or modified the action of instructions for speed. But the examples you
mentioned are rather the result of complex interaction between units or collision of opinions between CPU architects or standard version confliction over advance of technology over time, but not the result of breaking consistency of a given core unit (or CPU, FP unit whatever). But I apologize on my vague expressions and points, and thank you for your detailed technical review on this issue. It is now clear that I was not clear in delivering my opinion. Well, I'm not sure if I'm clearly delivering my opinion now... :( I'm not saying that I'm protesting you. Your comments and opinions are all right. I totally agree. Those complications have been big problems to solve in computing world in real probabilistic world.

But there is one thing that I didn't really expect in your post. It was what you mentioned about 'random' FP correction in a single CPU (or a single FP unit). I cannot disagree that it is possible and potentially better solution for correcting cumulative error. But I'm still not sure why this kind of probabilistic correction is necessary because even in statistical sense, consistent biased result is (in most case) easier to correct than unbiased noisy result because it is consistent and predictable. I can easily imagine that there may be a math library (which is not a CPU core but a software that can be possibly probabilistic) that takes care of this kind of cumulative error in some special situations. BTW, that random correction thing made me think that modern computing may be already near probabilistic somehow and not deterministic even in a single CPU level. If it is true, it is
really paradigm shift for me.

That said, I'm very seriously curious about on what CPU actually I can expect different results from exactly the same FP calculation, if the random FP correction strategy is implemented in a real CPU. At least in my PC, I don't see that with a moderate size matrix. For example,

clearvars;
a=rand(1000);
b(1000)=0;
k(1000)=0;
for i=1:numel(b)
    b(i)=sum(a(:));
    x=a'*a;
    k(i)=sum(x(:));
end
c = sum(b(:)~=b(1))
d = sum(k(:)~=k(1))

'c', and 'd' spit out zeros, which means at least '+' and '*' do not trigger the random correction, or my core-2-duo doesn't do it, or MATLAB touches the random-corrected machine result. I know that if I use different MATLAB versions the result can be different from what I got here, but I'm pretty sure they will spit out zeros. (I did this because I don't want such random correction in my PC. I'm serious.) It maybe because my PC is not on cutting-edge technology. But then on What CPU or what operation or on what situation can I get real different results? Here, I'm excluding distributed computing or multi-threaded computing that can disturb the order of calculation, though I think it is better to get consistent result even in that situation because that's why we have synchronization technology. I also exclude real world probabilistic disturbance like interrupt. I'm talking about a single
CPU, on which our deterministic computing world is relying on. And I assume you're not talking about something like quantum computing, which I have no idea what's going on and maybe very probabilistic.

Thanks a lot.

Best,

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Walter Roberson

Date: 9 Mar, 2009 18:52:40

Message: 39 of 50

Sung Soo Kim wrote:
> But there is one thing that I didn't really expect in your post. It was what you mentioned
> about 'random' FP correction in a single CPU (or a single FP unit).

Also known as "random-rounding" or "stochastic rounding"

> That said, I'm very seriously curious about on what CPU actually I can expect different
> results from exactly the same FP calculation, if the random FP correction strategy is
> implemented in a real CPU.

Actual computer hardware:

http://www.patentstorm.us/patents/4219877/claims.html


Commodity trading in which stochastic rounding gives results within
0.7% of the optimum, in a fraction of the time:

http://www.cs.ait.ac.th/~haddawy/abstract/icec04-abstract.htm


Statistics Canada, stochastic rounding is sometimes needed in order
to prevent confidentiality leakage from census cells

http://www.census.gov/srd/sdc/Massell%20StatCan%20Meth%20Symp%20english.pdf

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: dpb

Date: 9 Mar, 2009 19:23:58

Message: 40 of 50

Sung Soo Kim wrote:
...
> ...not sure why this kind of probabilistic correction is necessary
> because even in statistical sense, consistent biased result is (in
> most case) easier to correct than unbiased noisy result ...

The point is that it is an _unbiased_ estimator that is wanted in
virtually all instances.

That a system may use either a deterministic or probabilistic correction
to minimize bias is pretty much open to the implementation on all
computing language standards that I'm aware of--it's what would
typically be called a "quality of implementation" issue outside the
scope of the Standard. Whether it were to be implemented in the
hardware (more likely firmware) or software is immaterial.

It is, it seems to me, that the most important thing to take away from
this thread is that such minutiae as to prescribe floating point
operations to be reproducible to a lsb is simply beyond the range to
which any current language standard specifies the behavior of
"conforming" implementations.

Hence, the job falls back on the programmer to not make unwarranted
assumptions on numerical results. If one wanted to become truly famous,
developing an automated numerical analysis tool that could either fix or
recast ill-conditioned or ill-posed problems into numerical stable ones
for a wide variety of nontrivial cases would be a surefire way to do so.
:)

--


--

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: James Tursa

Date: 9 Mar, 2009 20:24:01

Message: 41 of 50

dpb <none@non.net> wrote in message <gp3qf0$2pf$1@aioe.org>...
>
> It is, it seems to me, that the most important thing to take away from
> this thread is that such minutiae as to prescribe floating point
> operations to be reproducible to a lsb is simply beyond the range to
> which any current language standard specifies the behavior of
> "conforming" implementations.
>

Except, of course, Java. Sun is apparently trying to kill Java (for any serious numerical applications, anyway) with their insistence on this very point. E.g.,

http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf

James Tursa

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Sung Soo Kim

Date: 9 Mar, 2009 22:35:04

Message: 42 of 50

@ Watler, thank your for your links. It was interesting to read those studies, though I think they're designed for special purposes. If that was implemented in my PC, I would hate my PC a lot. Thank you for your post.


@ James, thank you for your link. I enjoyed your link very well. I cannot agree more with author's points. And I learned a lot. I didn't know anything about JAVA's FP issue and I didn't know about MS's bad policy that we cannot access extra precision registers. I didn't know that 10byte FP is actually IEEE standard, which makes me happier. :) I also read that matrix is one of a few examples that may sacrifice associativity (that is related to accuracy and programmer's intention) for speed, which is the conclusion that I got in this long thread. Anyway, it was a really good read, a reminder of the basic. I like it. Thanks a lot.


"James Tursa" <aclassyguywithaknotac@hotmail.com> wrote in message <gp3tt1$kl7$1@fred.mathworks.com>...
> dpb <none@non.net> wrote in message <gp3qf0$2pf$1@aioe.org>...
> >
> > It is, it seems to me, that the most important thing to take away from
> > this thread is that such minutiae as to prescribe floating point
> > operations to be reproducible to a lsb is simply beyond the range to
> > which any current language standard specifies the behavior of
> > "conforming" implementations.
> >
>
> Except, of course, Java. Sun is apparently trying to kill Java (for any serious numerical applications, anyway) with their insistence on this very point. E.g.,
>
> http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
>
> James Tursa

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Walter Roberson

Date: 10 Mar, 2009 17:58:22

Message: 43 of 50

Sung Soo Kim wrote:
> I didn't know that 10byte FP is actually IEEE standard, which makes me happier. :)

Note that that it is optional within that standard: the standards says that if
you implement it at all (and want to claim conformance to the standard), do
it such and such a way, but it doesn't require that hardware implement it.

Another part of the IEEE 754 standard that was added in the second half of 2008
was decimal arithmetic. My understanding is that IBM has a working implementation,
though I do not recall which model line.


So... what is specified by a programming language standard might be considerably
more lax than is specified by a particular arithmetic standard. If you are lucky,
the programming language standard does not actively prohibit the arithmetic
standard from being used to its full extent, but general purpose programming
languages intended to be used on multiple architectures are usually deliberately
a bit vague on arithmetic details, not wanting to lock themselves out of
being implemented for a reasonable market, and not wanting to lock themselves
out of being implemented in some future technology where some constraints that
seem like good ideas now "for reproducibility" might later prove to be a
technical burden.


And then we have the side issue that there *is* no programming language
standard for Matlab. Matlab is defined by how it actually operates, rather
than by some written specification. If there is a difference between how
Matlab actually operates and how it is -documented- to operate,
sometimes the operation is changed to match the documentation and sometimes
the documentation is changed to match the operation. Which, in the longer
view is probably how it should be, in that studies (over several programming
languages) have shown that automatic generation of code from specifications
does NOT produce bug-free code because it turns out that specifications
(having been written by humans) have bugs. (If you have a specification
notation that is sufficiently powerful to express large complex programs,
then the specification notation is -itself- a computer language.) But
it's still a pain for Matlab users, in that it is often not clear to
the outside how TMW decides which side to fix -- and they *have* been
known to entrench what to many of us would appear to be obvious bugs.

With no offense intended to The Mathworks: if *I* were building software
in which the uttermost practical correctness and reproducibility of the
calculations was important, then I would not use Matlab. For anyone
concerned about numeric analysis, the clue should be glaringly obvious:
NONE of Matlab's library routines (that I have seen) document the precision
to be expected from the operation (other than perhaps the Fixed Point toolbox.)
And more subtly but still obviously important: Matlab provides no basic utility
routines such as "sum vector, optimized for highest accuracy".

>> a = randn(1,10000);
>> sum(a)

ans =

           10.574403301529

>> sum(sort(a))

ans =

            10.57440330154
>> [b,c]=sort(a);
>> sum(fliplr(b(b<0))) + sum(b(b>0))

ans =

          10.5744033015403

But no "sum_accurately" routine to do it right :(

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: dpb

Date: 10 Mar, 2009 21:57:02

Message: 44 of 50

Walter Roberson wrote:
...
> And more subtly but still obviously important: Matlab provides no basic utility
> routines such as "sum vector, optimized for highest accuracy".
>
>>> a = randn(1,10000);
>>> sum(a)
> ans = 10.574403301529
>>> sum(sort(a))
> ans = 10.57440330154
>>> [b,c]=sort(a);
>>> sum(fliplr(b(b<0))) + sum(b(b>0))
> ans = 10.5744033015403
>
> But no "sum_accurately" routine to do it right :(

And for another demonstration...

C:\Temp> *type walt.f90
program walt
   implicit none
   integer, parameter :: n = 10000
   integer :: idx(n)
   real*8 :: a(n), b(n)

   call drnnor(n, a)
   write(*,*) 'Sum intrinisc = ', sum(a)

   call dsortqx('a', n, a, 1, idx)
   b = a(idx)
   write(*,*) 'Sorted sum c = ', sum(b)

   !sum(fliplr(b(b<0))) + sum(b(b>0))
end program

C:\Temp> walt
  Sum intrinisc = -109.962223340005
  Sorted sum c = -109.962223340014

The fliplr took a little thought in Fortran so I didn't bother to take
the time.

Interestingly, it appears the IMSL RNG is somewhat more biased than the
ML version as the average mean is -0.01 instead of closer to 0.001 in
your particular case. I didn't do any multiple trials to see how much
variability there might be there or anything else.

--

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Tim Davis

Date: 16 Mar, 2009 15:54:13

Message: 45 of 50

"Sung Soo Kim" <sskimbox@aol.com> wrote in message <gopada$bj8$1@fred.mathworks.com>...

> About '+' operator. Actually, Peter pointed out that my memory was actually right, that a+b+c is (a+b)+c, not a+(b+c). But it might be changed in certain circumstances (like a gray area that standard cannot step in that someone mentioned above, like 80bit FP?). If you have an complex enough example that ISO committee can disagree on the behavior of a+b+c, please let me know. I'm curious.
>

No, your understanding is incorrect. The Backus-Naur form of the grammar may specify that a + b + c is parsed as a + expression where expression is b+c. However, that does not at all specify the actual order of operations. The only thing that is guaranteed is that the operations are completed when reaching a sequence point (such as a semi-colon at the end of a statement).

Try computing a = f() + g() + h(), and put a printf statement in the functions f and g and h. Then try compiling with optimizations turned on and turned off, and on many different compilers. Don't bother to use floating point. See how many variations you can find. Exercise left to the reader ... but even if you find no variations, be assured that any C or C++ compiler is free to evaluate the functions in any order. See for example http://en.wikipedia.org/wiki/Sequence_point .

If your computation is so sensitive that it changes when a single bit is changed in the least significant digit, then you're trying to compute something uncomputable, at least at that precision, and you should expect to get nonsensical and irreproducible results. Recast your computation; don't try to bash floating-point (or MATLAB) for not doing what it's not designed to do.

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Tim Davis

Date: 16 Mar, 2009 17:25:15

Message: 46 of 50

Walter Roberson <roberson@hushmail.com> wrote in message ...

> With no offense intended to The Mathworks: if *I* were building software
> in which the uttermost practical correctness and reproducibility of the
> calculations was important, then I would not use Matlab. For anyone
> concerned about numeric analysis, the clue should be glaringly obvious:
> NONE of Matlab's library routines (that I have seen) document the precision
> to be expected from the operation (other than perhaps the Fixed Point toolbox.)

Would you avoid using LAPACK, too?

> And more subtly but still obviously important: Matlab provides no basic utility
> routines such as "sum vector, optimized for highest accuracy".
>
> >> a = randn(1,10000);
> >> sum(a)
>
> ans =
>
> 10.574403301529
>
> >> sum(sort(a))
>
> ans =
>
> 10.57440330154
> >> [b,c]=sort(a);
> >> sum(fliplr(b(b<0))) + sum(b(b>0))
>
> ans =
>
> 10.5744033015403
>
> But no "sum_accurately" routine to do it right :(

A "sum_accurately" routine for a vector of length 10000 would require that the intermediate calcuations be done in rougly 10000-bit arithmetic, to ensure perfect accuracy. By "perfect" I mean "compute the result in exact arithmetic, and then round the result to a 64-bit floating-point result". That's not a job for floating-point, or at least for standard floating-point. It would be very slow. The symbolic toolbox can do it:

>
>> n=1000;
>> 1000/3.3

ans =

  303.0303

>> digits(304)
>> a=rand(n,1);
>> tic;s=sum(a);toc
Elapsed time is 0.000230 seconds.
>>
>> tic;t=sum(vpa(a));toc
Elapsed time is 9.849623 seconds.
>> whos
  Name Size Bytes Class Attributes

  a 1000x1 8000 double
  ans 1x1 8 double
  n 1x1 8 double
  s 1x1 8 double
  t 1x1 184 sym


>>
>> fprintf ('%32.16g\n', s)
               488.8326128652642
>>
>> t
 
t =
 
488.832612865264450885405267399619333446025848388671875
 
>> y=double(t)

y =

  488.8326

>> s-t
 
ans =
 
-0.000000000000255351295663786004297435283660888671875
 
>> (s-y)/s

ans =

  -4.6514e-16

So the sum in plain floating-point was off by about 2*eps relative error. Pretty good, I think, considering it was also 42,000 times faster than the symbolic toolbox.

For this n=1000 example, I could have gotten away with about 56 decimal digit arithmetic, though.

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Christopher Creutzig

Date: 17 Mar, 2009 13:16:59

Message: 47 of 50

Tim Davis wrote:

> A "sum_accurately" routine for a vector of length 10000 would require
> that the intermediate calcuations be done in rougly 10000-bit
> arithmetic, to ensure perfect accuracy. By "perfect" I mean "compute
> the result in exact arithmetic, and then round the result to a 64-bit
> floating-point result". That's not a job for floating-point, or
> at least for standard floating-point. It would be very slow. The

 Not really. See, e.g.,
http://www.ti3.tu-harburg.de/paper/rump/RuOgOi06.pdf for an algorithm
that performs sum_accurately using nothing but floating-point operations.



Christopher

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Bruno Luong

Date: 17 Mar, 2009 19:29:01

Message: 48 of 50

Christopher Creutzig <Christopher.Creutzig@mathworks.com> wrote in message <49BFA2CB.1070908@mathworks.com>...
>
> Not really. See, e.g.,
> http://www.ti3.tu-harburg.de/paper/rump/RuOgOi06.pdf for an algorithm
> that performs sum_accurately using nothing but floating-point operations.
>

Thanks Christopher for link to the the paper. That's interesting.

Bruno

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: alain barraud

Date: 20 Apr, 2009 10:16:01

Message: 49 of 50

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <gpotlt$6r1$1@fred.mathworks.com>...
> Christopher Creutzig <Christopher.Creutzig@mathworks.com> wrote in message <49BFA2CB.1070908@mathworks.com>...
> >
> > Not really. See, e.g.,
> > http://www.ti3.tu-harburg.de/paper/rump/RuOgOi06.pdf for an algorithm
> > that performs sum_accurately using nothing but floating-point operations.
> >
>
> Thanks Christopher for link to the the paper. That's interesting.
>
> Bruno
hi
I have written C mex files implementing ksum and accsum. If you are interested in send me an email. Dot computations based upun these codes are also available.
sincerly yours.

Subject: Very weird resolution issue. Bug ????? Seriously !!

From: Bruno Luong

Date: 20 Apr, 2009 11:16:02

Message: 50 of 50

"alain barraud" <abc.consultant@wanadoo.fr> wrote in message <gshi11$q42$1@fred.mathworks.com>...

> hi
> I have written C mex files implementing ksum and accsum. If you are interested in send me an email. Dot computations based upun these codes are also available.
> sincerly yours.

Alain,

Can you post your code on FEX?

Thanks,

Bruno

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us