"Michael" <michael@circularlogic.com> wrote in message <gr15tq$f9b$1@fred.mathworks.com>...
> Here's something interesting.
>
> >> p = [1:.1:100];
> >> sum(p .* p)  sum(p * p')
> ans =
> 4.656612873077393e10
> >> eps(sum(p*p'))
> ans =
> 4.656612873077393e10
>
> The sum of elementbyelement multiplication is different than matrix multiplication. It's only off here by eps, but in my tests for other kinds of values I get larger deviations.
>
> Seems it's probably because of the algorithms working in different order and thereby having different float rounding error accumulation:
>
> >> sum(p .* p)  sum(p(991:1:1) * p(991:1:1)')
> ans =
> 0
>
> Interesting. Anyone know for sure if this is the reason?
>
> Cheers,
> Michael
I will add my 2 cents to this.
If you are on a PC, then there is a numeric coprocessor involved that does calculations in 80bit arithmetic. So the sum(p .* p) formulation does each individual elementbyelement calculation by converting each operand to 80bit, then doing the multiply, then converting each individual result back to 64bit for storage in the intermediate vector result. So an intermediate storage vector the size of p is needed. Then the sum is done, again likely by converting to 80bit, doing the sum, then converting back to 64bit.
For the sum(p * p') formulation (you don't need the sum() function here, btw), the p' is formed as a simple reshape of p with no data copying. You can see this as follows:
format debug
a = rand(1,5)
b = a'
You can see that the pr of both variables is the same.
Then the p * p' calculation is done by calling a BLAS routine, which again probably converts to 80bit in the background, but in this case the actual summing may be done in 80bit as well. i.e., the individual elementbyelement multiplies are kept in 80bit as they are added to the sum. This gives you a chance for lower cancellation errors in the sum calculation (whether you actually get much benefit here would depend on the actual data).
In short, p * p' is the better formulation for this particular calculation because it 1) doesn't require a large intermediate vector result in the calculation, 2) it probably gives you a better chance for smaller cancellation errors in the result, at least on a PC, and 3) It is faster, largely because of 1). e.g.,
>> a = rand(1,10000000);
>> tic;sum(a.*a);toc
Elapsed time is 0.273004 seconds.
>> tic;a*a';toc
Elapsed time is 0.029888 seconds.
James Tursa
