I took a fresh look at largefactorial. With its carefully written code to remove a level of looping, its a fair amount faster. The author has also provided an additional boost for multiple inputs. Some downloaders may find the tricks used here to be of interest on their own merits.
I did feel it necessary to do a comparison to an alternative code for factorial. Is one method more accurate than the other? How do they compare for speed?
function [M,X] = mylargefact(N)
% No comments provided here due to sheer laziness.
F = gammaln(N+1)/log(10);
X = floor(F);
M = 10.^(F-X);
>> tic,[M,X] = mylargefact(5000);toc,M,X
Elapsed time is 0.000366 seconds.
>> tic,[M,X] = largefactorial(5000);toc,M,X
Elapsed time is 0.004599 seconds.
So the gamma call is about 10x faster than the cumsum version. Which is more accurate? The first few digits of this factorial, computed symbolically, are:
We see that largefactorial is not quite as accurate as the gamma code, missing a digit or so towards the end.
A little larger problem shows that typically largefactorial still loses a digit or so compared to the gamma based code.
[M,X] = largefactorial(100000)
[M,X] = mylargefact(100000)
The first few digits of factorial(100000), computed symbolically, are:
Next, I tried a larger problem. Largefactorial took 24 seconds for N = 1000000 when I first tested it. This version is much faster, although still not as fast as the gamma based code.
tic,[M,X] = largefactorial(1000000);toc,M,X
Elapsed time is 0.706434 seconds.
tic,[M,X] = mylargefact(1000000);toc,M,X
Elapsed time is 0.000605 seconds.
So, I'd still suggest the use of a gammaln based algorithm for largefactorial, both for accuracy and time. However, I will admit that the directly summed version may be acceptable. I even have a sneaking admiration for the care that was put into the code.
Thank you for your comments on my submission which was (I agree) less than perfect. I have revised the code quite heavily now.
The first improvement I have implemented materializes if an array of arguments is provided. In that case, I now sort the array and compute the smallest factorials first. For the larger factorials, only the increment is computed. This results in a very significant gain in terms of speed in some situations, e.g. largefactorial(1E7,1E7+1) is almost twice as fast as before.
Secondly, I take advantage of the suggested vectorization. I should have seen this myself, so thanks for the hint.
just see the fatherly scolding by FEX elder JD [readers, please note, a ;-) emoticon IS bad news when used by JD]: ok, ok, john, i felt slightly feverish today (even had to repost due to formatting and terrible grammatical errors!) - but at least, i got the looping-bit-needs-to-be-improved...
us with a :-)
Interesting review by us. ;-)
This code uses a loop to do its work. Note that if anybody really wants to compute the factorial of 2147483647, they had better settle in with a VERY large cup of coffee. A quick test for a number a small fraction of that size gives:
Elapsed time is 24.271485 seconds.
Now just imagine how much time the loop over 2:2147483647 will require.
Had the author just used a call to gammaln to do almost all of the work, they would see a very nice gain.
Elapsed time is 0.000146 seconds.
It might also be interesting to do some analysis on how the LSB errors will add up in a sum with 2147483646 terms. You may not wish to trust those lower bits in the mantissa.
Finally, gammaln is vectorized. This code is not. The author has not even pre-allocated the arrays to be returned. So if you did call this code with a vector input, you may be waiting even longer than you needed.
Since this code is promoted as something that computes LARGE factorials, I'd consider these major speed problems a serious flaw, which is why I rated it a 2.
For those who DO choose to download this code anyway, you will find there is readable help in this function. There is an H1 line. There is an error check on the only argument. There are internal comments. I'd raise my rating given a reason to do so.
good help and a clean engine with comments make this submission certainly an asset to this community.
one small change would make it slightly better:
change the for-loop to
and you see a considerable gain in speed (vectorization of LOG10).
eg, on a wintel p5/1.4g/512m/win2k.sp4/r2007a one gets these numers (loop time/cumsum time)
log(N) 2 LOOP 0.0066 CUMSUM 0.0002
log(N) 3 LOOP 0.0262 CUMSUM 0.0012
log(N) 4 LOOP 0.2753 CUMSUM 0.0058
log(N) 5 LOOP 2.6702 CUMSUM 0.0581
log(N) 6 LOOP 26.3543 CUMSUM 0.6014
just a thought
(reposted because of formatting issues)