Matlab updated their LSQR function around 2008. The old version was buggy, as you probably know (since you recommended a different version of LSQR), but anyone with a newer release of Matlab (e.g. 2009 and more recent) can use their existing LSQR function.
-Stephen
As of R2008b, Matlab's Signal Processing toolbox has the functions "fwht" and "ifwht" for the Fast Walsh-Hadamard (aka Hadamard) Transform, and you can choose among three different orderings. This builtin code doesn't work on very large vectors though, whereas I know it is possible to operate on these large vectors because a friend gave me some mex code that does just that. I haven't compared with the file posted here.
There's actually a very important point here. Because of how Matlab stores a sparse matrix (see Tim Davis' book, or read the help on, say, mxGetIc), and because a major bottleneck for these computations is loading data into the CPU's cache, it is MUCH faster to store a matrix by row, instead of column, when doing matrix-vector multiples.
In Matlab, this is easy to remedy: simply take the transpose.
So, this cool trick can save you time. If you want to do this repeatedly:
>> y = A*b
instead, do this:
>> At = A.'; % a one-time cost
>> y = At.'b
But, this trick only works on newer versions of Matlab. On, say, R2006b, a call like
>> y = At.'b
will actually calculate the transpose of At, so this is very slow!
On older versions of Matlab, it's likely that this SMVP code will really help, but it might not be as helpful on R2008, for example.
On linux, non-multithreaded R2006a, I found that SMVP took about 60% the time of Matlab's own multiply. So, good work!
Duane -- good question. I saw a lot of poorly written fft code that also is MUCH slower than matlab's fft; so this is an improvement over those codes. It also demonstrates two ideas: the radix2 FFT algorithm, and why persistent variables are useful.
For example, using persistent variables in this fashion on the discrete cosine transform can yield code much faster than matlab's dct() function. I hope to post such a code once I get it into polished form.
If as you say "There is no advantage to using this code over the builtin fft." then why post this code? Is it simply an example of code not worth posting?