|
On 08/03/2011 19:29, Albert van der Horst wrote:
> In article<0df27dd3-9223-4fc5-ba5a-e46a3bca05cb@b13g2000prf.googlegroups.com>,
> Felipe G. Nievinski<fgnievinski@gmail.com> wrote:
>> The documentation for Matlab's fftw interface states that:
>>
>> fftw('planner', method) sets the method by which the tuning
>> algorithm searches for a good FFT algorithm when the dimension of the
>> FFT is _not_ a power of 2.
>>
>> So when the input dimension _is_ a power of 2 there is no potential
>> speed up in tuning the FFTW library? I seem to be getting different
>> performance for different tuning even with power-2 input.
I'd expect there to be some, but not a great gain over the initial
assumption of using the largest radix that is suitable for the length.
>
> No. The speed attainable depends on the dimension of the FFT,
> and the algorithms differ considerably accordingly.
> FFT's for sizes with many small factors (144, 180) can have
> a speed advantage over comparably sized powers of 2.
Actually it begs an interesting practical question to list all the FFT
lengths on a given platform for which there is a strict advantage in
choosing a non-power of two transform when you do have a choice.
I reckon there are two interesting cases worth searching for.
Larger non-power of two transforms that are faster in absolute timing
than the next smallest power of two transform - and these are rare.
On my FFT code for instance only 36 as 6^2 beats 32 as 2^5.
Non-power of two transforms that are faster than the next highest power
of two transform - these are far more common and useful in practice.
Listing only the ones that are MIPS wise competitive against optimised
powers of two culls the list down to manageable proportions. In my case:
81, 216, 729, 1296, 2560, 7776, 8000
My 8192 transform is a bit weak, or rather 4096 is particularly strong
being 16^3, 8^4, 4^6, 2^12 - though usually the largest radix wins out.
I expect FFTW will behave better since their optimisations are smarter
(and it has many more permutations of kernel available to try).
I'd be a little bit surprised if the winner for powers of two was not in
most cases predictable by the usual heuristic of taking the largest
radix available to get a direct hit or within a factor of 2 or 4 fixup.
OTOH funny things sometimes happen with cache management boundaries.
If you do benchmark FFTW for all transforms 2^n3^p5^q upto 10000 complex
real length and list the ones that meet these two criteria after tuning
I'd be interested to know what the pattern looks like.
Regards,
Martin Brown
|