Fast Fourier transform—optimized for HDL code generation

Transforms

`dspxfrm3`

The FFT HDL Optimized block implements a pipelined Radix 2^2 FFT algorithm that provides hardware speed and area optimization for streaming data applications. The block accepts scalar or vector input of real or complex data, provides hardware-friendly control signals, and has optional output frame control signals. You can achieve giga-sample-per-second (GSPS) throughput using vector input.

The FFT HDL Optimized block replaces the HDL Streaming FFT block.

This FFT HDL Optimized block icon shows all optional ports available.

The table provides the descriptions of the port signals.

Port | Direction | Description | Data Type |
---|---|---|---|

`dataIn` | Input | Scalar or column vector of real or complex input data. The vector size must be a power of 2 between 1 and 64, that is not greater than the FFT length. | `fixdt()` `int64/32/16/8` `uint64/32/16/8`
`double/single` are allowed for simulation
but not for HDL code generation. |

`validIn` | Input | Indicates that the input data is valid. When `validIn` is `true` ,
the block captures the value on `dataIn` . | `boolean` |

`reset` | Input | Optional. Reset internal state. When `reset` is `true` ,
the block stops the current calculation and clears all internal state.
The block begins fresh calculations when `reset` is `false` and `validIn` starts
a new frame. | `boolean` |

`dataOut` | Output | Frequency channel output data. The output order is bit reversed by default. | Same as `dataIn` . If scaling is disabled,
the word length grows by 2 bits per stage. |

`validOut` | Output | Indicates that the output data is valid. The block sets `validOut` to `true` with
each valid sample on `dataOut` . | `boolean` |

`startOut` | Output | Optional. When this port is enabled, the block sets `startOut` to `true` during
the first valid cycle of a frame of output data. | `boolean` |

`endOut` | Output | Optional. When this port is enabled, the block sets `endOut` to `true` during
the last valid cycle of a frame of output data. | `boolean` |

**Architecture**`Streaming Radix 2^2`

(default) — An efficient architecture that has lower latency and uses fewer resources than the previous versions.`Streaming Radix 2`

— Versions before R2016a use this architecture. Use only for backward-compatibility purposes.**Complex Multiplication**Select the HDL implementation of complex multipliers. Each multiplication is implemented with either 3 multipliers and 5 adders, or 4 multipliers and 3 adders. Which option is faster or smaller depends on your synthesis tool and target device. This option applies only when you set

**Architecture**to`Streaming Radix 2^2`

.**Output in bit-reversed order**When you select this check box, the output elements are bit reversed relative to the input order. Clear the check box to output elements in linear order. By default, the check box is selected. The FFT algorithm calculates output in the reverse order to the input, and performs an extra reversal operation when required to provide output in the same order as the input. For vector data, input and output data must be in opposite orders so check only one of

**Output in bit-reversed order**or**Input in bit-reversed order**. For more information, see Linear and Bit-Reversed Output Order.**Input in bit-reversed order**When you select this check box, the block expects input data in bit-reversed order. By default, the check box is clear and input is expected in linear order. The FFT algorithm calculates output in the reverse order to the input, and performs an extra reversal operation when required to provide output in the same order as the input. For vector data, input and output data must be in opposite orders so check only one of

**Output in bit-reversed order**or**Input in bit-reversed order**. For more information, see Linear and Bit-Reversed Output Order.**Divide butterfly outputs by two**When you select this check box, the block implements an overall 1/

*N*scale factor by scaling the result of each pipeline stage by 2. This adjustment keeps the output of the FFT in the same amplitude range as its input. If scaling is disabled, the object avoids overflow by increasing the word length by 1 bit at each stage. By default, the check box is not selected.**FFT length**Specify the number of data points used for one FFT calculation. The default value is 1024. The FFT length must be a power of 2 between 2

^{3}and 2^{16}for HDL code generation.

**Rounding Method**The default rounding method for internal fixed point calculations is

`Floor`

. The FFT block uses fixed-point arithmetic for internal calculations when the input is any integer or fixed-point data type. This option does not apply when the input is single or double type. Each stage rounds after the twiddle factor multiplication but before the butterflies.

**Enable reset input port**When you select this check box, the

`reset`

port is present on the block icon. When`reset`

is`true`

, the block stops the current calculation and clears all internal state. The block begins fresh calculations when`reset`

is`false`

and`validIn`

starts a new frame. By default, the check box is not selected.**Enable start output port**When you select this check box, the

`startOut`

port is present on the block icon, and this output signal is asserted (`true`

) for the first cycle of an output frame. By default, the check box is not selected.**Enable end output port**When you select this check box, the

`endOut`

port is present on the block icon, and this output signal is asserted (`true`

) for the last cycle of an output frame. By default, the check box is not selected.

The Radix 2^2 architecture saves resources by factoring and
grouping the FFT equation. The architecture has log_{4}(N)
stages. Each stage contains two single-path delay feedback (SDF) butterflies
with memory controllers. If you use vector input, the results of twiddle
factor multiplication are shared across multiple data points, without
memory accesses.

The first SDF stage is a regular butterfly, and the second stage
multiplies by *–j* by swapping the real and
imaginary parts of the input, and swapping the imaginary parts of
the output. Each stage rounds the result of the twiddle factor multiplication
to the input word length. This result is stored in memory for use
by later stages. The twiddle factors have the same bit width as the
input data. They use 2 integer bits and the remainder are fractional
bits.

If you select **Divide butterfly outputs by two**,
the block scales the result of each butterfly stage by 2. Scaling
at each stage avoids overflow, keeps the word length the same as the
input, and results in an overall scale factor of 1/*N*.
If scaling is disabled, the block avoids overflow by increasing the
word length by 1 bit at each stage. The diagram shows the butterflies
and internal word lengths of each stage, not including the memory.

Input data is processed only when `validIn`

is
high. Output data is only valid when `validOut`

is
high.

The block provides an optional reset port. When `reset`

is
high, the block stops the current calculation and clears all internal
state. The block begins fresh calculations when `reset`

is
low and `validIn`

starts a new frame.

This diagram illustrates `validIn`

and `validOut`

signals
for contiguous scalar input data and an FFT length of 1024.

The diagram also shows the optional `startOut`

and `endOut`

signals
that indicate frame boundaries. If you enable `startOut`

,
it pulses for one cycle with the first `validOut`

of
the frame. If you enable `endOut`

, it pulses for
one cycle with the last `validOut`

of the frame.

The `validIn`

signal can be noncontiguous.
Data accompanied by a `validIn`

is stored until a
frame is filled, and output in a contiguous frame of *N* (FFT
length) cycles. This diagram illustrates noncontiguous scalar input
and contiguous scalar output for an FFT length of 1024.

The latency varies with the FFT length and input vector size. After you update the model, the latency is displayed on the block icon. The displayed latency is the number of cycles between the first valid input and the first valid output, assuming the input is contiguous.

This block supports HDL code generation using HDL Coder™. HDL Coder provides additional configuration options that affect HDL implementation and synthesized logic. For more information on implementations, properties, and restrictions for HDL code generation, see FFT HDL Optimized in the HDL Coder documentation.

These resource and performance data are the synthesis results
from the generated HDL targeted to a Xilinx^{®} Virtex^{®}-6 (XC6VLX75T-1FF484)
FPGA. The three examples in the tables have this configuration:

FFT Length (default) — 1024

Complex multiplication (default) — 3 multipliers, 5 adders

Output scaling — enabled

16-bit complex input data

Minimize clock enables (HDL Coder parameter)

Performance of the synthesized HDL code varies with your target and synthesis options. For instance, natural order output uses more RAM than bit-reversed output, and real input uses less RAM than complex input.

For a scalar input Radix 2^2 configuration, the design achieves 326 MHz clock frequency. The latency is 1116 cycles. It uses these resources.

Resource | Uses |
---|---|

LUT | 4597 |

FFS | 5353 |

Xilinx LogiCORE | 12 |

Block RAM (16K) | 6 |

When you vectorize the same Radix 2^2 implementation to process two 16-bit input samples in parallel, the design achieves 316 MHz clock frequency. The latency is 600 cycles. It uses these resources.

Resource | Uses |
---|---|

LUT | 7653 |

FFS | 9322 |

Xilinx LogiCORE DSP48 | 24 |

Block RAM (16K) | 8 |

The Radix 2 implementation is supported with scalar input data only. The Radix 2 design achieves 295 MHz clock frequency. The latency is 1148 cycles. It uses these resources.

Resource | Uses |
---|---|

LUT | 4060 |

FFS | 5160 |

Xilinx LogiCORE DSP48 | 16 |

Block RAM (16K) | 6 |

[1] Algnabi, Y.S, F.A. Aldaamee, R. Teymourzadeh, M. Othman,
and M.S. Islam. "Novel architecture of pipeline Radix 2^2 SDF
FFT Based on digit-slicing technique." *10th IEEE
International Conference on Semiconductor Electronics (ICSE)*.
2012, pp. 470–474.

Was this topic helpful?