Distributed Arithmetic for HDL Filters
Distributed Arithmetic (DA) is a widely used technique for implementing sum-of-products computations without the use of multipliers. Designers frequently use DA to build efficient Multiply-Accumulate Circuitry (MAC) for filters and other DSP applications. The main advantage of DA is its high computational efficiency. DA distributes multiply and accumulate operations across shifters, lookup tables (LUTs) and adders in such a way that conventional multipliers are not required.
In a DA realization of a FIR filter structure, a sequence of input data words of width
W is fed through a parallel to serial shift register, producing a
serialized stream of bits. The serialized data is then fed to a bit-wise shift register.
This shift register serves as a delay line, storing the bit serial data samples.
The delay line is tapped (based on the input word size
W), to form a
W-bit address that indexes into a lookup table (LUT). The LUT stores
all possible sums of partial products over the filter
coefficients space. The LUT is followed by a shift and adder (scaling accumulator) that adds
the values obtained from the LUT sequentially.
A table lookup is performed sequentially for each bit (in order of significance starting from the LSB). On each clock cycle, the LUT result is added to the accumulated and shifted result from the previous cycle. For the last bit (MSB), the table lookup result is subtracted, accounting for the sign of the operand.
This basic form of DA is fully serial, operating on one bit at a time. If the input data
W bits wide, then a FIR structure takes
clock cycles to compute the output. Symmetric and asymmetric FIR structures are an
W+1 cycles, because one additional clock cycle is
needed to process the carry bit of the preadders.
You can control how DA code is generated by using the
DARadix implementation parameters. The
DARadix parameters have
certain requirements and restrictions that are specific to different filter types. These
requirements are included in the discussions of each parameter.
Reduce LUT Size: DALUTPartition
Improve Performance with Parallelism: DARadix
For information on the theoretical foundations of DA, see Further References.
Requirements and Considerations for Generating Distributed Arithmetic Code
Fixed-Point Quantization Required
Generation of DA code is supported only for fixed-point filter designs.
Specifying Filter Precision
The data path in HDL code generated for the DA architecture is carefully optimized for full precision computations. The filter result is cast to the output data size only at the final stage when it is presented to the output.
Distributed arithmetic merges the product and accumulator operations and does computations at full precision. This approach ignores the Product output and Accumulator properties of the Digital Filter block and sets these properties to full precision.
Coefficients with Zero Values
DA ignores taps that have zero-valued coefficients and reduces the size of the DA LUT accordingly.
Considerations for Symmetrical and Asymmetrical Filters
For symmetrical and asymmetrical filters:
A bit-level preadder or presubtractor is required to add tap data values that have coefficients of equal value and/or opposite sign. One extra clock cycle is required to compute the result because of the additional carry bit.
HDL Coder™ takes advantage of filter symmetry where possible. This reduces the DA LUT size substantially, because the effective filter length for these filter types is halved.
Detailed discussions of the theoretical foundations of DA appear in the following publications:
Meyer-Baese, U., Digital Signal Processing with Field Programmable Gate Arrays, Second Edition, Springer, pp 88–94, 128–143
White, S.A., Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review. IEEE ASSP Magazine, Vol. 6, No. 3