| Filter Design HDL Coder™ | ![]() |
| On this page… |
|---|
Distributed Arithmetic Overview Requirements and Considerations for Generating Distributed Arithmetic Code Distributed Arithmetic Options in the Generate HDL Dialog Box |
Distributed Arithmetic (DA) is a widely-used technique for implementing sum-of-products computations without the use of multipliers. Designers frequently use DA to build efficient Multiply-Accumulate Circuitry (MAC) for filters and other DSP applications.
The main advantage of DA is its high computational efficiency. DA distributes multiply and accumulate operations across shifters, lookup tables (LUTs) and adders in such a way that conventional multipliers are not required.
The coder supports DA in HDL code generated for several single-rate and multirate FIR filter structures (see Requirements and Considerations for Generating Distributed Arithmetic Code ). Only fixed-point filter designs are supported.
This section briefly summarizes of the operation of DA. Detailed discussions of the theoretical foundations of DA appear in the following publications:
Meyer-Baese, U., Digital Signal Processing with Field Programmable Gate Arrays, Second Edition, Springer, pp 88–94, 128–143
White, S.A., Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review. IEEE ASSP Magazine, Vol. 6, No. 3
In a DA realization of a FIR filter structure, a sequence of input data words of width W is fed through a parallel to serial shift register, producing a serialized stream of bits. The serialized data is then fed to a bit-wide shift register. This shift register serves as a delay line, storing the bit serial data samples.
The delay line is tapped (based on the input word size W), to form a W-bit address that indexes into a lookup table (LUT). The LUT stores all possible sums of partial products over the filter coefficients space. The LUT is followed by a shift and adder (scaling accumulator) that adds the values obtained from the LUT sequentially.
A table lookup is performed sequentially for each bit (in order of significance starting from the LSB). On each clock cycle, the LUT result is added to the accumulated and shifted result from the previous cycle. For the last bit (MSB), the table lookup result is subtracted, accounting for the sign of the operand.
This basic form of DA is fully serial, operating on one bit at a time. If the input data sequence is W bits wide, then a FIR structure takes W clock cycles to compute the output. Symmetric and asymmetric FIR structures are an exception, requiring W+1 cycles, because one additional clock cycle is needed to process the carry bit of the pre-adders.
The inherently bit serial nature of DA can limit throughput. To improve throughput, the basic DA algorithm can be modified to compute more than one bit sum at a time. The number of simultaneously computed bit sums is expressed as a power of two called the DA radix. For example, a DA radix of 2 (2^1) indicates that one bit sum is computed at a time; a DA radix of 4 (2^2) indicates that two bit sums are computed at a time, and so on.
To compute more than one bit sum at a time, the LUT is replicated. For example, to perform DA on 2 bits at a time (radix 4), the odd bits are fed to one LUT and the even bits are simultaneously fed to an identical LUT. The LUT results corresponding to odd bits are left-shifted before they are added to the LUT results corresponding to even bits. This result is then fed into a scaling accumulator that shifts its feedback value by 2 places.
Processing more than one bit at a time introduces a degree of parallelism into the operation, improving performance at the expense of area. The DARadix property lets you specify the number of bits processed simultaneously in DA (see DARadix Property).
The size of the LUT grows exponentially with the order of the filter. For a filter with N coefficients, the LUT must have 2^N values. For higher order filters, LUT size must be reduced to reasonable levels. To reduce the size, you can subdivide the LUT into a number of LUTs, called LUT partitions. Each LUT partition operates on a different set of taps. The results obtained from the partitions are summed.
For example, for a 160 tap filter, the LUT size is (2^160)*W bits, where W is the word size of the LUT data. Dividing this into 16 LUT partitions, each taking 10 inputs (taps), the total LUT size is reduced to 16*(2^10)*W bits, a significant reduction.
Although LUT partitioning reduces LUT size, more adders are required to sum the LUT data.
The DALUTPartition property lets you specify how the LUT is partitioned in DA (see DALUTPartition Property ).
The coder lets you control how DA code is generated using the DALUTPartition and DARadix properties (or equivalent Generate HDL dialog box options). Before using these properties, review the following general requirements, restrictions, and other considerations for generation of DA code.
The coder supports DA in HDL code generated for the following single-rate and multirate FIR filter structures:
dfilt.dffir
dfilt.dfsymfir
dfilt.dfasymfir
mfilt.firdecim
mfilt.firinterp
The DALUTPartition and DARadix properties have certain requirements and restrictions that are specific to different filter types. These requirements are included in the discussions of each property:
Generation of DA code is supported only for fixed-point filter designs.
The data path in HDL code generated for the DA architecture is carefully optimized for full precision computations. The filter result is cast to the output data size only at the final stage when it is presented to the output. If the FilterInternals property is set to the default (FullPrecision), numeric results obtained from simulation of the generated HDL code are bit-true to filter results produced by the original filter object.
If the FilterInternals property is set to SpecifyPrecision and you change filter word or fraction lengths, generated DA code may produce numeric results that are different than the filter results produced by the original filter object.
Syntax: 'DALUTPartition', [p1 p2... pN]
DALUTPartition enables DA code generation and specifies the number and size of LUT partitions used for DA.
Specify LUT partitions as a vector of integers [p1 p2...pN] where
N is the number of partitions.
Each vector element specifies the size of a partition. The maximum size for an individual partition is 12.
The sum of all vector elements equals the filter length FL. FL is calculated differently depending on the filter type (see Specifying DALUTPartition for Single-Rate Filters and Specifying DALUTPartition for Multirate Filters).
To enable generation of DA code for your filter design without LUT partitioning, specify a vector of one element, whose value is equal to the filter length, as in the following example:
filtdes = fdesign.lowpass('N,Fc,Ap,Ast',4,0.4,0.05,0.03,'linear');
Hd = design(filtdes);
Hd.arithmetic = 'fixed';
generatehdl (Hd, 'DALUTPartition', 5);
To determine the LUT partition for one of the supported single-rate filter types, calculate FL as shown in the following table. Then, specify the partition as a vector whose elements sum to FL.
| Filter Type | Filter Length (FL) Calculation |
|---|---|
| dfilt.dffir | FL = length(find(Hd.numerator~= 0)) |
| dfilt.dfsymfir dfilt.dfasymfir | FL = ceil(length(find(Hd.numerator~= 0))/2) |
The following example shows the FL calculation and one possible partitioning for a direct form FIR filter:
filtdes = fdesign.lowpass('N,Fc,Ap,Ast',30,0.4,0.05,0.03,'linear');
Hd = design(filtdes,'filterstructure','dffir');
Hd.arithmetic = 'fixed';
FL = length(find(Hd.numerator~= 0))
FL =
31
generatehdl(Hd, 'DALUTPartition',[8 8 8 7]);
The following example shows the FL calculation and one possible partitioning for a direct-form symmetric FIR filter:
filtdes = fdesign.lowpass('N,Fc,Ap,Ast',30,0.4,0.05,0.03,'linear');
Hd = design(filtdes,'filterstructure','dfsymfir');
Hd.arithmetic = 'fixed';
FL = ceil(length(find(Hd.numerator~= 0))/2)
FL =
16
generatehdl(Hd, 'DALUTPartition',[8 8]);
For supported multirate filters (mfilt.firdecim and mfilt.firinterp) , you can specify the LUT partition as
A vector defining a partition for LUTs for all polyphase subfilters.
A matrix of LUT partitions, where each row vector specifies a LUT partition for a corresponding polyphase subfilter. In this case, the FL is uniform for all subfilters. This approach provides a fine control for partitioning each subfilter.
The following table shows the FL calculations for each type of LUT partition.
| LUT Partition Specified As... | Filter Length (FL) Calculation |
|---|---|
| Vector: determine FL as shown in the Filter Length (FL) Calculation column to the right. Specify the LUT partition as a vector of integers whose elements sum to FL. | FL = size(polyphase(Hm), 2) |
| Matrix: determine the subfilter length FLi based on the polyphase decomposition of the filter, as shown in the Filter Length (FL) Calculation column to the right. Specify the LUT partition for each subfilter as a row vector whose elements sum to FLi. | p = polyphase(Hm); FLi = length(find(p(i,:)));where i is the index to the ith row of the polyphase matrix of the multirate filter. The ith row of the matrix p represents theith subfilter. |
The following example shows the FL calculation for a direct-form FIR polyphase decimator, with the LUT partition specified as a vector:
d = fdesign.decimator(4);
Hm = design(d);
Hm.arithmetic = 'fixed';
FL = size(polyphase(Hm),2)
FL =
27
generatehdl(Hm, 'DALUTPartition',[8 8 8 3]); The following example shows the LUT partition specified as a matrix for the same direct-form FIR polyphase decimator. The length of the first subfilter is 1, and all other subfilters have length 26.
d = fdesign.decimator(4); Hm = design(d); Hm.arithmetic = 'fixed'; generatehdl(Hm, 'DALUTPartition',[1 0 0 0; 8 8 8 2; 8 8 6 4; 8 8 8 2]);
Syntax: 'DARadix', N
DARadix specifies the number of bits processed simultaneously in DA. The number of bits is expressed as N, which must be
A nonzero positive integer that is a power of two
Such that mod(W, log2(N)) = 0 where W is the input word size of the filter.
The default value for N is 2, specifying processing of one bit at a time, or fully serial DA, which is slow but low in area. The maximum value for N is 2^W, where W is the input word size of the filter. This maximum specifies fully parallel DA, which is fast but high in area. Values of N between these extrema specify partly serial DA.
Note When setting a DARadix value for symmetrical (dfilt.dfsymfir) and asymmetrical (dfilt.dfasymfir) filters, see Considerations for Symmetrical and Asymmetrical Filters. |
DA ignores taps that have zero-valued coefficients and reduces the size of the DA LUT accordingly.
For symmetrical (dfilt.dfsymfir) and asymmetrical (dfilt.dfasymfir) filters:
A bit-level preadder or presubtractor is required to add tap data values that have coefficients of equal value and/or opposite sign. One extra clock cycle is required to compute the result because of the additional carry bit.
The coder takes advantage of filter symmetry where possible. This reduces the DA LUT size substantially , because the effective filter length for these filter types is halved.
In filters with a DA architecture, data can be delivered to the outputs N cycles (N >= 2) later than the inputs. You can use the HoldInputDataBetweenSamples property to determine how long (in terms of clock cycles) input data values are held in a valid state, as follows:
When HoldInputDataBetweenSamples is set 'on' (the default), input data values are held in a valid state across N clock cycles.
When HoldInputDataBetweenSamples is set 'off' , data values are held in a valid state for only one clock cycle. For the nextN-1 cycles, data is in an unknown state (expressed as 'X') until the next input sample is clocked in.
This section describes Generate HDL dialog box options related to DA code generation. The following figure shows these options.

The DA related options are:
The Architecture pop-up menu, which lets you enable DA code generation and displays related options
The LUT Partition field, which displays and lets you change the value of the DALUTPartition property
The DA Radix field, which displays and lets you change the value of the DARadix property
The Generate HDL dialog box initially displays default DA related option values that are appropriate for the current filter design. In other respects, the requirements for setting these options are identical to those described in DALUTPartition Property and DARadix Property.
To specify DA code generation using the Generate HDL dialog box, follow these steps:
Design a FIR filter (using FDATool, filterbuilder, or MATLAB commands) that meets the requirements described in Requirements and Considerations for Generating Distributed Arithmetic Code.
Open the Generate HDL dialog box.
Select Distributed Arithmetic (DA) from the Architecture pop-up menu.
When you select this option, the related LUT Partition and DA Radix options are displayed to the right of the Architecture menu. The following figure shows the default DA options for a Direct Form FIR filter.

The default value for LUT Partition is a vector of integers such that each partition has a maximum of 8 inputs. The figure illustrates a 51-tap filter, with 7 partitions. All partitions have 8 inputs except for the last, which has 3 inputs.
If desired, set the LUT Partition field to a nondefault value. See DALUTPartition Property for detailed information.
The default DA Radix value is 2, specifying processing of one bit at a time, or fully serial DA. If desired, set the DA Radix field to a nondefault value. See DARadix Property for detailed information.
If you are setting the DA Radix value for a dfilt.dfsymfir and dfilt.dfasymfir filter, see Considerations for Symmetrical and Asymmetrical Filters.
Set other HDL options as required, and generate code. Incorrect or illegal values for LUT Partition or DA Radix are reported at code generation time.
When Distributed Arithmetic (DA) is selected in the Architecture menu, some other HDL options change automatically to settings that are appropriate for DA code generation:
Coefficient multipliers is set to Multiplier and disabled.
FIR adder style is set to Tree and disabled.
Add input register in the Ports pane of the More HDL Settings dialog box is selected and disabled. (An input register, used as part of a shift register, is always used in DA code.)
Add output register in the Ports pane of the More HDL Settings dialog box is selected and disabled.
![]() | Speed vs. Area Optimizations for FIR Filters | Optimizing the Clock Rate with Pipeline Registers | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |