Distributed Arithmetic for FIR Filters

Distributed Arithmetic Overview

Distributed Arithmetic (DA) is a widely-used technique for implementing sum-of-products computations without the use of multipliers. Designers frequently use DA to build efficient Multiply-Accumulate Circuitry (MAC) for filters and other DSP applications.

The main advantage of DA is its high computational efficiency. DA distributes multiply and accumulate operations across shifters, lookup tables (LUTs) and adders in such a way that conventional multipliers are not required.

The coder supports DA in HDL code generated for several single-rate and multirate FIR filter structures (see Requirements and Considerations for Generating Distributed Arithmetic Code ). Only fixed-point filter designs are supported.

This section briefly summarizes of the operation of DA. Detailed discussions of the theoretical foundations of DA appear in the following publications:

In a DA realization of a FIR filter structure, a sequence of input data words of width W is fed through a parallel to serial shift register, producing a serialized stream of bits. The serialized data is then fed to a bit-wide shift register. This shift register serves as a delay line, storing the bit serial data samples.

The delay line is tapped (based on the input word size W), to form a W-bit address that indexes into a lookup table (LUT). The LUT stores all possible sums of partial products over the filter coefficients space. The LUT is followed by a shift and adder (scaling accumulator) that adds the values obtained from the LUT sequentially.

A table lookup is performed sequentially for each bit (in order of significance starting from the LSB). On each clock cycle, the LUT result is added to the accumulated and shifted result from the previous cycle. For the last bit (MSB), the table lookup result is subtracted, accounting for the sign of the operand.

This basic form of DA is fully serial, operating on one bit at a time. If the input data sequence is W bits wide, then a FIR structure takes W clock cycles to compute the output. Symmetric and asymmetric FIR structures are an exception, requiring W+1 cycles, because one additional clock cycle is needed to process the carry bit of the pre-adders.

Improving Performance with Parallelism

The inherently bit serial nature of DA can limit throughput. To improve throughput, the basic DA algorithm can be modified to compute more than one bit sum at a time. The number of simultaneously computed bit sums is expressed as a power of two called the DA radix. For example, a DA radix of 2 (2^1) indicates that one bit sum is computed at a time; a DA radix of 4 (2^2) indicates that two bit sums are computed at a time, and so on.

To compute more than one bit sum at a time, the LUT is replicated. For example, to perform DA on 2 bits at a time (radix 4), the odd bits are fed to one LUT and the even bits are simultaneously fed to an identical LUT. The LUT results corresponding to odd bits are left-shifted before they are added to the LUT results corresponding to even bits. This result is then fed into a scaling accumulator that shifts its feedback value by 2 places.

Processing more than one bit at a time introduces a degree of parallelism into the operation, improving performance at the expense of area. The DARadix property lets you specify the number of bits processed simultaneously in DA (see DARadix Property).

Reducing LUT Size

The size of the LUT grows exponentially with the order of the filter. For a filter with N coefficients, the LUT must have 2^N values. For higher order filters, LUT size must be reduced to reasonable levels. To reduce the size, you can subdivide the LUT into a number of LUTs, called LUT partitions. Each LUT partition operates on a different set of taps. The results obtained from the partitions are summed.

For example, for a 160 tap filter, the LUT size is (2^160)*W bits, where W is the word size of the LUT data. Dividing this into 16 LUT partitions, each taking 10 inputs (taps), the total LUT size is reduced to 16*(2^10)*W bits, a significant reduction.

Although LUT partitioning reduces LUT size, more adders are required to sum the LUT data.

The DALUTPartition property lets you specify how the LUT is partitioned in DA (see DALUTPartition Property ).

Requirements and Considerations for Generating Distributed Arithmetic Code

The coder lets you control how DA code is generated using the DALUTPartition and DARadix properties (or equivalent Generate HDL dialog box options). Before using these properties, review the following general requirements, restrictions, and other considerations for generation of DA code.

Supported Filter Types

The coder supports DA in HDL code generated for the following single-rate and multirate FIR filter structures:

Requirements Specific to Filter Type

The DALUTPartition and DARadix properties have certain requirements and restrictions that are specific to different filter types. These requirements are included in the discussions of each property:

Fixed Point Quantization Required

Generation of DA code is supported only for fixed-point filter designs.

Specifying Filter Precision

The data path in HDL code generated for the DA architecture is carefully optimized for full precision computations. The filter result is cast to the output data size only at the final stage when it is presented to the output. If the FilterInternals property is set to the default (FullPrecision), numeric results obtained from simulation of the generated HDL code are bit-true to filter results produced by the original filter object.

If the FilterInternals property is set to SpecifyPrecision and you change filter word or fraction lengths, generated DA code may produce numeric results that are different than the filter results produced by the original filter object.

DALUTPartition Property

Syntax: 'DALUTPartition', [p1 p2... pN]

DALUTPartition enables DA code generation and specifies the number and size of LUT partitions used for DA.

Specify LUT partitions as a vector of integers [p1 p2...pN] where

To enable generation of DA code for your filter design without LUT partitioning, specify a vector of one element, whose value is equal to the filter length, as in the following example:

filtdes = fdesign.lowpass('N,Fc,Ap,Ast',4,0.4,0.05,0.03,'linear');
Hd = design(filtdes);
Hd.arithmetic = 'fixed';
generatehdl (Hd, 'DALUTPartition', 5);

Specifying DALUTPartition for Single-Rate Filters

To determine the LUT partition for one of the supported single-rate filter types, calculate FL as shown in the following table. Then, specify the partition as a vector whose elements sum to FL.

Filter TypeFilter Length (FL) Calculation
dfilt.dffir
FL = length(find(Hd.numerator~= 0))
dfilt.dfsymfir
dfilt.dfasymfir
FL = ceil(length(find(Hd.numerator~= 0))/2)

The following example shows the FL calculation and one possible partitioning for a direct form FIR filter:

filtdes = fdesign.lowpass('N,Fc,Ap,Ast',30,0.4,0.05,0.03,'linear');
Hd = design(filtdes,'filterstructure','dffir');
Hd.arithmetic = 'fixed';
FL = length(find(Hd.numerator~= 0))

FL =

    31
generatehdl(Hd, 'DALUTPartition',[8 8 8 7]);

The following example shows the FL calculation and one possible partitioning for a direct-form symmetric FIR filter:

filtdes = fdesign.lowpass('N,Fc,Ap,Ast',30,0.4,0.05,0.03,'linear');
Hd = design(filtdes,'filterstructure','dfsymfir');
Hd.arithmetic = 'fixed';
FL = ceil(length(find(Hd.numerator~= 0))/2)

FL =

    16
generatehdl(Hd, 'DALUTPartition',[8 8]);

Specifying DALUTPartition for Multirate Filters

For supported multirate filters (mfilt.firdecim and mfilt.firinterp) , you can specify the LUT partition as

The following table shows the FL calculations for each type of LUT partition.

LUT Partition Specified As...Filter Length (FL) Calculation
Vector: determine FL as shown in the Filter Length (FL) Calculation column to the right. Specify the LUT partition as a vector of integers whose elements sum to FL.
FL = size(polyphase(Hm), 2)
Matrix: determine the subfilter length FLi based on the polyphase decomposition of the filter, as shown in the Filter Length (FL) Calculation column to the right. Specify the LUT partition for each subfilter as a row vector whose elements sum to FLi.
p = polyphase(Hm);
FLi = length(find(p(i,:)));
 
where i is the index to the ith row of the polyphase matrix of the multirate filter. The ith row of the matrix p represents theith subfilter.

The following example shows the FL calculation for a direct-form FIR polyphase decimator, with the LUT partition specified as a vector:

d = fdesign.decimator(4);
Hm = design(d);
Hm.arithmetic = 'fixed';
FL = size(polyphase(Hm),2)

FL =

    27

generatehdl(Hm, 'DALUTPartition',[8 8 8 3]); 

The following example shows the LUT partition specified as a matrix for the same direct-form FIR polyphase decimator. The length of the first subfilter is 1, and all other subfilters have length 26.

d = fdesign.decimator(4);
Hm = design(d);
Hm.arithmetic = 'fixed';
generatehdl(Hm, 'DALUTPartition',[1 0 0 0; 8 8 8 2; 8 8 6 4; 8 8 8 2]); 

DARadix Property

Syntax: 'DARadix', N

DARadix specifies the number of bits processed simultaneously in DA. The number of bits is expressed as N, which must be

The default value for N is 2, specifying processing of one bit at a time, or fully serial DA, which is slow but low in area. The maximum value for N is 2^W, where W is the input word size of the filter. This maximum specifies fully parallel DA, which is fast but high in area. Values of N between these extrema specify partly serial DA.

Special Cases

Coefficients with Zero Values

DA ignores taps that have zero-valued coefficients and reduces the size of the DA LUT accordingly.

Considerations for Symmetrical and Asymmetrical Filters

For symmetrical (dfilt.dfsymfir) and asymmetrical (dfilt.dfasymfir) filters:

Holding Input Data in a Valid State

In filters with a DA architecture, data can be delivered to the outputs N cycles (N >= 2) later than the inputs. You can use the HoldInputDataBetweenSamples property to determine how long (in terms of clock cycles) input data values are held in a valid state, as follows:

Distributed Arithmetic Options in the Generate HDL Dialog Box

This section describes Generate HDL dialog box options related to DA code generation. The following figure shows these options.

The DA related options are:

The Generate HDL dialog box initially displays default DA related option values that are appropriate for the current filter design. In other respects, the requirements for setting these options are identical to those described in DALUTPartition Property and DARadix Property.

To specify DA code generation using the Generate HDL dialog box, follow these steps:

  1. Design a FIR filter (using FDATool, filterbuilder, or MATLAB commands) that meets the requirements described in Requirements and Considerations for Generating Distributed Arithmetic Code.

  2. Open the Generate HDL dialog box.

  3. Select Distributed Arithmetic (DA) from the Architecture pop-up menu.

    When you select this option, the related LUT Partition and DA Radix options are displayed to the right of the Architecture menu. The following figure shows the default DA options for a Direct Form FIR filter.

    The default value for LUT Partition is a vector of integers such that each partition has a maximum of 8 inputs. The figure illustrates a 51-tap filter, with 7 partitions. All partitions have 8 inputs except for the last, which has 3 inputs.

  4. If desired, set the LUT Partition field to a nondefault value. See DALUTPartition Property for detailed information.

  5. The default DA Radix value is 2, specifying processing of one bit at a time, or fully serial DA. If desired, set the DA Radix field to a nondefault value. See DARadix Property for detailed information.

    If you are setting the DA Radix value for a dfilt.dfsymfir and dfilt.dfasymfir filter, see Considerations for Symmetrical and Asymmetrical Filters.

  6. Set other HDL options as required, and generate code. Incorrect or illegal values for LUT Partition or DA Radix are reported at code generation time.

DA Interactions with Other HDL Options

When Distributed Arithmetic (DA) is selected in the Architecture menu, some other HDL options change automatically to settings that are appropriate for DA code generation:

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS