| Products & Services | Solutions | Academia | Support | User Community | Company |
| Download Product Updates | | | Get Pricing | | | Trial Software |
| Documentation → Simulink HDL Coder |
| Contents | Index |
| On this page… |
|---|
Distributed Arithmetic Implementation Parameters for Digital Filter Blocks |
Block implementation parameters let you control details of the code generated for specific block implementations. Block implementation parameters are passed to forEach or forAll calls (see forEach) as cell arrays of property/value pairs of the form
{'PropertyName', value}Property names are strings. The data type of a property value is specific to the property. This section describes the syntax of each block implementation parameter, and how the parameter affects generated code.
The CoeffMultipliers implementation parameter lets you specify use of canonic signed digit (CSD) or factored CSD optimizations for processing coefficient multiplier operations in code generated for certain filter blocks. Specify the CoeffMultipliers parameter in a control file using the following syntax:
{'CoeffMultipliers', 'csd'}: Use CSD techniques to replace multiplier operations with shift and add operations. CSD techniques minimize the number of addition operations required for constant multiplication by representing binary numbers with a minimum count of nonzero digits. This decreases the area used by the filter while maintaining or increasing clock speed.
{'CoeffMultipliers', 'factored-csd'}: Use factored CSD techniques, which replace multiplier operations with shift and add operations on prime factors of the coefficients. This option lets you achieve a greater filter area reduction than CSD, at the cost of decreasing clock speed.
{'CoeffMultipliers', 'multipliers'} (default): Retain multiplier operations.
The coder supports CoeffMultipliers for the filter block implementations shown in the following table:
| Block | Implementation |
|---|---|
| dsparch4/Digital Filter | hdldefaults.DigitalFilterHDLInstantiation |
| dspmlti4/FIR Decimation | hdldefaults.FIRDecimationHDLInstantiation |
| dspmlti4/FIR Interpolation | hdldefaults.FIRInterpolationHDLInstantiation |
| dsparch4/Biquad Filter | hdldefaults.BiquadFilterHDLInstantiation |
| simulink/Discrete/ Discrete FIR Filter | hdldefaults.DiscreteFIRFilterHDLInstantiation |
The following forEach call specifies that code generated for all FIR Decimation blocks in the model will use the CSD optimization:
config.forEach('*',...
'dspmlti4/FIR Decimation', {},...
'hdldefaults.FIRDecimationHDLInstantiation,...
{'CoeffMultipliers', 'csd'});
Distributed Arithmetic (DA) is a widely used technique for implementing sum-of-products computations without the use of multipliers. Designers frequently use DA to build efficient Multiply-Accumulate Circuitry (MAC) for filters and other DSP applications.
The main advantage of DA is its high computational efficiency. DA distributes multiply and accumulate operations across shifters, lookup tables (LUTs) and adders in such a way that conventional multipliers are not required.
The coder supports distributed arithmetic (DA) implementations for single-rate FIR structures of the Digital Filter and Discrete FIR Filter blocks, as given in the following table.
| Block | Implementation | FIR Structures That Support DA |
|---|---|---|
| dsparch4/Digital Filter | hdldefaults. DigitalFilterHDLInstantiation |
|
| simulink/Discrete/ Discrete FIR Filter | hdldefaults. DiscreteFIRFilterHDLInstantiation |
|
This section briefly summarizes the operation of DA. Detailed discussions of the theoretical foundations of DA appear in the following publications:
Meyer-Baese, U., Digital Signal Processing with Field Programmable Gate Arrays, Second Edition, Springer, pp 88–94, 128–143
White, S.A., Applications of Distributed Arithmetic to Digital Signal Processing: A Tutorial Review. IEEE ASSP Magazine, Vol. 6, No. 3
In a DA realization of a FIR filter structure, a sequence of input data words of width W is fed through a parallel to serial shift register, producing a serialized stream of bits. The serialized data is then fed to a bit-wide shift register. This shift register serves as a delay line, storing the bit serial data samples.
The delay line is tapped (based on the input word size W), to form a W-bit address that indexes into a lookup table (LUT). The LUT stores all possible sums of partial products over the filter coefficients space. The LUT is followed by a shift and adder (scaling accumulator) that adds the values obtained from the LUT sequentially.
A table lookup is performed sequentially for each bit (in order of significance starting from the LSB). On each clock cycle, the LUT result is added to the accumulated and shifted result from the previous cycle. For the last bit (MSB), the table lookup result is subtracted, accounting for the sign of the operand.
This basic form of DA is fully serial, operating on one bit at a time. If the input data sequence is W bits wide, then a FIR structure takes W clock cycles to compute the output. Symmetric and asymmetric FIR structures are an exception, requiring W+1 cycles, because one additional clock cycle is needed to process the carry bit of the pre-adders.
The inherently bit serial nature of DA can limit throughput. To improve throughput, the basic DA algorithm can be modified to compute more than one bit sum at a time. The number of simultaneously computed bit sums is expressed as a power of two called the DA radix. For example, a DA radix of 2 (2^1) indicates that one bit sum is computed at a time; a DA radix of 4 (2^2) indicates that two bit sums are computed at a time, and so on.
To compute more than one bit sum at a time, the LUT is replicated. For example, to perform DA on 2 bits at a time (radix 4), the odd bits are fed to one LUT and the even bits are simultaneously fed to an identical LUT. The LUT results corresponding to odd bits are left-shifted before they are added to the LUT results corresponding to even bits. This result is then fed into a scaling accumulator that shifts its feedback value by 2 places.
Processing more than one bit at a time introduces a degree of parallelism into the operation, improving performance at the expense of area. You can control the degree of parallelism by specifying the DARadix implementation parameter in a control file. DARadix lets you specify the number of bits processed simultaneously in DA (see DARadix Implementation Parameter).
The size of the LUT grows exponentially with the order of the filter. For a filter with N coefficients, the LUT must have 2^N values. For higher order filters, LUT size must be reduced to reasonable levels. To reduce the size, you can subdivide the LUT into a number of LUTs, called LUT partitions. Each LUT partition operates on a different set of taps. The results obtained from the partitions are summed.
For example, for a 160-tap filter, the LUT size is (2^160)*W bits, where W is the word size of the LUT data. Dividing this into 16 LUT partitions, each taking 10 inputs (taps), the total LUT size is reduced to 16*(2^10)*W bits. The reduction is significant.
Although LUT partitioning reduces LUT size, more adders are required to sum the LUT data.
You control how the LUT is partitioned in DA by specifying the DALUTPartition implementation parameter in a control file (see DALUTPartition Implementation Parameter).
You can control how DA code is generated by using the DALUTPartition and DARadix implementation parameters in a control file. Before using these parameters, review the following general requirements, restrictions, and other considerations for generation of DA code.
Requirements Specific to Filter Type. The DALUTPartition and DARadix parameters have certain requirements and restrictions that are specific to different filter types. These requirements are included in the discussions of each parameter:
Fixed-Point Quantization Required. Generation of DA code is supported only for fixed-point filter designs.
Specifying Filter Precision. The data path in HDL code generated for the DA architecture is carefully optimized for full precision computations. The filter result is cast to the output data size only at the final stage when it is presented to the output.
In distributed arithmetic the product and accumulator operations are merged, and computations are done at full precision. The Product output and Accumulator properties of the Digital Filter block are ignored and set to full precision.
Syntax: 'DALUTPartition', [p1 p2... pN]
DALUTPartition enables DA code generation and specifies the number and size of LUT partitions used for DA.
Specify LUT partitions as a vector of integers [p1 p2...pN] where:
N is the number of partitions.
Each vector element specifies the size of a partition. The maximum size for an individual partition is 12.
The sum of all vector elements equals the filter length FL. FL is calculated differently depending on the filter type (see Specifying DALUTPartition for Single-Rate Filters.)
Specifying DALUTPartition for Single-Rate Filters. To determine the LUT partition for one of the supported single-rate filter types, calculate FL as shown in the following table. Then, specify the partition as a vector whose elements sum to FL.
| Filter Type | Filter Length (FL) Calculation |
|---|---|
| dfilt.dffir | FL = length(find(Hd.numerator~= 0)) |
| dfilt.dfsymfir dfilt.dfasymfir | FL = ceil(length(find(Hd.numerator~= 0))/2) |
The following figure shows a Digital Filter configured for a direct form FIR filter of length 11.

The following control file defines one possible LUT partitioning for this filter:
function c = filter_da_config1
c = hdlnewcontrol(mfilename);
c.forEach('*',...
'dsparch4/Digital Filter', {},...
'hdldefaults.DigitalFilterHDLInstantiation', {'DALUTpartition',[4 4 3]});The following figure shows a Digital Filter configured for a direct-form symmetric FIR filter of length 6:

The following control file defines a possible LUT partitioning for this filter.
function c = filter_da_config1
c = hdlnewcontrol(mfilename);
c.forEach('*',...
'dsparch4/Digital Filter', {},...
'hdldefaults.DigitalFilterHDLInstantiation', {'DALutpartition',[3 3]});You can also specify generation of DA code for your filter design without LUT partitioning. To do so, specify a vector of one element, whose value is equal to the filter length. For example, the following figure shows a Digital Filter configuration for a direct form FIR filter of length 5.

The following control file specifies a partition that is equal to the filter length:
function c = filter_da_config1
c = hdlnewcontrol(mfilename);
c.forEach('*',...
'dsparch4/Digital Filter', {},...
'hdldefaults.DigitalFilterHDLInstantiation', {'DALutpartition',5});Syntax: 'DARadix', N
DARadix specifies the number of bits processed simultaneously in DA. The number of bits is expressed as N, which must be:
A nonzero positive integer that is a power of two
Such that mod(W, log2(N)) = 0, where W is the input word size of the filter
The default value for N is 2, specifying processing of one bit at a time, or fully serial DA, which is slow but low in area. The maximum value for N is 2^W, where W is the input word size of the filter. This maximum specifies fully parallel DA, which is fast but high in area. Values of N between these extrema specify partly serial DA.
Note When setting a DARadix value for symmetrical (dfilt.dfsymfir) and asymmetrical (dfilt.dfasymfir) filters, see Considerations for Symmetrical and Asymmetrical Filters. |
Coefficients with Zero Values. DA ignores taps that have zero-valued coefficients and reduces the size of the DA LUT accordingly.
Considerations for Symmetrical and Asymmetrical Filters. For symmetrical (dfilt.dfsymfir) and asymmetrical (dfilt.dfasymfir) filters:
A bit-level preadder or presubtractor is required to add tap data values that have coefficients of equal value and/or opposite sign. One extra clock cycle is required to compute the result because of the additional carry bit.
The coder takes advantage of filter symmetry where possible. This reduces the DA LUT size substantially, because the effective filter length for these filter types is halved.
Holding Input Data in a Valid State. In filters with a DA architecture, data can be delivered to the outputs N cycles (N >= 2) later than the inputs. You can use the HoldInputDataBetweenSamples property to determine how long (in terms of clock cycles) input data values are held in a valid state, as follows:
When HoldInputDataBetweenSamples is set 'on' (the default), input data values are held in a valid state across N clock cycles.
When HoldInputDataBetweenSamples is set 'off' , data values are held in a valid state for only one clock cycle. For the next N-1 cycles, data is in an unknown state (expressed as 'X') until the next input sample is clocked in.
InputPipeline lets you specify a implementation with input pipelining for selected blocks. The parameter value specifies the number of input pipeline stages (pipeline depth) in the generated code.
Syntax:
{'InputPipeline', nStages}where nStages >= 0.
The following forEach call specifies an input pipeline depth of two stages for all Sum blocks in the model:
config.forEach('*',...
'built-in/Sum', {},...
'hdldefaults.SumRTW', {'InputPipeline', 2});
When generating code for pipeline registers, the coder appends a postfix string to names of input or output pipeline registers. The default postfix string is _pipe. To customize the postfix string, use the Pipeline postfix option in the Global Settings / General pane in the HDL Coder pane of the Configuration Parameters dialog box. Alternatively, you can pass the desired postfix string in the makehdl property PipelinePostfix. See PipelinePostfix for an example.
OutputPipeline lets you specify a implementation with output pipelining for selected blocks. The parameter value specifies the number of output pipeline stages (pipeline depth) in the generated code.
Syntax:
{'OutputPipeline', nStages}where nStages >= 0.
The following forEach call specifies an output pipeline depth of two stages for all Sum blocks in the model:
config.forEach('*',...
'built-in/Sum', {},...
'hdldefaults.SumRTW', {'OutputPipeline', 2});
When generating code for pipeline registers, the coder appends a postfix string to names of input or output pipeline registers. The default postfix string is _pipe. To customize the postfix string, use the Pipeline postfix option in the Global Settings / General pane in the HDL Coder pane of the Configuration Parameters dialog box. Alternatively, you can pass the desired postfix string in the makehdl property PipelinePostfix. See PipelinePostfix for an example.
See also Distributed Pipeline Insertion.
The ResetType implementation parameter lets you suppress generation of reset logic for the following block types:
dspsigops/Delay
simulink/Additional Math & Discrete/Additional Discrete/Unit Delay Enabled
simulink/Commonly Used Blocks/Unit Delay
simulink/Discrete/Integer Delay
simulink/Discrete/Tapped Delay
sflib/Chart
sflib/Truth Table
Syntax:
{'ResetType', 'default'}
{'ResetType', 'none'}
When you specify {'ResetType', 'none'} for a selection of one or more blocks, the coder overrides the Global Settings/Advanced Reset type option for the specified blocks only. Reset signals and synchronous or asynchronous reset logic (as specified by Reset type) is still generated as required for other blocks.
The default specification is {'ResetType', 'default'}. In this case, the coder follows the Global Settings/Advanced Reset type option for the specified blocks.
The following control file specifies suppression of reset logic for a specific unit delay block within a subsystem.
function c = resetnone_examp
% Control file for resetnone_examp
c = hdlnewcontrol(mfilename);
c.generateHDLFor('resetnone_examp/HDLSubsystem');
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Suppress reset logic for Unit Delay block
c.forEach('resetnone_examp/HDLSubsystem/Unit Delay',...
'built-in/UnitDelay', {},...
'hdldefaults.UnitDelayRTW', {'ResetType','none'});
Some block implementation parameters let you customize features of an interface generated for the following block types:
simulink/Ports & Subsystems/Model
built-in/Subsystem
lfilinklib/HDL Cosimulation
modelsimlib/HDL Cosimulation
discoverylib/HDL Cosimulation
For example, you can specify generation of a black box interface for a subsystem, and pass parameters that specify the generation and naming of clock, reset, and other ports in HDL code. For more information about interface generation parameters, see Customizing the Generated Interface.
![]() | Block-Specific Usage, Requirements, and Restrictions for HDL Code Generation | Blocks That Support Complex Data | ![]() |

Learn more about Simulink through this collection of videos, articles, technical literature and the Getting Started with Simulink Guide.
| © 1984-2009- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |