Products & Services Solutions Academia Support User Community Company

Block Implementation Parameters

Overview

Block implementation parameters let you control details of the code generated for specific block implementations. Block implementation parameters are passed to forEach or forAll calls (see forEach) as cell arrays of property/value pairs of the form

{'PropertyName', value}

Property names are strings. The data type of a property value is specific to the property. This section describes the syntax of each block implementation parameter, and how the parameter affects generated code.

CoeffMultipliers

The CoeffMultipliers implementation parameter lets you specify use of canonic signed digit (CSD) or factored CSD optimizations for processing coefficient multiplier operations in code generated for certain filter blocks. Specify the CoeffMultipliers parameter in a control file using the following syntax:

The coder supports CoeffMultipliers for the filter block implementations shown in the following table:

BlockImplementation
dsparch4/Digital Filterhdldefaults.DigitalFilterHDLInstantiation
dspmlti4/FIR Decimationhdldefaults.FIRDecimationHDLInstantiation
dspmlti4/FIR Interpolationhdldefaults.FIRInterpolationHDLInstantiation
dsparch4/Biquad Filterhdldefaults.BiquadFilterHDLInstantiation
simulink/Discrete/
Discrete FIR Filter
hdldefaults.DiscreteFIRFilterHDLInstantiation

The following forEach call specifies that code generated for all FIR Decimation blocks in the model will use the CSD optimization:

config.forEach('*',...
 'dspmlti4/FIR Decimation', {},...
 'hdldefaults.FIRDecimationHDLInstantiation,...
 {'CoeffMultipliers', 'csd'});

Distributed Arithmetic Implementation Parameters for Digital Filter Blocks

Distributed Arithmetic (DA) is a widely used technique for implementing sum-of-products computations without the use of multipliers. Designers frequently use DA to build efficient Multiply-Accumulate Circuitry (MAC) for filters and other DSP applications.

The main advantage of DA is its high computational efficiency. DA distributes multiply and accumulate operations across shifters, lookup tables (LUTs) and adders in such a way that conventional multipliers are not required.

The coder supports distributed arithmetic (DA) implementations for single-rate FIR structures of the Digital Filter and Discrete FIR Filter blocks, as given in the following table.

BlockImplementationFIR Structures That Support DA
dsparch4/Digital Filterhdldefaults.
DigitalFilterHDLInstantiation
  • dfilt.dffir

  • dfilt.dfsymfir

  • dfilt.dfasymdir

simulink/Discrete/
Discrete FIR Filter
hdldefaults.
DiscreteFIRFilterHDLInstantiation
  • dfilt.dffir

  • dfilt.dfsymfir

  • dfilt.dfasymdir

This section briefly summarizes the operation of DA. Detailed discussions of the theoretical foundations of DA appear in the following publications:

In a DA realization of a FIR filter structure, a sequence of input data words of width W is fed through a parallel to serial shift register, producing a serialized stream of bits. The serialized data is then fed to a bit-wide shift register. This shift register serves as a delay line, storing the bit serial data samples.

The delay line is tapped (based on the input word size W), to form a W-bit address that indexes into a lookup table (LUT). The LUT stores all possible sums of partial products over the filter coefficients space. The LUT is followed by a shift and adder (scaling accumulator) that adds the values obtained from the LUT sequentially.

A table lookup is performed sequentially for each bit (in order of significance starting from the LSB). On each clock cycle, the LUT result is added to the accumulated and shifted result from the previous cycle. For the last bit (MSB), the table lookup result is subtracted, accounting for the sign of the operand.

This basic form of DA is fully serial, operating on one bit at a time. If the input data sequence is W bits wide, then a FIR structure takes W clock cycles to compute the output. Symmetric and asymmetric FIR structures are an exception, requiring W+1 cycles, because one additional clock cycle is needed to process the carry bit of the pre-adders.

Improving Performance with Parallelism

The inherently bit serial nature of DA can limit throughput. To improve throughput, the basic DA algorithm can be modified to compute more than one bit sum at a time. The number of simultaneously computed bit sums is expressed as a power of two called the DA radix. For example, a DA radix of 2 (2^1) indicates that one bit sum is computed at a time; a DA radix of 4 (2^2) indicates that two bit sums are computed at a time, and so on.

To compute more than one bit sum at a time, the LUT is replicated. For example, to perform DA on 2 bits at a time (radix 4), the odd bits are fed to one LUT and the even bits are simultaneously fed to an identical LUT. The LUT results corresponding to odd bits are left-shifted before they are added to the LUT results corresponding to even bits. This result is then fed into a scaling accumulator that shifts its feedback value by 2 places.

Processing more than one bit at a time introduces a degree of parallelism into the operation, improving performance at the expense of area. You can control the degree of parallelism by specifying the DARadix implementation parameter in a control file. DARadix lets you specify the number of bits processed simultaneously in DA (see DARadix Implementation Parameter).

Reducing LUT Size

The size of the LUT grows exponentially with the order of the filter. For a filter with N coefficients, the LUT must have 2^N values. For higher order filters, LUT size must be reduced to reasonable levels. To reduce the size, you can subdivide the LUT into a number of LUTs, called LUT partitions. Each LUT partition operates on a different set of taps. The results obtained from the partitions are summed.

For example, for a 160-tap filter, the LUT size is (2^160)*W bits, where W is the word size of the LUT data. Dividing this into 16 LUT partitions, each taking 10 inputs (taps), the total LUT size is reduced to 16*(2^10)*W bits. The reduction is significant.

Although LUT partitioning reduces LUT size, more adders are required to sum the LUT data.

You control how the LUT is partitioned in DA by specifying the DALUTPartition implementation parameter in a control file (see DALUTPartition Implementation Parameter).

Requirements and Considerations for Generating Distributed Arithmetic Code

You can control how DA code is generated by using the DALUTPartition and DARadix implementation parameters in a control file. Before using these parameters, review the following general requirements, restrictions, and other considerations for generation of DA code.

Requirements Specific to Filter Type.   The DALUTPartition and DARadix parameters have certain requirements and restrictions that are specific to different filter types. These requirements are included in the discussions of each parameter:

Fixed-Point Quantization Required.   Generation of DA code is supported only for fixed-point filter designs.

Specifying Filter Precision.   The data path in HDL code generated for the DA architecture is carefully optimized for full precision computations. The filter result is cast to the output data size only at the final stage when it is presented to the output.

In distributed arithmetic the product and accumulator operations are merged, and computations are done at full precision. The Product output and Accumulator properties of the Digital Filter block are ignored and set to full precision.

DALUTPartition Implementation Parameter

Syntax: 'DALUTPartition', [p1 p2... pN]

DALUTPartition enables DA code generation and specifies the number and size of LUT partitions used for DA.

Specify LUT partitions as a vector of integers [p1 p2...pN] where:

Specifying DALUTPartition for Single-Rate Filters.   To determine the LUT partition for one of the supported single-rate filter types, calculate FL as shown in the following table. Then, specify the partition as a vector whose elements sum to FL.

Filter TypeFilter Length (FL) Calculation
dfilt.dffir
FL = length(find(Hd.numerator~= 0))
dfilt.dfsymfir
dfilt.dfasymfir
FL = ceil(length(find(Hd.numerator~= 0))/2)

The following figure shows a Digital Filter configured for a direct form FIR filter of length 11.

The following control file defines one possible LUT partitioning for this filter:

function c = filter_da_config1
c = hdlnewcontrol(mfilename);

c.forEach('*',...
 'dsparch4/Digital Filter', {},...
 'hdldefaults.DigitalFilterHDLInstantiation', {'DALUTpartition',[4 4 3]});

The following figure shows a Digital Filter configured for a direct-form symmetric FIR filter of length 6:

The following control file defines a possible LUT partitioning for this filter.

function c = filter_da_config1
c = hdlnewcontrol(mfilename);

c.forEach('*',...
 'dsparch4/Digital Filter', {},...
 'hdldefaults.DigitalFilterHDLInstantiation', {'DALutpartition',[3 3]});

You can also specify generation of DA code for your filter design without LUT partitioning. To do so, specify a vector of one element, whose value is equal to the filter length. For example, the following figure shows a Digital Filter configuration for a direct form FIR filter of length 5.

The following control file specifies a partition that is equal to the filter length:

function c = filter_da_config1
c = hdlnewcontrol(mfilename);

c.forEach('*',...
 'dsparch4/Digital Filter', {},...
 'hdldefaults.DigitalFilterHDLInstantiation', {'DALutpartition',5});

DARadix Implementation Parameter

Syntax: 'DARadix', N

DARadix specifies the number of bits processed simultaneously in DA. The number of bits is expressed as N, which must be:

The default value for N is 2, specifying processing of one bit at a time, or fully serial DA, which is slow but low in area. The maximum value for N is 2^W, where W is the input word size of the filter. This maximum specifies fully parallel DA, which is fast but high in area. Values of N between these extrema specify partly serial DA.

Special Cases

Coefficients with Zero Values.   DA ignores taps that have zero-valued coefficients and reduces the size of the DA LUT accordingly.

Considerations for Symmetrical and Asymmetrical Filters.   For symmetrical (dfilt.dfsymfir) and asymmetrical (dfilt.dfasymfir) filters:

Holding Input Data in a Valid State.   In filters with a DA architecture, data can be delivered to the outputs N cycles (N >= 2) later than the inputs. You can use the HoldInputDataBetweenSamples property to determine how long (in terms of clock cycles) input data values are held in a valid state, as follows:

InputPipeline

InputPipeline lets you specify a implementation with input pipelining for selected blocks. The parameter value specifies the number of input pipeline stages (pipeline depth) in the generated code.

Syntax:

 {'InputPipeline', nStages}

where nStages >= 0.

The following forEach call specifies an input pipeline depth of two stages for all Sum blocks in the model:

config.forEach('*',...
 'built-in/Sum', {},...
 'hdldefaults.SumRTW', {'InputPipeline', 2});

When generating code for pipeline registers, the coder appends a postfix string to names of input or output pipeline registers. The default postfix string is _pipe. To customize the postfix string, use the Pipeline postfix option in the Global Settings / General pane in the HDL Coder pane of the Configuration Parameters dialog box. Alternatively, you can pass the desired postfix string in the makehdl property PipelinePostfix. See PipelinePostfix for an example.

OutputPipeline

OutputPipeline lets you specify a implementation with output pipelining for selected blocks. The parameter value specifies the number of output pipeline stages (pipeline depth) in the generated code.

Syntax:

 {'OutputPipeline', nStages}

where nStages >= 0.

The following forEach call specifies an output pipeline depth of two stages for all Sum blocks in the model:

config.forEach('*',...
 'built-in/Sum', {},...
 'hdldefaults.SumRTW', {'OutputPipeline', 2});

When generating code for pipeline registers, the coder appends a postfix string to names of input or output pipeline registers. The default postfix string is _pipe. To customize the postfix string, use the Pipeline postfix option in the Global Settings / General pane in the HDL Coder pane of the Configuration Parameters dialog box. Alternatively, you can pass the desired postfix string in the makehdl property PipelinePostfix. See PipelinePostfix for an example.

See also Distributed Pipeline Insertion.

ResetType

The ResetType implementation parameter lets you suppress generation of reset logic for the following block types:

Syntax:

 {'ResetType', 'default'}
 {'ResetType', 'none'}

When you specify {'ResetType', 'none'} for a selection of one or more blocks, the coder overrides the Global Settings/Advanced Reset type option for the specified blocks only. Reset signals and synchronous or asynchronous reset logic (as specified by Reset type) is still generated as required for other blocks.

The default specification is {'ResetType', 'default'}. In this case, the coder follows the Global Settings/Advanced Reset type option for the specified blocks.

The following control file specifies suppression of reset logic for a specific unit delay block within a subsystem.

function c = resetnone_examp

% Control file for resetnone_examp
c = hdlnewcontrol(mfilename);
c.generateHDLFor('resetnone_examp/HDLSubsystem');

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Suppress reset logic for Unit Delay block

c.forEach('resetnone_examp/HDLSubsystem/Unit Delay',...
 'built-in/UnitDelay', {},...
 'hdldefaults.UnitDelayRTW', {'ResetType','none'});

Interface Generation Parameters

Some block implementation parameters let you customize features of an interface generated for the following block types:

For example, you can specify generation of a black box interface for a subsystem, and pass parameters that specify the generation and naming of clock, reset, and other ports in HDL code. For more information about interface generation parameters, see Customizing the Generated Interface.

  


Related Products & Applications

Learn more about Simulink through this collection of videos, articles, technical literature and the Getting Started with Simulink Guide.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS