This property applies to framebased filters. It specifies how
many pipeline registers the architecture includes between levels of
the adder tree. These pipeline stages increase filter throughput while
adding latency. The default value is 0
. To improve
the speed of this architecture, the recommended setting is 2
.
Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the coder automatically adds balancing delays on parallel data paths.
For more information on the framebased filter architecture, see FrameBased Architecture.
This property applies to scalar input filters. When you enable
this property, the default linear adder of the filter is implemented
as a pipelined tree adder instead. This architecture increases filter
throughput while adding latency. The default value is off
.
The following limitations apply to AddPipelineRegisters
:
If you use AddPipelineRegisters
,
the code generator forces full precision in the HDL and the generated
filter model. This option implements a pipelined adder tree structure
in the HDL code for which only full precision is supported. If you
generate a validation model, you must use full precision in the original
model to avoid validation mismatches.
Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the coder automatically adds balancing delays on parallel data paths.
Note
When you use this property with the CIC Interpolation (DSP System Toolbox) block, delays in parallel paths are not automatically balanced. Manually add delays where needed by your design.
For filter architecture diagrams that indicate where the pipeline stages are added, see HDL Filter Architectures.
You can use the ChannelSharing
implementation
parameter with a multichannel filter to enable sharing a single filter
implementation among channels for a more areaefficient design. This
parameter is either 'on'
or 'off'
.
The default is 'off'
, and a separate filter will
be implemented for each channel.
See Multichannel FIR Filter for FPGA (DSP System Toolbox).
The CoeffMultipliers
implementation parameter
lets you specify use of canonical signed digit (CSD) or factored CSD
optimizations for processing coefficient multiplier operations in
code generated for certain filter blocks. Specify the CoeffMultipliers
parameter
using one of the following options:
'csd'
: Use CSD techniques to replace
multiplier operations with shiftandadd operations. CSD techniques
minimize the number of addition operations required for constant multiplication
by representing binary numbers with a minimum count of nonzero digits.
This representation decreases the area used by the filter while maintaining
or increasing clock speed.
'factoredcsd'
: Use factored CSD
techniques, which replace multiplier operations with shiftandadd
operations on prime factors of the coefficients. This option lets
you achieve a greater filter area reduction than CSD, at the cost
of decreasing clock speed.
'multipliers'
(default): Retain
multiplier operations.
HDL Coder™ supports CoeffMultipliers
for
fullyparallel filter implementations. It is not supported for fullyserial
and partlyserial architectures.
The size of the LUT grows exponentially with the order of the
filter. For a filter with N
coefficients, the LUT
must have 2^N
values. For higher order filters,
LUT size must be reduced to reasonable levels. To reduce the size,
you can subdivide the LUT into a number of LUTs, called LUT
partitions. Each LUT partition operates on a different
set of taps. The results obtained from the partitions are summed.
For example, for a 160tap filter, the LUT size is (2^160)*W
bits,
where W
is the word size of the LUT data. Dividing
this into 16 LUT partitions, each taking 10 inputs (taps), the total
LUT size is reduced to 16*(2^10)*W
bits.
Although LUT partitioning reduces LUT size, more adders are required to sum the LUT data.
You can use DALUTPartition
to enables DA
code generation and specify the number and size of LUT partitions.
Specify LUT partitions as a vector of integers [p1
p2...pN]
where:
N
is the number of partitions.
Each vector element specifies the size of a partition. The maximum size for an individual partition is 12.
The sum of all vector
elements equals the filter length FL
. FL
is
calculated differently depending on the filter type. You can find
how FL is calculated for different filter types in the next section.
See Distributed Arithmetic for HDL Filters.
To determine the LUT partition for one of the supported singlerate
filter types, calculate FL
as shown in the following
table. Then, specify the partition as a vector whose elements sum
to FL
.
Filter Type  Filter Length (FL) Calculation 

Directform FIR  FL = length(find(Hd.numerator ~= 0)) 
Directform asymmetrical FIR, directform symmetrical FIR  FL = ceil(length(find(Hd.numerator ~= 0))/2) 
You can also specify generation of DA code for your filter design without LUT partitioning. To do so, specify a vector of one element, whose value is equal to the filter length.
For supported multirate filters (FIR Decimation and FIR Interpolation), you can specify the LUT partition as
A vector defining a partition for LUTs for all polyphase subfilters.
A matrix of LUT partitions, where each row vector
specifies a LUT partition for a corresponding polyphase subfilter.
In this case, the FL
is uniform for all
subfilters. This approach provides fine control for partitioning each
subfilter.
The following table shows the FL
calculations
for each type of LUT partition.
LUT Partition  Filter Length (FL) Calculation 

Vector: Determine FL as
shown in the Filter Length (FL) Calculation column
to the right. Specify the LUT partition as a vector of integers whose
elements sum to FL .  FL = size(polyphase(Hm), 2) 
Matrix: Determine the subfilter length FL i based
on the polyphase decomposition of the filter, as shown in the Filter Length (FL) Calculation column to the
right. Specify the LUT partition for each subfilter as a row vector
whose elements sum to FL i.  p = polyphase(Hm); FLi = length(find(p(i,:))); p represents the ith
subfilter. 
The inherently bitserial nature of DA can limit throughput.
To improve throughput, the basic DA algorithm can be modified to compute
more than one bit sum at a time. The number of simultaneously computed
bit sums is expressed as a power of two called the DA radix.
For example, a DA radix of 2 (2^1
) indicates that
one bit sum is computed at a time. A DA radix of 4 (2^2
)
indicates that two bit sums are computed at a time, and so on.
To compute more than one bit sum at a time, the LUT is replicated. For example, to perform DA on 2 bits at a time (radix 4), the odd bits are fed to one LUT and the even bits are simultaneously fed to an identical LUT. The LUT results corresponding to odd bits are leftshifted before they are added to the LUT results corresponding to even bits. This result is then fed into a scaling accumulator that shifts its feedback value by 2 places.
Processing more than one bit at a time introduces a degree of parallelism into the operation, improving speed at the expense of area.
You can use DARadix
to specify the number
of bits processed simultaneously in DA. The number of bits is expressed
as N
, which must be:
A nonzero positive integer that is a power of two
Such that mod(W, log2(N)) = 0
,
where W
is the input word size of the filter
The default value for N
is 2, specifying
processing of one bit at a time, or fully serial DA, which is slow
but low in area. The maximum value for N
is 2^W
,
where W
is the input word size of the filter. This
maximum specifies fully parallel DA, which is fast but high in area.
Values of N
between these extrema specify partly
serial DA.
Note
When setting a DARadix
value for symmetrical
and asymmetrical filters, see Considerations for Symmetrical and Asymmetrical Filters.
FoldingFactor
specifies the total number
of clock cycles taken for the computation of filter output in an IIR
SOS filter with serial architecture. It is complementary with NumMultipliers. You must select one property
or the other; you cannot use both. If you do not specify either FoldingFactor
or NumMultipliers
,
HDL code for the filter is generated with fully parallel architecture.
You can use this parameter to generate a specified number of pipeline stages at multiplier inputs for FIR filter structures. The default value is 0.
The following limitation applies to MultiplierInputPipeline
:
Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the coder automatically adds balancing delays on parallel data paths.
For diagrams of where these pipeline stages occur in the filter architecture, see HDL Filter Architectures.
You can use this parameter to generate a specified number of pipeline stages at multiplier outputs for FIR filter structures. The default value is 0.
The following limitation applies to MultiplierOutputPipeline
:
Pipeline stages introduce delays along the path in the model that contains the affected filter. When you enable this pipeline option, the coder automatically adds balancing delays on parallel data paths.
For diagrams of where these pipeline stages occur in the filter architecture, see HDL Filter Architectures.
NumMultipliers
specifies the total number
of multipliers used for the filter implementation in an IIR SOS filter
with serial architecture. It is complementary with FoldingFactor property. You must select
one property or the other; you cannot use both. If you do not specify
either FoldingFactor
or NumMultipliers
,
HDL code for the filter is generated with fully parallel architecture.
You can use this parameter to enable or disable accumulator reuse in a serial HDL architecture. The default is a fully parallel architecture.
To Generate This Architecture...  Set ReuseAccum to... 

Fully parallel  Omit this property 
Fully serial  Not specified, or 'off' 
Partly serial  'off' 
Cascadeserial with explicitly specified partitioning  'on' 
Cascadeserial with automatically optimized partitioning  'on' 
For more information on parallel and serial filter architectures, see HDL Filter Architectures
Use this parameter to specify partitions for a serial filter architecture. The default is a fully parallel architecture.
To Generate This Architecture...  Set SerialPartition to... 

Fully parallel  Omit this property 
Fully serial  N , where N is the length of the filter 
Partly serial  [p1 p2 p3...pN] : A vector of integers having N
elements, where N is the number of serial partitions. Each element of the
vector specifies the length of the corresponding partition. The sum of the vector elements
must be equal to the length of the filter. When you define the partitioning for a partly
serial architecture, consider the following:

Cascadeserial with explicitly specified partitioning  [p1 p2 p3...pN] : A vector of N integers, where
N is the number of serial partitions. Each element of the vector
specifies the length of the corresponding partition. The sum of the vector elements must be
equal to the length of the filter. The values of the vector elements must be in descending
order, except the last two elements, which can be equal. For example, for a filter length of
8, partitions [5 3] or [4 2 2] are valid, but the
partitions [2 2 2 2] and [3 2 3] raise an error at code
generation time. 
Cascadeserial with automatically optimized partitioning  Omit this property. 
For more information on parallel and serial filter architectures, see HDL Filter Architectures.
This property is also used for Min/Max blocks with cascadeserial architectures. For how to configure Min/Max cascades, see SerialPartition.