Optimize speed or area of generated HDL code

Optimization properties for filter generation.

These properties configure the HDL architecture of your filter to improve speed or reduce
area. You can customize a serial filter architecture, select alternative multiplier
implementations, and add pipeline registers. Specify these properties as
`Name,Value`

arguments to the `generatehdl`

function, or set the corresponding options on the **Filter
Architecture** tab of the Generate HDL dialog box.

`AddPipelineRegisters`

— Optimize clock rate of generated filter code, by adding pipeline registers`'off'`

(default) | `'on'`

When you set this property to `'on'`

, the coder
adds a pipeline register between stages of computation in a filter.
For example, for a sixth-order IIR filter, the coder adds two pipeline
registers, one between the first and second sections and one between
the second and third sections. Although the registers add to the overall
filter latency, they provide significant improvements to the clock
rate. For FIR filters, the use of pipeline registers optimizes filter
final summation. The coder forces a tree structure for non-transposed
FIR filters. This setting overrides the setting of the `FIRAdderStyle`

property.

Filter Type | Location of Added Pipeline Register |
---|---|

FIR transposed | Between coefficient multipliers and adders |

FIR, asymmetric FIR, and symmetric FIR | Between levels of a tree-based final adder |

IIR | Between sections |

For details, see Optimizing Final Summation for FIR Filters.

Pipeline registers in FIR, antisymmetric FIR, and symmetric
FIR filters can produce numeric results that differ from the results
produced by the original filter object. The difference occurs because
adding pipeline registers forces the tree mode of final summation.
In such cases, consider adjusting the generated test bench error margin
with the `ErrorMargin`

property.

You cannot use this property with a fully serial or cascade serial filter implementation.

`FIRAdderStyle`

— Final summation technique used for FIR filters`'linear'`

(default) | `'tree'`

By default, the coder generates linear adder summation logic.
Set this property to `'tree'`

to increase clock speed
while using the same area. The tree architecture computes products
in parallel, rather than sequentially, and it creates a final adder
that performs pairwise addition on successive products.

Another option for FIR filter sum implementation is to set the `AddPipelineRegisters`

property
to `'on'`

. The pipelined implementation produces
results similar to tree mode, with the addition of a stage of pipeline
registers after processing each level of the tree.

Consider the following tradeoffs when selecting the final summation technique for your filter:

The number of adder operations for linear and tree mode are the same, but the timing for tree mode can be significantly better due to parallel execution of sums.

Pipeline mode optimizes the clock rate, but increases the filter latency by the base 2 logarithm of the number of products to be added, rounded up to the nearest integer.

Linear mode can help maintain numeric accuracy in comparison to the original filter function. Tree and pipeline modes can produce numeric results that differ from the results produced by the original filter function.

See Optimizing Final Summation for FIR Filters.

You cannot use this property with a fully serial or cascade serial filter implementation.

`AddInputRegister`

— Generate extra register on filter input in HDL code`'on'`

(default) | `'off'`

By default, the coder adds an extra input register to the generated
HDL code for the filter. The code declares a signal named `input_register`

and
includes a `PROCESS`

statement that controls the
register. You can set other properties to control the names of the
clock, clock enable, and reset signals, the polarity of the reset
signal, and the coding style that checks for clock events. See Ports and Identifiers Properties.

Input_Register_Process : PROCESS (clk, reset) BEGIN IF reset = '1' THEN input_register <= (OTHERS => '0'); ELSIF clk'event AND clk = '1' THEN IF clk_enable = '1' THEN input_register <= input_typeconvert; END IF; END IF; END PROCESS Input_Register_Process ;

When you set this property to `'off'`

, the
coder omits the extra input register from the generated HDL code for
the filter. Consider omitting the extra register if you are incorporating
the filter into HDL code that has an existing register to drive the
filter input. Also, omit the extra register if the latency it introduces
to the filter is not tolerable.

`AddOutputRegister`

— Generate extra register for filter output in HDL code`'on'`

(default) | `'off'`

By default, the coder adds an extra output register to the generated
HDL code for the filter. The code declares a signal named `output_register`

and
includes a `PROCESS`

statement that controls the
register. You can set other properties to control the names of the
clock, clock enable, and reset signals, the polarity of the reset
signal, and the coding style that checks for clock events. See Ports and Identifiers Properties.

Output_Register_Process : PROCESS (clk, reset) BEGIN IF reset = '1' THEN output_register <= (OTHERS => '0'); ELSIF clk'event AND clk = '1' THEN IF clk_enable = '1' THEN output_register <= output_typeconvert; END IF; END IF; END PROCESS Output_Register_Process ;

When you set this property to `'off'`

, the
coder omits the extra output register from the generated HDL code
for the filter. Consider omitting the extra register if you are incorporating
the filter into HDL code that has an existing output register. Also,
omit the extra register if the latency it introduces to the filter
is not tolerable.

`MultiplierInputPipeline`

— Number of pipeline stages at multiplier inputs for FIR filters`0`

(default) | nonnegative integerFor FIR filters, the coder generates this number of pipeline
stages on each multiplier input. The number of multipliers must be
an integer greater than or equal to zero. Multiplier pipelining can
help you achieve significantly higher clock rates. The coder ignores
this property if `CoeffMultipliers`

is not set
to `'multipliers'`

.

`MultiplierOutputPipeline`

— Number of pipeline stages at multiplier outputs for FIR filters`0`

(default) | nonnegative integerFor FIR filters, the coder generates this number of pipeline
stages on each multiplier output. The number of multipliers must be
an integer greater than or equal to zero. Multiplier pipelining can
help you achieve significantly higher clock rates. The coder ignores
this property if `CoeffMultipliers`

is not set
to `'multipliers'`

.

`OptimizeForHDL`

— Basic optimization of data types, quantization, and filter structure`'off'`

(default) | `'on'`

By default, the coder generates a fully parallel architecture
with numerics that match the filter object exactly. However, the data
types and quantization used in the software implementation are not
necessarily optimal for HDL implementation. When you set this property
to `'on'`

, the coder generates HDL code that reduces
area of the hardware implementation and optimizes data types and quantization
effects. As a result of these optimizations, the coder can:

Implement an adder-tree structure

Make tradeoffs concerning data types

Avoid excessive quantization

Generate code that produces numeric results that differ from results produced by the original filter function

You can combine this option with the serial architecture and multiplier optimization properties.

`CoeffMultipliers`

— Implementation of coefficient multiplications in generated HDL code`'multiplier'`

(default) | `'csd'`

| `'factored-csd'`

By default, the coder retains multiplier logic in the generated HDL code. To reduce the area of the filter implementation, you can choose to implement multiplication in either canonical signed digit (CSD) or factored CSD logic. The CSD technique replaces multipliers with shift and add logic.

A CSD architecture minimizes the number of adders used for constant multiplication by representing binary numbers with a minimum count of nonzero digits. This optimization decreases the area used by the filter while maintaining or increasing clock speed.

Factored CSD replaces multiplier operations with shift and add operations on prime factors of the coefficients. This option achieves a greater area reduction than CSD, at the cost of decreasing clock speed.

This option is not supported for multirate or serial architecture filters.

`SerialPartition`

— Number and size of partitions generated for serial filter architectures`[p1 p2 ... pN]`

By default, the coder generates a fully parallel architecture,
which is equivalent to a vector of `FL`

ones, where `FL`

is
the length of the filter.

To generate a fully serial architecture, set this property to
the length of the filter, `FL`

.

To generate a partly serial architecture, set this property
to a vector of integers, `[p1 p2 p3...pN]`

. This
vector specifies the length of each of `N`

partitions.
The sum of the vector elements must be equal to the length of the
filter, `FL`

.

For a cascade of filters, set this property to ```
{[p1
p2 ... pNa],[p1 p2 ... pNb],...}
```

, where each vector in the
cell array represents a serial partitioning of an individual filter
within the cascade.

For further savings in area, you can optionally enable the `ReuseAccum`

property
to generate a cascade-serial architecture using the partitions you
specified.

For a complete description of parallel and serial architectures and a list of filter types supported for each architecture, see Speed vs. Area Tradeoffs. For an example, see Compare Serial Architectures for FIR Filter

You can specify different `SerialPartition`

values
for each stage of a cascaded filter. See Serial Partitions for Cascaded Filter.

`ReuseAccum`

— Enable accumulator reuse, when generating cascade-serial architecture for FIR filters`'off'`

(default) | `'on'`

In a cascade-serial architecture, the coder groups filter taps
into several serial partitions. The accumulated output of each partition
is cascaded to the accumulator of the previous partition. The output
of the partitions is therefore computed at the accumulator of the
first partition. This technique, called *accumulator reuse*,
saves chip area.

Set this property to `'on'`

to enable accumulator
reuse and generate a cascade-serial architecture. If the number and
size of serial partitions is not specified in the `SerialPartition`

property,
the coder generates an optimal partition.

For a complete description of parallel and serial architectures and a list of filter types supported for each architecture, see Speed vs. Area Tradeoffs. For an example, see Compare Serial Architectures for FIR Filter.

`DALUTPartition`

— Number and size of lookup table (LUT) partitions for distributed arithmetic (DA) implementation`[p1 p2 ... pN]`

Distributed arithmetic uses a lookup table to store the sums of partial products. The size of the LUT grows exponentially with the order of the filter. You can divide the LUT into several partitions, where each LUT partition operates on a different set of filter taps. This division reduces the total size of the LUT logic.

To divide the LUT into `N`

partitions, set
this property to a vector of `N`

integers that specify
the size of each partition. The maximum size for an individual partition
is 12. The sum of the vector elements must be equal to the filter
length.

To generate DA code for your filter design without LUT partitioning, specify a scalar, whose value is equal to the filter length.

fdes = fdesign.lowpass('N,Fc,Ap,Ast',4,0.4,0.05,0.03,'linear'); filt = design(fdes,'SystemObject',true); generatehdl(filt,'InputDataType',numerictype(1,16,15),'DALUTPartition',5)

Filter Type | Filter Length (FL) Calculation |
---|---|

Direct form | `FL = length(find(filt.Numerator~= 0))` |

Direct form symmetric Direct form asymmetric | `FL = ceil(length(find(filt.Numerator~= 0))/2)` |

For supported multirate filters, you can specify the LUT partition as:

A vector defining a partition for LUTs for the polyphase subfilters.

A matrix of LUT partitions, where each row vector specifies a LUT partition for a corresponding polyphase subfilter. In this case, the

`FL`

is uniform for the subfilters. This approach provides a fine control for partitioning each subfilter.

LUT Partition Specification | Filter Length (FL) Calculation |
---|---|

Vector, whose elements sum to the overall filter length, `FL` . | `FL = size(polyphase(filt),2)` |

Matrix, where each row specifies the partitions for one subfilter.
The vector elements in each row must sum to the associated subfilter
length, `FLi` . | `p = polyphase(filt)` ,
where `i` is the index to the `i` th
row of the polyphase matrix of the filter. The `i` th
row of the matrix `p` represents the `i` th
subfilter. |

For more information about distributed arithmetic, see Distributed Arithmetic for FIR Filters.

For examples, see Distributed Arithmetic for Single Rate Filters and Distributed Arithmetic for Multirate Filters.

You can specify different `DALUTPartition`

values
for each stage of a cascaded filter. See Distributed Arithmetic for Cascaded Filters.

`DARadix`

— Number of bits processed simultaneously in distributed arithmetic (DA) implementation`2`

(default) | positive power of twoThis property specifies a degree of parallelism in the DA architecture,
which can improve clock speed at the expense of area. By default,
the coder implements a fully serial DA architecture, that processes
one bit at a time (`DARadix`

= `2`

).
The value of this property, ^{1}`N`

, must be:

A nonzero positive integer that is a power of two.

Such that

`mod(W,log2(N)) = 0`

, where`W`

is the input word size of the filter.Less than

`2`

, where^{W}`W`

is the input word size of the filter. This maximum specifies a fully parallel DA architecture.

Values of `N`

between `2`

and ^{1}`2`

specify
partly serial DA. For more information on distributed arithmetic,
see Distributed Arithmetic for FIR Filters.^{W}

When setting a `DARadix`

value for symmetrical
(`dfilt.dfsymfir`

) and asymmetrical (`dfilt.dfasymfir`

)
filters, see Considerations for Symmetric and Asymmetric Filters.

You can specify different `DARadix`

values
for each stage of a cascaded filter. See Distributed Arithmetic for Cascaded Filters

`FoldingFactor`

— Folding factor of a serial architecture for IIR SOS filterfilter length (default) | integer

Use this property to define a serial architecture for direct-form I or direct-form II SOS
filters. Specify the number of clock cycles, `N`

, taken for the
computation of filter output. The generated HDL code shares multipliers to reduce area
at the cost of latency. You can specify either `NumMultipliers`

or
`FoldingFactor`

, but not both. If you do not specify either
`NumMultipliers`

or `FoldingFactor`

, the coder
generates HDL code for the filter with a fully parallel architecture. For a command-line
example, see Serial Architecture for IIR Filter. For a UI example,
see Specifying Serial Architectures for IIR SOS Filters.

`NumMultipliers`

— Number of multipliers in a serial architecture for IIR SOS filterinteger greater than 1

Use this property to define a serial architecture for direct-form I or direct-form II SOS
filters. You can specify either `NumMultipliers`

or
`FoldingFactor`

, but not both. If you do not specify either
`NumMultipliers`

or `FoldingFactor`

, the coder
generates HDL code for the filter with a fully parallel architecture. For a command-line
example, see Serial Architecture for IIR Filter. For a UI example
see Specifying Serial Architectures for IIR SOS Filters.

Was this topic helpful?

You can also select a location from the following list:

- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)