Documentation

Optimization Properties

Optimize speed or area of generated HDL code

Optimization properties for filter generation.

These properties configure the HDL architecture of your filter to improve speed or reduce area. You can customize a serial filter architecture, select alternative multiplier implementations, and add pipeline registers. Specify these properties as Name,Value arguments to the generatehdl function, or set the corresponding options on the Filter Architecture tab of the Generate HDL dialog box.

Speed Optimization

expand all

When you set this property to 'on', the coder adds a pipeline register between stages of computation in a filter. For example, for a sixth-order IIR filter, the coder adds two pipeline registers, one between the first and second sections and one between the second and third sections. Although the registers add to the overall filter latency, they provide significant improvements to the clock rate. For FIR filters, the use of pipeline registers optimizes filter final summation. The coder forces a tree structure for non-transposed FIR filters. This setting overrides the setting of the FIRAdderStyle property.

Filter TypeLocation of Added Pipeline Register
FIR transposedBetween coefficient multipliers and adders
FIR, asymmetric FIR, and symmetric FIR Between levels of a tree-based final adder
IIRBetween sections

For details, see Optimizing Final Summation for FIR Filters.

Note

Pipeline registers in FIR, antisymmetric FIR, and symmetric FIR filters can produce numeric results that differ from the results produced by the original filter object. The difference occurs because adding pipeline registers forces the tree mode of final summation. In such cases, consider adjusting the generated test bench error margin with the ErrorMargin property.

You cannot use this property with a fully serial or cascade serial filter implementation.

By default, the coder generates linear adder summation logic. Set this property to 'tree' to increase clock speed while using the same area. The tree architecture computes products in parallel, rather than sequentially, and it creates a final adder that performs pairwise addition on successive products.

Another option for FIR filter sum implementation is to set the AddPipelineRegisters property to 'on'. The pipelined implementation produces results similar to tree mode, with the addition of a stage of pipeline registers after processing each level of the tree.

Consider the following tradeoffs when selecting the final summation technique for your filter:

  • The number of adder operations for linear and tree mode are the same, but the timing for tree mode can be significantly better due to parallel execution of sums.

  • Pipeline mode optimizes the clock rate, but increases the filter latency by the base 2 logarithm of the number of products to be added, rounded up to the nearest integer.

  • Linear mode can help maintain numeric accuracy in comparison to the original filter function. Tree and pipeline modes can produce numeric results that differ from the results produced by the original filter function.

See Optimizing Final Summation for FIR Filters.

You cannot use this property with a fully serial or cascade serial filter implementation.

By default, the coder adds an extra input register to the generated HDL code for the filter. The code declares a signal named input_register and includes a PROCESS statement that controls the register. You can set other properties to control the names of the clock, clock enable, and reset signals, the polarity of the reset signal, and the coding style that checks for clock events. See Ports and Identifiers Properties.

Input_Register_Process : PROCESS (clk, reset)
BEGIN
  IF reset = '1' THEN
    input_register <= (OTHERS => '0');
  ELSIF clk'event AND clk = '1' THEN
    IF clk_enable = '1' THEN
      input_register <= input_typeconvert;
    END IF;
  END IF;
END PROCESS Input_Register_Process ;

When you set this property to 'off', the coder omits the extra input register from the generated HDL code for the filter. Consider omitting the extra register if you are incorporating the filter into HDL code that has an existing register to drive the filter input. Also, omit the extra register if the latency it introduces to the filter is not tolerable.

By default, the coder adds an extra output register to the generated HDL code for the filter. The code declares a signal named output_register and includes a PROCESS statement that controls the register. You can set other properties to control the names of the clock, clock enable, and reset signals, the polarity of the reset signal, and the coding style that checks for clock events. See Ports and Identifiers Properties.

Output_Register_Process : PROCESS (clk, reset)
BEGIN
  IF reset = '1' THEN
    output_register <= (OTHERS => '0');
  ELSIF clk'event AND clk = '1' THEN
    IF clk_enable = '1' THEN
      output_register <= output_typeconvert;
    END IF;
  END IF;
END PROCESS Output_Register_Process ;

When you set this property to 'off', the coder omits the extra output register from the generated HDL code for the filter. Consider omitting the extra register if you are incorporating the filter into HDL code that has an existing output register. Also, omit the extra register if the latency it introduces to the filter is not tolerable.

For FIR filters, the coder generates this number of pipeline stages on each multiplier input. The number of multipliers must be an integer greater than or equal to zero. Multiplier pipelining can help you achieve significantly higher clock rates. The coder ignores this property if CoeffMultipliers is not set to 'multipliers'.

For FIR filters, the coder generates this number of pipeline stages on each multiplier output. The number of multipliers must be an integer greater than or equal to zero. Multiplier pipelining can help you achieve significantly higher clock rates. The coder ignores this property if CoeffMultipliers is not set to 'multipliers'.

Area Optimization

expand all

By default, the coder generates a fully parallel architecture with numerics that match the filter object exactly. However, the data types and quantization used in the software implementation are not necessarily optimal for HDL implementation. When you set this property to 'on', the coder generates HDL code that reduces area of the hardware implementation and optimizes data types and quantization effects. As a result of these optimizations, the coder can:

  • Implement an adder-tree structure

  • Make tradeoffs concerning data types

  • Avoid excessive quantization

  • Generate code that produces numeric results that differ from results produced by the original filter function

You can combine this option with the serial architecture and multiplier optimization properties.

By default, the coder retains multiplier logic in the generated HDL code. To reduce the area of the filter implementation, you can choose to implement multiplication in either canonical signed digit (CSD) or factored CSD logic. The CSD technique replaces multipliers with shift and add logic.

A CSD architecture minimizes the number of adders used for constant multiplication by representing binary numbers with a minimum count of nonzero digits. This optimization decreases the area used by the filter while maintaining or increasing clock speed.

Factored CSD replaces multiplier operations with shift and add operations on prime factors of the coefficients. This option achieves a greater area reduction than CSD, at the cost of decreasing clock speed.

This option is not supported for multirate or serial architecture filters.

By default, the coder generates a fully parallel architecture, which is equivalent to a vector of FL ones, where FL is the length of the filter.

To generate a fully serial architecture, set this property to the length of the filter, FL.

To generate a partly serial architecture, set this property to a vector of integers, [p1 p2 p3...pN]. This vector specifies the length of each of N partitions. The sum of the vector elements must be equal to the length of the filter, FL.

For a cascade of filters, set this property to {[p1 p2 ... pNa],[p1 p2 ... pNb],...}, where each vector in the cell array represents a serial partitioning of an individual filter within the cascade.

For further savings in area, you can optionally enable the ReuseAccum property to generate a cascade-serial architecture using the partitions you specified.

For a complete description of parallel and serial architectures and a list of filter types supported for each architecture, see Speed vs. Area Tradeoffs. For an example, see Compare Serial Architectures for FIR Filter

You can specify different SerialPartition values for each stage of a cascaded filter. See Serial Partitions for Cascaded Filter.

In a cascade-serial architecture, the coder groups filter taps into several serial partitions. The accumulated output of each partition is cascaded to the accumulator of the previous partition. The output of the partitions is therefore computed at the accumulator of the first partition. This technique, called accumulator reuse, saves chip area.

Set this property to 'on' to enable accumulator reuse and generate a cascade-serial architecture. If the number and size of serial partitions is not specified in the SerialPartition property, the coder generates an optimal partition.

For a complete description of parallel and serial architectures and a list of filter types supported for each architecture, see Speed vs. Area Tradeoffs. For an example, see Compare Serial Architectures for FIR Filter.

Distributed arithmetic uses a lookup table to store the sums of partial products. The size of the LUT grows exponentially with the order of the filter. You can divide the LUT into several partitions, where each LUT partition operates on a different set of filter taps. This division reduces the total size of the LUT logic.

To divide the LUT into N partitions, set this property to a vector of N integers that specify the size of each partition. The maximum size for an individual partition is 12. The sum of the vector elements must be equal to the filter length.

To generate DA code for your filter design without LUT partitioning, specify a scalar, whose value is equal to the filter length.

fdes = fdesign.lowpass('N,Fc,Ap,Ast',4,0.4,0.05,0.03,'linear');
filt = design(fdes,'SystemObject',true);
generatehdl(filt,'InputDataType',numerictype(1,16,15),'DALUTPartition',5)
The filter length is calculated differently depending on the filter type.

Filter TypeFilter Length (FL) Calculation
Direct form FL = length(find(filt.Numerator~= 0))
Direct form symmetric
Direct form asymmetric
FL = ceil(length(find(filt.Numerator~= 0))/2)

For supported multirate filters, you can specify the LUT partition as:

  • A vector defining a partition for LUTs for the polyphase subfilters.

  • A matrix of LUT partitions, where each row vector specifies a LUT partition for a corresponding polyphase subfilter. In this case, the FL is uniform for the subfilters. This approach provides a fine control for partitioning each subfilter.

LUT Partition SpecificationFilter Length (FL) Calculation
Vector, whose elements sum to the overall filter length, FL.FL = size(polyphase(filt),2)
Matrix, where each row specifies the partitions for one subfilter. The vector elements in each row must sum to the associated subfilter length, FLi.p = polyphase(filt)
FLi = length(find(p(i,:)))
, where i is the index to the ith row of the polyphase matrix of the filter. The ith row of the matrix p represents the ith subfilter.

For more information about distributed arithmetic, see Distributed Arithmetic for FIR Filters.

For examples, see Distributed Arithmetic for Single Rate Filters and Distributed Arithmetic for Multirate Filters.

You can specify different DALUTPartition values for each stage of a cascaded filter. See Distributed Arithmetic for Cascaded Filters.

This property specifies a degree of parallelism in the DA architecture, which can improve clock speed at the expense of area. By default, the coder implements a fully serial DA architecture, that processes one bit at a time (DARadix = 21). The value of this property, N, must be:

  • A nonzero positive integer that is a power of two.

  • Such that mod(W,log2(N)) = 0, where W is the input word size of the filter.

  • Less than 2W, where W is the input word size of the filter. This maximum specifies a fully parallel DA architecture.

Values of N between 21 and 2W specify partly serial DA. For more information on distributed arithmetic, see Distributed Arithmetic for FIR Filters.

When setting a DARadix value for symmetrical (dfilt.dfsymfir) and asymmetrical (dfilt.dfasymfir) filters, see Considerations for Symmetric and Asymmetric Filters.

You can specify different DARadix values for each stage of a cascaded filter. See Distributed Arithmetic for Cascaded Filters

Use this property to define a serial architecture for direct-form I or direct-form II SOS filters. Specify the number of clock cycles, N, taken for the computation of filter output. The generated HDL code shares multipliers to reduce area at the cost of latency. You can specify either NumMultipliers or FoldingFactor, but not both. If you do not specify either NumMultipliers or FoldingFactor, the coder generates HDL code for the filter with a fully parallel architecture. For a command-line example, see Serial Architecture for IIR Filter. For a UI example, see Specifying Serial Architectures for IIR SOS Filters.

Use this property to define a serial architecture for direct-form I or direct-form II SOS filters. You can specify either NumMultipliers or FoldingFactor, but not both. If you do not specify either NumMultipliers or FoldingFactor, the coder generates HDL code for the filter with a fully parallel architecture. For a command-line example, see Serial Architecture for IIR Filter. For a UI example see Specifying Serial Architectures for IIR SOS Filters.

See Also

Was this topic helpful?