Filter Design Toolbox 4.6
Floating-Point to Fixed-Point Conversion of IIR Filters
We will demonstrate the main aspects of converting an IIR filter from a floating-point to a fixed-point implementation. Second-order sections (also referred as biquadratic) structures work better when using fixed-point arithmetic than do structures that implement the transfer function. We will show that the surest path to a successful "float-to-fixed" conversion consists of 2 steps:
- Select a Second-Order Structure
- Perform Dynamic Range Analysis for Each Node of the Filter
See also Floating-Point to Fixed-Point Conversion of FIR Filters
Contents
Design a Lowpass Chebyshev Type I Filter
We will use a minimum-order lowpass Chebyshev Type I design for the purpose of the demo. The design specifications of the filter are:
- Passband frequency edge: 0.4*pi
- Stopband frequency edge: 0.45*pi
- Passband ripple: 0.5 dB
- Stopband attenuation: 80 dB
f = fdesign.lowpass('Fp,Fst,Ap,Ast',0.4,0.45,0.5,80); Hdf1sos = design(f, 'cheby1', 'FilterStructure', 'df1sos');
Step 1: Select a Second-Order Structure
We will illustrate why converting to transfer function is a bad idea when fixed-point arithmetic will be used (and even when floating-point is to be used). First we create the filter.
[b,a] = tf(Hdf1sos); Hdf1 = dfilt.df1(b,a);
Now we set arithmetic to fixed. We use 16 bits to represent the coefficients and autoscale the fraction length. Note how the frequency response of the filter with quantized coefficients is nowhere near the reference double-precision filter's frequency response.
Hdf1.Arithmetic = 'fixed'; hfvt = fvtool(Hdf1,'Color', 'white'); legend(hfvt,'Direct-Form I')
So converting to transfer function is not advisable. Instead, if we quantize the second-order sections (SOS) we notice that the frequency response of the quantized and reference filters are indistinguishable unless we zoom in quite a bit.
Hdf1sos.Arithmetic = 'fixed'; set(hfvt, 'Filters', Hdf1sos); axis([0 1 -120 5]) legend(hfvt,'Direct-Form I SOS', 'Location','NorthEast')
Obtaining a good frequency response for the quantized filter as we have done here is a necessary step but, alone, not sufficient in order to filter data adequately. We will use some random data in the [-1, 1) range to filter and compare against.
rand('state',5); q = quantizer([10,9],'RoundMode','round'); xq = randquant(q,1000,1); x = fi(xq,true,10,9);
Furthermore assume that the hardware we target has an accumulator with no guard bits:
Hdf1sos.AccumWordLength = Hdf1sos.ProductWordLength;
Next, we turn on the logging of min/max and overflows and filter the test data we created. The filtering is done using the quantized coefficients, but the operations are performed with double-precision, floating-point arithmetic. The allowable ranges for the fixed-point settings are retained so that we can observe how many values are out of range.
fipref('LoggingMode', 'on', 'DataTypeOverride', 'ScaledDoubles'); y = filter(Hdf1sos,x); fipref('LoggingMode', 'off', 'DataTypeOverride', 'ForceOff'); R = qreport(Hdf1sos)
R =
Double-Precision Floating-Point
Report
-----------------------------------------------------------
----------------------------------
Min Max | Fixed-Point Range
| Out of Range
-----------------------------------------------------------
----------------------------------
Input: -1 0.99609375 | -1 0.99
996948 | 0/1000 (0%)
Output: -0.96015412 1.2707835 | -16 15.
999512 | 0/1000 (0%)
Num States: -298.93939 322.9527 | -1 0.99
996948 | 12374/20000 (62%)
Den States: -1923.3548 2077.8547 | -1 0.99
996948 | 8299/20000 (41%)
Num Prod: -597.87877 645.9054 | -8
8 | 14689/30000 (49%)
Den Prod: -2133.4028 1974.7726 | -4
4 | 13826/20000 (69%)
Num Acc: -805.97944 873.84231 | -32
32 | 18152/20000 (91%)
Den Acc: -2133.4028 2077.8547 | -16
16 | 20000/20000 (100%)
Notice the large quantity of overflow values in every stage of the filtering operation. A measure toward reducing the overflow problem is to scale the second-order section filters using one of the available scaling criteria. The filtering operation will be least prone to overflow if we chose the most stringent scaling, 'l1', however, this stringent normalization will also cause the worst signal-to-noise ratio (SNR). Linf-scaling is the most commonly used scaling in practice since it offers a good compromise between overflow prevention, and SNR. For this reason we will use it for the example at hand.
So let us set the second-order section scaling norm to 'Linf'. This can be done by either calling the scale method for Hdf1sos
scale(Hdf1sos,'Linf');
or by calling the design method with the appropriate parameters
Hdf1sos = design(f, 'cheby1', 'FilterStructure', 'df1sos','SOSScaleNorm','Li nf');
Now that we have set the second-order section scaling to 'Linf' we may repeat the overflow analysis:
Hdf1sos.Arithmetic = 'fixed'; Hdf1sos.AccumWordLength = Hdf1sos.ProductWordLength; fipref('LoggingMode', 'on', 'DataTypeOverride', 'ScaledDoubles'); y = filter(Hdf1sos,x); fipref('LoggingMode', 'off', 'DataTypeOverride', 'ForceOff'); R = qreport(Hdf1sos)
R =
Double-Precision Floating-Point
Report
-----------------------------------------------------------
----------------------------------
Min Max | Fixed-Point Range
| Out of Range
-----------------------------------------------------------
----------------------------------
Input: -1 0.99609375 | -1 0.99
996948 | 0/1000 (0%)
Output: -0.96051591 1.2712625 | -16 15.
999512 | 0/1000 (0%)
Num States: -1 0.99609375 | -1 0.99
996948 | 0/20000 (0%)
Den States: -0.96051591 1.2712625 | -1 0.99
996948 | 5/20000 (0%)
Num Prod: -0.47138325 0.61167269 | -2
2 | 0/30000 (0%)
Den Prod: -1.3631337 1.2159397 | -4
4 | 0/20000 (0%)
Num Acc: -0.90354824 1.1380514 | -8
8 | 0/20000 (0%)
Den Acc: -1.4771734 1.9656445 | -16
16 | 0/20000 (0%)
The report clearly indicates a considerable reduction in the number of overflow cases. Actually, out-of-range values only occur at the denominator states. Event though they have been considerably reduced by the second-order section scaling, these potential overflows could still create problems in the filtering operation. Note that although the use of 'Linf' norm did not completely avoid overflows (the reason being that the filter has nonlinear phase frequency components that can still add up to a point that causes overflow), it greatly reduced their occurrence. As mentioned earlier, to completely remove overflows, a more stringent 'l1' scaling is necessary - but such scaling may considerably reduce the SNR and is typically avoided. A better way to completely remove overflow cases is described in Step 2.
To summarize, Step 1 consists of selecting an appropriate second-order section filter structure, and scaling criteria.
Step 2: Perform Dynamic Range Analysis
The second step towards a fixed point conversion of IIR filters consists of applying dynamic range analysis to the filter to fine-tune the scaling for each node. The maxima and minima obtained from a floating-point simulation are used to set fraction lengths such that the simulation range is covered and the precision is maximized. Word lengths are not changed.
Hdf1sosf = autoscale(Hdf1sos,x);
We can verify that the settings are correct by running the filter in fixed-point:
fipref('LoggingMode', 'on', 'DataTypeOverride', 'ForceOff'); y = filter(Hdf1sosf,x); fipref('LoggingMode', 'off'); R = qreport(Hdf1sosf)
R =
Fixed-Point Report
-----------------------------------------------------------
----------------------------------
Min Max | Range
| Number of Overflows
-----------------------------------------------------------
----------------------------------
Input: -1 0.99609375 | -1 0.99
996948 | 0/1000 (0%)
Output: -0.96026611 1.2714844 | -2 1.
999939 | 0/1000 (0%)
Num States: -1 0.99609375 | -1 0.99
996948 | 0/20000 (0%)
Den States: -0.96026611 1.2714844 | -2 1.
999939 | 0/20000 (0%)
Num Prod: -0.47122891 0.61166532 | -2
2 | 0/30000 (0%)
Den Prod: -1.3627193 1.2161518 | -8
8 | 0/20000 (0%)
Num Acc: -0.90327533 1.1379978 | -2
2 | 0/20000 (0%)
Den Acc: -1.4768949 1.9657354 | -8
8 | 0/20000 (0%)
All overflows are now removed for this input signal. Furthermore, a close look at the fixed-point report suggests that the two most significant bits are not used for the denominator product and accumulator.
The magnitude response estimate, along with the power spectral density of filter output due to roundoff noise, further confirm that the fixed-point filter gives results very close to the double precision floating-point reference.
set(hfvt, 'Filters', Hdf1sosf,'Analysis', 'magestimate'); legend(hfvt, ' ') hfvt2 = fvtool(Hdf1sosf,'Analysis', 'noisepower', 'Color', 'white'); legend(hfvt2, ' ')
Summary
We have outlined a simple two-step procedure to convert a floating-point IIR filter to a fixed-point implementation. The filter objects of the Filter Design Toolbox™ are equipped with an 'autoscale' function that automatically and dynamically scale internal signals. In addition, functions such as 'qreport' and various analyses of 'fvtool' give tools to the user to perform verifications at each step of the process.
Store