Filter Design Toolbox 4.6
Floating-Point to Fixed-Point Conversion of FIR Filters
We illustrate the main aspects of converting FIR filters from a floating-point to a fixed-point implementation to a fixed-point one. This conversion requires a 2-step process:
- Quantizing the Coefficients
- Performing Dynamic Range Analysis
See also Floating-Point to Fixed-Point Conversion of IIR Filters
Contents
Designing the Filter
We design an equiripple bandpass filter for this task. The passband is defined by the [.45 .55] range of normalized frequencies. The amount ripple acceptable in the passband is set to 1 dB. The first stopband covers the [0 .35] range of normalized frequencies while the second stopband covers the [.65 1] range. Both stopbands must provide 60 dB of attenuation.
f = fdesign.bandpass(.35,.45,.55,.65,60,1,60);
Hd = design(f, 'equiripple');
Step 1: Quantizing the Coefficients
First, we verify that 12 bits are sufficient to represent the coefficients:
Hd.Arithmetic = 'fixed'; set(Hd, 'CoeffwordLength', 12); hfvt = fvtool(Hd, 'Color', 'white'); legend(hfvt,' ', 'Location', 'Best');
Unlike IIR filters, FIR filters are good candidates for full-precision fixed-point implementation. This makes the process of converting a floating-point implementation to a fixed-point implementation easier for FIRs than for IIRs.
Notice however that the input range is of crucial importance for the static range analysis on which the full precision implementation is based. Therefore, if we do not want to resort to dynamic range scaling, we must verify that data we send to the filter is within the input range defined by the InputWordLength and InputFracLength properties. Let's assume the input lies in the [-2,2) range. This translate in an InputFracLength of 14 when the InputWordLength is set to 16 bits (default).
Hd.InputFracLength = 14; rand('state',5); q = quantizer([Hd.InputWordLength,Hd.InputFracLength],'RoundMode','round'); xq = randquant(q,1000,1); x = fi(xq,true,Hd.InputWordLength,Hd.InputFracLength); yfullprec = filter(Hd,x);
We can verify that this full precision output is really the best we can hope to achieve. By comparing it to a reference that is computed using the quantized coefficients and double-precision, floating-point arithmetic.
hdouble = double(Hd);
yref = filter(hdouble,x);
norm(double(yfullprec)-yref) % total error
ans =
0
The error is exactly zero, showing that no quantization is being introduced in the accumulator. The products are set by default to full precision, so we know that no errors are occurring there. Finally, the output has the same specifications as the accumulator, which eliminates quantization error at the output completely.
Specifying Data Widths Constraints
Performing full precision fixed-point arithmetic is a convenient starting point, but it may not always lead to word lengths that are actually available for a given hardware. In our example, 31 bits would be necessary to represent the full precision output:
yfullprec.wordLength
ans =
31
We must consider data widths constraints and simulate a piece of hardware that would have 16-bit data buses, a 24-bit multiplier and an accumulator with 4 guard bits. Also, we must assume the input data comes from a 12-bit ADC.
set(Hd, 'InputWordLength', 12, ... 'FilterInternals', 'SpecifyPrecision', ... 'ProductWordLength', 24, 'AccumWordLength', 28, ... 'OutputWordLength', 16)
Step 2: Performing Dynamic Range Analysis
The second step of a "float-to-fixed" conversion consists of applying dynamic range analysis to the filter to fine tune the scaling for each node. The maxima and minima obtained from a floating-point simulation are used to set fraction lengths such that the simulation range is covered and the precision is maximized. There are no constraints on the range of the input stimulus. We could use random, uniformly distributed white noise data with a range of [-2,2) for example. Alternatively, we can generate the stimulus that will cover the largest dynamic range in the filter. The scaling based on this stimulus is more conservative because it ensures that no overflow will occur, no matter what the input signal actually is. That "worst-case" input signal is a scaled version of the sign of the flipped impulse response.
x = 1.9*sign(fliplr(impz(Hd))); Hd = autoscale(Hd,x);
We can verify that the filter is properly scaled by running the filter in fixed-point:
fipref('LoggingMode', 'on', 'DataTypeOverride', 'ForceOff'); y = filter(Hd,x); fipref('LoggingMode', 'off'); R = qreport(Hd)
R =
Fixed-Point Report
-----------------------------------------------------------
----------------------------------
Min Max | Range
| Number of Overflows
-----------------------------------------------------------
----------------------------------
Input: -1.9003906 1.9003906 | -2 1.9
990234 | 0/48 (0%)
Output: -3.2658691 3.3671875 | -4 3.9
998779 | 0/48 (0%)
Product: -0.23522902 0.23522902 | -0.5 0.49
999994 | 0/2304 (0%)
Accumulator: -3.2658324 3.3672082 | -8 7.9
999999 | 0/2256 (0%)
We verify that there is no overflow, i.e., all the signals are within available dynamic range. The magnitude response estimate shows that the fixed-point implementation is within the spectral mask and the filter has been properly scaled.
set(hfvt,'Filters',Hd,'Analysis', 'magestimate', 'Color', 'white'); legend(hfvt,' ', 'Location', 'NorthEast');
Summary
We presented a simple 2-steps procedure to convert a floating-point FIR filter to a fixed-point implementation. The FIR filter objects of the Filter Design Toolbox™ have a full precision fixed-point mode that provides a convenient starting point when only the range of the input data is not known. In addition, functions like 'autoscale' (for the dynamic range scaling of the internal signals) and 'qreport' (for verification) make the scaling automatic for every data widths constraints.
Store