Filter Design Toolbox 4.6
Working with Fixed-Point Direct-Form FIR Filters
This demonstration illustrates various aspects of working with FIR filters implemented with the direct-form structure using fixed-point arithmetic.
See also Getting Started with Fixed-Point Filters
Contents
- Designing the Filter
- Comparing Quantized Coefficients to Non-Quantized Coefficients
- Determining the Number of Bits being Used
- Determining the Proper Coefficient Word Length
- Fixed-Point Filtering
- Generating Training Input Data
- Generating a Baseline Output to Compare Against
- Computing the Fixed-Point Output
- The Advantages of Having Guard Bits
- Avoiding Overflow with No Guard Bits
Designing the Filter
The FIR filter to use is not critical. Since we will use the direct-form structure, it doesn't even need to have linear phase. For this demonstration we will use a simple least-squares design.
f=fdesign.lowpass('N,Fp,Fst',80,.11,.19); % Specifications
A filter object results from the design method. It associates coefficients with a particular filter structure, here a direct-form FIR structure.
h = design(f, 'firls', 'Wpass', 1, 'WStop', 100, ... 'FilterStructure', 'dffir'); set(h,'Arithmetic','fixed'); h
h =
FilterStructure: 'Direct-Form FIR'
Arithmetic: 'fixed'
Numerator: [1x81 double]
PersistentMemory: false
CoeffWordLength: 16
CoeffAutoScale: true
Signed: true
InputWordLength: 16
InputFracLength: 15
FilterInternals: 'FullPrecision'
Comparing Quantized Coefficients to Non-Quantized Coefficients
There are several parameters for a fixed-point direct-form FIR filter. To start with, it is best to concentrate on the coefficient wordlength and fractionlength (scaling). First we use the Filter Visualization Tool to compare the quantized coefficients to the nonquantized (reference) coefficients.
hfvt = fvtool(h, 'legend', 'on', 'Color', 'white');
Determining the Number of Bits being Used
To determine the number of bits being used in the fixed-point filter, we simply look at the CoeffWordlength. To determine how they are being scaled, we can look at the CoeffAutoScale state.
get(h,'CoeffWordLength')
ans =
16
get(h,'NumFracLength')
ans =
17
This tells us that 16 bits are being used to represent the coefficients, and the least-significant bit (LSB) is weighed by 2^(-17). 16 bits is just the default number used for coefficients, but the 2^(-17) weight has been computed automatically to represent the coefficients with the best precision possible. This is controlled through the 'CoeffAutoScale' property. This property can be set to false if manual control of the coefficient scaling is desired. We simply verify that auto scaling is enabled here:
get(h,'CoeffAutoScale') % Returns a logical true
ans =
1
Determining the Proper Coefficient Word Length
We can make several copies of the filter to try different wordlengths. Allowing the coefficient auto scaling to determine the best precision in each case.
h1 = copy(h); set(h1,'CoeffWordLength',12); % Use 12 bits h2 = copy(h); set(h2,'CoeffWordLength',24); % Use 24 bits href = reffilter(h); set(hfvt, 'Filters', [href, h1, h, h2]); set(hfvt,'ShowReference','off'); % Reference already displayed once legend(hfvt,'Reference filter','12 bits','16 bits','24 bits');
12 bits are clearly not enough to faithfully represent this filter. 16 bits may be enough for most applications, so we will continue to use 16 bits in this demonstration. As a rule-of-thumb, one should expect an attainable attenuation of about 5 dB per bit.
Fixed-Point Filtering
Our main purpose is to evaluate the accuracy of the fixed-point filter when compared to a double-precision floating point version. We will see that it is not sufficient to have a faithful representation of the coefficients that keep the magnitude response approximately the same.
Generating Training Input Data
Since we just want to evaluate accuracy, we will use some random data to filter and compare against. We will create a quantizer, with a range of [-1,1) to generate random uniformly distributed white-noise data using 16 bits of wordlength.
rand('state',0); % Make results reproducible by initializing the random gene rator q = quantizer([16,15],'RoundMode','round'); xq = randquant(q,1000,1); % 1000 Data points in the range [-1,1) xin = fi(xq,true,16,15);
Generating a Baseline Output to Compare Against
When evaluating accuracy of fixed-point filtering, there are three quantities to consider:
1. The "ideal" output, this quantity is what we would like to compute. It is computed using the reference coefficients and double-precision floating-point arithmetic.
2. The best we can hope for, this is the best we can hope to achieve. It is computed using the quantized coefficients and double-precision floating-point arithmetic.
3. What we can actually compute, this is the output computed using the quantized coefficients and fixed-point arithmetic.
Clearly we want to compare what we can actually compute to the best we can hope for. This last quantity can be computed by casting the fixed-point filter to double and filtering with double-precision floating-point arithmetic.
xdouble = double(xin); hdouble = double(h); ydouble = filter(hdouble,xdouble);
For completeness we show how to compute the "ideal" output. And how much the effect of solely quantizing the coefficients affects the output of the filter.
yideal = filter(href,xdouble);
norm(yideal-ydouble) % total error
ans = 3.4886e-004
norm(yideal-ydouble,inf) % max deviation
ans = 3.7219e-005
Computing the Fixed-Point Output
Next we will perform the actual fixed-point filtering. Once again, the best we can hope to achieve is to have an output identical to ydouble.
y = filter(h,xin);
norm(double(y)-ydouble) % total error
ans =
0
norm(double(y)-ydouble,inf) % max deviation
ans =
0
The errors are exactly zero, showing that no quantization is being introduced in the accumulator. The products are set by default to full precision, so we know that no errors are occurring there. Finally the output have the same specifications as the accumulator which eliminates quantization error at the output completely.
The Advantages of Having Guard Bits
If compare the product settings, with the accumulator settings:
info(h)
Discrete-Time FIR Filter (real) ------------------------------- Filter Structure : Direct-Form FIR Filter Length : 81 Stable : Yes Linear Phase : Yes (Type 1) Arithmetic : fixed Numerator : s16,17 -> [-2.500000e-001 2.500000e-001) Input : s16,15 -> [-1 1) Filter Internals : Full Precision Output : s34,32 -> [-2 2) (auto determined) Product : s31,32 -> [-2.500000e-001 2.500000e-001) (auto deter mined) Accumulator : s34,32 -> [-2 2) (auto determined) Round Mode : No rounding Overflow Mode : No overflow
We notice that the accumulator has 3 extra bits available. This is typical of most fixed-point DSP processors. These bits are usually referred to as guard bits. They provide a safety net for intermediate overflows. The easiest way of appreciating their value is to remove them and see what happens (we adjust the output setting accordingly),
set(h,'FilterInternals','SpecifyPrecision'); set(h,'AccumWordLength',get(h,'ProductWordLength')); set(h,'OutputWordLength',get(h,'AccumWordLength'));
We now enable quantization reports. The logging capability is integrated to the 'filter' method. It is triggered when the 'Logging' FI preference is 'on'. The stored report corresponds to the last simulation. It is overwritten each time the filter command is executed.
fipref('LoggingMode', 'on'); y = filter(h,xin); R = qreport(h)
R =
Fixed-Point Report
-----------------------------------------------------------
----------------------------------
Min Max | Range
| Number of Overflows
-----------------------------------------------------------
----------------------------------
Input: -0.99954224 0.99902344 | -1 0.99
996948 | 0/1000 (0%)
Output: -0.24871957 0.24981417 | -0.25
0.25 | 0/1000 (0%)
Product: -0.14461077 0.14477883 | -0.25
0.25 | 0/81000 (0%)
Accumulator: -0.2499943 0.24997962 | -0.25
0.25 | 902/80000 (1%)
The quantization report contains the minimum and maximum values that were recorded during the last simulation (values are logged before quantization), the range and the number of overflows of different internal signals. As expected, we can see that overflows are occurring in the accumulator.
norm(double(y)-ydouble) % total error
ans =
8.0623
norm(double(y)-ydouble,inf) % max deviation
ans =
0.5000
plot([ydouble,double(y)]) xlabel('Samples'); ylabel('Amplitude') legend('ydouble','y') set(gcf, 'Color', [1 1 1])
The error is large now, because overflow occurred as can be seen in the plot.
Avoiding Overflow with No Guard Bits
It is possible to not have overflow even if guard bits are not available. From the plots of y and ydouble, it was clear that one bit for the integer part was all that was required in this specific case to avoid overflow. We can improve the results slightly with this setting, but this is specific to the current filter coefficients and input signal.
set(h,'AccumFracLength',get(h,'AccumWordLength')-1); set(h,'OutputFracLength',get(h,'AccumFracLength')); y = filter(h,xin);
R = qreport(h)
R =
Fixed-Point Report
-----------------------------------------------------------
----------------------------------
Min Max | Range
| Number of Overflows
-----------------------------------------------------------
----------------------------------
Input: -0.99954224 0.99902344 | -1 0.99
996948 | 0/1000 (0%)
Output: -0.5227344 0.64321456 | -1
1 | 0/1000 (0%)
Product: -0.14461077 0.14477883 | -0.25
0.25 | 0/81000 (0%)
Accumulator: -0.5276654 0.66335143 | -1
1 | 0/80000 (0%)
The quantization report let us verify that the overflows are eliminated and that the signals occupy the full range i.e. the scaling is optimal for this particular training data.
norm(double(y)-ydouble) % total error
ans = 7.7178e-008
norm(double(y)-ydouble,inf) % max deviation
ans = 9.0804e-009
The error seems small because there is no output quantization error in this case. If we use 16 bits for the output, the error is much larger.
set(h,'OutputWordLength',16); set(h,'OutputFracLength',15); y = filter(h,xin); norm(double(y)-ydouble) % total error
ans = 2.7623e-004
norm(double(y)-ydouble,inf) % max deviation
ans = 1.5251e-005
Store