Skip to Main Content Skip to Search
Home |   Select Country  Choose Country  |  Contact Us  |  Cart Store 
Create Account | Log In
Products & Services Solutions Academia Support User Community Company
spacer spacer spacer spacer spacer spacer

 

Filter Design Toolbox 4.6

Working with Fixed-Point Direct-Form FIR Filters

This demonstration illustrates various aspects of working with FIR filters implemented with the direct-form structure using fixed-point arithmetic.

See also Getting Started with Fixed-Point Filters

Contents

Designing the Filter

The FIR filter to use is not critical. Since we will use the direct-form structure, it doesn't even need to have linear phase. For this demonstration we will use a simple least-squares design.

f=fdesign.lowpass('N,Fp,Fst',80,.11,.19); % Specifications

A filter object results from the design method. It associates coefficients with a particular filter structure, here a direct-form FIR structure.

h = design(f, 'firls', 'Wpass', 1, 'WStop', 100, ...
    'FilterStructure', 'dffir');
set(h,'Arithmetic','fixed');
h
h =

     FilterStructure: 'Direct-Form FIR'
          Arithmetic: 'fixed'
           Numerator: [1x81 double]
    PersistentMemory: false

     CoeffWordLength: 16
      CoeffAutoScale: true
              Signed: true

     InputWordLength: 16
     InputFracLength: 15

     FilterInternals: 'FullPrecision'

Comparing Quantized Coefficients to Non-Quantized Coefficients

There are several parameters for a fixed-point direct-form FIR filter. To start with, it is best to concentrate on the coefficient wordlength and fractionlength (scaling). First we use the Filter Visualization Tool to compare the quantized coefficients to the nonquantized (reference) coefficients.

hfvt = fvtool(h, 'legend', 'on', 'Color', 'white');

Determining the Number of Bits being Used

To determine the number of bits being used in the fixed-point filter, we simply look at the CoeffWordlength. To determine how they are being scaled, we can look at the CoeffAutoScale state.

get(h,'CoeffWordLength')
ans =

    16

get(h,'NumFracLength')
ans =

    17

This tells us that 16 bits are being used to represent the coefficients, and the least-significant bit (LSB) is weighed by 2^(-17). 16 bits is just the default number used for coefficients, but the 2^(-17) weight has been computed automatically to represent the coefficients with the best precision possible. This is controlled through the 'CoeffAutoScale' property. This property can be set to false if manual control of the coefficient scaling is desired. We simply verify that auto scaling is enabled here:

get(h,'CoeffAutoScale') % Returns a logical true
ans =

     1

Determining the Proper Coefficient Word Length

We can make several copies of the filter to try different wordlengths. Allowing the coefficient auto scaling to determine the best precision in each case.

h1 = copy(h);
set(h1,'CoeffWordLength',12); % Use 12 bits
h2 = copy(h);
set(h2,'CoeffWordLength',24); % Use 24 bits
href = reffilter(h);
set(hfvt, 'Filters', [href, h1, h, h2]);
set(hfvt,'ShowReference','off'); % Reference already displayed once
legend(hfvt,'Reference filter','12 bits','16 bits','24 bits');

12 bits are clearly not enough to faithfully represent this filter. 16 bits may be enough for most applications, so we will continue to use 16 bits in this demonstration. As a rule-of-thumb, one should expect an attainable attenuation of about 5 dB per bit.

Fixed-Point Filtering

Our main purpose is to evaluate the accuracy of the fixed-point filter when compared to a double-precision floating point version. We will see that it is not sufficient to have a faithful representation of the coefficients that keep the magnitude response approximately the same.

Generating Training Input Data

Since we just want to evaluate accuracy, we will use some random data to filter and compare against. We will create a quantizer, with a range of [-1,1) to generate random uniformly distributed white-noise data using 16 bits of wordlength.

rand('state',0); % Make results reproducible by initializing the random gene
rator
q = quantizer([16,15],'RoundMode','round');
xq = randquant(q,1000,1); % 1000 Data points in the range [-1,1)
xin = fi(xq,true,16,15);

Generating a Baseline Output to Compare Against

When evaluating accuracy of fixed-point filtering, there are three quantities to consider:

1. The "ideal" output, this quantity is what we would like to compute. It is computed using the reference coefficients and double-precision floating-point arithmetic.

2. The best we can hope for, this is the best we can hope to achieve. It is computed using the quantized coefficients and double-precision floating-point arithmetic.

3. What we can actually compute, this is the output computed using the quantized coefficients and fixed-point arithmetic.

Clearly we want to compare what we can actually compute to the best we can hope for. This last quantity can be computed by casting the fixed-point filter to double and filtering with double-precision floating-point arithmetic.

xdouble = double(xin);
hdouble = double(h);
ydouble = filter(hdouble,xdouble);

For completeness we show how to compute the "ideal" output. And how much the effect of solely quantizing the coefficients affects the output of the filter.

yideal = filter(href,xdouble);
norm(yideal-ydouble)     % total error
ans =

  3.4886e-004

norm(yideal-ydouble,inf) % max deviation
ans =

  3.7219e-005

Computing the Fixed-Point Output

Next we will perform the actual fixed-point filtering. Once again, the best we can hope to achieve is to have an output identical to ydouble.

y = filter(h,xin);
norm(double(y)-ydouble)     % total error
ans =

     0

norm(double(y)-ydouble,inf) % max deviation
ans =

     0

The errors are exactly zero, showing that no quantization is being introduced in the accumulator. The products are set by default to full precision, so we know that no errors are occurring there. Finally the output have the same specifications as the accumulator which eliminates quantization error at the output completely.

The Advantages of Having Guard Bits

If compare the product settings, with the accumulator settings:

info(h)
Discrete-Time FIR Filter (real)
-------------------------------
Filter Structure  : Direct-Form FIR
Filter Length     : 81
Stable            : Yes
Linear Phase      : Yes (Type 1)
Arithmetic        : fixed
Numerator         : s16,17 -> [-2.500000e-001 2.500000e-001)
Input             : s16,15 -> [-1 1)
Filter Internals  : Full Precision
  Output          : s34,32 -> [-2 2)  (auto determined)
  Product         : s31,32 -> [-2.500000e-001 2.500000e-001)  (auto deter
mined)
  Accumulator     : s34,32 -> [-2 2)  (auto determined)
  Round Mode      : No rounding
  Overflow Mode   : No overflow

We notice that the accumulator has 3 extra bits available. This is typical of most fixed-point DSP processors. These bits are usually referred to as guard bits. They provide a safety net for intermediate overflows. The easiest way of appreciating their value is to remove them and see what happens (we adjust the output setting accordingly),

set(h,'FilterInternals','SpecifyPrecision');
set(h,'AccumWordLength',get(h,'ProductWordLength'));
set(h,'OutputWordLength',get(h,'AccumWordLength'));

We now enable quantization reports. The logging capability is integrated to the 'filter' method. It is triggered when the 'Logging' FI preference is 'on'. The stored report corresponds to the last simulation. It is overwritten each time the filter command is executed.

fipref('LoggingMode', 'on');
y = filter(h,xin);
R = qreport(h)
R =


                                                  Fixed-Point Report
                 -----------------------------------------------------------
----------------------------------
                      Min              Max       |              Range       
       |      Number of Overflows
                 -----------------------------------------------------------
----------------------------------
         Input:     -0.99954224       0.99902344 |             -1       0.99
996948 |              0/1000 (0%)
        Output:     -0.24871957       0.24981417 |          -0.25           
  0.25 |              0/1000 (0%)
       Product:     -0.14461077       0.14477883 |          -0.25           
  0.25 |             0/81000 (0%)
   Accumulator:      -0.2499943       0.24997962 |          -0.25           
  0.25 |           902/80000 (1%)

The quantization report contains the minimum and maximum values that were recorded during the last simulation (values are logged before quantization), the range and the number of overflows of different internal signals. As expected, we can see that overflows are occurring in the accumulator.

norm(double(y)-ydouble)     % total error
ans =

    8.0623

norm(double(y)-ydouble,inf) % max deviation
ans =

    0.5000

plot([ydouble,double(y)])
xlabel('Samples'); ylabel('Amplitude')
legend('ydouble','y')
set(gcf, 'Color', [1 1 1])

The error is large now, because overflow occurred as can be seen in the plot.

Avoiding Overflow with No Guard Bits

It is possible to not have overflow even if guard bits are not available. From the plots of y and ydouble, it was clear that one bit for the integer part was all that was required in this specific case to avoid overflow. We can improve the results slightly with this setting, but this is specific to the current filter coefficients and input signal.

set(h,'AccumFracLength',get(h,'AccumWordLength')-1);
set(h,'OutputFracLength',get(h,'AccumFracLength'));
y = filter(h,xin);
R = qreport(h)
R =


                                                  Fixed-Point Report
                 -----------------------------------------------------------
----------------------------------
                      Min              Max       |              Range       
       |      Number of Overflows
                 -----------------------------------------------------------
----------------------------------
         Input:     -0.99954224       0.99902344 |             -1       0.99
996948 |              0/1000 (0%)
        Output:      -0.5227344       0.64321456 |             -1           
     1 |              0/1000 (0%)
       Product:     -0.14461077       0.14477883 |          -0.25           
  0.25 |             0/81000 (0%)
   Accumulator:      -0.5276654       0.66335143 |             -1           
     1 |             0/80000 (0%)

The quantization report let us verify that the overflows are eliminated and that the signals occupy the full range i.e. the scaling is optimal for this particular training data.

norm(double(y)-ydouble)     % total error
ans =

  7.7178e-008

norm(double(y)-ydouble,inf) % max deviation
ans =

  9.0804e-009

The error seems small because there is no output quantization error in this case. If we use 16 bits for the output, the error is much larger.

set(h,'OutputWordLength',16);
set(h,'OutputFracLength',15);
y = filter(h,xin);
norm(double(y)-ydouble)     % total error
ans =

  2.7623e-004

norm(double(y)-ydouble,inf) % max deviation
ans =

  1.5251e-005

Contact sales
Free technical kit
Trial software
E-mail this page

Get Pricing and
Licensing Options