Products & Services Solutions Academia Support User Community Company

Learn more about Filter Design Toolbox   

Review of Fixed-Point Numbers

Terminology of Fixed-Point Numbers

Filter Design Toolbox assumes fixed-point quantities are represented in two's complement format, and are described using the WordLength and FracLength parameters. It is common to represent fractional quantities of WordLength 16 with the leftmost bit representing the sign and the remaining bits representing the fraction to the right of the binary point. Often the FracLength is thought of as the number of bits to the right of the binary point. However, there is a problem with this interpretation when the FracLength is larger than the WordLength, or when the FracLength is negative.

To work around these cases, you can use the following interpretation of a fixed-point quantity:

The register has a WordLength of B, or in other words it has B bits. The bits are numbered from left to right from 0 to B-1. The most significant bit (MSB) is the leftmost bit, bB-1. The least significant bit is the right-most bit, b0. You can think of the FracLength as a quantity specifying how to interpret the bits stored and resolve the value they represent. The value represented by the bits is determined by assigning a weight to each bit:

In this figure, L is the integer FracLength. It can assume any value, depending on the quantization step size. L is necessary to interpret the value that the bits represent. This value is given by the equation

.

The value 2–L is the smallest possible difference between two numbers represented in this format, otherwise known as the quantization step. In this way, it is preferable to think of the FracLength as the negative of the exponent used to weigh the right-most, or least-significant, bit of the fixed-point number.

To reduce the number of bits used to represent a given quantity, you can discard the least-significant bits. This method minimizes the quantization error since the bits you are removing carry the least weight. For instance, the following figure illustrates reducing the number of bits from 4 to 2:

This means that the FracLength has changed from L to L – 2.

You can think of integers as being represented with a FracLength of L = 0, so that the quantization step becomes .

Suppose B = 16 and L = 0. Then the numbers that can be represented are the integers .

If you need to quantize these numbers to use only 8 bits to represent them, you will want to discard the LSBs as mentioned above, so that B=8 and L = 0–8 = –8. The increments, or quantization step then becomes . So you will still have the same range of values, but with less precision, and the numbers that can be represented become .

With this quantization the largest possible error becomes about 256/2 when rounding to the nearest, with a special case for 32767.

  


Free Early Verification Kit

Learn how to apply early verification to your development process through these technical resources.

How much time do you spend on testing to ensure implementation meets system-level requirements?

 © 1984-2010- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS