Fixed-point numbers use integers and integer arithmetic to approximate real numbers. They are an efficient means for performing computations involving real numbers without requiring floating-point support in underlying system hardware.

Fixed-point numbers use integers and integer arithmetic to represent real numbers and arithmetic with the following encoding scheme:

$$V=\stackrel{\sim}{V}=SQ+B$$

where

*V*is a precise real-world value that you want to approximate with a fixed-point number.$$\stackrel{\sim}{V}$$ is the approximate real-world value that results from fixed-point representation.

*Q*is an integer that encodes $$\stackrel{\sim}{V.}$$ This value is the*quantized integer.**Q*is the actual stored integer value used in representing the fixed-point number. If a fixed-point number changes, its quantized integer,*Q*, changes but*S*and*B*remain unchanged.*S*is a coefficient of*Q*, or the*slope*.*B*is an additive correction, or the*bias*.

Fixed-point numbers encode real quantities (for example, 15.375)
using the stored integer *Q*. You set the value of *Q* by
solving the equation

$$\stackrel{\sim}{V}=SQ+B$$

for *Q* and rounding the result to an integer
value as follows:

*Q* = round((*V* – *B*)/*S*)

For example, suppose you want to represent the number 15.375
in a fixed-point type with the slope *S* = 0.5 and
the bias *B* = 0.1. This means that

*Q* = round((15.375 – 0.1)/0.5)
= 30

However, because *Q* is rounded to an integer,
you lose some precision in representing the number 15.375. If you
calculate the number that *Q* actually represents,
you now get a slightly different answer.

$$V=\stackrel{\sim}{V}=SQ+B=0.5\times 30+0.1=15.1$$

Using fixed-point numbers to represent real numbers with integers
involves the loss of some precision. However, if you choose *S* and *B* correctly,
you can minimize this loss to acceptable levels.

Now that you can express fixed-point numbers as $$\stackrel{\sim}{V}=SQ+B,$$ you can define operations between two fixed-point numbers.

The general equation for an operation between fixed-point operands is as follows:

c = a <op> b

where `a`

, `b`

, and `c`

are
all fixed-point numbers, and `<op>`

refers
to a binary operation: addition, subtraction, multiplication, or division.

The general form for a fixed-point number `x`

is *S*_{x}*Q*_{x} + *B*_{x} (see Fixed-Point Numbers).
Substituting this form for the result and operands in the preceding
equation yields this expression:

(*S _{c}*

`op`

> (The values for *S _{c}* and

B when
you use the _{c}`:=` assignment operator (that is, ```
c
:= a <
``` ). See Assignment (=, :=) Operations. |

Using the values for *S _{a}*,

The operation

`c=a+b`

implies that*Q*= ((_{c}*S*/_{a}*S*)_{c}*Q*+ (_{a}*S*/_{b}*S*_{c})*Q*+ (_{b}*B*+_{a}*B*–_{b}*B*)/_{c}*S*)_{c}The operation

`c=a-b`

implies that*Q*= ((_{c}*S*/_{a}*S*)_{c}*Q*– (_{a}*S*/_{b}*S*)_{c}*Q*– (_{b}*B*–_{a}*B*–_{b}*B*)/_{c}*S*)_{c}The operation

`c=a*b`

implies that*Q*= ((_{c}*S*_{a}*S*/_{b}*S*)_{c}*Q*_{a}*Q*+ (_{b}*B*_{a}*S*/_{b}*S*)_{c}*Q*+ (_{a}*B*_{b}*S*/_{a}*S*)_{c}*Q*+ (_{b}*B*_{a}*B*–_{b}*B*)/_{c}*S*)_{c}The operation

`c=a/b`

implies that*Q*= ((_{c}*S*_{a}*Q*+_{a}*B*)/(_{a}*S*(_{c}*S*_{b}*Q*+_{b}*B*)) – (_{b}*B*/_{c}*S*))_{c}

The fixed-point approximations of the real number result of
the operation `c = a <op> b`

are given by the
preceding solutions for the value *Q _{c}*.
In this way, all fixed-point operations are performed using only the
stored integer

Was this topic helpful?