## Documentation Center |

On this page… |
---|

The sections that follow describe the relationship between arithmetic operations and fixed-point scaling, and offer some basic recommendations that may be appropriate for your fixed-point design. For each arithmetic operation,

The general [Slope Bias] encoding scheme described in Scaling is used.

The scaling of the result is automatically selected based on the scaling of the two inputs. In other words, the scaling is

*inherited*.Scaling choices are based on

Minimizing the number of arithmetic operations of the result

Maximizing the precision of the result

Additionally, binary-point-only scaling is presented as a special case of the general encoding scheme.

In embedded systems, the scaling of variables at the hardware interface (the ADC or DAC) is fixed. However for most other variables, the scaling is something you can choose to give the best design. When scaling fixed-point variables, it is important to remember that

Your scaling choices depend on the particular design you are simulating.

There is no best scaling approach. All choices have associated advantages and disadvantages. It is the goal of this section to expose these advantages and disadvantages to you.

Consider the addition of two real-world values:

These values are represented by the general [Slope Bias] encoding scheme described in Scaling:

In a fixed-point system, the addition of values results in finding
the variable *Q _{a}*:

This formula shows

In general,

*Q*is not computed through a simple addition of_{a}*Q*and_{b}*Q*._{c}In general, there are two multiplications of a constant and a variable, two additions, and some additional bit shifting.

In the process of finding the scaling of the sum, one reasonable goal is to simplify the calculations. Simplifying the calculations should reduce the number of operations, thereby increasing execution speed. The following choices can help to minimize the number of arithmetic operations:

Set

*B*=_{a}*B*+_{b}*B*. This eliminates one addition._{c}Set

*F*=_{a}*F*or_{b}*F*=_{a}*F*. Either choice eliminates one of the two constant times variable multiplications._{c}

The resulting formula is

or

These equations appear to be equivalent. However, your choice
of rounding and precision may make one choice stand out over the other.
To further simplify matters, you could choose *E _{a}* =

In the process of finding the scaling of the sum, one reasonable
goal is maximum precision. You can determine the maximum-precision
scaling if the range of the variable is known. Maximize Precision shows that you can determine
the range of a fixed-point operation from max(*V _{a}*) and min(

You can now derive the maximum-precision slope:

In most cases the input and output word sizes are much greater than one, and the slope becomes

which depends only on the size of the input and output words. The corresponding bias is

The value of the bias depends on whether the inputs and output are signed or unsigned numbers.

If the inputs and output are all unsigned, then the minimum values for these variables are all zero and the bias reduces to a particularly simple form:

If the inputs and the output are all signed, then the bias becomes

For binary-point-only scaling, finding *Q _{a}* results
in this simple expression:

This scaling choice results in only one addition and some bit shifting. The avoidance of any multiplications is a big advantage of binary-point-only scaling.

The accumulation of values is closely associated with addition:

Finding *Q _{a_new}* involves
one multiplication of a constant and a variable, two additions, and
some bit shifting:

The important difference for fixed-point implementations is that the scaling of the output is identical to the scaling of the first input.

For binary-point-only scaling, finding *Q _{a_new}* results
in this simple expression:

This scaling option only involves one addition and some bit shifting.

Consider the multiplication of two real-world values:

These values are represented by the general [Slope Bias] encoding scheme described in Scaling:

In a fixed-point system, the multiplication of values results
in finding the variable *Q _{a}*:

This formula shows

In general,

*Q*is not computed through a simple multiplication of_{a}*Q*and_{b}*Q*._{c}In general, there is one multiplication of a constant and two variables, two multiplications of a constant and a variable, three additions, and some additional bit shifting.

The number of arithmetic operations can be reduced with these choices:

Set

*B*=_{a}*B*_{b}*B*. This eliminates one addition operation._{c}Set

*F*=_{a}*F*_{b}*F*. This simplifies the triple multiplication—certainly the most difficult part of the equation to implement._{c}Set

*E*=_{a}*E*+_{b}*E*. This eliminates some of the bit shifting._{c}

The resulting formula is

You can determine the maximum-precision scaling if the range of the variable is known. Maximize Precision shows that you can determine the range of a fixed-point operation from

and

For multiplication, you can determine the range from

where

For binary-point-only scaling, finding *Q _{a}* results
in this simple expression:

Consider the multiplication of a constant and a variable

where *K* is a constant called the gain. Since *V _{a}* results
from the multiplication of a constant and a variable, finding

Note that the terms in the parentheses can be calculated offline. Therefore, there is only one multiplication of a constant and a variable and one addition.

To implement the above equation without changing it to a more complicated form, the constants need to be encoded using a binary-point-only format. For each of these constants, the range is the trivial case of only one value. Despite the trivial range, the binary point formulas for maximum precision are still valid. The maximum-precision representations are the most useful choices unless there is an overriding need to avoid any shifting. The encoding of the constants is

resulting in the formula

The number of arithmetic operations can be reduced with these choices:

Set

*B*=_{a}*KB*. This eliminates one constant term._{b}Set

*F*=_{a}*KF*and_{b}*E*=_{a}*E*. This sets the other constant term to unity._{b}The resulting formula is simply

If the number of bits is different, then either handling potential overflows or performing sign extensions is the only possible operation involved.

The scaling for maximum precision does not need to be different from the scaling for speed unless the output has fewer bits than the input. If this is the case, then saturation should be avoided by dividing the slope by 2 for each lost bit. This prevents saturation but causes rounding to occur.

Division of values is an operation that should be avoided in fixed-point embedded systems, but it can occur in places. Therefore, consider the division of two real-world values:

These values are represented by the general [Slope Bias] encoding scheme described in Scaling:

In a fixed-point system, the division of values results in finding
the variable *Q _{a}*:

This formula shows

In general,

*Q*is not computed through a simple division of_{a}*Q*by_{b}*Q*._{c}In general, there are two multiplications of a constant and a variable, two additions, one division of a variable by a variable, one division of a constant by a variable, and some additional bit shifting.

The number of arithmetic operations can be reduced with these choices:

Set

*B*= 0. This eliminates one addition operation._{a}If

*B*= 0, then set the fractional slope_{c}*F*=_{a}*F*/_{b}*F*. This eliminates one constant times variable multiplication._{c}

The resulting formula is

If *B _{c}* ≠ 0, then
no clear recommendation can be made.

You can determine the maximum-precision scaling if the range of the variable is known. Maximize Precision shows that you can determine the range of a fixed-point operation from

and

For division, you can determine the range from

where for nonzero denominators

For binary-point-only scaling, finding *Q _{a}* results
in this simple expression:

From the previous analysis of fixed-point variables scaled within the general [Slope Bias] encoding scheme, you can conclude

Addition, subtraction, multiplication, and division can be very involved unless certain choices are made for the biases and slopes.

Binary-point-only scaling guarantees simpler math, but generally sacrifices some precision.

Note that the previous formulas don't show the following:

Constants and variables are represented with a finite number of bits.

Variables are either signed or unsigned.

Rounding and overflow handling schemes. You must make these decisions before an actual fixed-point realization is achieved.

Was this topic helpful?