Documentation |
On this page… |
---|
Fixed-point numbers and their data types are characterized by their word size in bits, binary point, and whether they are signed or unsigned. The Fixed-Point Designer™ software supports integers and fixed-point numbers. The main difference among these data types is their binary point.
A common representation of a binary fixed-point number , either signed or unsigned, is shown in the following figure.
where
Computer hardware typically represents the negation of a binary fixed-point number in three different ways: sign/magnitude, one's complement, and two's complement. Two's complement is the preferred representation of signed fixed-point numbers and supported by the Fixed-Point Designer software.
Negation using two's complement consists of a bit inversion (translation into one's complement) followed by the addition of a one. For example, the two's complement of 000101 is 111011.
Whether a fixed-point value is signed or unsigned is usually not encoded explicitly within the binary word; that is, there is no sign bit. Instead, the sign information is implicitly defined within the computer architecture.
The binary point is the means by which fixed-point numbers are scaled. It is usually the software that determines the binary point. When performing basic math functions such as addition or subtraction, the hardware uses the same logic circuits regardless of the value of the scale factor. In essence, the logic circuits have no knowledge of a scale factor. They are performing signed or unsigned fixed-point binary algebra as if the binary point is to the right of b_{0}.
Fixed-Point Designer supports the general binary point scaling $$V=Q*2^E$$. V is the real-world value, Q is the stored integer value, and E is equal to -FractionLength. In other words, RealWorldValue = StoredInteger * 2 ^ -FractionLength.
FractionLength defines the scaling of the stored integer value. The word length limits the values that the stored integer can take, but it does not limit the values FractionLength can take. The software does not restrict the value of exponent E based on the word length of the stored integer Q. Because E is equal to -FractionLength, restricting the binary point to being contiguous with the fraction is unnecessary; the fraction length can be negative or greater than the word length.
For example, a word consisting of three unsigned bits is usually represented in scientific notation in one of the following ways.
$$\begin{array}{l}bbb.=bbb.\times {2}^{0}\\ bb.b=bbb.\times {2}^{-1}\\ b.bb=bbb.\times {2}^{-2}\\ .bbb=bbb.\times {2}^{-3}\end{array}$$
If the exponent were greater than 0 or less than -3, then the representation would involve lots of zeros.
$$\begin{array}{c}bbb00000.=bbb.\times {2}^{5}\\ bbb00.=bbb.\times {2}^{2}\\ .00bbb=bbb.\times {2}^{-5}\\ .00000bbb=bbb.\times {2}^{-8}\end{array}$$
These extra zeros never change to ones, however, so they don't show up in the hardware. Furthermore, unlike floating-point exponents, a fixed-point exponent never shows up in the hardware, so fixed-point exponents are not limited by a finite number of bits.
Consider a signed value with a word length of 8, a fraction
length of 10, and a stored integer value of 5 (binary value 00000101).
The real-word value is calculated using the formula
RealWorldValue
= StoredInteger * 2 ^ -FractionLength. In this case, RealWorldValue
= 5 * 2 ^ -10 = 0.0048828125. Because the fraction length
is 2 bits longer than the word length, the binary value of the stored
integer is x.xx00000101 , where x is
a placeholder for implicit zeros. 0.0000000101 (binary)
is equivalent to 0.0048828125 (decimal). For
an example using a fi object, see Create a fi Object With Fraction Length
Greater Than Word Length.
The dynamic range of fixed-point numbers is much less than floating-point numbers with equivalent word sizes. To avoid overflow conditions and minimize quantization errors, fixed-point numbers must be scaled.
With the Fixed-Point Designer software, you can select a fixed-point data type whose scaling is defined by its binary point, or you can select an arbitrary linear scaling that suits your needs. This section presents the scaling choices available for fixed-point data types.
You can represent a fixed-point number by a general slope and bias encoding scheme
$$V\approx \stackrel{~}{V}=SQ+B,$$
where
V is an arbitrarily precise real-world value.
$$\stackrel{~}{V}$$ is the approximate real-world value.
Q, the stored value, is an integer that encodes V.
$$S=F{2}^{E}$$ is the slope.
B is the bias.
The slope is partitioned into two components:
$${2}^{E}$$ specifies the binary point. E is the fixed power-of-two exponent.
F is the slope adjustment factor. It is normalized such that $$1\le F<2$$.
The scaling modes available to you within this encoding scheme are described in the sections that follow. For detailed information about how the supported scaling modes effect fixed-point operations, refer to Recommendations for Arithmetic and Scaling.
Binary-point-only or power-of-two scaling involves moving the binary point within the fixed-point word. The advantage of this scaling mode is to minimize the number of processor arithmetic operations.
With binary-point-only scaling, the components of the general slope and bias formula have the following values:
F = 1
$$S=F{2}^{E}={2}^{E}$$
$$B=0$$
The scaling of a quantized real-world number is defined by the slope S, which is restricted to a power of two. The negative of the power-of-two exponent is called the fraction length. The fraction length is the number of bits to the right of the binary point. For Binary-Point-Only scaling, specify fixed-point data types as
signed types — fixdt(1, WordLength, FractionLength)
unsigned types — fixdt(0, WordLength, FractionLength)
Integers are a special case of fixed-point data types. Integers have a trivial scaling with slope 1 and bias 0, or equivalently with fraction length 0. Specify integers as
signed integer — fixdt(1, WordLength, 0)
unsigned integer — fixdt(0, WordLength, 0)
When you scale by slope and bias, the slope S and bias B of the quantized real-world number can take on any value. The slope must be a positive number. Using slope and bias, specify fixed-point data types as
fixdt(Signed, WordLength, Slope, Bias)
Specify fixed-point data types with an unspecified scaling as
fixdt(Signed, WordLength)
Simulink^{®} signals, parameters, and states must never have unspecified scaling. When scaling is unspecified, you must use some other mechanism such as automatic best precision scaling to determine the scaling that the Simulink software uses.
The quantization Q of a real-world value V is represented by a weighted sum of bits. Within the context of the general slope and bias encoding scheme, the value of an unsigned fixed-point quantity is given by
$$\stackrel{~}{V}=S.\left[{\displaystyle \sum _{i=0}^{ws-1}{b}_{i}{2}^{i}}\right]+B,$$
while the value of a signed fixed-point quantity is given by
$$\stackrel{~}{V}=S.\left[-{b}_{ws-1}{2}^{ws-1}+{\displaystyle \sum _{i=0}^{ws-2}{b}_{i}{2}^{i}}\right]+B,$$
where
$${b}_{i}$$ are binary digits, with $${b}_{i}=1,0$$, for $$i=0,1,\mathrm{...},ws-1$$
The word size in bits is given by ws, with ws = 1, 2, 3,..., 128.
S is given by $$F={2}^{E}$$, where the scaling is unrestricted because the binary point does not have to be contiguous with the word.
$${b}_{i}$$ are called bit multipliers and $${2}^{i}$$ are called the weights.
Formats for 8-bit signed and unsigned fixed-point values are shown in the following figure.
Note that you cannot discern whether these numbers are signed or unsigned data types merely by inspection since this information is not explicitly encoded within the word.
The binary number 0011.0101 yields the same value for the unsigned and two's complement representation because the MSB = 0. Setting B = 0 and using the appropriate weights, bit multipliers, and scaling, the value is
$$\begin{array}{c}\stackrel{~}{V}=\left(F{2}^{E}\right)Q={2}^{E}\left[{\displaystyle \sum _{i=0}^{ws-1}{b}_{i}{2}^{i}}\right]\\ ={2}^{-4}\left(0\times {2}^{7}+0\times {2}^{6}+1\times {2}^{5}+1\times {2}^{4}+0\times {2}^{3}+1\times {2}^{2}+0\times {2}^{1}+1\times {2}^{0}\right)\\ =\mathrm{3.3125.}\end{array}$$
Conversely, the binary number 1011.0101 yields different values for the unsigned and two's complement representation since the MSB = 1.
Setting B = 0 and using the appropriate weights, bit multipliers, and scaling, the unsigned value is
$$\begin{array}{c}\stackrel{~}{V}=\left(F{2}^{E}\right)Q={2}^{E}\left[{\displaystyle \sum _{i=0}^{ws-1}{b}_{i}{2}^{i}}\right]\\ ={2}^{-4}\left(1\times {2}^{7}+0\times {2}^{6}+1\times {2}^{5}+1\times {2}^{4}+0\times {2}^{3}+1\times {2}^{2}+0\times {2}^{1}+1\times {2}^{0}\right)\\ =11.3125,\end{array}$$
while the two's complement value is
$$\begin{array}{c}\stackrel{~}{V}=\left(F{2}^{E}\right)Q={2}^{E}\left[-{b}_{ws-1}{2}^{ws-1}+{\displaystyle \sum _{i=0}^{ws-2}{b}_{i}{2}^{i}}\right]\\ ={2}^{-4}\left(-1\times {2}^{7}+0\times {2}^{6}+1\times {2}^{5}+1\times {2}^{4}+0\times {2}^{3}+1\times {2}^{2}+0\times {2}^{1}+1\times {2}^{0}\right)\\ =-\mathrm{4.6875.}\end{array}$$
The range of a number gives the limits of the representation, while the precision gives the distance between successive numbers in the representation. The range and precision of a fixed-point number depend on the length of the word and the scaling.
The following figure illustrates the range of representable numbers for an unsigned fixed-point number of size ws, scaling S, and bias B.
The following figure illustrates the range of representable numbers for a two's complement fixed-point number of size ws, scaling S, and bias B where the values of ws, scaling S, and bias B allow for both negative and positive numbers.
For both the signed and unsigned fixed-point numbers of any data type, the number of different bit patterns is 2^{ws}.
For example, if the fixed-point data type is an integer with scaling defined as $$S=1$$ and B = 0, then the maximum unsigned value is $${2}^{ws-1}$$, because zero must be represented. In two's complement, negative numbers must be represented as well as zero, so the maximum value is $${2}^{ws-1}-1$$. Additionally, since there is only one representation for zero, there must be an unequal number of positive and negative numbers. This means there is a representation for $$-{2}^{ws-1}$$ but not for $${2}^{ws-1}$$.
The precision of a data type is given by the slope. In this usage, precision means the difference between neighboring representable values.
The low limit, high limit, and default binary-point-only scaling for the supported fixed-point data types discussed in Binary Point Interpretation are given in the following table. See Precision and Range for more information.
Fixed-Point Data Type Range and Default Scaling
Name | Data Type | Low Limit | High Limit | Default Scaling (~Precision) |
---|---|---|---|---|
Unsigned Integer | fixdt(0,ws,0) | 0 | $${2}^{ws}-1$$ | 1 |
Signed Integer | fixdt(1,ws,0) | $$-{2}^{ws-1}$$ | $${2}^{ws-1}-1$$ | 1 |
Unsigned Binary Point | fixdt(0,ws,fl) | 0 | $$({2}^{ws}-1){2}^{-fl}$$ | $${2}^{-fl}$$ |
Signed Binary Point | fixdt(1,ws,fl) | $$-{2}^{ws-1-fl}$$ | $$({2}^{ws-1}-1){2}^{-fl}$$ | $${2}^{-fl}$$ |
Unsigned Slope Bias | fixdt(0,ws,s,b) | b | $$s({2}^{ws}-1)+b$$ | s |
Signed Slope Bias | fixdt(1,ws,s,b) | $$-s({2}^{ws-1})+b$$ | $$s({2}^{ws-1}-1)+b$$ | s |
s = Slope, b = Bias, ws = WordLength, fl = FractionLength
The precisions, range of signed values, and range of unsigned values for an 8-bit generalized fixed-point data type with binary-point-only scaling are listed in the follow table. Note that the first scaling value (2^{1}) represents a binary point that is not contiguous with the word.
Scaling | Precision | Range of Signed Values (Low, High) | Range of Unsigned Values (Low, High) |
---|---|---|---|
2^{1} | 2.0 | -256, 254 | 0, 510 |
2^{0} | 1.0 | -128, 127 | 0, 255 |
2^{-1} | 0.5 | -64, 63.5 | 0, 127.5 |
2^{-2} | 0.25 | -32, 31.75 | 0, 63.75 |
2^{-3} | 0.125 | -16, 15.875 | 0, 31.875 |
2^{-4} | 0.0625 | -8, 7.9375 | 0, 15.9375 |
2^{-5} | 0.03125 | -4, 3.96875 | 0, 7.96875 |
2^{-6} | 0.015625 | -2, 1.984375 | 0, 3.984375 |
2^{-7} | 0.0078125 | -1, 0.9921875 | 0, 1.9921875 |
2^{-8} | 0.00390625 | -0.5, 0.49609375 | 0, 0.99609375 |
The precision and ranges of signed and unsigned values for an 8-bit fixed-point data type using slope and bias scaling are listed in the following table. The slope starts at a value of 1.25 with a bias of 1.0 for all slopes. Note that the slope is the same as the precision.
Bias | Slope/Precision | Range of Signed Values (low, high) | Range of Unsigned Values (low, high) |
---|---|---|---|
1 | 1.25 | -159, 159.75 | 1, 319.75 |
1 | 0.625 | -79, 80.375 | 1, 160.375 |
1 | 0.3125 | -39, 40.6875 | 1, 80.6875 |
1 | 0.15625 | -19, 20.84375 | 1, 40.84375 |
1 | 0.078125 | -9, 10.921875 | 1, 20.921875 |
1 | 0.0390625 | -4, 5.9609375 | 1, 10.9609375 |
1 | 0.01953125 | -1.5, 3.48046875 | 1, 5.98046875 |
1 | 0.009765625 | -0.25, 2.240234375 | 1, 3.490234375 |
1 | 0.0048828125 | 0.375, 1.6201171875 | 1, 2.2451171875 |