| Products & Services | Solutions | Academia | Support | User Community | Company |
| Download Product Updates | | | Get Pricing | | | Trial Software |
| Documentation → Simulink Fixed Point |
| Contents | Index |
| On this page… |
|---|
Constant Scaling for Best Precision |
Fixed-point numbers and their data types are characterized by their word size in bits, binary point, and whether they are signed or unsigned. The Simulink Fixed Point software supports integers, fixed-point numbers. The main difference among these data types is their binary point.
A common representation of a binary fixed-point number , either signed or unsigned, is shown in the following figure.

where
ws is the word length in bits
The most significant bit (MSB) is the leftmost bit, and is represented by location bws – 1
The least significant bit (LSB) is the rightmost bit, and is represented by location b0
The binary point is shown four places to the left of the LSB
Computer hardware typically represents the negation of a binary fixed-point number in three different ways: sign/magnitude, one's complement, and two's complement. Two's complement is the preferred representation of signed fixed-point numbers and supported by the Simulink Fixed Point software.
Negation using two's complement consists of a bit inversion (translation into one's complement) followed by the addition of a one. For example, the two's complement of 000101 is 111011.
Whether a fixed-point value is signed or unsigned is usually not encoded explicitly within the binary word; that is, there is no sign bit. Instead, the sign information is implicitly defined within the computer architecture.
The binary point is the means by which fixed-point numbers are scaled. It is usually the software that determines the binary point. When performing basic math functions such as addition or subtraction, the hardware uses the same logic circuits regardless of the value of the scale factor. In essence, the logic circuits have no knowledge of a scale factor. They are performing signed or unsigned fixed-point binary algebra as if the binary point is to the right of b0.
The dynamic range of fixed-point numbers is much less than floating-point numbers with equivalent word sizes. To avoid overflow conditions and minimize quantization errors, fixed-point numbers must be scaled.
With the Simulink Fixed Point software, you can select a fixed-point data type whose scaling is defined by its binary point, or you can select an arbitrary linear scaling that suits your needs. This section presents the scaling choices available for fixed-point data types.
You can represent a fixed-point number by a general slope and bias encoding scheme
![]()
where
is an arbitrarily precise real-world value
is the approximate real-world value
Q, the stored value, is an integer that encodes V
S = F 2E is the slope
B is the bias
The slope is partitioned into two components:
2E specifies the binary point. E is the fixed power-of-two exponent.
F is the slope adjustment factor.
It is normalized such that
![]()
The scaling modes available to you within this encoding scheme are described in the sections that follow. For detailed information about how the supported scaling modes effect fixed-point operations, refer to Recommendations for Arithmetic and Scaling.
Binary-point-only or power-of-two scaling involves moving the binary point within the fixed-point word. The advantage of this scaling mode is to minimize the number of processor arithmetic operations.
With binary-point-only scaling, the components of the general slope and bias formula have the following values:
F = 1
S = F2E = 2E
B = 0
The scaling of a quantized real-world number is defined by the slope S, which is restricted to a power of two. The negative of the power-of-two exponent is called the fraction length. The fraction length is the number of bits to the right of the binary point. For Binary-Point-Only scaling, specify fixed-point data types as
signed types — fixdt(1, WordLength, FractionLength)
unsigned types — fixdt(0, WordLength, FractionLength)
Integers are a special case of fixed-point data types. Integers have a trivial scaling with slope 1 and bias 0, or equivalently with fraction length 0. Specify integers as
signed integer — fixdt(1, WordLength, 0)
unsigned integer — fixdt(0, WordLength, 0)
When you scale by slope and bias, the slope S and bias B of the quantized real-world number can take on any value. The slope must be a positive number. Using slope and bias, specify fixed-point data types as
fixdt(Signed, WordLength, Slope, Bias)
Specify fixed-point data types with an unspecified scaling as
fixdt(Signed, WordLength)
Simulink signals, parameters, and states must never have unspecified scaling. When scaling is unspecified, you must use some other mechanism such as automatic best precision scaling to determine the scaling that the Simulink software uses.
The quantization Q of a real-world value V is represented by a weighted sum of bits. Within the context of the general slope and bias encoding scheme, the value of an unsigned fixed-point quantity is given by

while the value of a signed fixed-point quantity is given by

where
bi are binary digits, with bi = 1, 0.
The word size in bits is given by ws, with ws = 1, 2, 3,..., 128.
S is given by F 2E, where the scaling is unrestricted because the binary point does not have to be contiguous with the word.
bi are called bit multipliers and 2i are called the weights.
Formats for 8-bit signed and unsigned fixed-point values are shown in the following figure.

Note that you cannot discern whether these numbers are signed or unsigned data types merely by inspection since this information is not explicitly encoded within the word.
The binary number 0011.0101 yields the same value for the unsigned and two's complement representation because the MSB = 0. Setting B = 0 and using the appropriate weights, bit multipliers, and scaling, the value is

Conversely, the binary number 1011.0101 yields different values for the unsigned and two's complement representation since the MSB = 1.
Setting B = 0 and using the appropriate weights, bit multipliers, and scaling, the unsigned value is

while the two's complement value is

The range of a number gives the limits of the representation, while the precision gives the distance between successive numbers in the representation. The range and precision of a fixed-point number depends on the length of the word and the scaling.
The range of representable numbers for an unsigned and two's complement fixed-point number of size ws, scaling S, and bias B is illustrated in the following figure.

For both the signed and unsigned fixed-point numbers of any data type, the number of different bit patterns is 2ws.
For example, if the fixed-point data type is an integer with scaling defined as S = 1 and B = 0, then the maximum unsigned value is 2ws - 1, because zero must be represented. In two's complement, negative numbers must be represented as well as zero, so the maximum value is 2ws - 1- 1. Additionally, since there is only one representation for zero, there must be an unequal number of positive and negative numbers. This means there is a representation for -2ws - 1 but not for 2ws - 1.
The precision of a data type is given by the slope. In this usage, precision means the difference between neighboring representable values.
The low limit, high limit, and default binary-point-only scaling for the supported fixed-point data types discussed in Binary Point Interpretation are given in the following table. See Limitations on Precision and Limitations on Range for more information.
Fixed-Point Data Type Range and Default Scaling
Name | Data Type | Low Limit | High Limit | Default Scaling (~Precision) |
|---|---|---|---|---|
Unsigned Integer | fixdt(0,ws,0) | 0 | 2ws - 1 | 1 |
| Signed Integer | fixdt(1,ws,0) | -2ws - 1 | 2ws - 1 - 1 | 1 |
Unsigned Binary Point | fixdt(0,ws,fl) | 0 | (2ws - 1)2-fl | 2-fl |
| Signed Binary Point | fixdt(1,ws,fl) | -2ws - 1- fl | (2ws - 1 - 1)2-fl | 2-fl |
Unsigned Slope Bias | fixdt(0,ws,s,b) | 0 | s(2ws - 1) + b | s |
| Signed Slope Bias | fixdt(1,ws,s,b) | -s(2ws - 1) + b | s(2ws - 1 - 1) + b | s |
s = Slope, b = Bias, ws = WordLength, fl = FractionLength
The precisions, range of signed values, and range of unsigned values for an 8-bit generalized fixed-point data type with binary-point-only scaling are listed in the follow table. Note that the first scaling value (21) represents a binary point that is not contiguous with the word.
Scaling | Precision | Range of Signed Values (Low, High) | Range of Unsigned Values (Low, High) |
|---|---|---|---|
21 | 2.0 | -256, 254 | 0, 510 |
20 | 1.0 | -128, 127 | 0, 255 |
2-1 | 0.5 | -64, 63.5 | 0, 127.5 |
2-2 | 0.25 | -32, 31.75 | 0, 63.75 |
2-3 | 0.125 | -16, 15.875 | 0, 31.875 |
2-4 | 0.0625 | -8, 7.9375 | 0, 15.9375 |
2-5 | 0.03125 | -4, 3.96875 | 0, 7.96875 |
2-6 | 0.015625 | -2, 1.984375 | 0, 3.984375 |
2-7 | 0.0078125 | -1, 0.9921875 | 0, 1.9921875 |
2-8 | 0.00390625 | -0.5, 0.49609375 | 0, 0.99609375 |
The precision and ranges of signed and unsigned values for an 8-bit fixed-point data type using slope and bias scaling are listed in the following table. The slope starts at a value of 1.25 with a bias of 1.0 for all slopes. Note that the slope is the same as the precision.
Bias | Slope/Precision | Range of Signed Values (low, high) | Range of Unsigned Values (low, high) |
|---|---|---|---|
1 | 1.25 | -159, 159.75 | 1, 319.75 |
1 | 0.625 | -79, 80.375 | 1, 160.375 |
1 | 0.3125 | -39, 40.6875 | 1, 80.6875 |
1 | 0.15625 | -19, 20.84375 | 1, 40.84375 |
1 | 0.078125 | -9, 10.921875 | 1, 20.921875 |
1 | 0.0390625 | -4, 5.9609375 | 1, 10.9609375 |
1 | 0.01953125 | -1.5, 3.48046875 | 1, 5.98046875 |
1 | 0.009765625 | -0.25, 2.240234375 | 1, 3.490234375 |
1 | 0.0048828125 | 0.375, 1.6201171875 | 1, 2.2451171875 |
The following fixed-point Simulink blocks provide a mode for scaling parameters whose values are constant vectors or matrices:
This scaling mode is based on binary-point-only scaling. Using this mode, you can scale a constant vector or matrix such that a common binary point is found based on the best precision for the largest value in the vector or matrix.
Constant scaling for best precision is available only for fixed-point data types with unspecified scaling. All other fixed-point data types use their specified scaling. You can use the Data Type Assistant (see Using the Data Type Assistant) on a block dialog box to enable the best precision scaling mode.
On a block dialog box, click the Show data
type assistant button
.
The Data Type Assistant appears.
In the Data Type Assistant, and from the Mode list, select Fixed point.
The Data Type Assistant displays additional options associated with fixed-point data types.
From the Scaling list, select Best precision.

To understand how you might use this scaling mode, consider a 3-by-3 matrix of doubles, M, defined as
3.3333e-003 3.3333e-004 3.3333e-005 3.3333e-002 3.3333e-003 3.3333e-004 3.3333e-001 3.3333e-002 3.3333e-003
Now suppose you specify M as the value of the Gain parameter for a Gain block. The results of specifying your own scaling versus using the constant scaling mode are described here:
Specified Scaling
Suppose the matrix elements are converted to a signed, 10-bit generalized fixed-point data type with binary-point-only scaling of 2-7 (that is, the binary point is located seven places to the left of the right most bit). With this data format, M becomes
0 0 0 3.1250e-002 0 0 3.3594e-001 3.1250e-002 0
Note that many of the matrix elements are zero, and for the nonzero entries, the scaled values differ from the original values. This is because a double is converted to a binary word of fixed size and limited precision for each element. The larger and more precise the conversion data type, the more closely the scaled values match the original values.
Constant Scaling for Best Precision
If M is scaled based on its largest matrix value, you obtain
2.9297e-003 0 0 3.3203e-002 2.9297e-003 0 3.3301e-001 3.3203e-002 2.9297e-003
Best precision would automatically select the fraction length that minimizes the quantization error. Even though precision was maximized for the given word length, quantization errors can still occur. In this example, a few elements still quantize to zero.
Simulink data type names must be valid MATLAB identifiers with less than 128 characters. The data type name provides information about container type, number encoding, and scaling.
You can represent a fixed-point number using the fixed-point scaling equation
![]()
where
V is the real-world value
is the slope
F is the slope adjustment factor
E is the fixed power-of-two exponent
Q is the stored integer
B is the bias
For more information, see Scaling.
The following table provides a key for various symbols that appear in Simulink products to indicate the data type and scaling of a fixed-point value.
Symbol | Description | Example |
|---|---|---|
| Container Type | ||
ufix | Unsigned fixed-point data type | ufix8 is an 8-bit unsigned fixed-point data type |
sfix | Signed fixed-point data type | sfix128 is a 128-bit signed fixed-point data type |
fltu | Scaled Doubles override of an unsigned fixed-point data type (ufix) | fltu32 is a scaled doubles override of ufix32 |
flts | Scaled Doubles override of a signed fixed-point data type (sfix) | flts64 is a scaled doubles override of sfix64 |
| Number Encoding | ||
e | 10^ | 125e8 equals 125*(10^(8)) |
n | Negative | n31 equals -31 |
p | Decimal point | 1p5 equals 1.5 p2 equals 0.2 |
| Scaling Encoding | ||
S | Slope | ufix16_S5_B7 is a 16-bit unsigned fixed-point data type with Slope of 5 and Bias of 7 |
B | Bias | ufix16_S5_B7 is a 16-bit unsigned fixed-point data type with Slope of 5 and Bias of 7 |
E | Fixed exponent (2^) A negative fixed exponent describes the fraction length | sfix32_En31 is a 32-bit signed fixed-point data type with a fraction length of 31 |
F | Slope adjustment factor | ufix16_F1p5_En50 is a 16-bit unsigned fixed-point data type with a SlopeAdjustmentFactor of 1.5 and a FixedExponent of -50 |
C,c,D, or d | Compressed encoding for Bias | No example available. For backwards compatibility only. |
T or t | Compressed encoding for Slope | No example available. For backwards compatibility only. |
Scaled doubles are a hybrid between floating-point and fixed-point numbers. The Simulink Fixed Point software stores them as doubles with the scaling, sign, and word length information retained. For example, the storage container for a fixed-point data type sfix16_En14 is int16. The storage container of the equivalent scaled doubles data type, flts16_En14 is floating-point double. (For details of the fixed-point scaling notation, see Fixed-Point Data Type and Scaling Notation. The Simulink Fixed Point software applies the scaling information to the stored floating-point double to obtain the real-world value. Storing the value in a double almost always eliminates overflow and precision issues.
What is the Difference between Scaled Doubles and True Doubles?.
The storage container for both the scaled doubles and true doubles data types is floating-point double. Therefore both data type override settings, True doubles and Scaled doubles, provide the range and precision advantages of floating-point doubles. Scaled doubles retain the information about the specified data type and scaling, but true doubles do not retain this information.
Consider an example where you are storing 0.75001 degrees Celsius in a data type sfix16_En13. For this data type , the slope, S, is 2^-13, the bias, B, is 0. Using the scaling equation
![]()
, where V is the real-world value and Q is the stored value. V = 2^-13 * Q. Therefore the real-world value, V, is 0.75001 and the stored value, Q, is V/S = 0.75001/2^-13 = 6144.08192. The data type sfix16_En13 can only represent integers, so the ideal value is quantized to 6144 causing precision loss.
If you override the data type sfix16_En13 with true doubles, the data type changes to double and you lose the information about the scaling. The stored-value equals the real-world value 0.75001.
If you override the data type sfix16_En13 with scaled doubles, the data type changes to flts16_En13. The scaling is still given by _En13 and is identical to that of the original data type. The only difference is the storage container used to hold the stored value which is now double so the stored-value is 6144.08192. This example demonstrates one advantage of using scaled doubles: the virtual elimination of quantization errors.
The Fixed-Point Tool enables you to perform various data type overrides on fixed-point signals in your simulations. Use scaled doubles to override the fixed-point data types and scaling using double-precision numbers, thus avoiding quantization effects. Overriding the fixed-point data types provides a floating-point benchmark that represents the ideal output. Scaled doubles are useful for:
Testing and debugging
Applying data type override to individual subsystems.
If you apply data type override to subsystems in your model rather than to the whole model, Scaled doubles provide the information that the fixed-point portions of the model need for consistent data type propagation.
To display the data types for the ports in your model.
From the Simulink Format menu, point to Port/Signal Displays, and then click Port Data Types .
The port display for fixed-point signals consists of three parts: the data type, the number of bits, and the scaling. These three parts reflect the block Output data type parameter value or the data type and scaling that is inherited from the driving block or through back propagation.
The following model displays its port data types.

In the model, the data type displayed with the In1 block indicates that the output data type name is sfix16_Sp2_B10. This corresponds to fixdt(1, 16, 0.2, 10) which is a signed 16 bit fixed-point number with slope 0.2 and bias 10.0. The data type displayed with the In2 block indicates that the output data type name is sfix16_En6. This corresponds to fixdt(1, 16, 6) which is a signed 16 bit fixed-point number with fraction length of 6.
![]() | Overview | Floating-Point Numbers | ![]() |

Learn more about Simulink through this collection of videos, articles, technical literature and the Getting Started with Simulink Guide.
| © 1984-2009- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |