Whenever you add two fixed-point numbers, you may need a carry bit to correctly represent the result. For this reason, when adding two B-bit numbers (with the same scaling), the resulting value has an extra bit compared to the two operands used.

a = fi(0.234375,0,4,6); c = a+a

c = 0.4688 DataTypeMode: Fixed-point: binary point scaling Signedness: Unsigned WordLength: 5 FractionLength: 6

a.bin

ans = 1111

c.bin

ans = 11110

If you add or subtract two numbers with different precision, the radix point first needs to be aligned to perform the operation. The result is that there is a difference of more than one bit between the result of the operation and the operands.

a = fi(pi,1,16,13); b = fi(0.1,1,12,14); c = a + b

c = 3.2416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 18 FractionLength: 14

In general, a full precision product requires a word length
equal to the sum of the word length of the operands. In the following
example, note that the word length of the product `c`

is
equal to the word length of `a`

plus the word length
of `b`

. The fraction length of `c`

is
also equal to the fraction length of `a`

plus the
fraction length of `b`

.

a = fi(pi,1,20), b = fi(exp(1),1,16)

a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 20 FractionLength: 17 b = 2.7183 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13

c = a*b

c = 8.5397 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 36 FractionLength: 30

Note that in C, the result of an operation between an integer
data type and a double data type promotes to a double. However, in MATLAB^{®},
the result of an operation between a built-in integer data type and
a double data type is an integer. In this respect, the `fi`

object
behaves like the built-in integer data types in MATLAB.

When doing addition between `fi`

and `double`

,
the double is cast to a `fi`

with the same numerictype
as the `fi`

input. The result of the operation is
a `fi`

. When doing multiplication between `fi`

and `double`

,
the double is cast to a `fi`

with the same word length
and signedness of the `fi`

, and best precision fraction
length. The result of the operation is a `fi`

.

a = fi(pi);

a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13

b = 0.5 * a

b = 1.5708 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 32 FractionLength: 28

When doing arithmetic between a `fi`

and one
of the built-in integer data types, `[u]int[8, 16, 32]`

,
the word length and signedness of the integer are preserved. The result
of the operation is a `fi`

.

a = fi(pi); b = int8(2) * a

b = 6.2832 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 24 FractionLength: 13

When doing arithmetic between a `fi`

and a
logical data type, the logical is treated as an unsigned `fi`

object
with a value of 0 or 1, and word length 1. The result of the operation
is a `fi`

object.

a = fi(pi); b = logical(1); c = a*b

c = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 17 FractionLength: 13

`fimath`

properties define the rules for performing
arithmetic operations on `fi`

objects, including
math, rounding, and overflow properties. A `fi`

object
can have a local `fimath`

object, or it can use the
default `fimath`

properties. You can attach a `fimath`

object
to a `fi`

object by using `setfimath`

.
Alternatively, you can specify `fimath`

properties
in the `fi`

constructor at creation. When a `fi`

object
has a local `fimath`

, rather than using the default
properties, the display of the `fi`

object shows
the `fimath`

properties. In this example, `a`

has
the `ProductMode`

property specified in the constructor.

a = fi(5,1,16,4,'ProductMode','KeepMSB')

a = 5 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 4 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: KeepMSB ProductWordLength: 32 SumMode: FullPrecision

`ProductMode`

property
of `a`

is set to `KeepMSB`

while
the remaining `fimath`

properties use the default
values.For more information on the `fimath`

object,
its properties, and their default values, see fimath Object Properties.

The following table shows the bit growth of `fi`

objects, `A`

and `B`

,
when their `SumMode`

and `ProductMode`

properties
use the default `fimath`

value, `FullPrecision`

.

A | B | Sum = A+B | Prod = A*B | |
---|---|---|---|---|

Format | `fi(v` | `fi(v` | — | — |

Sign | `s` | `s` | `S` = (`s` ||`s` ) | `S` = (`s` ||`s` ) |

Integer bits | `I` | `I` | `I` | `I` |

Fraction bits | `f` | `f` | `F` | `F` |

Total bits | `w` | `w` | `S` | `w` |

This example shows how bit growth can occur in a `for`

-loop.

T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end

acc = 1 s33,0 acc = 3 s34,0 acc = 6 s35,0

`acc`

increases
with each iteration of the loop. This increase causes two problems:
One is that code generation does not allow changing data types in
a loop. The other is that, if the loop is long enough, you run out
of memory in MATLAB. See Controlling Bit Growth for some strategies to avoid this
problem.By specifying the `fimath`

properties of a `fi`

object,
you can control the bit growth as operations are performed on the
object.

F = fimath('SumMode', 'SpecifyPrecision', 'SumWordLength', 8,... 'SumFractionLength', 0); a = fi(8,1,8,0, F); b = fi(3, 1, 8, 0); c = a+b

c = 11 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: FullPrecision SumMode: SpecifyPrecision SumWordLength: 8 SumFractionLength: 0 CastBeforeSum: true

The `fi`

object `a`

has a
local `fimath`

object `F`

. `F`

specifies
the word length and fraction length of the sum. Under the default `fimath`

settings,
the output, `c`

, normally has word length 9, and
fraction length 0. However because `a`

had a local `fimath`

object,
the resulting `fi`

object has word length 8 and fraction
length 0.

You can also use `fimath`

properties to control
bit growth in a `for`

-loop.

F = fimath('SumMode', 'SpecifyPrecision','SumWordLength',32,... 'SumFractionLength',0); T.acc = fi([],1,32,0,F); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end

acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0

Unlike when `T.acc`

was using the default `fimath`

properties,
the bit growth of `acc`

is now restricted. Thus,
the word length of `acc`

stays at 32.

Another way to control bit growth is by using subscripted assignment. ```
a(I)
= b
```

assigns the values of `b`

into the
elements of `a`

specified by the subscript vector, `I`

,
while retaining the `numerictype`

of `a`

.

T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); % Assign in to acc without changing its type for n = 1:length(x) acc(:) = acc + x(n) end

acc (:) = acc + x(n) dictates
that the values at subscript vector, `(:)`

, change.
However, the `numerictype`

of output `acc`

remains
the same. Because `acc`

is a scalar, you also receive
the same output if you use `(1)`

as the subscript
vector.

for n = 1:numel(x) acc(1) = acc + x(n); end

acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0

The `numerictype`

of `acc`

remains
the same at each iteration of the `for`

-loop.

Subscripted assignment can also help you control bit growth
in a function. In the function, `cumulative_sum`

,
the `numerictype`

of `y`

does not
change, but the values in the elements specified by *n* do.

function y = cumulative_sum(x) % CUMULATIVE_SUM Cumulative sum of elements of a vector. % % For vectors, Y = cumulative_sum(X) is a vector containing the % cumulative sum of the elements of X. The type of Y is the type of X. y = zeros(size(x),'like',x); y(1) = x(1); for n = 2:length(x) y(n) = y(n-1) + x(n); end end

y = cumulative_sum(fi([1:10],1,8,0))

y = 1 3 6 10 15 21 28 36 45 55 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0

For more information on subscripted assignment, see the `subsasgn`

function.

Another way you can control bit growth is by using the `accumpos`

and `accumneg`

functions to perform addition
and subtraction operations. Similar to using subscripted assignment, `accumpos`

and `accumneg`

preserve
the data type of one of its input `fi`

objects while
allowing you to specify a rounding method, and overflow action in
the input values.

For more information on how to implement `accumpos`

and `accumneg`

,
see Avoid Multiword Operations in Generated Code

When performing fixed-point arithmetic, consider the possibility
and consequences of overflow. The `fimath`

object
specifies the overflow and rounding modes used when performing arithmetic
operations.

Overflows can occur when the result of an operation exceeds
the maximum or minimum representable value. The `fimath`

object
has an `OverflowAction`

property which offers two
ways of dealing with overflows: saturation and wrap. If you set `OverflowAction`

to `saturate`

,
overflows are saturated to the maximum or minimum value in the range.
If you set `OverflowAction`

to `wrap`

,
any overflows wrap using modulo arithmetic, if unsigned, or two’s
complement wrap, if signed.

For more information on how to detect overflow see Underflow and Overflow Logging Using fipref.

There are several factors to consider when choosing a rounding method, including cost, bias, and whether or not there is a possibility of overflow. Fixed-Point Designer™ software offers several different rounding functions to meet the requirements of your design.

Rounding Method | Description | Cost | Bias | Possibility of Overflow |
---|---|---|---|---|

`ceil` | Rounds to the closest representable number in the direction of positive infinity. | Low | Large positive | Yes |

`convergent` | Rounds to the closest representable number. In the case of
a tie, `convergent` rounds to the nearest even number.
This approach is the least-biased rounding method provided by the
toolbox. | High | Unbiased | Yes |

`floor` | Rounds to the closest representable number in the direction of negative infinity, equivalent to two’s complement truncation. | Low | Large negative | No |

`nearest` | Rounds to the closest representable number. In the case of
a tie, `nearest` rounds to the closest representable
number in the direction of positive infinity. This rounding method
is the default for `fi` object creation and `fi` arithmetic. | Moderate | Small positive | Yes |

`round` | Rounds to the closest representable number. In the case of
a tie, the `round` method rounds:Positive numbers to the closest representable number in the direction of positive infinity. Negative numbers to the closest representable number in the direction of negative infinity.
| High | Small negative for negative samples Unbiased for samples with evenly distributed positive and negative values Small positive for positive samples
| Yes |

`fix` | Rounds to the closest representable number in the direction of zero. | Low | Large positive for negative samples Unbiased for samples with evenly distributed positive and negative values Large negative for positive samples
| No |