## Performing Fixed-Point Arithmetic

### Fixed-Point Arithmetic

#### Addition and subtraction

Whenever you add two fixed-point numbers, you may need a carry bit to correctly represent the result. For this reason, when adding two B-bit numbers (with the same scaling), the resulting value has an extra bit compared to the two operands used.

a = fi(0.234375,0,4,6); c = a+a

c = 0.4688 DataTypeMode: Fixed-point: binary point scaling Signedness: Unsigned WordLength: 5 FractionLength: 6

a.bin

ans = 1111

c.bin

ans = 11110

If you add or subtract two numbers with different precision, the radix point first needs to be aligned to perform the operation. The result is that there is a difference of more than one bit between the result of the operation and the operands.

a = fi(pi,1,16,13); b = fi(0.1,1,12,14); c = a + b

c = 3.2416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 18 FractionLength: 14

#### Multiplication

In general, a full precision product requires a word length equal to the sum
of the word length of the operands. In the following example, note that the word
length of the product `c`

is equal to the word length of
`a`

plus the word length of `b`

. The
fraction length of `c`

is also equal to the fraction length of
`a`

plus the fraction length of
`b`

.

a = fi(pi,1,20), b = fi(exp(1),1,16)

a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 20 FractionLength: 17 b = 2.7183 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13

c = a*b

c = 8.5397 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 36 FractionLength: 30

#### Math with other built in data types

Note that in C, the result of an operation between an integer data type and a
double data type promotes to a double. However, in MATLAB^{®}, the result of an operation between a built-in integer data type
and a double data type is an integer. In this respect, the `fi`

object behaves like the built-in integer data types in MATLAB.

When doing addition between `fi`

and
`double`

, the double is cast to a `fi`

with the same numerictype as the `fi`

input. The result of the
operation is a `fi`

. When doing multiplication between
`fi`

and `double`

, the double is cast to a
`fi`

with the same word length and signedness of the
`fi`

, and best precision fraction length. The result of the
operation is a `fi`

.

a = fi(pi);

a = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 13

b = 0.5 * a

b = 1.5708 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 32 FractionLength: 28

When doing arithmetic between a `fi`

and one of the built-in
integer data types, `[u]int[8, 16, 32]`

, the word length and
signedness of the integer are preserved. The result of the operation is a
`fi`

.

a = fi(pi); b = int8(2) * a

b = 6.2832 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 24 FractionLength: 13

When doing arithmetic between a `fi`

and a logical data type,
the logical is treated as an unsigned `fi`

object with a value
of 0 or 1, and word length 1. The result of the operation is a
`fi`

object.

a = fi(pi); b = logical(1); c = a*b

c = 3.1416 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 17 FractionLength: 13

### The fimath Object

`fimath`

properties define the rules for performing arithmetic
operations on `fi`

objects, including math, rounding, and overflow
properties. A `fi`

object can have a local
`fimath`

object, or it can use the default
`fimath`

properties. You can attach a `fimath`

object to a `fi`

object by using `setfimath`

.
Alternatively, you can specify `fimath`

properties in the
`fi`

constructor at creation. When a `fi`

object has a local `fimath`

, rather than using the default
properties, the display of the `fi`

object shows the
`fimath`

properties. In this example, `a`

has
the `ProductMode`

property specified in the
constructor.

a = fi(5,1,16,4,'ProductMode','KeepMSB')

a = 5 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 16 FractionLength: 4 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: KeepMSB ProductWordLength: 32 SumMode: FullPrecision

`ProductMode`

property of `a`

is set to
`KeepMSB`

while the remaining `fimath`

properties use the default values.**Note**

For more information on the `fimath`

object, its properties,
and their default values, see fimath Object Properties.

### Bit Growth

The following table shows the bit growth of `fi`

objects,
`A`

and `B`

, when their
`SumMode`

and `ProductMode`

properties use the
default `fimath`

value, `FullPrecision`

.

A | B | Sum = A+B | Prod = A*B | |
---|---|---|---|---|

Format | `fi(v` | `fi(v` | — | — |

Sign | `s` | `s` | `S` =
(`s` ||`s` ) | `S` =
(`s` ||`s` ) |

Integer bits | `I` | `I` | `I` | `I` |

Fraction bits | `f` | `f` | `F` | `F` |

Total bits | `w` | `w` | `S` | `w` |

This example shows how bit growth can occur in a
`for`

-loop.

T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end

acc = 1 s33,0 acc = 3 s34,0 acc = 6 s35,0

`acc`

increases with each iteration of the loop.
This increase causes two problems: One is that code generation does not allow
changing data types in a loop. The other is that, if the loop is long enough, you
run out of memory in MATLAB. See Controlling Bit Growth for some strategies to avoid this
problem.### Controlling Bit Growth

#### Using fimath

By specifying the `fimath`

properties of a
`fi`

object, you can control the bit growth as operations
are performed on the object.

F = fimath('SumMode', 'SpecifyPrecision', 'SumWordLength', 8,... 'SumFractionLength', 0); a = fi(8,1,8,0, F); b = fi(3, 1, 8, 0); c = a+b

c = 11 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0 RoundingMethod: Nearest OverflowAction: Saturate ProductMode: FullPrecision SumMode: SpecifyPrecision SumWordLength: 8 SumFractionLength: 0 CastBeforeSum: true

The `fi`

object `a`

has a local
`fimath`

object `F`

. `F`

specifies the word length and fraction length of the sum. Under the default
`fimath`

settings, the output, `c`

,
normally has word length 9, and fraction length 0. However because
`a`

had a local `fimath`

object, the
resulting `fi`

object has word length 8 and fraction length
0.

You can also use `fimath`

properties to control bit growth in
a `for`

-loop.

F = fimath('SumMode', 'SpecifyPrecision','SumWordLength',32,... 'SumFractionLength',0); T.acc = fi([],1,32,0,F); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); for n = 1:length(x) acc = acc + x(n) end

acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0

Unlike when `T.acc`

was using the default
`fimath`

properties, the bit growth of
`acc`

is now restricted. Thus, the word length of
`acc`

stays at 32.

#### Subscripted Assignment

Another way to control bit growth is by using subscripted assignment.
`a(I) = b`

assigns the values of `b`

into
the elements of `a`

specified by the subscript vector,
`I`

, while retaining the `numerictype`

of
`a`

.

T.acc = fi([],1,32,0); T.x = fi([],1,16,0); x = cast(1:3,'like',T.x); acc = zeros(1,1,'like',T.acc); % Assign in to acc without changing its type for n = 1:length(x) acc(:) = acc + x(n) end

acc (:) = acc + x(n) dictates that the values at subscript vector,
`(:)`

, change. However, the `numerictype`

of output `acc`

remains the same. Because
`acc`

is a scalar, you also receive the same output if you
use `(1)`

as the subscript
vector.

for n = 1:numel(x) acc(1) = acc + x(n); end

acc = 1 s32,0 acc = 3 s32,0 acc = 6 s32,0

The `numerictype`

of `acc`

remains the same
at each iteration of the `for`

-loop.

Subscripted assignment can also help you control bit growth in a function. In
the function, `cumulative_sum`

, the
`numerictype`

of `y`

does not change, but
the values in the elements specified by *n*
do.

function y = cumulative_sum(x) % CUMULATIVE_SUM Cumulative sum of elements of a vector. % % For vectors, Y = cumulative_sum(X) is a vector containing the % cumulative sum of the elements of X. The type of Y is the type of X. y = zeros(size(x),'like',x); y(1) = x(1); for n = 2:length(x) y(n) = y(n-1) + x(n); end end

y = cumulative_sum(fi([1:10],1,8,0))

y = 1 3 6 10 15 21 28 36 45 55 DataTypeMode: Fixed-point: binary point scaling Signedness: Signed WordLength: 8 FractionLength: 0

**Note**

For more information on subscripted assignment, see the `subsasgn`

function.

`accumpos`

and `accumneg`

Another way you can control bit growth is by using the `accumpos`

and `accumneg`

functions to
perform addition and subtraction operations. Similar to using subscripted
assignment, `accumpos`

and `accumneg`

preserve
the data type of one of its input `fi`

objects while allowing
you to specify a rounding method, and overflow action in the input
values.

For more information on how to implement `accumpos`

and
`accumneg`

, see Avoid Multiword Operations in Generated Code

### Overflows and Rounding

When performing fixed-point arithmetic, consider the possibility and consequences
of overflow. The `fimath`

object specifies the overflow and
rounding modes used when performing arithmetic operations.

#### Overflows

Overflows can occur when the result of an operation exceeds the maximum or
minimum representable value. The `fimath`

object has an
`OverflowAction`

property which offers two ways of dealing
with overflows: saturation and wrap. If you set
`OverflowAction`

to `saturate`

, overflows
are saturated to the maximum or minimum value in the range. If you set
`OverflowAction`

to `wrap`

, any overflows
wrap using modulo arithmetic, if unsigned, or two’s complement wrap, if
signed.

For more information on how to detect overflow see Underflow and Overflow Logging Using fipref.

#### Rounding

There are several factors to consider when choosing a rounding method, including cost, bias, and whether or not there is a possibility of overflow. Fixed-Point Designer™ software offers several different rounding functions to meet the requirements of your design.

Rounding Method | Description | Cost | Bias | Possibility of Overflow |
---|---|---|---|---|

`ceil` | Rounds to the closest representable number in the direction of positive infinity. | Low | Large positive | Yes |

`convergent` | Rounds to the closest representable number. In the case of a
tie, `convergent` rounds to the nearest even
number. This approach is the least-biased rounding method
provided by the toolbox. | High | Unbiased | Yes |

`floor` | Rounds to the closest representable number in the direction of negative infinity, equivalent to two’s complement truncation. | Low | Large negative | No |

`nearest` | Rounds to the closest representable number. In the case of a
tie, `nearest` rounds to the closest
representable number in the direction of positive infinity. This
rounding method is the default for `fi` object
creation and `fi` arithmetic. | Moderate | Small positive | Yes |

`round` | Rounds to the closest representable number. In the case of a
tie, the `round` method rounds:Positive numbers to the closest representable number in the direction of positive infinity. Negative numbers to the closest representable number in the direction of negative infinity.
| High |
Small negative for negative samples Unbiased for samples with evenly distributed positive and negative values Small positive for positive samples
| Yes |

`fix` | Rounds to the closest representable number in the direction of zero. | Low |
Large positive for negative samples Unbiased for samples with evenly distributed positive and negative values Large negative for positive samples
| No |