# Calculate Fixed-Point Arctangent

This example shows how to use the CORDIC algorithm, polynomial approximation, and lookup table approaches to calculate the fixed-point, four quadrant inverse tangent. These implementations are approximations to the MATLAB® built-in function `atan2`. An efficient fixed-point arctangent algorithm to estimate an angle is critical to many applications, including control of robotics, frequency tracking in wireless communications, and many more.

## Contents

- Calculating
`atan2(y,x)`Using the CORDIC Algorithm - Understanding the
`CORDICATAN2`Code - Visualizing the Vectoring Mode CORDIC Iterations
- Performing Overall Error Analysis of the CORDIC Algorithm
- Accelerating the Fixed-Point
`CORDICATAN2`Algorithm Using`FIACCEL` - Calculating
`atan2(y,x)`Using Chebyshev Polynomial Approximation - Comparing the Algorithmic Error of the CORDIC and Polynomial Approximation Algorithms
- Converting the Floating-Point Chebyshev Polynomial Approximation Algorithm to Fixed Point
- Performing the Overall Error Analysis of the Polynomial Approximation Algorithm
- Calculating
`atan2(y,x)`Using Lookup Tables - Single Lookup Table Based Approach
- Understanding the Lookup Table Based
`ATAN2`Implementation - Computing Fixed-point Argtangent Using
`ATAN2` - Comparison of Overall Error Between the Fixed-Point Implementations
- Comparing the Costs of the Fixed-Point Approximation Algorithms
- References

## Calculating `atan2(y,x)` Using the CORDIC Algorithm

**Introduction**

The `cordicatan2` function approximates the MATLAB® `atan2` function, using a CORDIC-based algorithm. CORDIC is an acronym for COordinate Rotation DIgital Computer. The Givens rotation-based CORDIC algorithm (see [1,2]) is one of the most hardware efficient algorithms because it only requires iterative shift-add operations. The CORDIC algorithm eliminates the need for explicit multipliers, and is suitable for calculating a variety of functions, such as sine, cosine, arcsine, arccosine, arctangent, vector magnitude, divide, square root, hyperbolic and logarithmic functions.

**CORDIC Vectoring Computation Mode**

The CORDIC vectoring mode equations are widely used to calculate `atan(y/x)`. In vectoring mode, the CORDIC rotator rotates the input vector towards the positive X-axis to minimize the component of the residual vector. For each iteration, if the coordinate of the residual vector is positive, the CORDIC rotator rotates clockwise (using a negative angle); otherwise, it rotates counter-clockwise (using a positive angle). If the angle accumulator is initialized to 0, at the end of the iterations, the accumulated rotation angle is the angle of the original input vector.

In vectoring mode, the CORDIC equations are:

is the angle accumulator

where if , and otherwise;

, and is the total number of iterations.

As approaches :

As explained above, the arctangent can be directly computed using the vectoring mode CORDIC rotator with the angle accumulator initialized to zero, i.e., and .

## Understanding the `CORDICATAN2` Code

**Introduction**

The `cordicatan2` function computes the four quadrant arctangent of the elements of x and y, where . `cordicatan2` calculates the arctangent using the vectoring mode CORDIC algorithm, according to the above CORDIC equations.

**Initialization**

The `cordicatan2` function performs the following initialization steps:

- is set to the initial X input value.
- is set to the initial Y input value.
- is set to zero.

After iterations, these initial values lead to

**Shared Fixed-Point and Floating-Point CORDIC Kernel Code**

The MATLAB code for the CORDIC algorithm (vectoring mode) kernel portion is as follows (for the case of scalar `x`, `y`, and `z`). This same code is used for both fixed-point and floating-point operations:

function [x, y, z] = cordic_vectoring_kernel(x, y, z, inpLUT, n) % Perform CORDIC vectoring kernel algorithm for N kernel iterations. xtmp = x; ytmp = y; for idx = 1:n if y < 0 x(:) = x - ytmp; y(:) = y + xtmp; z(:) = z - inpLUT(idx); else x(:) = x + ytmp; y(:) = y - xtmp; z(:) = z + inpLUT(idx); end xtmp = bitsra(x, idx); % bit-shift-right for multiply by 2^(-idx) ytmp = bitsra(y, idx); % bit-shift-right for multiply by 2^(-idx) end

## Visualizing the Vectoring Mode CORDIC Iterations

The CORDIC algorithm is usually run through a specified (constant) number of iterations since ending the CORDIC iterations early would break pipelined code, and the CORDIC gain would not be constant because would vary.

For very large values of , the CORDIC algorithm is guaranteed to converge, but not always monotonically. As will be shown in the following example, intermediate iterations occasionally rotate the vector closer to the positive X-axis than the following iteration does. You can typically achieve greater accuracy by increasing the total number of iterations.

**Example**

In the following example, iteration 5 provides a better estimate of the angle than iteration 6, and the CORDIC algorithm converges in later iterations.

Initialize the input vector with angle degrees, magnitude = 1

origFormat = get(0, 'format'); % store original format setting; % restore this setting at the end. format short % theta = 43*pi/180; % input angle in radians Niter = 10; % number of iterations inX = cos(theta); % x coordinate of the input vector inY = sin(theta); % y coordinate of the input vector % % pre-allocate memories zf = zeros(1, Niter); xf = [inX, zeros(1, Niter)]; yf = [inY, zeros(1, Niter)]; angleLUT = atan(2.^-(0:Niter-1)); % pre-calculate the angle lookup table % % Call CORDIC vectoring kernel algorithm for k = 1:Niter [xf(k+1), yf(k+1), zf(k)] = fixed.internal.cordic_vectoring_kernel_private(inX, inY, 0, angleLUT, k); end

The following output shows the CORDIC angle accumulation (in degrees) through 10 iterations. Note that the 5th iteration produced less error than the 6th iteration, and that the calculated angle quickly converges to the actual input angle afterward.

angleAccumulator = zf*180/pi; angleError = angleAccumulator - theta*180/pi; fprintf('Iteration: %2d, Calculated angle: %7.3f, Error in degrees: %10g, Error in bits: %g\n',... [(1:Niter); angleAccumulator(:)'; angleError(:)';log2(abs(zf(:)'-theta))]);

Iteration: 1, Calculated angle: 45.000, Error in degrees: 2, Error in bits: -4.84036 Iteration: 2, Calculated angle: 18.435, Error in degrees: -24.5651, Error in bits: -1.22182 Iteration: 3, Calculated angle: 32.471, Error in degrees: -10.5288, Error in bits: -2.44409 Iteration: 4, Calculated angle: 39.596, Error in degrees: -3.40379, Error in bits: -4.07321 Iteration: 5, Calculated angle: 43.173, Error in degrees: 0.172543, Error in bits: -8.37533 Iteration: 6, Calculated angle: 41.383, Error in degrees: -1.61737, Error in bits: -5.14671 Iteration: 7, Calculated angle: 42.278, Error in degrees: -0.722194, Error in bits: -6.3099 Iteration: 8, Calculated angle: 42.725, Error in degrees: -0.27458, Error in bits: -7.70506 Iteration: 9, Calculated angle: 42.949, Error in degrees: -0.0507692, Error in bits: -10.1403 Iteration: 10, Calculated angle: 43.061, Error in degrees: 0.0611365, Error in bits: -9.87218

As N approaches , the CORDIC rotator gain approaches 1.64676. In this example, the input was on the unit circle, so the initial rotator magnitude is 1. The following output shows the rotator magnitude through 10 iterations:

rotatorMagnitude = sqrt(xf.^2+yf.^2); % CORDIC rotator gain through iterations fprintf('Iteration: %2d, Rotator magnitude: %g\n',... [(0:Niter); rotatorMagnitude(:)']);

Iteration: 0, Rotator magnitude: 1 Iteration: 1, Rotator magnitude: 1.41421 Iteration: 2, Rotator magnitude: 1.58114 Iteration: 3, Rotator magnitude: 1.6298 Iteration: 4, Rotator magnitude: 1.64248 Iteration: 5, Rotator magnitude: 1.64569 Iteration: 6, Rotator magnitude: 1.64649 Iteration: 7, Rotator magnitude: 1.64669 Iteration: 8, Rotator magnitude: 1.64674 Iteration: 9, Rotator magnitude: 1.64676 Iteration: 10, Rotator magnitude: 1.64676

Note that approaches 0, and approaches because .

y_n = yf(end)

y_n = -0.0018

x_n = xf(end)

x_n = 1.6468

```
figno = 1;
fidemo.fixpt_atan2_demo_plot(figno, xf, yf) %Vectoring Mode CORDIC Iterations
```

```
figno = figno + 1; %Cumulative Angle and Rotator Magnitude Through Iterations
fidemo.fixpt_atan2_demo_plot(figno,Niter, theta, angleAccumulator, rotatorMagnitude)
```

## Performing Overall Error Analysis of the CORDIC Algorithm

The overall error consists of two parts:

- The algorithmic error that results from the CORDIC rotation angle being represented by a finite number of basic angles.
- The quantization or rounding error that results from the finite precision representation of the angle lookup table, and from the finite precision arithmetic used in fixed-point operations.

**Calculate the CORDIC Algorithmic Error**

theta = (-178:2:180)*pi/180; % angle in radians inXflt = cos(theta); % generates input vector inYflt = sin(theta); Niter = 12; % total number of iterations zflt = cordicatan2(inYflt, inXflt, Niter); % floating-point results

Calculate the maximum magnitude of the CORDIC algorithmic error by comparing the CORDIC computation to the builtin `atan2` function.

```
format long
cordic_algErr_real_world_value = max(abs((atan2(inYflt, inXflt) - zflt)))
```

cordic_algErr_real_world_value = 4.753112306290497e-04

The log base 2 error is related to the number of iterations. In this example, we use 12 iterations (i.e., accurate to 11 binary digits), so the magnitude of the error is less than

cordic_algErr_bits = log2(cordic_algErr_real_world_value)

cordic_algErr_bits = -11.038839889583048

*Relationship Between Number of Iterations and Precision*

Once the quantization error dominates the overall error, i.e., the quantization error is greater than the algorithmic error, increasing the total number of iterations won't significantly decrease the overall error of the fixed-point CORDIC algorithm. You should pick your fraction lengths and total number of iterations to ensure that the quantization error is smaller than the algorithmic error. In the CORDIC algorithm, the precision increases by one bit every iteration. Thus, there is no reason to pick a number of iterations greater than the precision of the input data.

Another way to look at the relationship between the number of iterations and the precision is in the right-shift step of the algorithm. For example, on the counter-clockwise rotation

x(:) = x0 - bitsra(y,i); y(:) = y + bitsra(x0,i);

if i is equal to the word length of y and x0, then `bitsra(y,i)` and `bitsra(x0,i)` shift all the way to zero and do not contribute anything to the next step.

To measure the error from the fixed-point algorithm, and not the differences in input values, compute the floating-point reference with the same inputs as the fixed-point CORDIC algorithm.

inXfix = sfi(inXflt, 16, 14); inYfix = sfi(inYflt, 16, 14); zref = atan2(double(inYfix), double(inXfix)); zfix8 = cordicatan2(inYfix, inXfix, 8); zfix10 = cordicatan2(inYfix, inXfix, 10); zfix12 = cordicatan2(inYfix, inXfix, 12); zfix14 = cordicatan2(inYfix, inXfix, 14); zfix15 = cordicatan2(inYfix, inXfix, 15); cordic_err = bsxfun(@minus,zref,double([zfix8;zfix10;zfix12;zfix14;zfix15]));

The error depends on the number of iterations and the precision of the input data. In the above example, the input data is in the range [-1, +1], and the fraction length is 14. From the following tables showing the maximum error at each iteration, and the figure showing the overall error of the CORDIC algorithm, you can see that the error decreases by about 1 bit per iteration until the precision of the data is reached.

iterations = [8, 10, 12, 14, 15]; max_cordicErr_real_world_value = max(abs(cordic_err')); fprintf('Iterations: %2d, Max error in real-world-value: %g\n',... [iterations; max_cordicErr_real_world_value]);

Iterations: 8, Max error in real-world-value: 0.00773633 Iterations: 10, Max error in real-world-value: 0.00187695 Iterations: 12, Max error in real-world-value: 0.000501175 Iterations: 14, Max error in real-world-value: 0.000244621 Iterations: 15, Max error in real-world-value: 0.000244621

```
max_cordicErr_bits = log2(max_cordicErr_real_world_value);
fprintf('Iterations: %2d, Max error in bits: %g\n',[iterations; max_cordicErr_bits]);
```

Iterations: 8, Max error in bits: -7.01414 Iterations: 10, Max error in bits: -9.05739 Iterations: 12, Max error in bits: -10.9624 Iterations: 14, Max error in bits: -11.9972 Iterations: 15, Max error in bits: -11.9972

figno = figno + 1; fidemo.fixpt_atan2_demo_plot(figno, theta, cordic_err)

## Accelerating the Fixed-Point `CORDICATAN2` Algorithm Using `FIACCEL`

You can generate a MEX function from MATLAB code using the MATLAB® fiaccel command. Typically, running a generated MEX function can improve the simulation speed, although the actual speed improvement depends on the simulation platform being used. The following example shows how to accelerate the fixed-point `cordicatan2` algorithm using `fiaccel`.

The `fiaccel` function compiles the MATLAB code into a MEX function. This step requires the creation of a temporary directory and write permissions in that directory.

```
tempdirObj = fidemo.fiTempdir('fixpt_atan2_demo');
```

When you declare the number of iterations to be a constant (e.g., `12`) using `coder.newtype('constant',12)`, the compiled angle lookup table will also be constant, and thus won't be computed at each iteration. Also, when you call the compiled MEX file `cordicatan2_mex`, you will not need to give it the input argument for the number of iterations. If you pass in the number of iterations, the MEX function will error.

The data type of the input parameters determines whether the `cordicatan2` function performs fixed-point or floating-point calculations. When MATLAB generates code for this file, code is only generated for the specific data type. For example, if the inputs are fixed point, only fixed-point code is generated.

inp = {inYfix, inXfix, coder.newtype('constant',12)}; % example inputs for the function fiaccel('cordicatan2', '-o', 'cordicatan2_mex', '-args', inp)

First, calculate a vector of 4 quadrant `atan2` by calling `cordicatan2`.

tstart = tic; cordicatan2(inYfix,inXfix,Niter); telapsed_Mcordicatan2 = toc(tstart);

Next, calculate a vector of 4 quadrant `atan2` by calling the MEX-function `cordicatan2_mex`

```
cordicatan2_mex(inYfix,inXfix); % load the MEX file
tstart = tic;
cordicatan2_mex(inYfix,inXfix);
telapsed_MEXcordicatan2 = toc(tstart);
```

Now, compare the speed. Type the following in the MATLAB command window to see the speed improvement on your specific platform:

fiaccel_speedup = telapsed_Mcordicatan2/telapsed_MEXcordicatan2;

To clean up the temporary directory, run the following commands:

```
clear cordicatan2_mex;
status = tempdirObj.cleanUp;
```

## Calculating `atan2(y,x)` Using Chebyshev Polynomial Approximation

Polynomial approximation is a multiply-accumulate (MAC) centric algorithm. It can be a good choice for DSP implementations of non-linear functions like `atan(x)`.

For a given degree of polynomial, and a given function `f(x) = atan(x)` evaluated over the interval of [-1, +1], the polynomial approximation theory tries to find the polynomial that minimizes the maximum value of , where `P(x)` is the approximating polynomial. In general, you can obtain polynomials very close to the optimal one by approximating the given function in terms of Chebyshev polynomials and cutting off the polynomial at the desired degree.

The approximation of arctangent over the interval of [-1, +1] using the Chebyshev polynomial of the first kind is summarized in the following formula:

where

Therefore, the 3rd order Chebyshev polynomial approximation is

The 5th order Chebyshev polynomial approximation is

The 7th order Chebyshev polynomial approximation is

You can obtain four quadrant output through angle correction based on the properties of the arctangent function.

## Comparing the Algorithmic Error of the CORDIC and Polynomial Approximation Algorithms

In general, higher degrees of polynomial approximation produce more accurate final results. However, higher degrees of polynomial approximation also increase the complexity of the algorithm and require more MAC operations and more memory. To be consistent with the CORDIC algorithm and the MATLAB `atan2` function, the input arguments consist of both `x` and `y` coordinates instead of the ratio `y/x`.

To eliminate quantization error, floating-point implementations of the CORDIC and Chebyshev polynomial approximation algorithms are used in the comparison below. An algorithmic error comparison reveals that increasing the number of CORDIC iterations results in less error. It also reveals that the CORDIC algorithm with 12 iterations provides a slightly better angle estimation than the 5th order Chebyshev polynomial approximation. The approximation error of the 3rd order Chebyshev Polynomial is about 8 times larger than that of the 5th order Chebyshev polynomial. You should choose the order or degree of the polynomial based on the required accuracy of the angle estimation and the hardware constraints.

The coefficients of the Chebyshev polynomial approximation for `atan(x)` are shown in ascending order of `x`.

constA3 = [0.970562748477141, -0.189514164974601]; % 3rd order constA5 = [0.994949366116654,-0.287060635532652,0.078037176446441]; % 5th order constA7 = [0.999133448222780 -0.320533292381664 0.144982490144465... -0.038254464970299]; % 7th order theta = (-90:1:90)*pi/180; % angle in radians inXflt = cos(theta); inYflt = sin(theta); zfltRef = atan2(inYflt, inXflt); % Ideal output from ATAN2 function zfltp3 = fidemo.poly_atan2(inYflt,inXflt,3,constA3); % 3rd order polynomial zfltp5 = fidemo.poly_atan2(inYflt,inXflt,5,constA5); % 5th order polynomial zfltp7 = fidemo.poly_atan2(inYflt,inXflt,7,constA7); % 7th order polynomial zflt8 = cordicatan2(inYflt, inXflt, 8); % CORDIC alg with 8 iterations zflt12 = cordicatan2(inYflt, inXflt, 12); % CORDIC alg with 12 iterations

The maximum algorithmic error magnitude (or infinity norm of the algorithmic error) for the CORDIC algorithm with 8 and 12 iterations is shown below:

cordic_algErr = [zfltRef;zfltRef] - [zflt8;zflt12]; max_cordicAlgErr = max(abs(cordic_algErr')); fprintf('Iterations: %2d, CORDIC algorithmic error in real-world-value: %g\n',... [[8,12]; max_cordicAlgErr(:)']);

Iterations: 8, CORDIC algorithmic error in real-world-value: 0.00772146 Iterations: 12, CORDIC algorithmic error in real-world-value: 0.000483258

The log base 2 error shows the number of binary digits of accuracy. The 12th iteration of the CORDIC algorithm has an estimated angle accuracy of :

max_cordicAlgErr_bits = log2(max_cordicAlgErr); fprintf('Iterations: %2d, CORDIC algorithmic error in bits: %g\n',... [[8,12]; max_cordicAlgErr_bits(:)']);

Iterations: 8, CORDIC algorithmic error in bits: -7.01691 Iterations: 12, CORDIC algorithmic error in bits: -11.0149

The following code shows the magnitude of the maximum algorithmic error of the polynomial approximation for orders 3, 5, and 7:

poly_algErr = [zfltRef;zfltRef;zfltRef] - [zfltp3;zfltp5;zfltp7]; max_polyAlgErr = max(abs(poly_algErr')); fprintf('Order: %d, Polynomial approximation algorithmic error in real-world-value: %g\n',... [3:2:7; max_polyAlgErr(:)']);

Order: 3, Polynomial approximation algorithmic error in real-world-value: 0.00541647 Order: 5, Polynomial approximation algorithmic error in real-world-value: 0.000679384 Order: 7, Polynomial approximation algorithmic error in real-world-value: 9.16204e-05

The log base 2 error shows the number of binary digits of accuracy.

max_polyAlgErr_bits = log2(max_polyAlgErr); fprintf('Order: %d, Polynomial approximation algorithmic error in bits: %g\n',... [3:2:7; max_polyAlgErr_bits(:)']);

Order: 3, Polynomial approximation algorithmic error in bits: -7.52843 Order: 5, Polynomial approximation algorithmic error in bits: -10.5235 Order: 7, Polynomial approximation algorithmic error in bits: -13.414

figno = figno + 1; fidemo.fixpt_atan2_demo_plot(figno, theta, cordic_algErr, poly_algErr)

## Converting the Floating-Point Chebyshev Polynomial Approximation Algorithm to Fixed Point

Assume the input and output word lengths are constrained to 16 bits by the hardware, and the 5th order Chebyshev polynomial is used in the approximation. Because the dynamic range of inputs `x`, `y` and `y/x` are all within [-1, +1], you can avoid overflow by picking a signed fixed-point input data type with a word length of 16 bits and a fraction length of 14 bits. The coefficients of the polynomial are purely fractional and within (-1, +1), so we can pick their data types as signed fixed point with a word length of 16 bits and a fraction length of 15 bits (best precision). The algorithm is robust because is within [-1, +1], and the multiplication of the coefficients and is within (-1, +1). Thus, the dynamic range will not grow, and due to the pre-determined fixed-point data types, overflow is not expected.

Similar to the CORDIC algorithm, the four quadrant polynomial approximation-based `atan2` algorithm outputs estimated angles within . Therefore, we can pick an output fraction length of 13 bits to avoid overflow and provide a dynamic range of [-4, +3.9998779296875].

The basic floating-point Chebyshev polynomial approximation of arctangent over the interval [-1, +1] is implemented as the `chebyPoly_atan_fltpt` local function in the `poly_atan2.m` file.

function z = chebyPoly_atan_fltpt(y,x,N,constA,Tz,RoundingMethodStr)

tmp = y/x; switch N case 3 z = constA(1)*tmp + constA(2)*tmp^3; case 5 z = constA(1)*tmp + constA(2)*tmp^3 + constA(3)*tmp^5; case 7 z = constA(1)*tmp + constA(2)*tmp^3 + constA(3)*tmp^5 + constA(4)*tmp^7; otherwise disp('Supported order of Chebyshev polynomials are 3, 5 and 7'); end

The basic fixed-point Chebyshev polynomial approximation of arctangent over the interval [-1, +1] is implemented as the `chebyPoly_atan_fixpt` local function in the `poly_atan2.m` file.

function z = chebyPoly_atan_fixpt(y,x,N,constA,Tz,RoundingMethodStr)

z = fi(0,'numerictype', Tz, 'RoundingMethod', RoundingMethodStr); Tx = numerictype(x); tmp = fi(0, 'numerictype',Tx, 'RoundingMethod', RoundingMethodStr); tmp(:) = Tx.divide(y, x); % y/x;

tmp2 = fi(0, 'numerictype',Tx, 'RoundingMethod', RoundingMethodStr); tmp3 = fi(0, 'numerictype',Tx, 'RoundingMethod', RoundingMethodStr); tmp2(:) = tmp*tmp; % (y/x)^2 tmp3(:) = tmp2*tmp; % (y/x)^3

z(:) = constA(1)*tmp + constA(2)*tmp3; % for order N = 3

if (N == 5) || (N == 7) tmp5 = fi(0, 'numerictype',Tx, 'RoundingMethod', RoundingMethodStr); tmp5(:) = tmp3 * tmp2; % (y/x)^5 z(:) = z + constA(3)*tmp5; % for order N = 5

if N == 7 tmp7 = fi(0, 'numerictype',Tx, 'RoundingMethod', RoundingMethodStr); tmp7(:) = tmp5 * tmp2; % (y/x)^7 z(:) = z + constA(4)*tmp7; %for order N = 7 end end

The universal four quadrant `atan2` calculation using Chebyshev polynomial approximation is implemented in the `poly_atan2.m` file.

function z = poly_atan2(y,x,N,constA,Tz,RoundingMethodStr)

if nargin < 5 % floating-point algorithm fhandle = @chebyPoly_atan_fltpt; Tz = []; RoundingMethodStr = []; z = zeros(size(y)); else % fixed-point algorithm fhandle = @chebyPoly_atan_fixpt; %pre-allocate output z = fi(zeros(size(y)), 'numerictype', Tz, 'RoundingMethod', RoundingMethodStr); end

% Apply angle correction to obtain four quadrant output for idx = 1:length(y) % first quadrant if abs(x(idx)) >= abs(y(idx)) % (0, pi/4] z(idx) = feval(fhandle, abs(y(idx)), abs(x(idx)), N, constA, Tz, RoundingMethodStr); else % (pi/4, pi/2) z(idx) = pi/2 - feval(fhandle, abs(x(idx)), abs(y(idx)), N, constA, Tz, RoundingMethodStr); end

if x(idx) < 0 % second and third quadrant if y(idx) < 0 z(idx) = -pi + z(idx); else z(idx) = pi - z(idx); end else % fourth quadrant if y(idx) < 0 z(idx) = -z(idx); end end end

## Performing the Overall Error Analysis of the Polynomial Approximation Algorithm

Similar to the CORDIC algorithm, the overall error of the polynomial approximation algorithm consists of two parts - the algorithmic error and the quantization error. The algorithmic error of the polynomial approximation algorithm was analyzed and compared to the algorithmic error of the CORDIC algorithm in a previous section.

**Calculate the Quantization Error**

Compute the quantization error by comparing the fixed-point polynomial approximation to the floating-point polynomial approximation.

Quantize the inputs and coefficients with convergent rounding:

inXfix = fi(fi(inXflt, 1, 16, 14,'RoundingMethod','Convergent'),'fimath',[]); inYfix = fi(fi(inYflt, 1, 16, 14,'RoundingMethod','Convergent'),'fimath',[]); constAfix3 = fi(fi(constA3, 1, 16,'RoundingMethod','Convergent'),'fimath',[]); constAfix5 = fi(fi(constA5, 1, 16,'RoundingMethod','Convergent'),'fimath',[]); constAfix7 = fi(fi(constA7, 1, 16,'RoundingMethod','Convergent'),'fimath',[]);

Calculate the maximum magnitude of the quantization error using `Floor` rounding:

ord = 3:2:7; % using 3rd, 5th, 7th order polynomials Tz = numerictype(1, 16, 13); % output data type zfix3p = fidemo.poly_atan2(inYfix,inXfix,ord(1),constAfix3,Tz,'Floor'); % 3rd order zfix5p = fidemo.poly_atan2(inYfix,inXfix,ord(2),constAfix5,Tz,'Floor'); % 5th order zfix7p = fidemo.poly_atan2(inYfix,inXfix,ord(3),constAfix7,Tz,'Floor'); % 7th order poly_quantErr = bsxfun(@minus, [zfltp3;zfltp5;zfltp7], double([zfix3p;zfix5p;zfix7p])); max_polyQuantErr_real_world_value = max(abs(poly_quantErr')); max_polyQuantErr_bits = log2(max_polyQuantErr_real_world_value); fprintf('PolyOrder: %2d, Quant error in bits: %g\n',... [ord; max_polyQuantErr_bits]);

PolyOrder: 3, Quant error in bits: -12.7101 PolyOrder: 5, Quant error in bits: -12.325 PolyOrder: 7, Quant error in bits: -11.8416

**Calculate the Overall Error**

Compute the overall error by comparing the fixed-point polynomial approximation to the builtin `atan2` function. The ideal reference output is `zfltRef`. The overall error of the 7th order polynomial approximation is dominated by the quantization error, which is due to the finite precision of the input data, coefficients and the rounding effects from the fixed-point arithmetic operations.

poly_err = bsxfun(@minus, zfltRef, double([zfix3p;zfix5p;zfix7p])); max_polyErr_real_world_value = max(abs(poly_err')); max_polyErr_bits = log2(max_polyErr_real_world_value); fprintf('PolyOrder: %2d, Overall error in bits: %g\n',... [ord; max_polyErr_bits]);

PolyOrder: 3, Overall error in bits: -7.51907 PolyOrder: 5, Overall error in bits: -10.2497 PolyOrder: 7, Overall error in bits: -11.5883

figno = figno + 1; fidemo.fixpt_atan2_demo_plot(figno, theta, poly_err)

*The Effect of Rounding Modes in Polynomial Approximation*

Compared to the CORDIC algorithm with 12 iterations and a 13-bit fraction length in the angle accumulator, the fifth order Chebyshev polynomial approximation gives a similar order of quantization error. In the following example, `Nearest`, `Round` and `Convergent` rounding modes give smaller quantization errors than the `Floor` rounding mode.

Maximum magnitude of the quantization error using `Floor` rounding

poly5_quantErrFloor = max(abs(poly_quantErr(2,:))); poly5_quantErrFloor_bits = log2(poly5_quantErrFloor)

poly5_quantErrFloor_bits = -12.324996933210334

For comparison, calculate the maximum magnitude of the quantization error using `Nearest` rounding:

zfixp5n = fidemo.poly_atan2(inYfix,inXfix,5,constAfix5,Tz,'Nearest'); poly5_quantErrNearest = max(abs(zfltp5 - double(zfixp5n))); poly5_quantErrNearest_bits = log2(poly5_quantErrNearest) set(0, 'format', origFormat); % reset MATLAB output format

poly5_quantErrNearest_bits = -13.175966487895451

## Calculating `atan2(y,x)` Using Lookup Tables

There are many lookup table based approaches that may be used to implement fixed-point argtangent approximations. The following is a low-cost approach based on a single real-valued lookup table and simple nearest-neighbor linear interpolation.

## Single Lookup Table Based Approach

The `atan2` method of the `fi` object in the Fixed-Point Designer™ approximates the MATLAB® builtin floating-point `atan2` function, using a single lookup table based approach with simple nearest-neighbor linear interpolation between values. This approach allows for a small real-valued lookup table and uses simple arithmetic.

Using a single real-valued lookup table simplifies the index computation and the overall arithmetic required to achieve very good accuracy of the results. These simplifications yield a relatively high speed performance as well as relatively low memory requirements.

## Understanding the Lookup Table Based `ATAN2` Implementation

**Lookup Table Size and Accuracy**

Two important design considerations of a lookup table are its size and its accuracy. It is not possible to create a table for every possible input value. It is also not possible to be perfectly accurate due to the quantization of the lookup table values.

As a compromise, the `atan2` method of the Fixed-Point Designer `fi` object uses an 8-bit lookup table as part of its implementation. An 8-bit table is only 256 elements long, so it is small and efficient. Eight bits also corresponds to the size of a byte or a word on many platforms. Used in conjunction with linear interpolation, and 16-bit output (lookup table value) precision, an 8-bit-addressable lookup table provides very good accuracy as well as performance.

**Overview of Algorithm Implementation**

To better understand the Fixed-Point Designer implementation, first consider the symmetry of the four-quadrant `atan2(y,x)` function. If you always compute the arctangent in the first-octant of the x-y space (i.e., between angles 0 and pi/4 radians), then you can perform octant correction on the resulting angle for any y and x values.

As part of the pre-processing portion, the signs and relative magnitudes of y and x are considered, and a division is performed. Based on the signs and magnitudes of y and x, only one of the following values is computed: y/x, x/y, -y/x, -x/y, -y/-x, -x/-y. The unsigned result that is guaranteed to be non-negative and purely fractional is computed, based on the a priori knowledge of the signs and magnitudes of y and x. An unsigned 16-bit fractional fixed-point type is used for this value.

The 8 most significant bits (MSBs) of the stored unsigned integer representation of the purely-fractional unsigned fixed-point result is then used to directly index an 8-bit (length-256) lookup table value containing angle values between 0 and pi/4 radians. Two table lookups are performed, one at the computed table index location `lutValBelow`, and one at the next index location `lutValAbove`:

idxUint8MSBs = bitsliceget(idxUFIX16, 16, 9); zeroBasedIdx = int16(idxUint8MSBs); lutValBelow = FI_ATAN_LUT(zeroBasedIdx + 1); lutValAbove = FI_ATAN_LUT(zeroBasedIdx + 2);

The remaining 8 least significant bits (LSBs) of idxUFIX16 are used to interpolate between these two table values. The LSB values are treated as a normalized scaling factor with 8-bit fractional data type `rFracNT`:

```
rFracNT = numerictype(0,8,8); % fractional remainder data type
idxFrac8LSBs = reinterpretcast(bitsliceget(idxUFIX16,8,1), rFracNT);
rFraction = idxFrac8LSBs;
```

The two lookup table values, with the remainder (rFraction) value, are used to perform a simple nearest-neighbor linear interpolation. A real multiply is used to determine the weighted difference between the two points. This results in a simple calculation (equivalent to one product and two sums) to obtain the interpolated fixed-point result:

temp = rFraction * (lutValAbove - lutValBelow); rslt = lutValBelow + temp;

Finally, based on the original signs and relative magnitudes of y and x, the output result is formed using simple octant-correction logic and arithmetic. The first-octant [0, pi/4] angle value results are added or subtracted with constants to form the octant-corrected angle outputs.

## Computing Fixed-point Argtangent Using `ATAN2`

You can call the `atan2` function directly using fixed-point or floating-point inputs. The lookup table based algorithm is used for the fixed-point `atan2` implementation:

zFxpLUT = atan2(inYfix,inXfix);

**Calculate the Overall Error**

You can compute the overall error by comparing the fixed-point lookup table based approximation to the builtin `atan2` function. The ideal reference output is `zfltRef`.

```
lut_err = bsxfun(@minus, zfltRef, double(zFxpLUT));
max_lutErr_real_world_value = max(abs(lut_err'));
max_lutErr_bits = log2(max_lutErr_real_world_value);
fprintf('Overall error in bits: %g\n', max_lutErr_bits);
```

Overall error in bits: -12.6743

figno = figno + 1; fidemo.fixpt_atan2_demo_plot(figno, theta, lut_err)

## Comparison of Overall Error Between the Fixed-Point Implementations

As was done previously, you can compute the overall error by comparing the fixed-point approximation(s) to the builtin `atan2` function. The ideal reference output is `zfltRef`.

zfixCDC15 = cordicatan2(inYfix, inXfix, 15); cordic_15I_err = bsxfun(@minus, zfltRef, double(zfixCDC15)); poly_7p_err = bsxfun(@minus, zfltRef, double(zfix7p)); figno = figno + 1; fidemo.fixpt_atan2_demo_plot(figno, theta, cordic_15I_err, poly_7p_err, lut_err)

## Comparing the Costs of the Fixed-Point Approximation Algorithms

The fixed-point CORDIC algorithm requires the following operations:

- 1 table lookup
**per iteration** - 2 shifts
**per iteration** - 3 additions
**per iteration**

The N-th order fixed-point Chebyshev polynomial approximation algorithm requires the following operations:

- 1 division
- (N+1) multiplications
- (N-1)/2 additions

The simplified single lookup table algorithm with nearest-neighbor linear interpolation requires the following operations:

- 1 division
- 2 table lookups
- 1 multiplication
- 2 additions

In real world applications, selecting an algorithm for the fixed-point arctangent calculation typically depends on the required accuracy, cost and hardware constraints.

close all; % close all figure windows

## References

- Jack E. Volder, The CORDIC Trigonometric Computing Technique, IRE Transactions on Electronic Computers, Volume EC-8, September 1959, pp330-334.
- Ray Andraka, A survey of CORDIC algorithm for FPGA based computers, Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays, Feb. 22-24, 1998, pp191-200