## Fixed-Point Arctangent Calculation

An efficient fixed-point arctangent algorithm to estimate an angle is critical to many applications, including control of robotics, frequency tracking in wireless communications, and many more. This demo shows how to use the CORDIC algorithm, polynomial approximation, and lookup table approaches to calculate the fixed-point, four quadrant inverse tangent. These implementations are approximations to the MATLAB® built-in function atan2.

### Calculating atan2(y,x) Using the CORDIC Algorithm

Introduction

The cordicatan2 function approximates the MATLAB® atan2 function, using a CORDIC-based algorithm. CORDIC is an acronym for COordinate Rotation DIgital Computer. The Givens rotation-based CORDIC algorithm (see [1,2]) is one of the most hardware efficient algorithms because it only requires iterative shift-add operations. The CORDIC algorithm eliminates the need for explicit multipliers, and is suitable for calculating a variety of functions, such as sine, cosine, arcsine, arccosine, arctangent, vector magnitude, divide, square root, hyperbolic and logarithmic functions.

CORDIC Vectoring Computation Mode

The CORDIC vectoring mode equations are widely used to calculate atan(y/x). In vectoring mode, the CORDIC rotator rotates the input vector towards the positive X-axis to minimize the component of the residual vector. For each iteration, if the coordinate of the residual vector is positive, the CORDIC rotator rotates clockwise (using a negative angle); otherwise, it rotates counter-clockwise (using a positive angle). If the angle accumulator is initialized to 0, at the end of the iterations, the accumulated rotation angle is the angle of the original input vector.

In vectoring mode, the CORDIC equations are:

is the angle accumulator

where if , and otherwise;

, and is the total number of iterations.

As approaches :

As explained above, the arctangent can be directly computed using the vectoring mode CORDIC rotator with the angle accumulator initialized to zero, i.e., and .

### Understanding the CORDICATAN2 Code

Introduction

The cordicatan2 function computes the four quadrant arctangent of the elements of x and y, where . cordicatan2 calculates the arctangent using the vectoring mode CORDIC algorithm, according to the above CORDIC equations.

Initialization

The cordicatan2 function performs the following initialization steps:

• is set to the initial X input value.
• is set to the initial Y input value.
• is set to zero.

After iterations, these initial values lead to

Shared Fixed-Point and Floating-Point CORDIC Kernel Code

The MATLAB code for the CORDIC algorithm (vectoring mode) kernel portion is as follows (for the case of scalar , , and ):

function [x, y, z] = cordic_vectoring_kernel_private(x, y, z, inpLUT, niters)
% CORDIC_VECTORING_KERNEL_PRIVATE Perform CORDIC iterations.
xtmp = x;
ytmp = y;
for idx = 1:niters
if y < 0
x(:) = x - ytmp;
y(:) = y + xtmp;
z(:) = z - inpLUT(idx);
else
x(:) = x + ytmp;
y(:) = y - xtmp;
z(:) = z + inpLUT(idx);
end
    xtmp = bitsra(x, idx);
ytmp = bitsra(y, idx);
end

This same code is used for both fixed-point and floating-point operation. Note that the bitsra function supports double, single, integer, and fixed-point numeric types. Thus, using bitsra as part of the cordicatan2 function allows for a single, shared, data type independent CORDIC vectoring kernel algorithm implementation.

### Visualizing the Vectoring Mode CORDIC Iterations

The CORDIC algorithm is usually run through a specified (constant) number of iterations since ending the CORDIC iterations early would break pipelined code, and the CORDIC gain would not be constant because would vary.

For very large values of , the CORDIC algorithm is guaranteed to converge, but not always monotonically. As will be shown in the following example, intermediate iterations occasionally rotate the vector closer to the positive X-axis than the following iteration does. You can typically achieve greater accuracy by increasing the total number of iterations.

Example

In the following example, iteration 5 provides a better estimate of the angle than iteration 6, and the CORDIC algorithm converges in later iterations.

Initialize the input vector with angle degrees, magnitude = 1

origFormat = get(0, 'format'); % store original format setting;
% restore this at the end of the demo.
format short
%
theta = 43*pi/180;  % input angle in radians
Niter = 10;         % number of iterations
inX   = cos(theta); % x coordinate of the input vector
inY   = sin(theta); % y coordinate of the input vector
%
% pre-allocate memories
zf = zeros(1, Niter);
xf = [inX, zeros(1, Niter)];
yf = [inY, zeros(1, Niter)];
angleLUT = atan(2.^-(0:Niter-1)); % pre-calculate the angle lookup table
%
% Call CORDIC vectoring kernel algorithm
for k = 1:Niter
[xf(k+1), yf(k+1), zf(k)] = fixed.internal.cordic_vectoring_kernel_private(in
X, inY, 0, angleLUT, k);
end


The following output shows the CORDIC angle accumulation (in degrees) through 10 iterations. Note that the 5th iteration produced less error than the 6th iteration, and that the calculated angle quickly converges to the actual input angle afterward.

angleAccumulator = zf*180/pi; angleError = angleAccumulator - theta*180/pi;
fprintf('Iteration: %2d, Calculated angle: %7.3f, Error in degrees: %10g, Error
in bits: %g\n',...
[(1:Niter); angleAccumulator(:)'; angleError(:)';log2(abs(zf(:)'-theta))
]);

Iteration:  1, Calculated angle:  45.000, Error in degrees:          2, Error in
bits: -4.84036
Iteration:  2, Calculated angle:  18.435, Error in degrees:   -24.5651, Error in
bits: -1.22182
Iteration:  3, Calculated angle:  32.471, Error in degrees:   -10.5288, Error in
bits: -2.44409
Iteration:  4, Calculated angle:  39.596, Error in degrees:   -3.40379, Error in
bits: -4.07321
Iteration:  5, Calculated angle:  43.173, Error in degrees:   0.172543, Error in
bits: -8.37533
Iteration:  6, Calculated angle:  41.383, Error in degrees:   -1.61737, Error in
bits: -5.14671
Iteration:  7, Calculated angle:  42.278, Error in degrees:  -0.722194, Error in
bits: -6.3099
Iteration:  8, Calculated angle:  42.725, Error in degrees:   -0.27458, Error in
bits: -7.70506
Iteration:  9, Calculated angle:  42.949, Error in degrees: -0.0507692, Error in
bits: -10.1403
Iteration: 10, Calculated angle:  43.061, Error in degrees:  0.0611365, Error in
bits: -9.87218


As N approaches , the CORDIC rotator gain approaches 1.64676. In this example, the input was on the unit circle, so the initial rotator magnitude is 1. The following output shows the rotator magnitude through 10 iterations:

rotatorMagnitude = sqrt(xf.^2+yf.^2); % CORDIC rotator gain through iterations
fprintf('Iteration: %2d, Rotator magnitude: %g\n',...
[(0:Niter); rotatorMagnitude(:)']);

Iteration:  0, Rotator magnitude: 1
Iteration:  1, Rotator magnitude: 1.41421
Iteration:  2, Rotator magnitude: 1.58114
Iteration:  3, Rotator magnitude: 1.6298
Iteration:  4, Rotator magnitude: 1.64248
Iteration:  5, Rotator magnitude: 1.64569
Iteration:  6, Rotator magnitude: 1.64649
Iteration:  7, Rotator magnitude: 1.64669
Iteration:  8, Rotator magnitude: 1.64674
Iteration:  9, Rotator magnitude: 1.64676
Iteration: 10, Rotator magnitude: 1.64676


Note that approaches 0, and approaches because .

y_n = yf(end)

y_n =

-0.0018


x_n = xf(end)

x_n =

1.6468


figno = 1;
fidemo.fixpt_atan2_demo_plot(figno, xf, yf) %Vectoring Mode CORDIC Iterations

figno = figno + 1; %Cumulative Angle and Rotator Magnitude Through Iterations
fidemo.fixpt_atan2_demo_plot(figno,Niter, theta, angleAccumulator, rotatorMagnit
ude)


### Performing Overall Error Analysis of the CORDIC Algorithm

The overall error consists of two parts:

1. The algorithmic error that results from the CORDIC rotation angle being represented by a finite number of basic angles.
2. The quantization or rounding error that results from the finite precision representation of the angle lookup table, and from the finite precision arithmetic used in fixed-point operations.

Calculate the CORDIC Algorithmic Error

theta  = (-178:2:180)*pi/180; % angle in radians
inXflt = cos(theta); % generates input vector
inYflt = sin(theta);
Niter  = 12; % total number of iterations
zflt   = cordicatan2(inYflt, inXflt, Niter); % floating-point results


Calculate the maximum magnitude of the CORDIC algorithmic error by comparing the CORDIC computation to the builtin atan2 function.

format long
cordic_algErr_real_world_value = max(abs((atan2(inYflt, inXflt) - zflt)))

cordic_algErr_real_world_value =

4.753112306290497e-04



The log base 2 error is related to the number of iterations. In this example, we use 12 iterations (i.e., accurate to 11 binary digits), so the magnitude of the error is less than

cordic_algErr_bits = log2(cordic_algErr_real_world_value)

cordic_algErr_bits =

-11.038839889583048



Relationship Between Number of Iterations and Precision

Once the quantization error dominates the overall error, i.e., the quantization error is greater than the algorithmic error, increasing the total number of iterations won't significantly decrease the overall error of the fixed-point CORDIC algorithm. You should pick your fraction lengths and total number of iterations to ensure that the quantization error is smaller than the algorithmic error. In the CORDIC algorithm, the precision increases by one bit every iteration. Thus, there is no reason to pick a number of iterations greater than the precision of the input data.

Another way to look at the relationship between the number of iterations and the precision is in the right-shift step of the algorithm. For example, on the counter-clockwise rotation

x(:) = x0 - bitsra(y,i);
y(:) = y + bitsra(x0,i);

if i is equal to the word length of y and x0, then bitsra(y,i) and bitsra(x0,i) shift all the way to zero and do not contribute anything to the next step.

To measure the error from the fixed-point algorithm, and not the differences in input values, compute the floating-point reference with the same inputs as the fixed-point CORDIC algorithm.

inXfix = sfi(inXflt, 16, 14);
inYfix = sfi(inYflt, 16, 14);
zref   = atan2(double(inYfix), double(inXfix));
zfix8  = cordicatan2(inYfix, inXfix, 8);
zfix10 = cordicatan2(inYfix, inXfix, 10);
zfix12 = cordicatan2(inYfix, inXfix, 12);
zfix14 = cordicatan2(inYfix, inXfix, 14);
zfix15 = cordicatan2(inYfix, inXfix, 15);
cordic_err = bsxfun(@minus,zref,double([zfix8;zfix10;zfix12;zfix14;zfix15]));


The error depends on the number of iterations and the precision of the input data. In the above example, the input data is in the range [-1, +1], and the fraction length is 14. From the following tables showing the maximum error at each iteration, and the figure showing the overall error of the CORDIC algorithm, you can see that the error decreases by about 1 bit per iteration until the precision of the data is reached.

iterations = [8, 10, 12, 14, 15];
max_cordicErr_real_world_value = max(abs(cordic_err'));
fprintf('Iterations: %2d, Max error in real-world-value: %g\n',...
[iterations; max_cordicErr_real_world_value]);

Iterations:  8, Max error in real-world-value: 0.00773633
Iterations: 10, Max error in real-world-value: 0.00187695
Iterations: 12, Max error in real-world-value: 0.000501175
Iterations: 14, Max error in real-world-value: 0.000244621
Iterations: 15, Max error in real-world-value: 0.000244621

max_cordicErr_bits = log2(max_cordicErr_real_world_value);
fprintf('Iterations: %2d, Max error in bits: %g\n',[iterations; max_cordicErr_bi
ts]);

Iterations:  8, Max error in bits: -7.01414
Iterations: 10, Max error in bits: -9.05739
Iterations: 12, Max error in bits: -10.9624
Iterations: 14, Max error in bits: -11.9972
Iterations: 15, Max error in bits: -11.9972

figno = figno + 1;
fidemo.fixpt_atan2_demo_plot(figno, theta, cordic_err)


### Accelerating the Fixed-Point CORDICATAN2 Algorithm Using FIACCEL

You can generate a MEX function from MATLAB code using the MATLAB® fiaccel command. Typically, running a generated MEX function can improve the simulation speed, although the actual speed improvement depends on the simulation platform being used. The following example shows how to accelerate the fixed-point cordicatan2 algorithm using fiaccel.

The fiaccel function compiles the MATLAB code into a MEX function. This step requires the creation of a temporary directory and write permissions in that directory.

tempdirObj = fidemo.fiTempdir('fixpt_atan2_demo');


When you declare the number of iterations to be a constant (e.g., 12) using coder.newtype('constant',12), the compiled angle lookup table will also be constant, and thus won't be computed at each iteration. Also, when you call the compiled MEX file cordicatan2_mex, you will not need to give it the input argument for the number of iterations. If you pass in the number of iterations, the MEX function will error.

The data type of the input parameters determines whether the cordicatan2 function performs fixed-point or floating-point calculations. When MATLAB generates code for this file, code is only generated for the specific data type. For example, if the inputs are fixed point, only fixed-point code is generated.

inp = {inYfix, inXfix, coder.newtype('constant',12)}; % example inputs for
the function
fiaccel('cordicatan2', '-o', 'cordicatan2_mex',  '-args', inp)


First, calculate a vector of 4 quadrant atan2 by calling cordicatan2.

tstart = tic;
cordicatan2(inYfix,inXfix,Niter);
telapsed_Mcordicatan2 = toc(tstart);


Next, calculate a vector of 4 quadrant atan2 by calling the MEX-function cordicatan2_mex

cordicatan2_mex(inYfix,inXfix); % load the MEX file
tstart = tic;
cordicatan2_mex(inYfix,inXfix);
telapsed_MEXcordicatan2 = toc(tstart);


Now, compare the speed. Type the following in the MATLAB command window to see the speed improvement on your specific platform:

fiaccel_speedup = telapsed_Mcordicatan2/telapsed_MEXcordicatan2;


To clean up the temporary directory, run the following commands:

clear cordicatan2_mex;
status = tempdirObj.cleanUp;


### Calculating atan2(y,x) Using Chebyshev Polynomial Approximation

Polynomial approximation is a multiply-accumulate (MAC) centric algorithm. It can be a good choice for DSP implementations of non-linear functions like atan(x).

For a given degree of polynomial, and a given function f(x) = atan(x) evaluated over the interval of [-1, +1], the polynomial approximation theory tries to find the polynomial that minimizes the maximum value of , where P(x) is the approximating polynomial. In general, you can obtain polynomials very close to the optimal one by approximating the given function in terms of Chebyshev polynomials and cutting off the polynomial at the desired degree.

The approximation of arctangent over the interval of [-1, +1] using the Chebyshev polynomial of the first kind is summarized in the following formula:

where

Therefore, the 3rd order Chebyshev polynomial approximation is

The 5th order Chebyshev polynomial approximation is

The 7th order Chebyshev polynomial approximation is

You can obtain four quadrant output through angle correction based on the properties of the arctangent function.

### Comparing the Algorithmic Error of the CORDIC and Polynomial Approximation Algorithms

In general, higher degrees of polynomial approximation produce more accurate final results. However, higher degrees of polynomial approximation also increase the complexity of the algorithm and require more MAC operations and more memory. To be consistent with the CORDIC algorithm and the MATLAB atan2 function, the input arguments consist of both x and y coordinates instead of the ratio y/x.

To eliminate quantization error, floating-point implementations of the CORDIC and Chebyshev polynomial approximation algorithms are used in the comparison below. An algorithmic error comparison reveals that increasing the number of CORDIC iterations results in less error. It also reveals that the CORDIC algorithm with 12 iterations provides a slightly better angle estimation than the 5th order Chebyshev polynomial approximation. The approximation error of the 3rd order Chebyshev Polynomial is about 8 times larger than that of the 5th order Chebyshev polynomial. You should choose the order or degree of the polynomial based on the required accuracy of the angle estimation and the hardware constraints.

The coefficients of the Chebyshev polynomial approximation for atan(x) are shown in ascending order of x.

constA3 = [0.970562748477141, -0.189514164974601]; % 3rd order
constA5 = [0.994949366116654,-0.287060635532652,0.078037176446441]; % 5th order
constA7 = [0.999133448222780 -0.320533292381664 0.144982490144465...
-0.038254464970299]; % 7th order

theta   = (-90:1:90)*pi/180; % angle in radians
inXflt  = cos(theta);
inYflt  = sin(theta);
zfltRef = atan2(inYflt, inXflt); % Ideal output from ATAN2 function
zfltp3  = fidemo.poly_atan2(inYflt,inXflt,3,constA3); % 3rd order polynomial
zfltp5  = fidemo.poly_atan2(inYflt,inXflt,5,constA5); % 5th order polynomial
zfltp7  = fidemo.poly_atan2(inYflt,inXflt,7,constA7); % 7th order polynomial
zflt8   = cordicatan2(inYflt, inXflt,  8); % CORDIC alg with 8 iterations
zflt12  = cordicatan2(inYflt, inXflt, 12); % CORDIC alg with 12 iterations


The maximum algorithmic error magnitude (or infinity norm of the algorithmic error) for the CORDIC algorithm with 8 and 12 iterations is shown below:

cordic_algErr    = [zfltRef;zfltRef] - [zflt8;zflt12];
max_cordicAlgErr = max(abs(cordic_algErr'));
fprintf('Iterations: %2d, CORDIC algorithmic error in real-world-value: %g\n',..
.
[[8,12]; max_cordicAlgErr(:)']);

Iterations:  8, CORDIC algorithmic error in real-world-value: 0.00772146
Iterations: 12, CORDIC algorithmic error in real-world-value: 0.000483258


The log base 2 error shows the number of binary digits of accuracy. The 12th iteration of the CORDIC algorithm has an estimated angle accuracy of :

max_cordicAlgErr_bits = log2(max_cordicAlgErr);
fprintf('Iterations: %2d, CORDIC algorithmic error in bits: %g\n',...
[[8,12]; max_cordicAlgErr_bits(:)']);

Iterations:  8, CORDIC algorithmic error in bits: -7.01691
Iterations: 12, CORDIC algorithmic error in bits: -11.0149


The following code shows the magnitude of the maximum algorithmic error of the polynomial approximation for orders 3, 5, and 7:

poly_algErr    = [zfltRef;zfltRef;zfltRef] - [zfltp3;zfltp5;zfltp7];
max_polyAlgErr = max(abs(poly_algErr'));
fprintf('Order: %d, Polynomial approximation algorithmic error in real-world-val
ue: %g\n',...
[3:2:7; max_polyAlgErr(:)']);

Order: 3, Polynomial approximation algorithmic error in real-world-value: 0.0054
1647
Order: 5, Polynomial approximation algorithmic error in real-world-value: 0.0006
79384
Order: 7, Polynomial approximation algorithmic error in real-world-value: 9.1620
4e-05


The log base 2 error shows the number of binary digits of accuracy.

max_polyAlgErr_bits = log2(max_polyAlgErr);
fprintf('Order: %d, Polynomial approximation algorithmic error in bits: %g\n',..
.
[3:2:7; max_polyAlgErr_bits(:)']);

Order: 3, Polynomial approximation algorithmic error in bits: -7.52843
Order: 5, Polynomial approximation algorithmic error in bits: -10.5235
Order: 7, Polynomial approximation algorithmic error in bits: -13.414

figno = figno + 1;
fidemo.fixpt_atan2_demo_plot(figno, theta, cordic_algErr, poly_algErr)


### Converting the Floating-Point Chebyshev Polynomial Approximation Algorithm to Fixed Point

Assume the input and output word lengths are constrained to 16 bits by the hardware, and the 5th order Chebyshev polynomial is used in the approximation. Because the dynamic range of inputs x, y and y/x are all within [-1, +1], you can avoid overflow by picking a signed fixed-point input data type with a word length of 16 bits and a fraction length of 14 bits. The coefficients of the polynomial are purely fractional and within (-1, +1), so we can pick their data types as signed fixed point with a word length of 16 bits and a fraction length of 15 bits (best precision). The algorithm is robust because is within [-1, +1], and the multiplication of the coefficients and is within (-1, +1). Thus, the dynamic range will not grow, and due to the pre-determined fixed-point data types, overflow is not expected.

Similar to the CORDIC algorithm, the four quadrant polynomial approximation-based atan2 algorithm outputs estimated angles within . Therefore, we can pick an output fraction length of 13 bits to avoid overflow and provide a dynamic range of [-4, +3.9998779296875].

The basic floating-point Chebyshev polynomial approximation of arctangent over the interval [-1, +1] is implemented as the chebyPoly_atan_fltpt subfunction in the poly_atan2.m file.

   function z = chebyPoly_atan_fltpt(y,x,N,constA,Tz,RoundModeStr)
   tmp = y/x;
switch N
case 3
z = constA(1)*tmp + constA(2)*tmp^3;
case 5
z = constA(1)*tmp + constA(2)*tmp^3 + constA(3)*tmp^5;
case 7
z = constA(1)*tmp + constA(2)*tmp^3 + constA(3)*tmp^5 + constA(4)*tmp
^7;
otherwise
disp('Supported order of Chebyshev polynomials are 3, 5 and 7');
end

The basic fixed-point Chebyshev polynomial approximation of arctangent over the interval [-1, +1] is implemented as the chebyPoly_atan_fixpt subfunction in the poly_atan2.m file.

   function z = chebyPoly_atan_fixpt(y,x,N,constA,Tz,RoundModeStr)
   z = fi(0,'NumericType', Tz, 'RoundMode', RoundModeStr);
Tx = numerictype(x);
tmp = fi(0, 'NumericType',Tx, 'RoundMode', RoundModeStr);
tmp(:) = Tx.divide(y, x); % y/x;
   tmp2 = fi(0, 'NumericType',Tx, 'RoundMode', RoundModeStr);
tmp3 = fi(0, 'NumericType',Tx, 'RoundMode', RoundModeStr);
tmp2(:) = tmp*tmp;  % (y/x)^2
tmp3(:) = tmp2*tmp; % (y/x)^3
   z(:) = constA(1)*tmp + constA(2)*tmp3; % for order N = 3
   if (N == 5) || (N == 7)
tmp5 = fi(0, 'NumericType',Tx, 'RoundMode', RoundModeStr);
tmp5(:) = tmp3 * tmp2; % (y/x)^5
z(:) = z + constA(3)*tmp5; % for order N = 5
       if N == 7
tmp7 = fi(0, 'NumericType',Tx, 'RoundMode', RoundModeStr);
tmp7(:) = tmp5 * tmp2; % (y/x)^7
z(:) = z + constA(4)*tmp7; %for order N = 7
end
end

The universal four quadrant atan2 calculation using Chebyshev polynomial approximation is implemented in the poly_atan2.m file.

   function z = poly_atan2(y,x,N,constA,Tz,RoundModeStr)
   if nargin < 5
% floating-point algorithm
fhandle = @chebyPoly_atan_fltpt;
Tz = [];
RoundModeStr = [];
z = zeros(size(y));
else
% fixed-point algorithm
fhandle = @chebyPoly_atan_fixpt;
%pre-allocate output
z = fi(zeros(size(y)), 'NumericType', Tz, 'RoundMode', RoundModeStr);
end
   % Apply angle correction to obtain four quadrant output
for idx = 1:length(y)
if abs(x(idx)) >= abs(y(idx))
% (0, pi/4]
z(idx) = feval(fhandle, abs(y(idx)), abs(x(idx)), N, constA, Tz, Round
ModeStr);
else
% (pi/4, pi/2)
z(idx) = pi/2 - feval(fhandle, abs(x(idx)), abs(y(idx)), N, constA, Tz
, RoundModeStr);
end
      if x(idx) < 0
if y(idx) < 0
z(idx) = -pi + z(idx);
else
z(idx) = pi - z(idx);
end
if y(idx) < 0
z(idx) = -z(idx);
end
end
end

### Performing the Overall Error Analysis of the Polynomial Approximation Algorithm

Similar to the CORDIC algorithm, the overall error of the polynomial approximation algorithm consists of two parts - the algorithmic error and the quantization error. The algorithmic error of the polynomial approximation algorithm was analyzed and compared to the algorithmic error of the CORDIC algorithm in a previous section.

Calculate the Quantization Error

Compute the quantization error by comparing the fixed-point polynomial approximation to the floating-point polynomial approximation.

Quantize the inputs and coefficients with convergent rounding:

inXfix = fi(fi(inXflt,  1, 16, 14,'RoundMode','Convergent'),'fimath',[]);
inYfix = fi(fi(inYflt,  1, 16, 14,'RoundMode','Convergent'),'fimath',[]);
constAfix3 = fi(fi(constA3, 1, 16,'RoundMode','Convergent'),'fimath',[]);
constAfix5 = fi(fi(constA5, 1, 16,'RoundMode','Convergent'),'fimath',[]);
constAfix7 = fi(fi(constA7, 1, 16,'RoundMode','Convergent'),'fimath',[]);


Calculate the maximum magnitude of the quantization error using Floor rounding:

ord    = 3:2:7; % using 3rd, 5th, 7th order polynomials
Tz     = numerictype(1, 16, 13); % output data type
zfix3p = fidemo.poly_atan2(inYfix,inXfix,ord(1),constAfix3,Tz,'Floor'); % 3rd or
der
zfix5p = fidemo.poly_atan2(inYfix,inXfix,ord(2),constAfix5,Tz,'Floor'); % 5th or
der
zfix7p = fidemo.poly_atan2(inYfix,inXfix,ord(3),constAfix7,Tz,'Floor'); % 7th or
der
poly_quantErr = bsxfun(@minus, [zfltp3;zfltp5;zfltp7], double([zfix3p;zfix5p;zfi
x7p]));
max_polyQuantErr_real_world_value = max(abs(poly_quantErr'));
max_polyQuantErr_bits = log2(max_polyQuantErr_real_world_value);
fprintf('PolyOrder: %2d, Quant error in bits: %g\n',...
[ord; max_polyQuantErr_bits]);

PolyOrder:  3, Quant error in bits: -12.7101
PolyOrder:  5, Quant error in bits: -12.325
PolyOrder:  7, Quant error in bits: -11.8416


Calculate the Overall Error

Compute the overall error by comparing the fixed-point polynomial approximation to the builtin atan2 function. The ideal reference output is zfltRef. The overall error of the 7th order polynomial approximation is dominated by the quantization error, which is due to the finite precision of the input data, coefficients and the rounding effects from the fixed-point arithmetic operations.

poly_err = bsxfun(@minus, zfltRef, double([zfix3p;zfix5p;zfix7p]));
max_polyErr_real_world_value = max(abs(poly_err'));
max_polyErr_bits = log2(max_polyErr_real_world_value);
fprintf('PolyOrder: %2d, Overall error in bits: %g\n',...
[ord; max_polyErr_bits]);

PolyOrder:  3, Overall error in bits: -7.51907
PolyOrder:  5, Overall error in bits: -10.2497
PolyOrder:  7, Overall error in bits: -11.5883

figno = figno + 1;
fidemo.fixpt_atan2_demo_plot(figno, theta, poly_err)


The Effect of Rounding Modes in Polynomial Approximation

Compared to the CORDIC algorithm with 12 iterations and a 13-bit fraction length in the angle accumulator, the fifth order Chebyshev polynomial approximation gives a similar order of quantization error. In the following example, Nearest, Round and Convergent rounding modes give smaller quantization errors than the Floor rounding mode.

Maximum magnitude of the quantization error using Floor rounding

poly5_quantErrFloor = max(abs(poly_quantErr(2,:)));
poly5_quantErrFloor_bits = log2(poly5_quantErrFloor)

poly5_quantErrFloor_bits =

-12.324996933210334



For comparison, calculate the maximum magnitude of the quantization error using Nearest rounding:

zfixp5n = fidemo.poly_atan2(inYfix,inXfix,5,constAfix5,Tz,'Nearest');
poly5_quantErrNearest = max(abs(zfltp5 - double(zfixp5n)));
poly5_quantErrNearest_bits = log2(poly5_quantErrNearest)
set(0, 'format', origFormat); % reset MATLAB output format

poly5_quantErrNearest_bits =

-13.175966487895451



### Calculating atan2(y,x) Using Lookup Tables

There are many lookup table based approaches that may be used to implement fixed-point argtangent approximations. The following is a low-cost approach based on a single real-valued lookup table and simple nearest-neighbor linear interpolation.

### Single Lookup Table Based Approach

The atan2 method of the fi object in the Fixed-Point Toolbox approximates the MATLAB® builtin floating-point atan2 function, using a single lookup table based approach with simple nearest-neighbor linear interpolation between values. This approach allows for a small real-valued lookup table and uses simple arithmetic.

Using a single real-valued lookup table simplifies the index computation and the overall arithmetic required to achieve very good accuracy of the results. These simplifications yield a relatively high speed performance as well as relatively low memory requirements.

### Understanding the Lookup Table Based ATAN2 Implementation

Lookup Table Size and Accuracy

Two important design considerations of a lookup table are its size and its accuracy. It is not possible to create a table for every possible input value. It is also not possible to be perfectly accurate due to the quantization of the lookup table values.

As a compromise, the atan2 method of the Fixed-Point Toolbox fi object uses an 8-bit lookup table as part of its implementation. An 8-bit table is only 256 elements long, so it is small and efficient. Eight bits also corresponds to the size of a byte or a word on many platforms. Used in conjunction with linear interpolation, and 16-bit output (lookup table value) precision, an 8-bit-addressable lookup table provides very good accuracy as well as performance.

Overview of Algorithm Implementation

To better understand the Fixed-Point Toolbox implementation, first consider the symmetry of the four-quadrant atan2(y,x) function. If you always compute the arctangent in the first-octant of the x-y space (i.e., between angles 0 and pi/4 radians), then you can perform octant correction on the resulting angle for any y and x values.

As part of the pre-processing portion, the signs and relative magnitudes of y and x are considered, and a division is performed. Based on the signs and magnitudes of y and x, only one of the following values is computed: y/x, x/y, -y/x, -x/y, -y/-x, -x/-y. The unsigned result that is guaranteed to be non-negative and purely fractional is computed, based on the a priori knowledge of the signs and magnitudes of y and x. An unsigned 16-bit fractional fixed-point type is used for this value.

The 8 most significant bits (MSBs) of the stored unsigned integer representation of the purely-fractional unsigned fixed-point result is then used to directly index an 8-bit (length-256) lookup table value containing angle values between 0 and pi/4 radians. Two table lookups are performed, one at the computed table index location lutValBelow, and one at the next index location lutValAbove:

idxUint8MSBs = uint8(bitsliceget(idxUFIX16, 16, 9));
zeroBasedIdx = int16(idxUint8MSBs);
lutValBelow  = FI_ATAN_LUT(zeroBasedIdx + 1);
lutValAbove  = FI_ATAN_LUT(zeroBasedIdx + 2);


The remaining 8 least significant bits (LSBs) of idxUFIX16 are used to interpolate between these two table values. The LSB values are treated as a normalized scaling factor with 8-bit fractional data type rFracNT:

rFracNT      = numerictype(0,8,8); % fractional remainder data type
idxFrac8LSBs = reinterpretcast(bitsliceget(idxUFIX16,8,1), rFracNT);
rFraction    = idxFrac8LSBs;


The two lookup table values, with the remainder (rFraction) value, are used to perform a simple nearest-neighbor linear interpolation. A real multiply is used to determine the weighted difference between the two points. This results in a simple calculation (equivalent to one product and two sums) to obtain the interpolated fixed-point result:

temp = rFraction * (lutValAbove - lutValBelow);
rslt = lutValBelow + temp;


Finally, based on the original signs and relative magnitudes of y and x, the output result is formed using simple octant-correction logic and arithmetic. The first-octant [0, pi/4] angle value results are added or subtracted with constants to form the octant-corrected angle outputs.

### Computing Fixed-point Argtangent Using ATAN2

You can call the atan2 function directly using fixed-point or floating-point inputs. The lookup table based algorithm is used for the fixed-point atan2 implementation:

zFxpLUT = atan2(inYfix,inXfix);


Calculate the Overall Error

You can compute the overall error by comparing the fixed-point lookup table based approximation to the builtin atan2 function. The ideal reference output is zfltRef.

lut_err = bsxfun(@minus, zfltRef, double(zFxpLUT));
max_lutErr_real_world_value = max(abs(lut_err'));
max_lutErr_bits = log2(max_lutErr_real_world_value);
fprintf('Overall error in bits: %g\n', max_lutErr_bits);

Overall error in bits: -12.6743

figno = figno + 1;
fidemo.fixpt_atan2_demo_plot(figno, theta, lut_err)


### Comparison of Overall Error Between the Fixed-Point Implementations

As was done previously, you can compute the overall error by comparing the fixed-point approximation(s) to the builtin atan2 function. The ideal reference output is zfltRef.

zfixCDC15      = cordicatan2(inYfix, inXfix, 15);
cordic_15I_err = bsxfun(@minus, zfltRef, double(zfixCDC15));
poly_7p_err    = bsxfun(@minus, zfltRef, double(zfix7p));
figno = figno + 1;
fidemo.fixpt_atan2_demo_plot(figno, theta, cordic_15I_err, poly_7p_err, lut_err)


### Comparing the Costs of the Fixed-Point Approximation Algorithms

The fixed-point CORDIC algorithm requires the following operations:

• 1 table lookup per iteration
• 2 shifts per iteration

The N-th order fixed-point Chebyshev polynomial approximation algorithm requires the following operations:

• 1 division
• (N+1) multiplications

The simplified single lookup table algorithm with nearest-neighbor linear interpolatiom requires the following operations:

• 1 division
• 2 table lookups
• 1 multiplication
close all; % close all figure windows