Documentation |
The sections that follow compare implementations of lookup tables that use breakpoints whose spacing is uneven, even, and power of two. The comparison focuses on:
Execution speed of commands
Rounding error during interpolation
The amount of read-only memory (ROM) for data
The amount of ROM for commands
This comparison is valid only when the breakpoints are not tunable. If the breakpoints are tunable in the generated code, all three cases generate the same code. For a summary of the effects of breakpoint spacing on execution speed, error, and memory usage, see Summary of the Effects of Breakpoint Spacing.
This comparison uses the model fxpdemo_approx_sin. Three fixed-point lookup tables appear in this model. All three tables approximate the function sin(2*pi*u) over the first quadrant and achieve a worst-case error of less than 2^-8. However, they have different restrictions on their breakpoint spacing.
You can use the model fxpdemo_approx, which fxpdemo_approx_sin opens, to generate Simulink^{®} Coder™ code (Simulink Coder software license required). The sections that follow present several segments of generated code to emphasize key differences.
To open the model, type at the MATLAB^{®} prompt:
fxpdemo_approx_sin
This section looks at the data ROM required by each of the three spacing options.
Uneven spacing requires both Y data points and breakpoints:
int16_T yuneven[8]; uint16_T xuneven[8];
The total bytes used is 32.
Even spacing requires only Y data points:
int16_T yeven[10];
The total bytes used is 20. The breakpoints are not explicitly required. The code uses the spacing between the breakpoints, and might use the smallest and largest breakpoints. At most, three values related to the breakpoints are necessary.
Power of two spacing requires only Y data points:
int16_T ypow2[17];
The total bytes used is 34. The breakpoints are not explicitly required. The code uses the spacing between the breakpoints, and might use the smallest and largest breakpoints. At most, three values related to the breakpoints are necessary.
In all three cases, you must guard against the chance that the input is less than the smallest breakpoint or greater than the biggest breakpoint. There can be differences in how occurrences of these possibilities are handled. However, the differences are generally minor and are normally not a key factor in deciding to use one spacing method over another. The subsequent sections assume that out-of-range inputs are impossible or have already been handled.
This section describes how the three fixed-point lookup tables determine where the current input is relative to the breakpoints.
Unevenly-spaced breakpoints require a general-purpose algorithm such as a binary search to determine where the input lies in relation to the breakpoints. The following code provides an example:
iLeft = 0; iRght = 7; /* number of breakpoints minus 1 */ while ( ( iRght - iLeft ) > 1 ) { i = ( iLeft + iRght ) >> 1; if ( uAngle < xuneven[i] ) { iRght = i; } else { iLeft = i; } }
The while loop executes up to log2(N) times, where N is the number of breakpoints.
Evenly-spaced breakpoints require only one step to determine where the input lies in relation to the breakpoints:
iLeft = uAngle / 455U;
The divisor 455U represents the spacing between breakpoints. In general, the dividend would be (uAngle - SmallestBreakPoint). In this example, the smallest breakpoint is zero, so the code optimizes out the subtraction.
Power of two spaced breakpoints require only one step to determine where the input lies in relation to the breakpoints:
iLeft = uAngle >> 8;
The number of shifts is 8 because the breakpoints have spacing 2^8. The smallest breakpoint is zero, so uAngle replaces the general case of (uAngle - SmallestBreakPoint).
To determine where the input lies with respect to the breakpoints, the unevenly-spaced case requires much more code than the other two cases. This code requires additional command ROM. If many lookup tables share the binary search algorithm as a function, you can reduce this ROM penalty. Even if the code is shared, the number of clock cycles required to determine the location of the input is much higher for the unevenly-spaced cases than the other two cases. If the code is shared, function-call overhead decreases the speed of execution a little more.
In the evenly-spaced case and the power of two spaced case, you can determine the location of the input with a single line of code. The evenly-spaced case uses a general integer division. The power of two case uses a shift instead of general division because the divisor is an exact power of two. Without knowing the specific processor, you cannot be certain that a shift is better than division.
Many processors can implement division with a single assembly language instruction, so the code will be small. However, this instruction often takes many clock cycles to complete. Many processors do not provide a division instruction. Division on these processors occurs through repeated subtractions. This process is slow and requires a lot of machine code, but this code can be shared.
Most processors provide a way to do logical and arithmetic shifts left and right. A key difference is whether the processor can do N shifts in one instruction (barrel shift) or requires N instructions that shift one bit at a time. The barrel shift requires less code. Whether the barrel shift also increases speed depends on the hardware that supports the operation.
The compiler can also complicate the comparison. In the previous example, the command uAngle >> 8 essentially takes the upper 8 bits in a 16-bit word. The compiler can detect this situation and replace the bit shifts with an instruction that takes the bits directly. If the number of shifts is some other value, such as 7, this optimization would not occur.
In theory, you can calculate the interpolation with the following code:
y = ( yData[iRght] - yData[iLeft] ) * ( u - xData[iLeft] ) ... / ( xData[iRght] - xData[iLeft] ) + yData[iLeft]
The term (xData[iRght] - xData[iLeft]) is the spacing between neighboring breakpoints. If this value is constant, due to even spacing, some simplification is possible. If spacing is not just even but also a power of two, significant simplifications are possible for fixed-point implementations.
For the uneven case, one possible implementation of the ideal interpolation in fixed point is as follows:
xNum = uAngle - xuneven[iLeft]; xDen = xuneven[iRght] - xuneven[iLeft]; yDiff = yuneven[iRght] - yuneven[iLeft]; MUL_S32_S16_U16( bigProd, yDiff, xNum ); DIV_NZP_S16_S32_U16_FLOOR( yDiff, bigProd, xDen ); yUneven = yuneven[iLeft] + yDiff;
The multiplication and division routines are not shown here. These routines can be complex and depend on the target processor. For example, these routines look different for a 16-bit processor than for a 32-bit processor.
Evenly-spaced breakpoints implement interpolation using slightly different calculations than the uneven case. The key difference is that the calculations do not directly use the breakpoints. When the breakpoints are not required in ROM, you can save a lot of memory:
xNum = uAngle - ( iLeft * 455U ); yDiff = yeven[iLeft+1] - yeven[iLeft]; MUL_S32_S16_U16( bigProd, yDiff, xNum ); DIV_NZP_S16_S32_U16_FLOOR( yDiff, bigProd, 455U ); yEven = yeven[iLeft] + yDiff;
Power of two spaced breakpoints implement interpolation using very different calculations than the other two cases. As in the even case, breakpoints are not used in the generated code and therefore not required in ROM:
lambda = uAngle & 0x00FFU; yPow2 = ypow2[iLeft)+1] - ypow2[iLeft]; MUL_S16_U16_S16_SR8(yPow2,lambda,yPow2); yPow2 += ypow2[iLeft];
This implementation has significant advantages over the uneven and even implementations:
A bitwise AND combined with a shift right at the end of the multiplication replaces a subtraction and a division.
The term (u - xData[iLeft] ) / ( xData[iRght] - xData[iLeft]) results in no loss of precision, because the spacing is a power of two.
In contrast, the uneven and even cases usually introduce rounding error in this calculation.
The following table summarizes the effects of breakpoint spacing on execution speed, error, and memory usage.
Parameter | Even Power of 2 Spaced Data | Evenly Spaced Data | Unevenly Spaced Data |
---|---|---|---|
Execution speed | The execution speed is the fastest. The position search and interpolation are the same as for evenly-spaced data. However, to increase the speed more, a bit shift replaces the position search, and a bit mask replaces the interpolation. | The execution speed is faster than that for unevenly-spaced data, because the position search is faster and the interpolation requires a simple division. | The execution speed is the slowest of the different spacings because the position search is slower, and the interpolation requires more operations. |
Error | The error can be larger than that for unevenly-spaced data because approximating a function with nonuniform curvature requires more points to achieve the same accuracy. | The error can be larger than that for unevenly-spaced data because approximating a function with nonuniform curvature requires more points to achieve the same accuracy. | The error can be smaller because approximating a function with nonuniform curvature requires fewer points to achieve the same accuracy. |
ROM usage | Uses less command ROM, but more data ROM. | Uses less command ROM, but more data ROM. | Uses more command ROM, but less data ROM. |
RAM usage | Not significant. | Not significant. | Not significant. |
The number of Y data points follows the expected pattern. For the same worst-case error, unrestricted spacing (uneven) requires the fewest data points, and power-of-two-spaced breakpoints require the most. However, the implementation for the evenly-spaced and the power of two cases does not need the breakpoints in the generated code. This reduces their data ROM requirements by half. As a result, the evenly-spaced case actually uses less data ROM than the unevenly-spaced case. Also, the power of two case requires only slightly more ROM than the uneven case. Changing the worst-case error can change these rankings. Nonetheless, when you compare data ROM usage, you should always take into account the fact that the evenly-spaced and power of two spaced cases do not require their breakpoints in ROM.
The effort of determining where the current input is relative to the breakpoints strongly favors the evenly-spaced and power of two spaced cases. With uneven spacing, you use a binary search method that loops up to log2(N) times. With even and power of two spacing, you can determine the location with the execution of one line of C code. But you cannot decide the relative advantages of power of two versus evenly spaced without detailed knowledge of the hardware and the C compiler.
The effort of calculating the interpolation favors the power of two case, which uses a bitwise AND operation and a shift to replace a subtraction and a division. The advantage of this behavior depends on the specific hardware, but you can expect an advantage in code size, speed, and also in accuracy. The evenly-spaced case calculates the interpolation with a minor improvement in efficiency over the unevenly-spaced case.