This example shows how to optimize the QPSK receiver modeled in QPSK Transmitter and Receiver example for HDL code generation and hardware implementation. The HDL-optimized model shows a QPSK receiver that addresses real-world communications issues like carrier recovery and timing recovery in a hardware-friendly manner.
The HDL Optimized QPSK Receiver with Captured Data example provides a hardware-friendly solution that performs baseband processing to handle a time-varying frequency offset and a time-varying symbol delay. Specifically, this example provides an HDL-optimized reference design of a practical digital receiver to mitigate the above-mentioned impairments, and includes coarse frequency compensation, PLL-based fine frequency compensation, timing recovery with fixed-rate resampling, bit stuffing/skipping, frame synchronization, and phase ambiguity resolution.
Compared with the implementation of the receiver in the QPSK Transmitter and Receiver example, three major modifications have been made for efficient HDL code generation:
Streaming Input and Output: The HDL optimized QPSK receiver processes data one sample at a time. The captured real-world signal is streamed into the receiver front-end. The streaming output of the HDL optimized receiver is buffered and passed to the text message decoder.
Fixed-point: The QPSK receiver logic operates on fixed-point data.
HDL optimized architecture: Several blocks have been redesigned to use hardware efficient algorithms and architectures.
The top-level structure of the QPSK receiver model is shown in the following figure. The HDLRx subsystem has been optimized for HDL code generation.
The input data is captured using two USRP® devices running the transmitter model and the receiver model respectively. The captured data represents the baseband received signal with a sampling rate of 200 KHz. The data is sample-based and has a length of 200001, which corresponds to a period of 1 s.
The following diagram shows the detailed structure of the HDLRx subsystem.
The subsystems within are further described in the following sections.
1. Automatic Gain Control (AGC) - Adjusts the received signal amplitude to a desired level
2. Root Raised Cosine Receive Filter - Uses a rolloff factor of 0.5, and decimates the input signal by two
3. Coarse Frequency Compensation (CFC) - Estimates an approximate frequency offset of the received signal and corrects it
4. Fine Frequency Compensation (FFC) - Compensates for the residual frequency and phase offset
5. Timing Recovery - Resamples the input signal according to a recovered timing strobe so that symbol decisions are made at the optimum sampling instants
6. Data Decoding - Aligns the frame boundaries, resolves the carrier phase ambiguity caused by the Fine Frequency Compensation subsystem, and demodulates the signal
The structure of the Text Message Decoding subsystem is shown below.
This subsystem is expected to be run in software, therefore, it processes frame-based signals to speed up the computation. The HDLRx subsystem outputs three sample-based Boolean signals: bit1, bit2, and dValid. Given that the downstream processing requires a frame signal, the task of converting sample-based signals to frame-based counterparts is accomplished by the dataframer block. The demodulated bit pair, bit1 and bit2, is valid only when dValid is set high. The dataframer block uses the dValid signal to properly fill up a delay line with bit1 and bit2. The Descramble and Print subsystem processes the received data only when its enable signal goes high. This occurs when both the delay line accumulates exactly 200 valid demodulated bits and the RxGo signal is set high. While the simulation is running, the Descramble and Print subsystem outputs the string "Hello world ###" to the MATLAB® command window, where '###' is a repeating sequence of '000', '001, '002', ..., '099'.
The Reference Frequency Offset Estimation subsystem provides an accurate estimation of the frequency offset for diagnostic purposes.
The AGC ensures a stable input to the frequency and timing recovery subsystems. It sets the amplitude of the Coarse Frequency Compensation subsystem input as 1/Upsampling Factor , so that the equivalent gains of the phase and timing error detectors stay constant over time. The AGC is placed before the Root Raised Cosine Receive Filter so that the signal amplitude can be measured with an oversampling factor of four, thus improving the accuracy of the estimate.
The AGC structure is shown in the following diagram, and pipeline registers are shown in green throughout the model.
2. Root Raised Cosine Receive Filter
The Root Raised Cosine Receive Filter decimates the input signal by a factor of two, with a rolloff factor of 0.5. It provides matched filtering for the transmitted waveform to boost the signal to noise ratio and facilitate the downstream signal processing.
The Root Raised Cosine Receive Filter is implemented using a fully parallel architecture.
3. Coarse Frequency Compensation
The Coarse Frequency Compensation subsystem corrects the input signal with a rough estimate of the frequency offset. The following diagram shows the Coarse Frequency Compensation subsystem.
This subsystem estimates the frequency and phase offsets of the baseband QPSK signal. First, the subsystem raises the input signal to the power of four. This is implemented by cascading two product blocks. Then, from the modulation-independent signal, it estimates the tone at four times the frequency offset. After dividing the estimate by four, the so-obtained frequency offset is corrected in the original signal. There is usually a residual frequency offset even after the CFC, which would cause a slow rotation of the constellation. The Fine Frequency Compensation subsystem compensates for this residual frequency.
The model implements a correlation based algorithm, also known as the Luise algorithm [ 1 ], for frequency estimation. This algorithm saves hardware resources compared with an FFT algorithm. Pipeline registers are used in the data path of the Luise algorithm to ensure the circuit speed. To learn more about the CFC algorithm, refer to the Communication System Toolbox documentation.
The function, which constitutes a key component in the Luise algorithm, is computed using the Complex to Magnitude-Angle HDL Optimized block. This block computes the phase using the hardware friendly CORDIC algorithm. To learn more about the Complex to Magnitude-Angle HDL Optimized block, refer to the DSP System Toolbox documentation.
The detected phase offset is sent to an NCO to generate a complex exponential signal that is used to correct the phase offset in the original signal. The NCO HDL Optimized block maps the lookup table into a ROM, and provides a lookup table compression option to significantly reduce the lookup table size. To learn more about the NCO HDL Optimized block, refer to the DSP System Toolbox documentation.
4. Fine Frequency Compensation
The Fine Frequency Compensation subsystem, shown in the following figure, implements a phase-locked loop (PLL), described in Chapter 7 of [ 2 ], to track the residual frequency offset and the phase offset in the input signal.
A maximum likelihood Phase Error Detector (PED), described in Chapter 7.2.2 of [ 2 ], generates the phase error. A tunable proportional-plus-integral Loop Filter, described in Appendix C.2 of [ 2 ], filters the error signal and then feeds it into the Phase Calculation block. The Phase Calculation block generates a complex exponential signal that is used to correct the residual frequency and phase offsets in the output of the CFC. The Loop Filter allows tuning of Loop Bandwidth (normalized by the sample rate) and Loop Damping Factor . The default normalized loop bandwidth is set to 0.13 and the default damping factor is set to 2.5 (over damping), so that the PLL quickly locks to the intended phase while introducing little phase noise. To learn more about the FFC algorithm, refer to the Communication System Toolbox documentation.
5. Timing Recovery
The Timing Recovery subsystem is shown in the following diagram.
The Timing Recovery subsystem implements a PLL, described in Chapter 8 of [ 2 ], to correct the timing error in the received signal. On average, the Timing Recovery subsystem generates one output sample for every two input samples.
The Interpolation Control subsystem implements a decrementing modulo-1 counter, described in Chapter 8.4.3 of [ 2 ], to generate the control signal to facilitate the Data Decoding subsystem to properly select the interpolants of the Interpolation Filter. This control signal also enables the Timing Error Detector (TED), so that it calculates the timing errors at the correct timing instants. The Interpolation Control subsystem updates the timing difference for the Interpolation Filter, generating interpolants at optimum sampling instants.
The Interpolation Filter is a Farrow parabolic filter with as described in Chapter 8.4.2 of [ 2 ]. The filter uses an of 0.5 so that all the filter coefficients become 1, -1/2 and 3/2, which significantly simplifies the interpolator structure.
Based on the interpolants, timing errors are generated by a zero-crossing Timing Error Detector as described in Chapter 8.4.1 of [ 2 ], filtered by a tunable proportional-plus-integral Loop Filter as described in Appendix C.2 of [ 2 ], and fed into the Interpolation Control for a timing difference update. The Loop Filter allows tuning of Loop Bandwidth (normalized by the sample rate) and Loop Damping Factor. The default normalized loop bandwidth is set to 0.01 and the default damping factor is set to unity so that the PLL quickly locks to the correct timing while introducing little phase noise.
When the timing error (delay) reaches symbol boundaries, there is one extra or missing interpolant in the output. The TED implements bit stuffing or skipping to handle the extra or missing interpolants. You can refer to Chapter 8.4.4 of [ 2 ] for details of bit stuffing/skipping.
The timing recovery loop normally generates one output symbol for every two input samples. It also outputs a timing strobe (dValid signal) that runs at the input sample rate. Under normal circumstances, the strobe value is simply a sequence of alternating ones and zeros. However, this occurs only when the relative delay between transmitter and receiver contains some fractional part of one symbol period and the integer part of the delay (in symbols) remains constant. If the integer part of the relative delay changes, the strobe value can have two consecutive zeros or two consecutive ones.
6. Data Decoding
The Data Decoding subsystem performs frame synchronization, carrier phase ambiguity resolution, and QPSK demodulation. Its structure is shown in the diagram below:
Frame synchronization: The Matched Filter subsystem uses a QPSK-modulated Barker code as a reference to correlate against the received symbols. The modulus of the matched filter output is calculated in the Modulus subsystem and then compared with a threshold. Frame synchronization is declared if the modulus output exceeds the threshold. The threshold for frame synchronization is tunable: a large value increases the miss probability whereas a small value increases the probability of false alarm. In this example, the threshold value is set to 16.
Phase ambiguity resolution: The carrier phase PLL of the Fine Frequency Compensation subsystem may lock to the unmodulated carrier with a phase shift of 0, 90, 180, or 270 degrees, which can cause a phase ambiguity. For details of phase ambiguity and its resolution, refer to Chapter 7.2.2 and 7.7 in [ 2 ]. The angle of the matched filter output determines the extra phase shift. The Matched Filter output is fed into the conjugate block to negate the extra phase shift. Once frame synchronization is achieved, the conjugated version of the matched filter output is frozen and multiplied with all the symbols in a frame to effectively resolve the phase ambiguity issue.
QPSK demodulation: Each corrected symbol is demodulated and mapped to a pair of bits based on the symbol mapping of QPSK constellation.
The following figure shows the Bit Error Rate (BER) for this example. The captured data in this example has a small frequency offset ranging from -120 to -90Hz. Extra offset is added using the Frequency Offset block in the model.
The BER plot shows that using CFC followed by FFC ensures a low BER for a wide frequency offset range, while using the FFC (without CFC) can only correct small offsets (smaller than 1200Hz with the parameter settings in this example). Using both CFC and FFC is recommended, especially to avoid poor BER performance when the frequency offset drifts out of the range the FFC can track.
The CFC may introduce small offset in the system, which could lead to a slight performance degradation when the actual offset is close to 0. In this case, using just the FFC may be a better choice. The Reference Frequency Offset Estimation subsystem uses the FFT-based Coarse Frequency Compensation block from the Communication System Toolbox to provide an accurate estimation of the frequency offset. This information can be used to help user making design decisions.
When frequency compensation subsystems cannot estimate and correct the frequency and phase offset, it is difficult for the Timing Recovery to correct timing errors. BER equal to 1 means the enable signal of the Data Decoding subsystem is always low and there is no data decoded.
When running the simulation, the model displays two scatter plots to show the constellation of the FFC output and the Timing Recovery output respectively.
The following diagram shows the constellation plot of the FFC output. The cluster is scattered around, mainly due to two reasons:
The timing error between the clocks at the transmitter and receiver
The signals are oversampled by a factor of two. Therefore, half of the symbols are in the transition state between QPSK symbols.
The following diagram shows the constellation of the Timing Recovery output. One observes four concentrated clusters around the true 4-point constellation for QPSK modulation. This verifies the effectiveness of the Timing Recovery subsystem. However, as mentioned before, the Fine Frequency Compensation subsystem may lock the signal with a phase shift of 0, 90, 180, or 270 degree. The phase ambiguity issue is fixed in the *Data Decoding" subsystem.
Pipeline registers (shown in green) have been added throughout the model to make sure the HDLRx subsystem does not have a long critical path. The HDL code generated from the HDLRx subsystem was synthesized using Xilinx® ISE on a Virtex6 (XC6VLX240T-1FFG1156) FPGA, and the circuit ran at about 97 MHz.
To check and generate the HDL code referenced in this example, you must have an HDL Coder™ license.
You can use the commands makehdl and makehdltb to generate HDL code and testbench for subsystems in HDLRx. To generate the HDL code, use the following command:
To generate testbench, use the following command:
1. M. Luise and R. Reggiannini, "Carrier frequency recovery in all-digital modems for burst-mode transmissions," IEEE Trans. Communications, pp. 1169-1178, 1995.
2. Michael Rice, "Digital Communications - A Discrete-Time Approach", Prentice Hall, April 2008.
USRP® is a trademark of National Instruments Corp.