FPGA Design Using DSP Builder Advanced Blockset
Agenda

- Introduction
  - Model based design flow for Altera FPGAs
  - DSP Builder Features

- DSP Builder Advanced Blockset
  - Introduction to constraint driven model based design

- Advanced Blockset design examples
  - Radar front end processor: Interfacing to high speed A/Ds
  - Direct RF upconversion: Interfacing to multi-gigabit DACs
  - 8x8 Beam Former
DSP Builder System Level Design Flow

**Development**
- System Level Simulation of Algorithm Model
  - Algorithm-level Modeling
  - MATLAB/Simulink

**Implementation**
- RTL Implementation
  - RTL Simulation
  - Synthesis, Place & Route, RTL Simulation
  - Precision, Synplify
  - Quartus II, ModelSim

**Verification**
- System Level Verification of Hardware Implementation
  - System-level Verification
  - Altera FPGA
  - Altera Development Kits

© 2007 Altera Corporation - Confidential
Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation
DSP Builder System Level Design Flow

Development

System Level Simulation of Algorithm Model

Implementation

RTL Implementation
RTL Simulation

Verification

System Level Verification of Hardware Implementation

DSP Builder

Algorithm-level Modeling

Synthesis, Place & Route, RTL Simulation

System-level Verification

MATLAB/Simulink

Precision, Synplify
Quartus II, ModelSim

Altera FPGA
Altera Development Kits

© 2007 Altera Corporation - Confidential
Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation
DSP Builder Overview

DSP Builder

- Creates HDL Code
- Creates Simulation Test Bench
- Creates Processor Plug-In
- Download Design to Development Board
- Hardware In the Loop
- Verify in Hardware

HDL Synthesis

Model Technology

QUARTUS® II

SOPC Builder

SignalTap® II

© 2007 Altera Corporation - Confidential
Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation
DSP Builder – Standard and Advanced Blocksets
DSP Builder Features

- Automatic Generation of VHDL design from a MATLAB/Simulink model
- Automatic Generation of Testbenches
  - Captures Stimulus From Simulink, Writes Testbench
- HDL Import
  - Reads in Design: HDL: Verilog or VHDL, or Quartus Project
  - Creates Simulink Simulation Model
- Signal Tap: Embedded Logic Analyzer
  - Captures Internal Data And Brings It Into Matlab
- Hardware in Loop (HIL) Testing
  - Pass Vectors To / From Board
- Waveform Viewer
  - Visualize Waveforms as Digital Busses Using ModelSim
DSP Builder Features

- SOPC Bus Support
  - Avalon Masters & Slaves, Custom Instruction Set (Nios)

- IP Support
  - FIR, NCO, FFT, Reed Solomon, Viterbi, CIC
  - Other IP through HDL Import

- Data Width Propagation
  - Automatically Propagates Bus Widths Through Signal Path

- Multi Data Rate Support
  - PLLs or Clock Enables
  - Multi-rate FIFO

- Integration of DSP Boards
  - Customer board easily integrated

- Complex Data
DSP Builder
Advanced Blockset
What’s New in Advanced Blockset?

- Constraint-Driven Design
  - Automated pipelining
    - Meet desired clock rate
    - Enable timing closure at high clock rates of 400-500 MHz

- Automatic TDM Support for ModelIP
  - ModelIP reuses the resources efficiently

- “Textbook” Design with ModelPrim

- Multi-Channel Designs Made Easy

- Memory-map Register Generation
  - Allow easy configuration of coefficients and run-time parameters

**Increased Productivity by Closing Timing Faster**
Constraint Driven Design: (1) Create Model

- Use ModelIP or ModelPrim Libraries
Constraint Driven Design: (2) Select Device

- Device independent modeling until this level
Constraint Driven Design: (3) Set Frequency

- Automatic Pipelining / Time Sharing (ModelIP)
Constraint Driven Design: (4) Compile

![Compilation Report](image)

- **Compilation Report**
- **Legal Notice**
- **Flow Summary**
- **Flow Settings**
- **Flow Non-Default Global Settings**
- **Flow Elapsed Time**
- **Flow Log**
- **Analysis & Synthesis**
- **Fitter**
- **Assembler**
- **TimeQuest Timing Analyzer**
  - **Summary**
  - **SDC File List**
  - **Clocks**
  - **Slow 900mV 85C Model**
  - **Fmax Summary**

![Slow 900mV 85C Model Fmax Summary](image)

<table>
<thead>
<tr>
<th>Fmax</th>
<th>Restricted Fmax</th>
<th>Clock Name</th>
<th>Note</th>
</tr>
</thead>
<tbody>
<tr>
<td>307.22 MHz</td>
<td>307.22 MHz</td>
<td>bus_clk</td>
<td></td>
</tr>
<tr>
<td>408.66 MHz</td>
<td>408.66 MHz</td>
<td>clk</td>
<td></td>
</tr>
</tbody>
</table>

- **Fitter Status**: Successful - Fri Apr 11 11:35:26 2008
- **Quartus II Version**: 8.0 Internal Build 185 03/20/2008 SJ Full Version
- **Revision Name**: FilterSystem
- **Top-level Entity Name**: demo_fir_FilterSystem
- **Family**: Stratix IV
- **Device**: EP4S6X70DF29C2
- **Timing Models**: Preliminary
- **Logic utilization**: 7 %
  - **Combinational ALUTs**: 1,448 / 56,320 (3 %)
  - **Memory ALUTs**: 770 / 28,160 (3 %)
  - **Dedicated logic registers**: 3,825 / 56,320 (7 %)
- **Total registers**: 3025
- **Total pins**: 121 / 412 (29 %)
- **Total virtual pins**: 0
- **Total block memory bits**: 12,805 / 6,617,088 (< 1 %)
- **DSP block 18-bit elements**: 14 / 394 (4 %)
- **Total GXB Receiver Channels**: 0 / 16 (0 %)
- **Total GXB Transmitter Channels**: 0 / 16 (0 %)
- **Total PLLs**: 0 / 3 (0 %)
- **Total DLLs**: 0 / 4 (0 %)
Higher Level Synthesis

1. Convert the MDL schematic into an intermediate DFG representation

2. Apply transforms and analysis:
   - Break apart carry chains
   - DSP Block & Memory Timing
   - Share multipliers
   - Pipeline for:
     - required FMax performance
     - Balanced/matched delays
   - ...

3. Generate RTL

```
library IEEE;
use IEEE.STD_LOGIC_1164.all;

entity DSPA is port (
  SEL: in STD_LOGIC_VECTOR(15 downto 0);
  C, D, B: in STD_LOGIC_VECTOR(15 downto 0);
  A : out STD_LOGIC);
end;

architecture BEHAVIOUR of DSPA is
begin
  A := B * C + D;
end;
```
ModelPrim: Zero Latency Blocks

- Blocks are behavioural in nature
  - *What* to do, not *When* to do it
  - Focus on signal flow representation

- Much easier debug and modify without pipeline
- Design-once and retarget to different speed-grades and families

Behavioural input enables Optimizations
Performance Through Pipelining

- Simply enter desired System Clock Frequency,
- No need to change model
- Simple 50-bit 4-input adder tree
  - 100 MHz Target => 118 LUT4s, 121 MHz, No pipeline
  - 200 MHz Target => 175 LUT4s, 286 MHz, 1 stage pipeline
  - 400 MHz Target => 350 LUT4s, 581 MHz, 5 stage pipeline

Timing driven synthesis produces small or fast RTL from same model
Multi-channel Designs

- IIR example uses ‘textbook’ lumped delays
- Replace registers with number of channels
- Delays are distributed around logic to meet fmax goal
- Processes multiple channels simultaneously

Easy to enter Models produce high quality hardware
DSP Builder
Advanced Blockset
Design Examples
Design Examples

- **Radar front end processor**
  - Interfacing to high speed A/Ds
  - 2.8 Gsps A/D, efficient downconvert to 350 MspS

- **Direct RF upconversion to multi-gigabit DACs**
  - Interfacing to high speed DACs
  - 4.096 Gsps Digital Up Converter

- **8x8 Beam Former**
  - Folding based upon clock rate
Design Examples

- Radar front end processor
  - Interfacing to high speed A/Ds
  - 2.8 Gsps A/D, efficient downconvert to 350 Msps

- Direct RF upconversion to multi-gigabit DACs
  - Interfacing to high speed DACs
  - 4.096 Gsps Digital Up Converter

- 8x8 Beam Former
  - Folding based upon clock rate
Traditional Digital Downconversion

\[ \exp(j\theta_k n + \text{PhsAdj}) \]

- **ADC** → **Complex NCO** → **M:1 Decimating Low Pass FIR** → **I Baseband Data**
- **ADC** → **Real IF Data** → **M:1 Decimating Low Pass FIR** → **Q Baseband Data**

\[ H(Z) \]

- **Real IF Signal** → **Complex Baseband Signal**
- **Baseband** → **Fc = Carrier Signal** → **Frequency, F**
- **Baseband** → **Carrier Signal** → **Frequency, F**

© 2007 Altera Corporation - Confidential
Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation

23
Polyphase Digital Downconversion

ADC
2.8 GSPS

NCO
\[ \exp (j \omega_{\text{carrier}} + 0 + \text{PhsAdj}) \]

1:2 Demux

NCO
\[ \exp (j \omega_{\text{carrier}} + \pi + \text{PhsAdj}) \]

1:2 Demux

NCO
\[ \exp (j \omega_{\text{carrier}} + 5\pi/4 + \text{PhsAdj}) \]

1:2 Demux

NCO
\[ \exp (j \omega_{\text{carrier}} + 3\pi/2 + \text{PhsAdj}) \]

1:2 Demux

NCO
\[ \exp (j \omega_{\text{carrier}} + 3\pi/4 + \text{PhsAdj}) \]

1:2 Demux

NCO
\[ \exp (j \omega_{\text{carrier}} + 7\pi/4 + \text{PhsAdj}) \]

1:2 Demux

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

8:1 Decimating FIR

350 MSPS

baseband I data

baseband Q data
Aliased Polyphase DDC

ADC

Real IF Data

Real IF Signal

Low Pass FIR$\Phi_0$

Low Pass FIR$\Phi_1$

Low Pass FIR$\Phi_2$

Low Pass FIR$\Phi_3$

Low Pass FIR$\Phi_4$

Low Pass FIR$\Phi_5$

Low Pass FIR$\Phi_6$

Low Pass FIR$\Phi_7$

exp (j 0k 2\pi/8)

exp (j 1k 2\pi/8)

exp (j 2k 2\pi/8)

exp (j 3k 2\pi/8)

exp (j 4k 2\pi/8)

exp (j 5k 2\pi/8)

exp (j 6k 2\pi/8)

exp (j 7k 2\pi/8)

Optional Complex NCO

baseband I data

baseband Q data

350 MSPS

Spinner selects Nyquist Zone to downconvert

Baseband

Fc = Carrier Signal

K=2 for second Nyquist zone

FS/2

FS

H(Z)
Radar Front End

ADC
8 bit @ 2.8 GHz
National ADC083000 or equiv

4 LVDS buses @ 700 MSPS

8
1:2 Demux

8
1:2 Demux

8
1:2 Demux

8
1:2 Demux

350 MSPS

8-path polyphase (real only) band-pass down-sampler
In: 8 x 350 MSPS
Out: 18 350 MSPS (complex)

8-path polyphase filter

Band selection complex mixer/spinner

Complex adder

350 MSPS

1,024 point radix 4 complex FTT

350 MSPS

18 bits

22 bits

SERDES backhaul

Implemented in DSPB-AB

© 2007 Altera Corporation - Confidential
Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation.
Design Examples

- Radar front end processor
  - Interfacing to high speed A/Ds
  - 2.8 Gsps A/D, efficient downconvert to 350 Msps

- Direct RF upconversion to multi-gigabit DACs
  - Interfacing to high speed DACs
  - 4.096 Gsps Digital Up Converter

- 8x8 Beam Former
  - Folding based upon clock rate
DirectRF: Data rate > Clock rate designs

<table>
<thead>
<tr>
<th>Data Rate (Msps)</th>
<th>Current Design Methodology</th>
</tr>
</thead>
<tbody>
<tr>
<td>256</td>
<td>H ↦ 2</td>
</tr>
<tr>
<td>512</td>
<td>D ↦ 2</td>
</tr>
<tr>
<td>1024</td>
<td>G ↦ 2</td>
</tr>
<tr>
<td>2048</td>
<td>D ↦ 2</td>
</tr>
</tbody>
</table>

256 MHz Clock

**Advanced Blockset Methodology**

<table>
<thead>
<tr>
<th>Data Rate (Msps)</th>
<th>Advanced Blockset Methodology</th>
</tr>
</thead>
<tbody>
<tr>
<td>256</td>
<td>H ↦ 2</td>
</tr>
<tr>
<td>512</td>
<td>D ↦ 2 (2)</td>
</tr>
<tr>
<td>1024</td>
<td>G ↦ 2 (8)</td>
</tr>
<tr>
<td>2048</td>
<td>G ↦ 2 (8)</td>
</tr>
</tbody>
</table>

256 MHz Clock

- Complex control required
- User has to manage reordering of Odd / Even phases of half band filters
- Significantly simplifies high speed design
- The tool automatically duplicates necessary hardware to generate parallel outputs
- Handles polyphase reordering
- Design flow is simplified

© 2007 Altera Corporation - Confidential

Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation
Direct RF Design: Overview

**Detail 1:**
- 8 Channels of I&Q at 16 Msps
- Time Shared on the same bus at 256 MHz

**Detail 2:**
- FIR 2
- 32 Msps
- 64 Msps
- 128 Msps
- 256 Msps

**Detail 3:**
- Complex Mixer
- 8 NCOs with 6 MHz channel spacing = 48 MHz band

**Detail 4:**
- FIR 2
- 512 Msps
- 1024 Msps
- 2048 Msps
- 4096 Msps

**Detail 5:**
- Polyphase NCO: 16 Phases in parallel

**Detail 6:**
- Real channel at 4 Gsps

Up converted 256 MHz IF (Separate I/Q)
Eight 6 MHz Signals in 48 MHz Band
Design Examples

- Radar front end processor
  - Interfacing to high speed A/Ds
  - 2.8 Gsps A/D, efficient downconvert to 350 Msps

- Direct RF upconversion to multi-gigabit DACs
  - Interfacing to high speed DACs
  - 4.096 Gsps Digital Up Converter

- 8x8 Beam Former
  - Folding based upon clock rate
Folding: 8x8 Beam Former

- RADAR beam former
  - Multiply 8x1 incoming data vector by 8x8 matrix of weights

- Input data
  - 16-bit Complex data in rectangular form
  - Rate: 16-bits per sample @ 80MHz => 1280Mb/s
  - Format: I1,Q1,I2,Q2,…I8,Q8,I1,Q1,…

- Weights
  - 16-bit Complex weights
  - 8x8 matrix

- Output data
  - 16-bits per sample @ 80MHz
  - Format: I1,Q1,I2,Q2,…I8,Q8,I1,Q1,…
Real & Imaginary interleaved inputs/outputs on single bus

- 1 interleaved complex (I&Q) channel in and out
Advanced Blockset Folding

- Note that 64 Complex Multipliers are needed (or 256 multipliers)
- If clock rate = data rate then circuit requires 256 multipliers
- If clock rate > data rate then tool automatically time shares resources by putting samples into the memory and scheduling them accordingly
  - Example 1: Data Rate = 10 MHz, Clock = 200 MHz
    - Multipliers required = 256 multipliers / (200/10) or 25.6 multipliers (26 multipliers)
  - Example 2: Data Rate = 4 MHz, Clock = 256 MHz
    - Multipliers required = 256 multipliers / (256/4) or 4 multipliers
Summary: DSP Builder Advanced Blockset

- Automatic pipelining to meet required Fmax
- Similar performance as optimized HDL
- Easy timing closure
- Fewer compile iterations

Effortless FPGA Implementation

- Fast multi-channel design implementation
- Automatic generation of control plane logic
- Efficient pipelining for multi-channel data paths
- Ability to update design by editing system level parameters
- Effortless FPGA device family retargeting

© 2007 Altera Corporation - Confidential
Altera, Stratix, Cyclone, MAX, HardCopy, Nios, Quartus, and MegaCore are trademarks of Altera Corporation