Title: The Issues
1The Issues
- How much flexibility is needed and how best to
include it - A single system description including interaction
between the analog and digital domains - Realtime SOC prototyping
- Automated ASIC design flow
2An SOC Design Flow with Prototyping
3Simulation Framework using Simulink/Stateflow
(from Mathworks, Inc.)
- Techniques used to decrease simulation time
- Baseband-equivalent modeling of RF blocks
- Compile design using MATLAB Real-Time Workshop
4Blocks map to implementation libraries
Black Box
RTL CodeorSynopsysModuleCompilerorCustomMod
ule
Stateflow-VHDLtranslator
Time-Multiplexed FIR Filter
- Implementation choices embedded in description
- Libraries of blocks are pre-verified and re-used
5Timed Dataflow Graph Specification
- Simulink (from Mathworks)
- Discrete-Time(cycle accurate)
- Fixed-Point Types(bit true)
- No need for RTL simulation
- Embedded implementation choices
Multiply / Accumulate
6Control
- Stateflow
- Extended Finite State Machine
- Subset of Syntax
- Converted to VHDL
- Synthesized
- VHDL
- Synthesized directly
VHDL Stateflow Macros map to a netlist of
Standard Cells using standard synthesis
7Simulink Model of Direct-Conversion Receiver
8Bit true, cycle accurate digital baseband
algorithms
9Directly map diagram into hardware since there is
a one for one relationship for each of the blocks
- Results A fully parallel architecture that can
be implemented rapidly
10Then do a simulation Zero-IF Receiver
- 10 users (equal power)
- 13.5dB receiver NF
- PLL -80dBc/Hz _at_ 100kHz
- 2.5 I/Q phase mismatch
- 82dB gain
- 4 gain mismatch
- IIP2 -11dBm
- IIP3 -18dBm
- 500kHz DC notch filter
- 20MHz Butterworth LPF
- 10-bit, 200MHz S-D ADC
Output SNR 15dB
11With Analog Impairments
- ideal receiver
- real receiver
- 10 users (equal power)
- 20MHz Butterworth LPF
- 500kHz DC notch filter
- 13.5dB receiver NF
- 82dB gain
- 4 gain mismatch
- 2.5 I/Q phase mismatch
- IIP2 -11dBm
- IIP3 -18dBm
- PLL -80dBc/Hz _at_ 100kHz
- 10-bit, 200MHz S-D ADC
12Now to implement that description
13Berkeley Emulation Engine ?Complete Design
Prototype Environment for Communication
Systems
- Berkeley Wireless Research Center
- Chen Chang, Kimmo Kuusilinna, Brian Richards,
Kevin Camera, Nathan Chan, Allen Chan, Robert W.
Brodersen
14Whats BEE?
- A real-time FPGA-based hardware emulator, with
speed up to 60 MHz - Emulation capacity of 10 Million ASIC
gate-equivalents per module, corresponding to 600
Gops (16-bit adds). - 2400 external parallel I/O providing 192 Gbps raw
bandwidth. - Automated design flow from Simulink to FPGA
emulation, integrated with INSECTA ASIC design
flow.
15BEE Applications
- Real-time hardware emulation
- Novel Communication Systems with analog front-end
hardware (MCMA, UWB, 60GHz) - Digital signal processing systems
- Real-time control systems
- Neuron-like network processing
- Hardware acceleration
- Large communication/signal processing system
simulation - Hardware-in-the-loop cosimulation with software
system - Complex parallel computing algorithms
16The BEE Design Environment
Analog Front-end
Servers
BEE Processing Unit
Client PC
Network
Ethernet
LVDS/LVTTL
BEE/Insecta Design Flow
FPGA Bit Stream Conf File
Simulink MDL
ASIC Layout
17BEE System Assembly
20 Virtex-E 2000 16 ZBT-SRAM (1MByte each) 8
Riser I/O Cards
Riser I/O Card
MPB
StrongARM Module Linux OS
18Main Processing Board
48 bit buses
19Hardware Performance
- Board-level Main Clock Rate 160MHz
- On Board connection speed
- FPGA to FPGA 100MHz
- XBAR to XBAR 70MHz
- Off board connection speed (3 ft SCSI cable loop
back through riser card) - LVTTL 40MHz
- LVDS 160MHz 220MHz
20Hardware Capacity
- Reference Design
- 10240 tap FIR filter
- 512 taps per FPGA
- Slice utilization 99 of 19200 slices
- Max Clock Rate 28.5MHz
- ASIC Gate 401K per FPGA, 8M total
- MOPS 583,680 total (16bit add 12bit cmult)
- Power 2.5W per FPGA, 50W total
21Design Flow Goals
- Fully automatic generation of FPGA and ASIC
implementations from Simulink system level design - Cycle accurate bit-true functional level
equivalency between ASIC BEE implementation - Fast design turn-around time
- Chip-in-a-Day
- BEE-in-an-Hour
22Design Flow Global Perspective
Virtual Components
VHDL Netlist
23Design Flow Detailed View
24Virtual Component Library
- Parameterized system level blocks
- Bit-width
- Pipeline stages (latency)
- Output bits truncation
- Customizable block set library
- Different Architecture
- Different Technology Target
25Basic Blocks
FIFO
DPRAM
Shifter
VHDL
Concat
Enable
Const
ROM
RAM
Counter
Delay
Mux
Down
P to S
Convert
ReInt
S to P
Sync
Slice
Up Smp
Register
FPGAASIC Support
FPGA Support Only
Scale
Sin Cos
Shift
Thresh
26Communication DSP Blocks
Puncture
Conv. Encoder
Depuncture
DDS
CIC
FIR
Shift
FPGAASIC Support
FPGA Support Only
27Control Logic Design
- Simulink level StateFlow diagram, encapsulated
in a subsystem with Xilinx gateways - RTL VHDL automatically generated by SF2VHD
- Fully integrated with the BEE_ISE tools
VHDL
SF2VHD
Generate VHDL for Black Boxfrom StateFlow
StateFlow Controller
28Run-time Data I/O Interface
Matlab Control GUI
- New and improved infra-structure for transferring
data to and from the BEE - Control all data transfers from within a local
Matlab GUI - Accepts standard Simulink data structures for
intrinsic reuse of existing test vectors - Library macro contains the entire hardware
interface in one fully parameterized block
Ethernet
BEE
Linux/StrongARMDaemon
EmbeddedController
RAM
RAM
User Design
29Data I/O Interface Hardware
Pin Gateways
Source RAM
Bus Protocol Controller
Sink RAM
30Data I/O Interface Software
- Specify input source, BEE hostname, and data bus
parameters in Matlab GUI - Utilizes a custom MEX socket library for network
connectivity - Uses a simple packet header to distinguish
control frames and byte streams
root ./daemon Listening on port 2108
okWaiting for connection...
- StrongARM (running embedded Linux) starts a
persistent, lightweight server - Matlab clients connect via TCP and either send a
data stream or read request - Incoming data is translated into the hardware
protocol and broadcast to FPGA
31ASIC Flow INSECTA
- Tcl/Tk code drives the flow
- Same scripting language used by several EDA
tools First Encounter, Nanoroute, ModelSim,
Synopsys - GUI controls technology selection, parameter
selection, flow sequencing - A real Push Button flow
- Users can refine flow-generated scripts
32ASIC Tool Flow Placement
- Internally developed ASIC flow
- First Encounter (FE)
- Nanoroute
- Physical Compiler
- Timing Driven!
- FE provides accurate wire parasitic estimates
- Placement by FE or Physical Compiler
33ASIC Flow Routing in 130nm
- Nanoroute Ready for 130nm, 90nm designs
- Stepped metal pitches
- Minimum area rules
- Complex VIA rules
- Avoids antenna rule violations
- Cross-talk avoidance to be evaluated
34ASIC Flow Back-end
- Using Unicad backend directly for DRC, LVS,
Antenna rule checking - Easier to track technology updates from ST.
- Critical for evaluating internally developed
technology files for FE, Nanoroute
35BCJR MAP Decoder
- E2PR4 Channel Encoder - Decoder
- Fully enclosed design
- Uniform RNG input vector
- Channel encoder
- AWGN filter
- Channel decoder
- BER collection mechanism
- Part of Full 3G Turbo Decoder
36BCJR As Case Study
- 13.2 MHz system clock
- SNR 14db ? -1db
- 109 Samples
- 20 minute run-time
37FPGA Implementation of a Narrow-Band Transmission
System
Transmitter
Transmission System
Receiver
38FPGA Implementation of a Narrow-Band Transmission
System
392.4GHz Base-band Transmitter
CPU time 57 min Core Utilization 0.344418 (Pad
limited) Size (From SoC Enconter) Core Height
565.8u Core Width 489.54u Die Height
1322.66u Die Width 1242.3u Synopsys
estimates Total Dynamic Power 610.5163 uW
(100) Cell Leakage Power 15.9364 uW Critical
path 9.21ns
40How to get started?
- Documentation web site
- http//bwrc.eecs.berkeley.edu/Research/BEE
- Tutorials
- Lesson 1 Flow Basics
- Lesson 2 Runtime Debug on BEE
- Lesson 3 Control Logic Design
- Lesson 4 Run-time Data I/O on BEE
41BEE Compiler Framework
- Increase Design Scalability
- High-level blocks
- Vector Signals
- Reduce design time
- Faster run time
- Efficient/partial synthesis
- Modular design reuse
- Feature additions
- Tri-state pads/signal support
- Global pad assignment
- Automatic design partition
- Script based hardware generator