Title: An Integrated Debugging Environment for Reprogrammable Hardware Systems
1An IntegratedDebugging Environment
forReprogrammable Hardware Systems
- Kevin CameraHayden SoBob Brodersen
- Berkeley Wireless Research CenterUniversity of
California, Berkeley
AADEBUG 2005
2Outline
- Motivation
- Existing platform
- Existing design/verification flow
- Proposed solution
- Environment features
- Walkthrough
- Implementation strategy
3Application Domain
- Direct-mapped, reprogrammable hardware systems
- FPGA-based signalprocessing andsupercomputingar
rays
4FPGA Computing Benefits
- Superior power, computation, and cost efficiency
than any processor-based solution, due to direct
mapping of algorithms
Chang, Wawrzynek, Brodersen ISCA 05
5BEE2 2nd Berkeley Emulation Engine
- (5) Xilinx V2P100 per board
- 100K logic cells
- 2 PowerPC405 cores
- 444 dedicated multipliers
- 1MB on-chip SRAM
- 3.125Gb/s duplex links
- (4) DDR2 banks per FPGA
- 72 bits per bank with ECC
- Up to 12.8 (DDR400) or 17 (533DDR) GB/s bandwidth
- Up to 4GB capacity
6BEE Design Flow
- Design entry is in the Matlab/Simulink
environment - Graphical, library based also allows custom HDL
- Typical FPGA path to physical implementation
- HDL synthesis and place and route
- Hierarchy is flattened in each pass (non-modular
flow)
7Design Verification Methods
- High-level functional simulation
- HDL/RTL simulation
- Native FPGA execution
Complexity,Accuracy
8High-level Functional Simulation
- Design executionin Matlab/Simulink
- Intended to becorrect byconstruction
- Fastest software-based simulation
- Powerful and convenient algorithm exploration
9Drawbacks of High-level Simulation
- Even with high level of abstraction, vastly
slower than hardware - Trend is worsening with increased FPGA capacity
- Doesnt cover any side-effects or requirements of
the backend tool chain
10HDL/RTL Simulation
- Varying levelsof accuracy
- Access toarbitraryinternal signals
- But, simulation speed is even slower
- Parameterization/Iteration is much harder
11Native FPGA Execution
- Runs at full speed of hardware
- Three tools for on-FPGA testing
- Xilinx ChipScope Pro
- System Generator HW-in-the-loop
- Good old-fashioned signal probing
12Xilinx ChipScope Pro
- Inserts BRAM cores into design and binds to JTAG
- Captures selected signals and provides trigger
conditions - Signals of interest must be chosen in advance
- Captured state is limited by available BRAM
- Any changes require tool flow re-iteration
13System Generator HW-in-the-loop
- Allows hardware itself to accept and process data
from Simulink via JTAG - Arbitrary number of data elements can be accessed
as ports - Very powerful tool, but features limited process
control
14Hands-on Hardware Debugging
- Most accurate method for finding timing-related
bugs in a production system - Tradeoffs are all too well-known
- Complex equipment
- Limited probing pins
- A priori signal output
- Limited input options
15Drawback of On-FPGA Execution
- Place and route time is a major bottleneck
- Complete run is needed for every design change
- Increasingly problematic due to larger FPGA
capacity
16Proposed Solution
- Enable extensive debugging and design exploration
functionality directly on the hardware platform - Vastly superior execution time for todays
large-scale computing challenges - Exploit the spatial resources of the hardware to
assist in debugging - Essentially a -g switch to the hardware design
flow - Minimize or eliminate iterations through
implementation flow
17Caveats
- Final timing of design will not be preserved
- Critical path will definitely be increased,but
106 is a lot of headroom - Timing-driven implementation still needed once
verification is complete - Significantly more FPGA capacity and memory will
be needed - Acceptable for scalable BEE-like platforms and
for modular, tiled algorithms
18Essential Features of Environment
- Robustly parameterized library components with
soft configuration - Design exploration without tool iterations
- Readily accessible variable contents
- Reading and writing of any values by user
- Complete user-driven control over process
execution - Single-step, bursts, breakpoints, assertions
191 Parameterized Library
- Number of bits
- Saturate / Wrap
- Binary point position
- Microarchitecture
- Library components provide configuration
parameters as inputs, which can be set by
variables - Allows runtime modification of function
properties, including precision, range, and
latency - Enables design-space exploration at hardware
speed, plus correction of configuration errors
without re-implementation
202 Data Management
- Ability to dynamically observe any variables
value at the users request - Ability to overwrite a variables value at
runtime and continue operation - Ability to rewind system state within the bounds
of buffer capacity
212 Data Management Requirements
- Too expensive to re-implement the hardware to
expose new data - All variables are streamed into local and
off-chip storage, such as DRAM and disks - Unlike software, hardware is highly parallel, and
often deeply pipelined - Memory requirements could be extreme
- Can be offset by hierarchical memory architecture
and/or periodic sampling
223 Process Control
- Inherit the most useful features of software
debuggers like GDB - Cycle-by-cycle (single-step) execution
- Breakpoints (either state dependent, or fixed
cycle count) - Implemented using multiple clock domains and
clock buffer control - Already available for use on BEE2
23Walkthrough Design
- Use specialized libraries to provide soft
configuration - Integrates directly into the existing BEE2 tool
flow
24Walkthrough Tagging
- User tags signals of interest with debugging
testpoints - Defines a variable name
- Defines other parameters of interest for data
observation - Also includes breakpoints and assertions
25Walkthrough Stitching
- Stitcher updates the design before entering
back-end tool flow - Inserts logic as needed for debug functions
- Instantiates PowerPC core and master controller
- Adds underlying connections to route data
26Walkthrough Runtime
- User can monitor variables and control process
execution from remote client - Embedded PowerPC software provides a thin service
layer - Client is fully integrated with Matlab and
Simulink input description
27Control Architecture on BEE2
Control FPGA
PPC
Network
ClockBufferLogic
100MHz
User Defined (1-10MHz)
Single-step
Clockdomains
Breakpointinterrupt
Control
DRAM
User FPGA
Inserted Logic
UserDesign
28Stitching
- Stitcher traverses the design hierarchy and
- Replaces debugging component placeholders with
necessary logic - Creates a simple route from all variables to
off-chip storage devices - During execution, the stitcher records
- A mapping between variable names and their
physical variable unit in hardware - The latency within the variable routing network
29Variable Control Unit (VCU)
- Inserted in place of each variable block in
design - Automatically implied for every state variable in
a state machine - Combination of local buffers and off-chip DRAM
- Exact memory allocation is subject to
experimentation
30Debug Controller (DC)
- Interface between all variable and assertion
instances, the runtime user shell, and process
control services - Regulates the system clock both for exceptions
and to prevent variable storage overflows
31Runtime Shell Examples
32Future Work
- Complete infrastructure for BEE2
- Extensive experiments with variable memory
- Efficient methods for variable routing
- Storage requirements and hierarchy
- Time/Space tradeoffs for periodic sampling
- Generalize framework to define concepts such as
variable priorities, multiple debug levels, and
extensions to text-based languages
33Questions?