Title: Design Framework for Partial Run-Time FPGA Reconfiguration
1Design Framework for Partial Run-Time FPGA
Reconfiguration
- Chris Conger, Ann Gordon-Ross,
- and Alan D. George
- Presented by Abelardo Jara-Berrocal
- HCS Research Laboratory
- College of Engineering
- University of Florida
2Outline
- Introduction
- Partial Reconfiguration (PR) Overview
- Proposed Design Methodologies
- Framework analysis
- Conclusions
3Introduction Fully reconfigurable systems
Battery
FPGA
Config 1
Configuration lines
disabled
disabled
enabled
System controller
General purpose I/O
Config 2
enabled
disabled
Bitstreams storage
disabled
Required design
Shared memory
External I/O
Config 3
Config 1 Request
Config 2 Request
1. Device too small for complex designs
2. Big full bitstreams (long reconfiguration time)
3. Complete system operation is halted prior to
reconfiguration
Design station
4Introduction The Virtex 4 PR architecture
- Newer Xilinx FPGA families offer partial
reconfiguration feature - A rectangular region of the FPGA can be
reconfigured without affecting the remaining FPGA
area - System can continue operating without
interruption
)
Reconfigurable region 1
Reconfigurable region 2
5Introduction A sample PR architecture
Battery
FPGA
disabled
enabled
JTAG
Base system configuration
Bitstreams storage
enabled
External I/O
Reconfigurable area
Static area
Module A request
1. System controller does not need to be placed
in an external device
2. Access to fast Internal Configuration Access
Port (ICAP 32 bits, 100 MHz)
3. Smaller partial bitstreams
4. No need to halt complete system when
reconfiguring a module
5. Time multiplexing of FPGA resources, load and
unload HW modules on demand
6Introduction Current PR Design Flow
- Steps
- Partition the system into modules
- Define static modules and reconfigurable modules
- Decide the number of PR regions (PRRs)
- Decide PRR sizes, shapes and locations
- Map modules to PRRs
- Define PRR interfaces, instantiate slice macros
for PRR interfaces - Optimization problems
- Design partitioning
- Number of PRRs
- PRR sizes, shapes and locations
- Mapping PRMs to PRRs
- Type and placement of PRR interfaces
Design partitioning
Design floorplanning and budgeting
Static modules
Reconfigurable Modules (PRMs)
FPGA
Static region
2
of PRRs?
1
7Introduction Early Access PR Design Flow
- Introduced by Xilinx in FPL06
- Major improvements
- Automatic implementation scripts
- Rectangular regions (not full column
reconfiguration) - Static nets can cross reconfigurable regions
- Slice macros replace bus macros
- Partitioning and floorplanning steps are manually
executed - Design guidelines for these steps are not
provided
Placement and PRRs constraints
Reconfigurable design specifications
PRM Bitstreams
Xilinx PR Implementation Flow
Design floorplanning and budgeting
Design partitioning
(manual)
Full Initial Bistream
(automatic)
Potential for development of automatic CAD tools
8Introduction Current PR design tools limitations
- PR design is a very specialized task
- Only a physical level of support is provided
- Architectural knowledge of the target device is a
must - Not very flexible, many design constraints
- Partitioning and floorplanning steps are manually
executed - No performance sensitive design guidelines are
provided - No automatic heuristics based design flow is
available too - Lack of abstraction from low level details
discourages designers from using PR - Difficult for many end users
In this work, we will propose a taxonomy of PR
systems design flows and a efficient methodology
for each type.
9PR Overview Taxonomy of PR systems design flows
PR System Design Flow
Multipurpose
Special purpose
- Highly specialized systems design
- All PRMs that will exist on the system are known
at design time - Each PRR is independently optimized (size, shape,
location, interface) based on the PRMs that will
be mapped to it - Output is
- Floorplan defining a static region and a set of
optimized PRRs - The set of PRMs that can be placed in each PRR
(PRMs to PRRs mapping)
- Not optimized for a specific application
- PRMs required by the application are not known
when designing the base system - Goal is to design a flexible and reusable base
design that can be used for several different PR
systems - Base system designer defines a set of PRRs with
fixed shapes, sizes, locations and interfaces - Generated floorplan is used as input template for
the PRMs implementation
10Proposed Design Methodology Special-Purpose
- Partition the system into several hardware
modules - Synthesize the hardware modules
- Use a control flow graph (CFG) and a states table
to represent - Application states and the transitions between
them (execution path coverage) - Set of modules required in each application state
Lets see an example
11Proposed Design Methodology Special-Purpose
- Define region partitioning constraints
STATE MODULES
S1 A, B, C
S2 A, B, C, F
S3 A, B, C, G
S4 A, B, D
S5 A, B, E
S3
S2
C
F
S1
G
S4
D
S5
E
Establishing constraints
Reconfigurable
Static
1. A, B are present in all states (static
modules) 2. C, F, G and D are reconfigurable
modules (PRMs) 3. F and G are mutually
exclusive with respect to C (they can not be
placed in the same PRR than C) 4. F, G, D and E
can be placed in the same PRR 5. C, D and E can
be placed in the same PRR
12Proposed Design Methodology Special-Purpose
- Define the number of PRRs to be used
- Optimization variable
- Number is computed based on CFG and states table
1 ?
4 ?
PRRs
- Define a PRMs to PRRs mapping
- Optimization problem
- Combinatorial design space
- Design space is reduced usign design constraints
Static Region PRR 1 PRR 2
A, B C, D, E F, G
Possible solution (not necessarily the optimal)
13Proposed Design Methodology Special-Purpose
- And when do we size our PRRs?
- Dont worry, it is our next step ?
Module A
Module B
Required static region resources (Resources are
added)
Module C
Module D
Modules profile
Required PRR 1 Resources (Maximum of each
resource type)
Module E
Module F
Slices
BRAMs
DSP48s
Required PRR 2 Resources (Maximum of each
resource type)
Module G
14Proposed Design Methodology Special-Purpose
- Define the PRR sizes, shapes, locations inside
the FPGA fabric - Floorplanning optimization problem
- Proper metrics for PRR performance analysis are
required - Design guidelines for efficient PRR floorplanning
are also a necessity
PRR 1 Resources
PRR1
Static region
Final optimized custom base system floorplan
PRR 2 Resources
PRR2
FPGA
- Define PRR interfaces
- Place slice macros
Reconfigurable region with enough resources for
PRR1
We do the same for PRR2
15Proposed Design Methodology Special-Purpose
Custom base system
PRMs to PRRs mapping
- They are used as input files for the automatic
Xilinx PR Design Flow
16Proposed Design Methodology Special-Purpose
- Opportunity to automate this flow through design
tools - Optimization variables
- Number of PRRs
- PRRs sizes, shapes, and locations
- PRMs to PRRs mapping
- Other additional optimization variables can be
defined - Several possible cost functions
- Area wastage
- Power usage
- Application latency
- Throughput
-
17Framework analysis PRR Geometries
- PR system design flows require
- Proper metrics for PRR performance analysis
- Design guidelines for efficient PRR floorplanning
- Study of the effects of varying PRR shape over
- Maximum Clock Frequency
- Partial Bitstream Size
- Five separate test cores
- Beamforming (DSP/slice)
- CFAR (slice/memory)
- AES (register)
- ARM7 softcore (hybrid)
- Sine/Cosine LUT (memory)
- Performed on V4SX55 thus far
Aspect ratio PRR Height / PRR Width
18Framework analysis Beamforming (125 MHz, 40)
- 5022 slices
- 16 DSP48s
- 17 RAMB16s
- Baseline, non-PR performance 1614 kB, 127.845
MHz
Clock frequency (MHz)
Bitstream size (kB)
Aspect ratio
Aspect ratio
19Framework analysis CFAR (100 MHz, 16)
- 2610 slices
- 2 DSP48s
- 34 RAMB16s
- Baseline, non-PR performance 1001 kB, 103.616
MHz
Clock frequency (MHz)
Bitstream size (kB)
Aspect ratio
Aspect ratio
20Framework analysis AES (80 MHz, 13.75)
- 3634 slices
- 3943 registers
- 4 RAMB16s
- Baseline, non-PR performance 1393 kB, 80.483
MHz
Bitstream size (kB)
Clock frequency (MHz)
Aspect ratio
Aspect ratio
21Framework analysis ARM7 (40 MHz, 6.8)
- 1826 slices
- 16 DSP48s
- 10 RAMB16s
- Baseline, non-PR performance 872 kB, 40.985 MHz
Bitstream size (kB)
Clock frequency (MHz)
Aspect ratio
Aspect ratio
22Framework analysis Sine/Cosine LUT
- 107 slices
- 27 RAMB16s
- Baseline, non-PR performance 571 kB, 204.918
MHz
Bitstream size (kB)
Clock frequency (MHz)
Aspect ratio
Aspect ratio
23Framework analysis PRR Geometries
- Slice-intensive designs show best bitstream
size/clock frequency performance with aspect
ratio around 2-4 - Roughly equivalent to aspect ratio of the FPGA as
a whole - Non-slice intensive designs show best bitstream
performance with aspect ratio gtgt 4 - Due to columnar distribution of RAMB16/DSP48
resources on chip - Clock frequency relatively insensitive to aspect
ratio - Not shown in graph resource wastage also
improved - Results are more pronounced for high frequency
designs - However, aspect ratio not the only design
consideration - Placement on a chip relative to other regions,
pins, or resources may affect (restrict) choice
of PRR shape
24Conclusions - Contributions of this work
- Taxonomy for PR systems design flows and a design
methodology for efficient development of each
type - Identification of relevant optimization variables
and constraints - Number of PRRs, optimal mapping of PRMs to PRRs,
system floorplanning - Propose their incorporation in a future automatic
design tool - Study of the effects of varying PRR shape
- Maximum Clock Frequency
- Partial Bitstream Size
- Multiple classes of cores/designs
- Memory-intensive
- DSP-intensive
- Combinational Logic-intensive
- Register-intensive
- Etc.
- PRR floorplanning guidelines definitions and
delivery
25Questions