Design Framework for Partial Run-Time FPGA Reconfiguration - PowerPoint PPT Presentation

About This Presentation
Title:

Design Framework for Partial Run-Time FPGA Reconfiguration

Description:

... region of the FPGA can be reconfigured without affecting the remaining FPGA area ... 4. No need to halt complete system when reconfiguring a module ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 26
Provided by: dral60
Category:

less

Transcript and Presenter's Notes

Title: Design Framework for Partial Run-Time FPGA Reconfiguration


1
Design Framework for Partial Run-Time FPGA
Reconfiguration
  • Chris Conger, Ann Gordon-Ross,
  • and Alan D. George
  • Presented by Abelardo Jara-Berrocal
  • HCS Research Laboratory
  • College of Engineering
  • University of Florida

2
Outline
  • Introduction
  • Partial Reconfiguration (PR) Overview
  • Proposed Design Methodologies
  • Framework analysis
  • Conclusions

3
Introduction Fully reconfigurable systems
Battery
FPGA
Config 1
Configuration lines
disabled
disabled
enabled
System controller
General purpose I/O
Config 2
enabled
disabled
Bitstreams storage
disabled
Required design
Shared memory
External I/O
Config 3
Config 1 Request
Config 2 Request
1. Device too small for complex designs
2. Big full bitstreams (long reconfiguration time)
3. Complete system operation is halted prior to
reconfiguration
Design station
4
Introduction The Virtex 4 PR architecture
  • Newer Xilinx FPGA families offer partial
    reconfiguration feature
  • A rectangular region of the FPGA can be
    reconfigured without affecting the remaining FPGA
    area
  • System can continue operating without
    interruption

)
Reconfigurable region 1
Reconfigurable region 2
5
Introduction A sample PR architecture
Battery
FPGA
disabled
enabled
JTAG
Base system configuration
Bitstreams storage
enabled
External I/O
Reconfigurable area
Static area
Module A request
1. System controller does not need to be placed
in an external device
2. Access to fast Internal Configuration Access
Port (ICAP 32 bits, 100 MHz)
3. Smaller partial bitstreams
4. No need to halt complete system when
reconfiguring a module
5. Time multiplexing of FPGA resources, load and
unload HW modules on demand
6
Introduction Current PR Design Flow
  • Steps
  • Partition the system into modules
  • Define static modules and reconfigurable modules
  • Decide the number of PR regions (PRRs)
  • Decide PRR sizes, shapes and locations
  • Map modules to PRRs
  • Define PRR interfaces, instantiate slice macros
    for PRR interfaces
  • Optimization problems
  • Design partitioning
  • Number of PRRs
  • PRR sizes, shapes and locations
  • Mapping PRMs to PRRs
  • Type and placement of PRR interfaces

Design partitioning
Design floorplanning and budgeting
Static modules
Reconfigurable Modules (PRMs)
FPGA
Static region
2
of PRRs?
1
7
Introduction Early Access PR Design Flow
  • Introduced by Xilinx in FPL06
  • Major improvements
  • Automatic implementation scripts
  • Rectangular regions (not full column
    reconfiguration)
  • Static nets can cross reconfigurable regions
  • Slice macros replace bus macros
  • Partitioning and floorplanning steps are manually
    executed
  • Design guidelines for these steps are not
    provided

Placement and PRRs constraints
Reconfigurable design specifications
PRM Bitstreams
Xilinx PR Implementation Flow
Design floorplanning and budgeting
Design partitioning
(manual)
Full Initial Bistream
(automatic)
Potential for development of automatic CAD tools
8
Introduction Current PR design tools limitations
  • PR design is a very specialized task
  • Only a physical level of support is provided
  • Architectural knowledge of the target device is a
    must
  • Not very flexible, many design constraints
  • Partitioning and floorplanning steps are manually
    executed
  • No performance sensitive design guidelines are
    provided
  • No automatic heuristics based design flow is
    available too
  • Lack of abstraction from low level details
    discourages designers from using PR
  • Difficult for many end users

In this work, we will propose a taxonomy of PR
systems design flows and a efficient methodology
for each type.
9
PR Overview Taxonomy of PR systems design flows
PR System Design Flow
Multipurpose
Special purpose
  • Highly specialized systems design
  • All PRMs that will exist on the system are known
    at design time
  • Each PRR is independently optimized (size, shape,
    location, interface) based on the PRMs that will
    be mapped to it
  • Output is
  • Floorplan defining a static region and a set of
    optimized PRRs
  • The set of PRMs that can be placed in each PRR
    (PRMs to PRRs mapping)
  • Not optimized for a specific application
  • PRMs required by the application are not known
    when designing the base system
  • Goal is to design a flexible and reusable base
    design that can be used for several different PR
    systems
  • Base system designer defines a set of PRRs with
    fixed shapes, sizes, locations and interfaces
  • Generated floorplan is used as input template for
    the PRMs implementation

10
Proposed Design Methodology Special-Purpose
  • Partition the system into several hardware
    modules
  • Synthesize the hardware modules
  • Use a control flow graph (CFG) and a states table
    to represent
  • Application states and the transitions between
    them (execution path coverage)
  • Set of modules required in each application state

Lets see an example
11
Proposed Design Methodology Special-Purpose
  • Define region partitioning constraints

STATE MODULES
S1 A, B, C
S2 A, B, C, F
S3 A, B, C, G
S4 A, B, D
S5 A, B, E
S3
S2
C
F
S1
G
S4
D
S5
E
Establishing constraints
Reconfigurable
Static
1. A, B are present in all states (static
modules) 2. C, F, G and D are reconfigurable
modules (PRMs) 3. F and G are mutually
exclusive with respect to C (they can not be
placed in the same PRR than C) 4. F, G, D and E
can be placed in the same PRR 5. C, D and E can
be placed in the same PRR
12
Proposed Design Methodology Special-Purpose
  • Define the number of PRRs to be used
  • Optimization variable
  • Number is computed based on CFG and states table

1 ?
4 ?
PRRs
  • Define a PRMs to PRRs mapping
  • Optimization problem
  • Combinatorial design space
  • Design space is reduced usign design constraints

Static Region PRR 1 PRR 2
A, B C, D, E F, G
Possible solution (not necessarily the optimal)
13
Proposed Design Methodology Special-Purpose
  • And when do we size our PRRs?
  • Dont worry, it is our next step ?

Module A
Module B
Required static region resources (Resources are
added)
Module C
Module D
Modules profile
Required PRR 1 Resources (Maximum of each
resource type)
Module E
Module F
Slices
BRAMs
DSP48s
Required PRR 2 Resources (Maximum of each
resource type)
Module G
14
Proposed Design Methodology Special-Purpose
  • Define the PRR sizes, shapes, locations inside
    the FPGA fabric
  • Floorplanning optimization problem
  • Proper metrics for PRR performance analysis are
    required
  • Design guidelines for efficient PRR floorplanning
    are also a necessity

PRR 1 Resources
PRR1
Static region
Final optimized custom base system floorplan
PRR 2 Resources
PRR2
FPGA
  • Define PRR interfaces
  • Place slice macros

Reconfigurable region with enough resources for
PRR1
We do the same for PRR2
15
Proposed Design Methodology Special-Purpose
  • Methodology outputs

Custom base system
PRMs to PRRs mapping
  • They are used as input files for the automatic
    Xilinx PR Design Flow

16
Proposed Design Methodology Special-Purpose
  • Opportunity to automate this flow through design
    tools
  • Optimization variables
  • Number of PRRs
  • PRRs sizes, shapes, and locations
  • PRMs to PRRs mapping
  • Other additional optimization variables can be
    defined
  • Several possible cost functions
  • Area wastage
  • Power usage
  • Application latency
  • Throughput

17
Framework analysis PRR Geometries
  • PR system design flows require
  • Proper metrics for PRR performance analysis
  • Design guidelines for efficient PRR floorplanning
  • Study of the effects of varying PRR shape over
  • Maximum Clock Frequency
  • Partial Bitstream Size
  • Five separate test cores
  • Beamforming (DSP/slice)
  • CFAR (slice/memory)
  • AES (register)
  • ARM7 softcore (hybrid)
  • Sine/Cosine LUT (memory)
  • Performed on V4SX55 thus far

Aspect ratio PRR Height / PRR Width
18
Framework analysis Beamforming (125 MHz, 40)
  • 5022 slices
  • 16 DSP48s
  • 17 RAMB16s
  • Baseline, non-PR performance 1614 kB, 127.845
    MHz

Clock frequency (MHz)
Bitstream size (kB)
Aspect ratio
Aspect ratio
19
Framework analysis CFAR (100 MHz, 16)
  • 2610 slices
  • 2 DSP48s
  • 34 RAMB16s
  • Baseline, non-PR performance 1001 kB, 103.616
    MHz

Clock frequency (MHz)
Bitstream size (kB)
Aspect ratio
Aspect ratio
20
Framework analysis AES (80 MHz, 13.75)
  • 3634 slices
  • 3943 registers
  • 4 RAMB16s
  • Baseline, non-PR performance 1393 kB, 80.483
    MHz

Bitstream size (kB)
Clock frequency (MHz)
Aspect ratio
Aspect ratio
21
Framework analysis ARM7 (40 MHz, 6.8)
  • 1826 slices
  • 16 DSP48s
  • 10 RAMB16s
  • Baseline, non-PR performance 872 kB, 40.985 MHz

Bitstream size (kB)
Clock frequency (MHz)
Aspect ratio
Aspect ratio
22
Framework analysis Sine/Cosine LUT
  • 107 slices
  • 27 RAMB16s
  • Baseline, non-PR performance 571 kB, 204.918
    MHz

Bitstream size (kB)
Clock frequency (MHz)
Aspect ratio
Aspect ratio
23
Framework analysis PRR Geometries
  • Slice-intensive designs show best bitstream
    size/clock frequency performance with aspect
    ratio around 2-4
  • Roughly equivalent to aspect ratio of the FPGA as
    a whole
  • Non-slice intensive designs show best bitstream
    performance with aspect ratio gtgt 4
  • Due to columnar distribution of RAMB16/DSP48
    resources on chip
  • Clock frequency relatively insensitive to aspect
    ratio
  • Not shown in graph resource wastage also
    improved
  • Results are more pronounced for high frequency
    designs
  • However, aspect ratio not the only design
    consideration
  • Placement on a chip relative to other regions,
    pins, or resources may affect (restrict) choice
    of PRR shape

24
Conclusions - Contributions of this work
  • Taxonomy for PR systems design flows and a design
    methodology for efficient development of each
    type
  • Identification of relevant optimization variables
    and constraints
  • Number of PRRs, optimal mapping of PRMs to PRRs,
    system floorplanning
  • Propose their incorporation in a future automatic
    design tool
  • Study of the effects of varying PRR shape
  • Maximum Clock Frequency
  • Partial Bitstream Size
  • Multiple classes of cores/designs
  • Memory-intensive
  • DSP-intensive
  • Combinational Logic-intensive
  • Register-intensive
  • Etc.
  • PRR floorplanning guidelines definitions and
    delivery

25
Questions
Write a Comment
User Comments (0)
About PowerShow.com