Title: Sudarshan Banerjee, Elaheh Bozorgzadeh,
1Physically-aware HW-SW Partitioning For
Reconfigurable Architectures with Partial Dynamic
Reconfiguration
- Sudarshan Banerjee, Elaheh Bozorgzadeh,
- Nikil Dutt
- Center for Embedded Computer Systems (CECS)
- Donald Bren School of Information and Computer
Sciences - University of California, Irvine
- http//www.cecs.uci.edu/aces
Work partially supported by NSF grants
CCR-0203813, CCR-0205712
2Outline
- Introduction
- Problem overview
- Dynamically reconfigurable architecture
- Placement considerations in scheduling
- Detailed problem description
- Related work
- Approach
- Experiments
- Conclusion
3Introduction
Design Specification
Partial Dynamic Reconfiguration (RTR)
Partitioning
SW
HW
I/f
Reuse hardware ? More performance
.
M
HW
P
Co-design flow
Communication
4 Problem Overview
Task Dependency graph
HW-SW platform
Objective Minimize application execution
time (schedule length) (Map each task to
Hardware or Software)
5Dynamically Reconfigurable Architecture
Off-chip memory
On-chip shared memory
Height
CLB
Single context
Column-based Partial RTR
Tj
Width
6Criticality of linear placement simple example
T3
T1
T2
Width
C2
C1
C3
C4
T4
T1
T2
T3
Execution time
t2
T4
Simple example of infeasibility
7Criticality of linear placement simple example
Width
Width
C2
C1
C3
C4
C2
C1
C3
C4
T1
T2
T3
T1
T2
T3
Execution time
Execution time
t2
t2
T4
T4
Infeasible
Feasible
8Criticality of linear placement- detailed example
C2
C1
C3
C4
C5
C9
C6
C7
C8
1
2
3
4
5
6
7
8
9
T10
10
Simultaneous scheduling and placement to
guarantee feasibility
9Outline
- Introduction
- Problem overview
- Dynamically reconfigurable architecture
- Placement considerations in scheduling
- Detailed problem description
- Related work
- Approach
- Experiments
- Conclusion
10Detailed Problem Description
Key considerations for HW-SW partitioning
- Consider physical constraints (columnar task
placement) - as an integral part of problem
- Consider configuration prefetch to hide
reconfiguration overhead
- Consider multiple task implementation points
- compiler optimizations, heterogeneity
Dedicated resources (embedded multipliers)
Feasible, high-quality solutions (short schedule
length)
- For each task, suitable implementation point
is determined
- For each task, start of execution is
determined
- For each task implementation on HW, task
location is determined
11Related work
- Large body of work in HW-SW partitioning
- Gupta et al (93) Vahid et al (97)
Eles et al (97) Chatha et al (00) - ? NO partial RTR considerations
- Work on joint scheduling and placement for
dependency graphs - Fekete et al (DATE 01), Yuh et al (ICCAD
04) - ? Theoretical treatment (closer to
rectangle-packing) - NO configuration prefetch considerations
- Other work on hiding reconfiguration latency
- Configuration reuse, configuration caching
(Li et al, FCCM 00), etc - ? NO joint scheduling, placement
considerations
12Related Work in HW-SW partitioning for partial RTR
- Mei et al ProRISC
2000 - Genetic algorithm for HW-SW partitioning
with partial RTR - Columnar architecture
- No configuration prefetch considerations
- Jeong et al ASPDAC 2000
- ILP, heuristic for HW-SW partitioning with
partial RTR - Detailed configuration prefetch
considerations - Bottleneck of reconfiguration
mechanism - No placement considerations
13Outline
- Introduction
- Problem overview
- Dynamically reconfigurable architecture
- Placement considerations in scheduling
- Detailed problem description
- Related work
- Approach
- Experiments
- Conclusion
14Approach
- Exact formulation (ILP integer linear
programming)
- Key variables
- xi,j,k , ri,j,k Task i scheduled
(reconfigured) at time-step j, - placed on FPGA starting
from column k
- Constraints
- (traditional HW-SW partitioning contiguous
linear placement - configuration prefetch)
- Implementation on CPLEX- extremely slow
15KLFM quick overview
- KLFM (Kernighan-Lin/Fiduccia-Matheyes)-based
partitioning - Move-based approach
- Widely used in circuit partitioning
- Integration with scheduling for HW-SW
partitioning - Chatha et al, (00), Vallejo
et al, (03), etc
Kernel of KLFM-based approaches while (more
unlocked nodes) best Node SELECT NODE
MOVE AND LOCK (best Node) UPDATE
NEIGHBOURS (best Node) endwhile
16Modified KLFM for HW-SW partitioning with partial
RTR
- Additional considerations
- Linear placement
- Multiple implementation points
Modified KLFM kernel for partial RTR while (more
unlocked nodes) for each unlocked node
for each non-current implementation point of
node Calculate makespan by
physically-aware list-scheduling best Node
SELECT NODE MOVE AND LOCK (best Node,
implementation point) UPDATE NEIGHBOURS
(best Node) endwhile
17Physically-aware List-scheduling
- Each task bound to implementation point
- Priority of HW tasks based on physical
considerations
List-scheduling kernel for next task selection
for each schedulable task, // all parent
dependencies satisfied compute EST
(earliest start time of computation) EFT
(earliest finish time) EFT task execution
time Choose task that maximizes f (EST,
path length, width, EFT)
- EST computation embeds physical and
architectural constraints
f - A width - B EST C (path length) -
D EFT
18EST computation Example
Task 6 on SW
EXECUTE Task 1
PREFETCH gap
RECONFIG Task 5
HW-SW comm (6,5)
19EST computation approach
C1
C2
C5
C3
Time
C4
C6
First-fit placement
1
E2
E1
2
Find earliest slot task can be placed
3
R3
Find earliest slot reconfig. cntrl free
(and space available)
4
R4
5
Gap
If (reconfig. finish time lt parent finish)
EST parent finish time
6
E3
E4
(overhead hidden- possible gap)
7
Else EST reconfig finish time
8
9
20Heterogeneous Architecture
Off-chip memory
On-chip shared memory
Height
CLB
Heterogeneous
Single context
Column-based Partial RTR
Width
21Heterogeneity considerations
- Heterogeneous implementations often more
efficient - Synthesis of 2-dimensional DCT on
Virtex-II chip, XC2V2000 - Columnar placement and routing
constraints
- Homogenous implementation
64 MHz - Heterogeneous implementation
88 MHz - embedded multipliers (MULTX18) embedded
memory (BRAM)
- Limited resources, only at fixed columnar
locations
- Extending approach for heterogeneity
- Primary modification task placement
- Simple approach add type descriptor
for each column
22Outline
- Introduction
- Problem overview
- Dynamically reconfigurable architecture
- Placement considerations in scheduling
- Detailed problem description
- Related work
- Approach
- Experiments
- Conclusion
23Experimental Setup
- Application graph structures, and, large set of
problem - instances generated with TGFF
(Dick et al, CODES 98)
- Varying Graph size
- Varying In-degree/Out-degree
- Varying area constraints
- Multiple implementation points,
- Heterogeneity
24Experiments on feasibility
Placement-unaware
Placement-aware
Topt
Theu
Exact schedule length with area consideration
Exact schedule length with columnar placement
(new ILP)
25Heuristic quality (sample results)
LPF nodes on Longest Path First
Sample experiments for graphs with 60 tasks
26Heuristic quality (aggregate results)
Quality (Tlpf Theu)/Theu 100
Gain increases as area constraint increases
Gain increases as graph size increases
27Case study on JPEG encoding
- Basis for numerical data
- Synthesis under placement, routing
constraints on XC2V2000
- Execution times corresponding to 256 X 256
colour image
Aggregate unconstrained task area 11 columns
(homogenous) Area constraint 8 columns
28Execution time
Execution time O(minutes) for graphs with
100 nodes, 20 columns
180
Execution time (s)
90
45
50
10
100
20
Graph size (number of vertices)
Run-time measurements SunOS 5.8 with
502 MHz sparcv9 processor
29Conclusion
Contribution
- Comprehensive HW-SW partitioning for partial RTR
- Partitioning, Scheduling, Linear placement
- Multiple implementation points, Heterogeneity
- Reasonable run-time O(minutes) for graphs with
100s of tasks
- Current limitation of simple approach
- More sophisticated placement considerations
- Future work
- More investigations on multiple
implementations, heterogeneity
30Thank You !
- Questions/Comments?
- E-mail banerjee_at_uci.edu