Accelerators for FPGA Placement - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Accelerators for FPGA Placement

Description:

4 outputs, 2 registered 2 non-registered. Courtesy: Richard Sevcik, Xilinx ... crossover volume moves higher. FPGA vs. ASIC Cost ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 40
Provided by: ACMU
Category:

less

Transcript and Presenter's Notes

Title: Accelerators for FPGA Placement


1
Accelerators for FPGA Placement
  • Pritha Banerjee
  • Advanced Computing Microelectronics Unit
  • Indian Statistical Institute, Kolkata

2
Outline of this talk
  • Introduction to FPGAs
  • Problem Formulation
  • Cone based FPGA Placement
  • Initial placement
  • Low temperature SA
  • Placement by Space Filling Curve
  • Generation of linear placement
  • Initial placement by SFC curves
  • Refinement by Low temperature SA
  • Concluding Remarks

3
Island-style FPGA Architecture
L LUT based Logic Block/Slice C Connection
Block S Switch Block
L
L
L
C
C
S
S
C
C
C
L
L
L
C
C
Logic Block
S
S
C
C
C
Connection Block
Programmable Connection Switch
L
L
L
C
C
Programmable Routing Switch
Switch Block
Array based FPGA Model
Short wire Segment
Long wire Segment
4
Simplified FPGA Logic Block/Slice
  • An FPGA slice has
  • 2 LUTs with 4, 5 or 6 inputs
  • 2 registers
  • Carry logic for fast adders
  • 4 outputs, 2 registered 2 non-registered

Slice 0
PRE
D
Q
CE
CLR
PRE
D
Q
CE
CLR
Courtesy Richard Sevcik, Xilinx
5
A Decade of Progress
1000x
  • 200x More Logic
  • Plus memory, µP etc.
  • 40x Faster
  • 50x Lower Power
  • 500x Lower Cost

Virtex-4
XC4000
Spartan
100x
CLB Capacity
Virtex-II
Speed
Virtex-II Pro
Power per MHz
Virtex
Price
Virtex-E
10x
Spartan-2
XC4000
Spartan-3
1x
'91 '92 '93 '94 '95 '96 '97
'98 '99 '00 '01 '02 '03 '04
Year
Courtesy Richard Sevcik, Xilinx
6
FPGA vs. ASIC Cost ASIC High volumes needed to
recover design cost
Total cost
ASIC cost/part is lower
ASIC Design Cost is much higher (and
increasing)!!
Volume
Courtesy Richard Sevcik, Xilinx
7
FPGA Design Flow
Circuit description (VHDL,schematic)
Synthesize ( technology map) to logic blocks
Place logic blocks in FPGA
Route connections between logic blocks
FPGA programming file
8
FPGA Placement Problem
  • Input A technology mapped netlist of
    Configurable Logic Blocks (CLB) realizing a
    given circuit.
  • Output CLB netlist placed in a two dimensional
    array of slots such that total
    wirelength is minimized.

i1
i2
i3
i4
1
2
3
4
5
6
7
8
Placement
9
10
f1
f2
FPGA
CLB Netlist
9
Problem Formulation
  • Given
  • Set of modules M m1, m2, .mn
  • Set of signals S s1, s2, .sq
  • Set of location L l1, l2, .lp, p ? M
  • ? mi ? M, there is a set of signals
  • ? si ? S, there is a set of modules
  • is said to be a signal net

Goal To assign each module mi ? M to a location
lj ? L such that the chosen objective function
is optimized.
10
Existing Approaches for FPGA Placement
  • VPR (1997)
  • Uses Simulated Annealing (SA)
  • Adaptive Annealing Schedule
  • Tabu Search Based Method(1999)
  • TCO (2004)
  • Temperature schedule and probability of
    acceptance derived from laws of thermodynamics
  • Force Directed Placement
  • Genetic Algorithm Based Placement
  • Partitioning and Clustering Based Techniques

11
Accelerators for FPGA Placement
Initial placement quality does affect the speed
of convergence in iterative refinement methods!
  • Cone based initial placement with iterative
    refinement by simulated annealing
  • Cost metrics
  • Low temperature Simulated Annealing
  • Placement by Space Filling Curves followed by low
    temp. SA

12
Part I ACone based Initial Placement for FPGAs
13
Motivation of our work
  • Salient features of previous approaches
  • initial placement done at random
  • improvement through iteration time-consuming
  • Our motivation Fast Placement method to
    accelerate the iterative phase without
    sacrificing quality
  • Approach
  • better initial placement using constructive
    method
  • quicker convergence of iterative phase

14
Our workflow
Technology mapped netlist
Cone based Initial Placement (Accelerator)
Initial Placement
Low temperature simulated annealing
Final Placement
15
Preliminaries
  • For a given CLB netlist, a graph G(V,E) is
    defined where
  • V v v is CLB / primary input (I) / primary
    output (O)
  • E ltvi, vjgt vi ? fanin(vj) and vj ?
    fanout(vi).
  • Cone(Oi) fi u ? a simple directed path
    from u to Oi in G
  • Bounding Box
  • A rectangular region containing all bj ?
    fanout(bi)
  • Cost BBcost (a la VPR)
  • Nnets - number of nets
  • bbx(i), bby(i) - horizontal vertical span of
    bounding box
  • q(i) - 1 , for nets with 3 or fewer terminals,
    increases till 2.79 for nets with 50 terminals
  • Cav,x(i), Cav,y(i) avg. channel capacities in
    x y direction

16
Algorithm Overview
  • Initial Placement
  • Generate a n ? n array , n sqrt(number of CLB)
  • Place each Oi at the boundary of the array at
    random
  • Trace cone(Oi) till all blocks bi are placed
  • Trace Cone
  • Find one bi ? fanin(bj) and bj not placed , bj
    is already placed
  • Place Block bi
  • Place Block
  • Find smallest rectangle enclosing bbfanin(bi) ?
    bbfanout(bi)
  • Find an optimal position(empty slot) within the
    bounding box
  • If there is no empty slot, extend bounding box
  • Repeat the process until an empty slot found

17
An Example
4
A Cone
7
9
8
2
11
16
0
15
3
2
14
1
9
1
3
2
8
0
2
5
2
0
4
9
16
8
4
7
1
8
11
3
9
14
15
16
Placement of a cone
6
Bounding box of net 9
18
A Running Example (contd.)
5
10
10
0
12
11
3
12
0
2
8
11
19
A Running Example (contd.)
6
13
0
14
15
13
fan-in fan-out of 13 0,14,15 6
20
Benchmark Circuit Details
Q (Quality) 100 . T (Time) 100 .
Algo VPR TCO
21
Experimental Results Initial cost
22
Remarks on Initial Temperature
  • Existing Approach (VPR)
  • High initial temperature when a random placement
    is given
  • Initially all moves are accepted
  • Tinit 20 ? where ? Std. Dev. of cost
    over Nblocks random moves, co-efficient derived
    empirically
  • range limit is set to maximum span of 2D array
  • Our Approach Low initial temperature
  • Need to generate low enough initial temp to
    match the better quality initial solution on the
    annealing curve
  • Not all moves are accepted
  • Tinit 0.025 ? , ? Std. Dev. of cost over
    Nblocks random moves, co-efficient derived
    empirically by us
  • range limit is set to 1

23
Experimental Results Accl LTSA vs. VPR
With Initial Temperature Tinit , adaptive
schedule of SA in VPR, our initial placement
converges very fast, while maintaining quality.
24
Experimental Results Accl LTSA vs. TCO
25
Summary of Results for MCNC Benchmarks
26
Accelerators for FPGA Placement
  • Cone based initial placement with iterative
    refinement by simulated annealing
  • Cost metrics
  • Low temperature Simulated Annealing
  • Initial placement by Space Filling Curves
    followed by Low Temp SA

27
Part IIFPGA Placement by Space Filling Curves
28
Motivation
  • Observations from previous work
  • Stochastic methods like Simulated Annealing,
    Genetic Algorithms yield good quality solutions
  • SA based techniques take enormous amount of time.
  • Our motivation
  • Development of much faster initial placement
    method to accelerate FPGA placement
  • Quality of the placement should be comparable to
    the SA based FPGA place and route tool VPR.

29
Our Workflow
Technology mapped netlist
Find a linear order of netlist blocks
Linear order of blocks
Accelerator
Place the linearly ordered list using Space
Filling Curves (Snake, Hilbert or Z Curve)
Low temperature simulated annealing
Final Placement
30
Step 1 Linear ordering of a netlist
  • 2D placement problem mapped to 1D placement
    problem
  • Requirement The CLBs of the netlist to be
    assigned to equally
  • spaced slots on a line such that the total
    wirelength is minimized.
  • Our Approach
  • Min-cut based partitions of at most two CLBs per
    partition.
  • Netlist Hypergraph is bi-partitioned recursively
    to obtain a
  • nearly linear order
  • A popular tool, hMetis, is used for hypergraph
    bi-partitioning.

Problem with current method The linear order
obtained here needs further refinement.
31
Step 2 Space-filling Curves(SFC)
  • Preliminaries
  • Provides a linear traversal or indexing of a
    multidimensional grid space
  • Commonly used to reduce a multi-dimensional
    problem to one dimensional problem.
  • Our objective
  • Reverse mapping the linear order (1D) onto a 2D
    grid
  • Exploit the locality preserving property of SFC

32
Step 2 Space-filling Curves (Contd..)
  • A sequence of 2D SFC of successive orders follow
    recursive framework.
  • Hilbert curve

Z Curve
33
Our Method by An Example
Step 1 Generation of nearly linear ordered CLB
netlist
1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16
Linear order
Step 2 Placement of linear list using Space
Filling Curve
Hilbert
Z
Snake
16
8
14
16
11
12
13
16
12
13
17
20
6
10
9
11
5
18
7
13
14
19
15
14
15
15
10
7
8
3
2
2
3
6
10
2
4
12
7
1
3
9
11
1
4
5
9
6
5
4
1
8
34
Step 3 Low Temp. SA
  • With Initial Temperature Tinit , adaptive
    schedule of SA in VPR,
  • our initial placement converges very fast, while
    maintaining quality.
  • Our Approach Low initial temperature
  • Need to generate low enough initial temp to
    match the better quality initial solution on the
    annealing curve
  • Not all moves are accepted
  • Tinit co-eff ? , ? Std. Dev. of cost
    over Nblocks random moves, co-efficient derived
    empirically by us
  • range limit is set to 1

35
Experimental Result Initial Cost
  • K height of snake curve
  • Time to place ordered blocks using space-filling
    curve is negligible
  • This placement can be refined by low temp. SA to
    obtain near-optimal quality

36
Experimental Result Final Cost after LTSA
37
Experimental Result Speed up by our method
38
Concluding Remarks
  • Extending our placement methods for
  • advanced FPGA architecture ( Xilinx Virtex)
  • having preplaced blocks like RAM,
  • microprocessors etc.
  • Development of placement method for
  • 3D FPGA architecture

39
Thank you
Write a Comment
User Comments (0)
About PowerShow.com