Title: Physical Synthesis Comes of Age
1Physical Synthesis Comes of Age
Chuck Alpert, IBM Corp. Chris Chu, Iowa State
University Paul Villarrubia, IBM Corp.
2Physical Synthesis Family Tree
- Roles of layout as a parent
- Clean up the mess created by physical synthesis
- (Implement the netlist generated by physical
synthesis) - Provide guidance to physical synthesis
- so that it will do things right
- Is layout mature enough to serve the role?
- Is there still room for layout to grow?
Synthesis
Layout
Physical Synthesis
3New Requirements of Placement
- Super fast
- 4 to 8 million objects now
- Provide quick feedbacks to physical synthesis to
refine the netlist - Stable in handling incremental placement
- Physical synthesis constantly makes changes to
netlist - Flexible objective function
- Timing, Power, Routability
- Handle mixed-size modules
- Hierarchical design and use of IP blocks are
common
4Placement As a Baby
- Simulated annealing based placement
- Popularized by Timberwolf DAC-86
Greedy Algorithm Simulated Annealing
- You only have 1 chance.
- If you get stuck, I will terminate you!
- OK to make mistakes. Keep trying!
- Evaluation/Feedback is important.
- Strength
- Good quality for small designs
- Easy to consider different objective functions
- Handle incremental changes well
- Weakness
- Very slow crawling
- Non-trivial to handle modules of different sizes
5Placement As a Kid
- Min-cut placement (or Partitioning-based
placement) - An old idea Breuer, DAC-77
- Capo DAC-00 leverages breakthrough in
partitioning using multi-level technique (e.g.,
hMetis DAC-97, MLFM DAC-97) - Dragon ICCAD-00 combines hierarchical
partitioning with annealing - Strength
- Efficient and scalable
- Very good wirelength, but can we do better?
- Weakness
- More difficult to handle other objectives
- Not stable in handling incremental changes
- Not good in white space management
Circuit
Placement Region
6White Space in Min-Cut Placement
Capo (Min-Cut) adaptec2 HPWL9955
APlace (Analytical) adaptec2 HPWL8715
Courtesy IBM
7Placement Maturing
- Analytical placement
- Used by 4 of the top 5 placers in ISPD-05
Placement Contest - and the top 5 placers in ISPD-06 Placement
Contest - Strength
- Fastest and scalable
- Best wirelength
- Robust framework to incorporate different
objectives and constraints - Stable in handling incremental changes
- Good in white space management
- Why would analytical placement work so well?
- Can see the big picture
- Why was it not popular in the past?
- Hard to spread modules evenly in placement region
8Attempt Still Relying on Partitioning
- Gordian Global Optimization and Rectangle
Dissection TCAD-91 - Artificial center of mass constraints disturb
global optimal solution too drastically
Centers of mass
9Another Partitioning-based Spreading
- Quadratic optimization with quadrisection Vygen,
DAC-97
Courtesy IBM
10Spreading by Density-based Force
- Kraftwerk DAC-98
- Quadratic wirelength minimization
- Spread cells by additional forces
- Density-based force to push cells away from dense
to sparse region - Great idea
- Spread cells smoothly
- Very good wirelength
- But not too fast
- Constant force, hard to control convergence
- Density-based force expensive to compute
x
11Dramatic Speedup
- FastPlace ISPD-04
- repeat
- Solve quadratic program to minimize wirelength
? - Spread the cells ?
- until cell distribution is roughly even ?
- Reduce wirelength by iterative heuristic ?
- Hybrid Net Model
- Speed up solving of QP ?
- Cell Shifting
- Simple technique to compute spreading force ?
- Fast convergence due to the use of pseudo-net Hu
et al., ISPD-02 instead of constant force ? - Iterative Local Refinement
- More efficient than using QP to refine the
solution ? - Minimize wirelength based on linear objective ?
12Linearization of Quadratic Wirelength
- New Kraftwerk ICCAD-06
- BoundingBox net model for multi-pin nets
- Need to know the outmost pins of a net
- Accurately models HPWL
- Faster and less memory than clique model
- Two fundamental components of spreading force
- Hold force Constant force
- Move force Enforced by pseudo-net to fixed point
BoundingBox
Clique
13Relaxation Rather than Linearization
- RQL DAC-07
- Force Vector Modulation to FastPlace framework
- Currently fastest and best wirelength
Rank Modules based on the spreading force
magnitude
Spreading Force Magnitude
Module Index
Nullify the spreading force for top 5-10 of
modules
14An Alterative Analytical Approach
- APlace ISPD-04, mPL5 ISPD-05, NTUPlace3
ICCAD-06 - Log-sum-exponential function to approximate HPWL
- Naylor et al., US Patent 2001
- Density constraint is directed formulated into
the objective function - Very competitive wirelength and runtime
APlace NTUP3 mPL6 RQL
Wirelength Model Log-sum-exponential Log-sum-exponential Log-sum-exponential Quadratic
Spreading Force Density potential based Density potential based Density potential based Fixed-point based
Spreading Force Bell-shaped Bell-shaped Poisson smoothed Fixed-point based
Objective Function Non-linear Non-convex Non-linear Non-convex Non-linear Non-convex Quadratic
15Placement Getting Old or Still Young?
- Better approach than quadratic / analytical
approach? - Massive parallelism to speed up placement
- Better clustering technique
- Marco placement / floorplanning
- True timing driven placement
16Sufficient Parental Guidance?
- All physical synthesis gets from placement is
distance info - Physical synthesis has a distorted world view!
- Wirelength estimation is inaccurate
- (especially for nets with high pin count)
- Congestion estimation is inaccurate
- Area estimation is inaccurate
- Without buffering and gate sizing
- Timing estimation is very inaccurate
17Routing-Driven Physical Synthesis
- Need a more integrated approach
- Past Placement-Driven Physical Synthesis
- Future Routing-Driven Physical Synthesis
- Main obstacle
- Runtime
- Two possibilities
- 1. Construct Steiner trees to guide synthesis
and placement - 2. Perform global routing to guide synthesis and
placement
18Fast Steiner Tree Construction
- FLUTE (Fast LookUp Table Estimation) ICCAD 04,
ISPD 05 - An extremely fast and accurate rectilinear
Steiner Tree algorithm - Very suitable for VLSI applications
- Optimal up to degree 9, Very accurate up to
degree 100 - Over all 1.57 million nets in 18 IBM circuits
ISPD 98
RMST
RSTT
SPAN
BGA
BI1S
FLUTE
19Is Steiner Tree Sufficient?
- Steiner trees do not consider detour due to
routing congestion or buffering congestion - Can we predict the impact of congestion on
routing? - There is no way for generic estimators to
accurately estimate congestion of arbitrary
global routers!
Labyrinth(70) Labyrinth(50) Labyrinth(50) Chi Dispersion Chi Dispersion
cong cong match cong match
ibm01 238 268 54 122 44
ibm02 368 390 89 46 7
ibm03 247 214 47 1 0
ibm04 588 596 261 273 161
ibm06 367 391 81 9 1
ibm07 568 643 162 122 55
ibm08 486 655 138 30 18
ibm09 377 399 69 12 3
ibm10 501 376 93 27 16
match
Congestion by router 2
Congestion by router 1
20Traditional Global Routing
- Simultaneous approach (e.g., ILP)
- Very slow
- Sequential approach
- Net-by-net routing, Rip-up and Reroute
- Maze routing for a net Lees, Dijkstras,
A-search algorithms - Reasonably fast
- Reasonably good quality
- Is it good enough to handle the demand of
physical synthesis?
21Progresses in Global Routing
- Pattern Routing Kastner et al., ICCAD-00
- L-shaped, Z-shaped routes
- Faster
- Better cost functions for maze routing Hadsell
Madden, DAC-03 Pan Chu, ICCAD-06 - Reduce overflow significantly
- Congestion-driven Steiner tree construction Pan
Chu, ICCAD-06 - Much faster because of much less reliance on maze
routing - Negotiated Congestion by PathFinder FPGA-95
- Used by BoxRouter ICCAD-07, FGA ICCAD-07,
Archer ICCAD-07 - Excellent routing ability
- Very slow because it takes a long time to build
congestion history - Wanted Techniques that are both fast and high
quality
22What Should We Do Next?
- Integration of global routing into placement
- An initial attempt IPR DAC-07
- Integration of FastPlace, FastDP, FLUTE and
FastRoute - Significantly improves routability wirelength
in good runtime - Incorporate buffering and gate sizing into
integrated placement routing - Much more accurate timing information
- Should also help congestion and placement density
control - Integration with logic synthesis
- In other words, we need
- Better basic algorithms placement, Steiner
tree, global routing, buffering, gate sizing,
etc. - Clever ways of integration
- It is a (EDA) family problem. Lets work
together!
23Thank You