Title: DAC 2005 Session 352 Timing Driven Placement by GridWarping
1DAC 2005 Session 35-2Timing Driven Placement by
Grid-Warping
- Zhong Xiu, Rob A Rutenbar
- Department of Electrical and Computer Engineering
- Carnegie Mellon University
2Timing-Driven Placement
RTL/Logic Synthesis
- Despite 30 years of progress, still an important
problem - Why? Placement determines
- Your overall chip area
- Most of your max clock speed
- Timing very critical to target
- If you have miss timing specs.
- youve failed
Physical Synthesis
Courtesy Juergen Koehl, IBM
3Placement by Grid-Warping
- In Zhong et al, DAC04, we showed first
grid-warping placer - Fundamentally new idea for placement improvement
- Imagine we place the gates on the surface of a
flexible elastic sheet - We stretch the sheet to improve the placement
Quadratic Initial placement
Warp Placement surface
Improved warped result
Recurse descendto continue
4Grid Warping Attractive Features
- Novel paradigm for placement optimize the grid,
not the gates - Think gravity we reshape curvature of space
to move the mass - Flexibly nonlinear
- Free to warp anyway we like not driven primarily
by linear solves - Low-dimensional optimization problem
- We only need to control the sheet, we dont move
gates individually - Early prototype WARP1 performs well
- Competitive on wirelength other published placers
- As fast or faster than many other analytical
placer
5Organization of this Talk
- Whats missing? Timing-driven formulation of
grid-warping - Wirelength optimization is necessary but not
sufficient - We must be able to insert a warping placer in a
std timing flow - First, we review basic mechanics of grid-warping
- Second, we show some new wirelength optimizations
- Useful to combat inevitable degradation of
performance when we must optimize both wirelength
and timing concurrently - Finally, we show how to extend grid-warping for
timing - By adding slack-based net-weighting to warping
flow
6Review Mechanics of Grid-Warping
- Its conceptually useful to think of warping as
distorting a regular mesh placed on the elastic
placement surface - ..but this is not actually how we implement
warping
Quadratic Initial placement
Warp Placement surface
Improved warped result
Recurse descendto continue
7We Formulate Warping in an Inverse Way
- We warp to acquire a new set of gates in each
unit grid area - then pull gates back to the undistorted grid,
to move them
8And We Do Not Use a Regular Warping Grid
2x2 Warping grid
4x4 Warping grid
- Instead, we use a grid defined by a set of
slicing cuts - It turns out this allows a greater range of
motion for the gates - Yesa lot like quadrisection or partitioning, but
more general - The cuts need not be axis parallel
- Because gates are fully placed in each region, we
get real wirelength
9Complete Grid Warping Flow
- Complete flow has several steps
- We review them briefly here
10Complete Grid Warping Flow
- Quadratic place onto elastic sheet
- Note pure quadratic wirelength
- No reweighting steps
11Complete Grid Warping Flow
- Geometric pre-conditioning step
- Spreads gates out quickly, uniformly, to improve
final wirelen
12Complete Grid Warping Flow
- Nonlinear optimizer iteratively perturbs warping
grid on sheet
13Complete Grid Warping Flow
stretched
- Nonlinear optimizer iteratively perturbs warping
grid on sheet - ..each new warping is quickly stretched back to
a full placement - Use this to eval cost function, which tracks
rectilinear wirelen capacity
14Complete Grid Warping Flow
- Nonlinear optimizer delivers a final warped
placement - Standard improvement step runs hMetis to optimize
location of gates placed near partition cuts
15Complete Grid Warping Flow
- Recurse in this case, 4 new placements inside 4
regions - Continue until few gates/region
16Complete Grid Warping Flow
- Warping flow delivers a final, but still slightly
illegal, placement - Use Domino (T.U. Munich) to legalize to final
detailed placement
17Enhancements to Core Warping Flow
- Concern
- Addition of timing usually degrades both optimal
wirelength and the overall placer runtime - Can we do anything to mitigate this?
- Two efficiency improvements for grid-warping
- Improved QP step
- Re-warping stage
18Speed Improvement Improved QP
- We adopt the hybrid net model from FastPlace
- From Viswanathan, et al ISPD04
- Simple, elegant speedup heuristic for our QP
steps - Use idea in two places
- Initial QP step
- New re-warping step, to be described next
19Using the FastPlace Hybrid Net Model
- Simple, elegant idea
- Use clique model for low-fanout nets
- Use star model for higher-fanout nets
- Star model ? bigger matrix, but more sparse ?
faster to solve - Can speed-up QP by about 2X
- but QP is relatively minor part of warping, only
25 of total CPU
2
1
1
2
5
4
4
3
3
Clique Model
Star Model
20Wirelength Improvement Re-Warping
- New iterative local improvement step that targets
wirelength - After each new partition, for each 2x2 subgrid,
re-place all the gates - Inspired by Vygen DAC97
- We call this re-warping
- We actually do a new, local warping
21Mechanics of Re-Warping
- After each partition, walk 2x2 grid across
lowest-level partitions - Remove local partitioning, propagate outside
points to boundary - Formulate a local warping problem to re-place
gates better we hope - Accept re-warped solution only if local
wirelength improves
22Re-Warping and Nonlinear Convergence
- How to minimize the runtime cost of re-warping?
- Warping is intrinsically a nonlinear optimization
loop - Re-warps are small, but could still be costly
- Solution Shorter global warping runs
- Loosen global convergence tolerance, shorten
global runtime - Rely on local re-warping stages to buy us back
the local wirelength
Cost
Cost
Stop global warp sooner
Prior warping convergence tolerance
Local re-warp completes
Time
Time
23Improved QP Re-Warp WARP2
Re-Warping alone gives 2-3 shorter wirelen and
4 speed loss
adding hybrid net model preserves wirelenwith
10 overall speedup
- Benchmark is across full standard ISPD98 set of
18 IBM netlists
24WARP2 Comparisons Experimental Results
- Versus analytical engines Gordian (TU Munich)
mPL4 (UCLA) - WARP2 has competitive wirelength, and is 20-40
faster - Versus partition/anneal Capo (UCLA/UM), Dragon
(NWU,UCLA) - WARP2 has much better wirelength than Capo, but
is 2.5X slower - WARP2 has competitive wirelength with Dragon, but
is 3.5X faster
25Timing-Driven Grid Warping
- Our goal
- Extend warping algorithm to accommodate timing
optimizations - Support a standard net-based static-timing-driven
flow - Approach
- Static timing delay budgeting iterative net
re-weighting - Use recent slack sensitivity model from IBM Ren
et al, ISPD04 - Goal minimize the worst negative slack (WNS)
- Technical questions
- Where exactly in the warping flow do these net
weights appear? - How to transform them across various internal
steps of WARP2?
26Using Net-Weights in WARP2 Flow
27Overall Flow Timing-Driven Warping
- 1 Run WARP2 with uniform net weights
- 2 Run static timing to obtain timing info
- 3 Compute new weight for each net,using slack
sensitivity from Ren ISPD04 - 4 Run a second placement (WARP2) with net
weights to shrink the critical paths - 5 Use Domino to legalize
28Timing Model
Critical path(s)
Slack S
Net n
Bounding box
- Basic model
- We model delay proportional to bounding box
length of a net - Each net n has a weight Wn, we approximate
?Slack/?Wn - Given a placement and a worst case negative slack
(WNS), we calculate a best ?W to optimally
improve WNS - Flow infrastructure OpenAccess Gear project
- Open source static timer, database, technology
lib, benchmarks, etc - See Zhong et al, ISPD05 for more details
29Critical Path
30Timing-Driven WARP2 Preliminary Results
- OA Gear benchmarks
- Technology hypothetical lib with 250nm
parameters - Benchmarks 10 ISCAS89 sequential logic
benchmarks up to 12K - Results Timing-driven WARP2 vs Wirelength-only
WARP2 - WARP2 improves WNS by 36.5, with 1 wirelen
increase on avg - The cost in increased runtime is about 47
- (Domino is improving our wirelen, but degrading
our timing)
31Conclusions
- Grid warping a new model for placement
- Optimize the grid itself, not the gates
individually - New idea for placement improvement, with an
evolving formulation - Timing-Driven Placement WARP2 promising
- Integrated with OpenAccess database and OA Gear
Timer - Improve WNS dramatically, with modest increases
in wirelength CPU - First formulation is rather simple, but works
well - Whats next?
- A new backend tool and the new benchmarks from
ISPD05 - Extend formulation to handle macroblock placements