DAC 2005 Session 352 Timing Driven Placement by GridWarping - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

DAC 2005 Session 352 Timing Driven Placement by GridWarping

Description:

DAC 2005 Session 352 Timing Driven Placement by GridWarping – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 32
Provided by: ece17
Category:

less

Transcript and Presenter's Notes

Title: DAC 2005 Session 352 Timing Driven Placement by GridWarping


1
DAC 2005 Session 35-2Timing Driven Placement by
Grid-Warping
  • Zhong Xiu, Rob A Rutenbar
  • Department of Electrical and Computer Engineering
  • Carnegie Mellon University

2
Timing-Driven Placement
RTL/Logic Synthesis
  • Despite 30 years of progress, still an important
    problem
  • Why? Placement determines
  • Your overall chip area
  • Most of your max clock speed
  • Timing very critical to target
  • If you have miss timing specs.
  • youve failed

Physical Synthesis
Courtesy Juergen Koehl, IBM
3
Placement by Grid-Warping
  • In Zhong et al, DAC04, we showed first
    grid-warping placer
  • Fundamentally new idea for placement improvement
  • Imagine we place the gates on the surface of a
    flexible elastic sheet
  • We stretch the sheet to improve the placement

Quadratic Initial placement
Warp Placement surface
Improved warped result
Recurse descendto continue
4
Grid Warping Attractive Features
  • Novel paradigm for placement optimize the grid,
    not the gates
  • Think gravity we reshape curvature of space
    to move the mass
  • Flexibly nonlinear
  • Free to warp anyway we like not driven primarily
    by linear solves
  • Low-dimensional optimization problem
  • We only need to control the sheet, we dont move
    gates individually
  • Early prototype WARP1 performs well
  • Competitive on wirelength other published placers
  • As fast or faster than many other analytical
    placer

5
Organization of this Talk
  • Whats missing? Timing-driven formulation of
    grid-warping
  • Wirelength optimization is necessary but not
    sufficient
  • We must be able to insert a warping placer in a
    std timing flow
  • First, we review basic mechanics of grid-warping
  • Second, we show some new wirelength optimizations
  • Useful to combat inevitable degradation of
    performance when we must optimize both wirelength
    and timing concurrently
  • Finally, we show how to extend grid-warping for
    timing
  • By adding slack-based net-weighting to warping
    flow

6
Review Mechanics of Grid-Warping
  • Its conceptually useful to think of warping as
    distorting a regular mesh placed on the elastic
    placement surface
  • ..but this is not actually how we implement
    warping

Quadratic Initial placement
Warp Placement surface
Improved warped result
Recurse descendto continue
7
We Formulate Warping in an Inverse Way
  • We warp to acquire a new set of gates in each
    unit grid area
  • then pull gates back to the undistorted grid,
    to move them

8
And We Do Not Use a Regular Warping Grid
2x2 Warping grid
4x4 Warping grid
  • Instead, we use a grid defined by a set of
    slicing cuts
  • It turns out this allows a greater range of
    motion for the gates
  • Yesa lot like quadrisection or partitioning, but
    more general
  • The cuts need not be axis parallel
  • Because gates are fully placed in each region, we
    get real wirelength

9
Complete Grid Warping Flow
  • Complete flow has several steps
  • We review them briefly here

10
Complete Grid Warping Flow
  • Quadratic place onto elastic sheet
  • Note pure quadratic wirelength
  • No reweighting steps

11
Complete Grid Warping Flow
  • Geometric pre-conditioning step
  • Spreads gates out quickly, uniformly, to improve
    final wirelen

12
Complete Grid Warping Flow
  • Nonlinear optimizer iteratively perturbs warping
    grid on sheet

13
Complete Grid Warping Flow
stretched
  • Nonlinear optimizer iteratively perturbs warping
    grid on sheet
  • ..each new warping is quickly stretched back to
    a full placement
  • Use this to eval cost function, which tracks
    rectilinear wirelen capacity

14
Complete Grid Warping Flow
  • Nonlinear optimizer delivers a final warped
    placement
  • Standard improvement step runs hMetis to optimize
    location of gates placed near partition cuts

15
Complete Grid Warping Flow
  • Recurse in this case, 4 new placements inside 4
    regions
  • Continue until few gates/region

16
Complete Grid Warping Flow
  • Warping flow delivers a final, but still slightly
    illegal, placement
  • Use Domino (T.U. Munich) to legalize to final
    detailed placement

17
Enhancements to Core Warping Flow
  • Concern
  • Addition of timing usually degrades both optimal
    wirelength and the overall placer runtime
  • Can we do anything to mitigate this?
  • Two efficiency improvements for grid-warping
  • Improved QP step
  • Re-warping stage

18
Speed Improvement Improved QP
  • We adopt the hybrid net model from FastPlace
  • From Viswanathan, et al ISPD04
  • Simple, elegant speedup heuristic for our QP
    steps
  • Use idea in two places
  • Initial QP step
  • New re-warping step, to be described next

19
Using the FastPlace Hybrid Net Model
  • Simple, elegant idea
  • Use clique model for low-fanout nets
  • Use star model for higher-fanout nets
  • Star model ? bigger matrix, but more sparse ?
    faster to solve
  • Can speed-up QP by about 2X
  • but QP is relatively minor part of warping, only
    25 of total CPU

2
1
1
2
5
4
4
3
3
Clique Model
Star Model
20
Wirelength Improvement Re-Warping
  • New iterative local improvement step that targets
    wirelength
  • After each new partition, for each 2x2 subgrid,
    re-place all the gates
  • Inspired by Vygen DAC97
  • We call this re-warping
  • We actually do a new, local warping

21
Mechanics of Re-Warping
  • After each partition, walk 2x2 grid across
    lowest-level partitions
  • Remove local partitioning, propagate outside
    points to boundary
  • Formulate a local warping problem to re-place
    gates better we hope
  • Accept re-warped solution only if local
    wirelength improves

22
Re-Warping and Nonlinear Convergence
  • How to minimize the runtime cost of re-warping?
  • Warping is intrinsically a nonlinear optimization
    loop
  • Re-warps are small, but could still be costly
  • Solution Shorter global warping runs
  • Loosen global convergence tolerance, shorten
    global runtime
  • Rely on local re-warping stages to buy us back
    the local wirelength

Cost
Cost
Stop global warp sooner
Prior warping convergence tolerance
Local re-warp completes
Time
Time
23
Improved QP Re-Warp WARP2
Re-Warping alone gives 2-3 shorter wirelen and
4 speed loss
adding hybrid net model preserves wirelenwith
10 overall speedup
  • Benchmark is across full standard ISPD98 set of
    18 IBM netlists

24
WARP2 Comparisons Experimental Results
  • Versus analytical engines Gordian (TU Munich)
    mPL4 (UCLA)
  • WARP2 has competitive wirelength, and is 20-40
    faster
  • Versus partition/anneal Capo (UCLA/UM), Dragon
    (NWU,UCLA)
  • WARP2 has much better wirelength than Capo, but
    is 2.5X slower
  • WARP2 has competitive wirelength with Dragon, but
    is 3.5X faster

25
Timing-Driven Grid Warping
  • Our goal
  • Extend warping algorithm to accommodate timing
    optimizations
  • Support a standard net-based static-timing-driven
    flow
  • Approach
  • Static timing delay budgeting iterative net
    re-weighting
  • Use recent slack sensitivity model from IBM Ren
    et al, ISPD04
  • Goal minimize the worst negative slack (WNS)
  • Technical questions
  • Where exactly in the warping flow do these net
    weights appear?
  • How to transform them across various internal
    steps of WARP2?

26
Using Net-Weights in WARP2 Flow
27
Overall Flow Timing-Driven Warping
  • 1 Run WARP2 with uniform net weights
  • 2 Run static timing to obtain timing info
  • 3 Compute new weight for each net,using slack
    sensitivity from Ren ISPD04
  • 4 Run a second placement (WARP2) with net
    weights to shrink the critical paths
  • 5 Use Domino to legalize

28
Timing Model
Critical path(s)
Slack S
Net n
Bounding box
  • Basic model
  • We model delay proportional to bounding box
    length of a net
  • Each net n has a weight Wn, we approximate
    ?Slack/?Wn
  • Given a placement and a worst case negative slack
    (WNS), we calculate a best ?W to optimally
    improve WNS
  • Flow infrastructure OpenAccess Gear project
  • Open source static timer, database, technology
    lib, benchmarks, etc
  • See Zhong et al, ISPD05 for more details

29
Critical Path
  • Before and after

30
Timing-Driven WARP2 Preliminary Results
  • OA Gear benchmarks
  • Technology hypothetical lib with 250nm
    parameters
  • Benchmarks 10 ISCAS89 sequential logic
    benchmarks up to 12K
  • Results Timing-driven WARP2 vs Wirelength-only
    WARP2
  • WARP2 improves WNS by 36.5, with 1 wirelen
    increase on avg
  • The cost in increased runtime is about 47
  • (Domino is improving our wirelen, but degrading
    our timing)

31
Conclusions
  • Grid warping a new model for placement
  • Optimize the grid itself, not the gates
    individually
  • New idea for placement improvement, with an
    evolving formulation
  • Timing-Driven Placement WARP2 promising
  • Integrated with OpenAccess database and OA Gear
    Timer
  • Improve WNS dramatically, with modest increases
    in wirelength CPU
  • First formulation is rather simple, but works
    well
  • Whats next?
  • A new backend tool and the new benchmarks from
    ISPD05
  • Extend formulation to handle macroblock placements
Write a Comment
User Comments (0)
About PowerShow.com