APLACE: A General and Extensible LargeScale Placer - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

APLACE: A General and Extensible LargeScale Placer

Description:

QOR target: 24-26% better than Capo v9r6 on all known benchmarks ... Mimicked fixed-block layout diagrams in the artificial benchmark creation ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 13
Provided by: sr5
Category:

less

Transcript and Presenter's Notes

Title: APLACE: A General and Extensible LargeScale Placer


1
APLACE A General and Extensible Large-Scale
Placer
  • Andrew B. Kahng Sherief Reda Qinke Wang
  • VLSI CAD Lab
  • UCSD CSE and ECE Departments
  • http//vlsicad.ucsd.edu
  • Currently on leave of absence at Blaze DFM, Inc.

2
Goals and Plan
  • Goals
  • Build a new placer to win the competition
  • Scalable, robust, high-quality implementation
  • Leave no stone unturned / QOR on the table
  • Plan and Schedule
  • Work within most promising framework APlace
  • 30 days for coding 30 days for tuning

3
Philosophy
  • Respect the competition
  • Well-funded groups with decades of experience
  • ABKGroups Capo, MLPart, APlace all unfunded
    side projects
  • No placement-related industry interactions
  • QOR target 24-26 better than Capo v9r6 on all
    known benchmarks
  • Nearly pulled out 10 days before competition
  • Work smart
  • Solve scalability and speed basics first
  • Slimmed-down data structure, -msse compiler
    options, etc.
  • Ordered list of 15 QOR ideas to implement
  • Daily regressions on all known benchmarks
  • Synthetic testcases to predict bb3, bb4, etc.

4
Implementation Framework
New APlace Flow
  • APlace weaknesses
  • Weak clustering
  • Poor legalization / detailed placement

Clustering
Adaptive APlace engine
Global Phase
Unclustering
  • New APlace
  • New clustering
  • Adaptive parameter setting for scalability
  • New legalization iterative detailed placement

Legalization
WS arrangement
Detailed Phase
Cell order polishing
Global moving
5
Clustering/Unclustering
  • A multi-level paradigm with clustering ratio ?
    10
  • Top-level clusters ? 2000
  • Similar in spirit to HuM04 and AlpertKNRV05

Algorithm Sketch
  • For each clustering level
  • Calculate the clustering score of each node to
    its
  • neighbors based on the number of connections
  • Sort all scores and process nodes in order as
    long as
  • cluster size upper bounds are not violated
  • If a nodes score needs updating then update
    score and
  • insert in order

6
Adaptive Tuning / Legalization
Adaptive Parameterization
  • Automatically decide the initial weight for the
    wirelength objective according to the gradients
  • Decrease wirelength weight based on the current
    placement process

Legalization
  • Sort all cells from left to right move each cell
    in order (or a group of cells) to the closest
    legal position(s)
  • Sort all cells from right to left move each cell
    in order (or a group of cells) to the closest
    legal position(s)
  • Pick the best of (1) and (2)

7
Detailed Placement
Whitespace Compaction
  • For each layout row
  • Optimally arrange whitespace to minimize
    wirelength while maintaining relative cell order.
    KahngTZ99, KahngRM04.

Cell Order Polishing
  • For a window of neighboring cells
  • Optimally arrange cell orders and whitespace to
    minimize wirelength

Global Moving
  • Optimally move a cell to a better available
    position to minimize wirelength

8
Parameterization and Parallelizing
Tuning Knobs
  • Clustering ratio, top-level clusters, cluster
    area constraints
  • Initial wirelength weight, wirelength weight
    reduction ratio
  • Max CG iterations for each wirelength weight
  • Target placement discrepancy
  • Detailed placement parameters, etc.

Resources
  • SDSC ROCKS Cluster 8 Xeon CPUs at 2.8GHz
  • Michigan Prof. Sylvesters Group 8 various CPUs
  • UCSD FWGrid 60 Opteron CPUs at 1.6GHz
  • UCSD VLSICAD Group 8 Xeon CPUs at 2.4GHz

Wirelength Improvement after Tuning 2-3
9
Artificial Benchmark Synthesis
  • Synthetic benchmarks to test code scalability
    and performance
  • Rapid response to broadcast of s00-nam.pdf
  • Created synthetic versions of bigblue3 and
    bigblue4 within 48 hours
  • Mimicked fixed-block layout diagrams in the
    artificial benchmark creation
  • This process was useful we identified (and
    solved) a problem with clustering in presence of
    many small fixed blocks

10
Results
11
Bigblue4 Placement
HPWL 833.21
12
Conclusions
  • ISPD05 an exercise in process and philosophy
  • At end, we were still 4 short of where we wanted
  • Not happy with how we handled 5-day time frame
  • Auto-tuning ? first results best results
  • During competition, wrote but then left out
    annealing DP improvements that gained another
    0.5
  • Students and IBM ARL did a really, really great
    job
  • Currently restoring capabilities (congestion,
    timing-driven, etc.) and cleaning (antecedents in
    Naylor patent)
Write a Comment
User Comments (0)
About PowerShow.com