Large Scale Circuit Placement: Gap and Promise - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Large Scale Circuit Placement: Gap and Promise

Description:

Leading edge industrial placer. Component of Silicon Ensemble. Experimental ... The quality result of the same placer varies for circuits of similar size but ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 42
Provided by: cadlab
Category:

less

Transcript and Presenter's Notes

Title: Large Scale Circuit Placement: Gap and Promise


1
Large Scale Circuit Placement Gap and Promise
  • Jason Cong
  • UCLA VLSI CAD LAB1
  • Joint work with Chin-Chih Chang, Tim Kong,
    Michail Romesis, Joseph R. Shinnerl, Min Xie and
    Xin Yuan

2
Outline
  • Introduction
  • Gap Analysis of Existing Placement Algorithms
  • Scalable Paradigm Multilevel Placement

3
Why Still Placement Problem
  • True, it has been studied over 30 years, but
  • We need good solutions more then ever
  • One of most important steps in IC implementation
    flow
  • Directly defines interconnects
  • Difficult
  • Problem size grows 2X every 18-24 months
  • Moores Law
  • Cannot place hierarchically without quality
    degradation

4
Example of Logic Hierarchy in Final Layout
By courtesy of IBM (Tony Drumm)
5
Why Still Placement
  • True, it has been studied over 30 years, but
  • We need good solutions more then ever
  • One of most important steps in IC implementation
    flow
  • Directly defines interconnects
  • Difficult
  • Problem size grows 2X every 18-24 months
  • Moores Law
  • Cannot place hierarchically without quality
    degradation
  • We are not very good at it

6
Outline
  • Introduction
  • Gap Analysis of Existing Placement Algorithms
  • Scalable Paradigm Multilevel Placement

7
Motivation
  • Lack of significant progress in wirelength
    reduction
  • Rate of reduction is about 5-10 every 2-3 years
  • Latest developments in placement differ mainly in
    runtime
  • Most work compare only with known heuristics
  • Use real design based benchmarks
  • Use synthetic benchmarks
  • Little understanding about the divergence from
    the optimal

8
Placement Examples with Known Optimal Wirelength
Chang et al, 2003
  • Given a (real) netlist N
  • Construct netlist N with known opt. WL and match
    the net distribution of N

9
Placement Examples with Known Upperbounds Cong
et al, 2003
  • Limitations of PEKO
  • All the nets are local
  • Wirelength contribution by global connections in
    real designs can be significant

10
IllustrationPEKU Example Construction
Input t 64, D d235,d321,d47,d54,d62,
d71 ?0.2
W w1w30, w43, w53, w6 0,w7 2,w8 2,w91,
w100, w111, w121
Generate 28 2-pin optimally
Generate 16 3-pin optimally
Generate 5 3-pin randomly
Generate 6 4-pin optimally
Generate 1 4-pin randomly
Generate 4 5-pin optimally
Generate 2 6-pin optimally
Generate 1 7-pin optimally
Total WL 184
11
Studied Five State-of-the-Art Placers
  • Capo Caldwell et al, 2000
  • Based on multilevel partitioner
  • Aims to enhance the routability
  • Dragon Wang et al, 2000
  • Uses hMetis for initial partition
  • SA with bin-based swapping
  • mPL Chan et al, 2000
  • Nonlinear programming on the coarsest level
  • Discrete relaxation at finer levels
  • mPG Chang et al, 2002
  • Uses FC clustering and hierarchical density
    control
  • Incremental A-tree for routability
  • Qplace Cadence Inc.
  • Leading edge industrial placer
  • Component of Silicon Ensemble

12
Experimental Results on PEKO
  • Existing Algorithms can be 59 to 140 away from
    the optimal on PEKO
  • On Examples with pads
  • mPG and Qplace show improvement of 12 and 10
    repectively
  • Dragon, mPL, and Capo do not benefit much from
    the additional information
  • There is significant room for improvement in
    placement algorithms

13
Experimental Results on PEKO
  • Capo, QPlace and mPL scales well in runtime
  • Average solution quality of each tool shows
    deterioration by an additional 9 to 17 when the
    problem size increases by a factor of 10

14
Experimental Results on PEKU
  • The effectiveness of existing placers can vary
    significantly for circuits of similar size but
    different characteristics
  • Comparing QRs helps to identify the technique
    that works best under each scenario

QR (Placed Wirelength vs Upperbound) may not be
tight
15
High Interest in the Community
16
Timing-driven Placement Examples with Known
Optimal (TPEKO)
  • Obtain a placement for the circuit from any
    available tool
  • Perform timing analysis on the circuit
  • Create an artificial combinational path with
    equal or larger delay than the longest path
  • Guarantee the cells in the path are adjacent to
    each other
  • Make necessary modifications

17
Evaluating Timing-Driven Placement Algorithms
Using TPEKO
  • Evaluating two state-of-the-art FPGA placement
    algorithms
  • VPR Marquardt et al. 2000
  • PATH Kong 2002
  • Can be far away from the optimal for difficult
    examples
  • 35 on average
  • 54 in the worst case

18
Observations from Gap Analysis
  • Significant opportunity in placement
  • Existing algorithms may produce solutions far
    away from the optimal
  • The quality result of the same placer varies for
    circuits of similar size but different
    characteristic
  • Scalability problem in runtime and solution
    quality
  • Significant ROI
  • Benefit equal to one to two generations of
    process scaling
  • But without requiring multi-billion dollar
    investment (hopefully!)

19
Outline
  • Introduction
  • Gap Analysis of Existing Placement Algorithms
  • Scalable Paradigm
  • Timing Optimization
  • Routability Optimization
  • Concluding Remarks
  • Application
  • Multi-Million Gate FPGA Placement

20
Paradigm 2 Multilevel Placement
  • Coarsening build the hierarchy by recursive
    aggregation (generalized clustering)
  • Relaxation improve the placement at each level
    by localized optimization
  • Interpolation transfer coarse-level solution to
    adjacent, finer level (generalized declustering)
  • Multilevel Flow multiple traversals over
    multiple hierarchies (V-cycle variations)

21
Multi-Level Optimization Framework
  • Multilevel coarsening generates smaller problem
    sizes at coarser levels ? faster optimization at
    coarser levels
  • May explore different aspects of the solution
    space at different levels
  • Gradual refinement on good solutions from coarser
    levels is very efficient
  • Successful in many applications
  • Originally developed for PDEs
  • Recent success in VLSI CAD partitioning,
    placement, routing

22
Multilevel Coarse Placement
23
Multilevel Methods Coarsening by Recursive
Aggregation
  • Recursive aggregation defines the hierarchy.
  • Different aggregation algorithms can be used on
    different levels and/or in different V-cycles.
  • Clustering methods
  • First-Choice Clustering (hMetis Karypis 1999).
  • AMG based aggregation
  • An aggregate need not be a cluster. A cell can
    be fractionally associated to more than one
    aggregate

24
Multilevel Methods Relaxation(Intralevel
Optimization)
  • Iterative improvement at each level by fast,
    localized computation
  • Discrete permutation enumerations swapping
  • Unconstrained quadratic wirelength minimization
    on subsets
  • Network-flow based improvement on subsets (RDFL)
  • Local relaxation is sufficient. Global
    improvement comes from the multilevel hierarchy.
  • Relaxations at finer levels may be quite
    different, e.g., more discrete, than relaxations
    at coarser levels.

25
Relaxation on Local Subsets
Move the red cells to their optimal positions,
holding all other cells fixed and (perhaps)
ignoring overlap
Original Subnetlist with Subproblem
26
Example Goto-based Discrete Relaxation
  • Each cells optimal location is readily
    calculated when all other cells are held fixed.
  • Compute a chain A, B, C, D, E, whereB is a
    randomly selected neighbor of As optimal
    location, etc.
  • Examine all permutations of the chain and take
    the best one.
  • Problem the chain is not closed (A is not
    necessarily near any other cells optimal
    location).

27
Example Quadratic Relaxation on Noncontiguous
Subsets (QRS)
  • Select a subset M of cells to move
  • Identify other cells and pads, F, connected to M
    by nets in
  • Decouple the horizontal and vertical problems.
  • M is obtained as segments of length k along a DFS
    vertex traversal of the netlist

28
Solving the QRS subproblem
  • Problem formulation (horizontal case)
  • Iteratively solve the weighted quadratic
    minimization problem, using the current solution
    to determine the weight (as in Gordian-L)
  • May result in cell overlap!

29
Ripple-move legalization Hur and Lillis, 2000
Because many forms of subset relaxation ignore
overlap, post-relaxation cell swaps may be needed
to remove overlap.
30
Multilevel Methods Interpolation(Generalized
Declustering)
  • Goal transfer a partial solution from a coarser
    level to its adjacent finer level
  • Simplest approach place all components of a
    cluster at its center
  • Better approach place each component of an
    aggregate at the weighted average of the
    aggregates to which it is strongly connected.
  • Optionally impose constraints e.g., the average
    location of the components can be held fixed.

31
Interpolation (Declustering)
  • Use the same grid structure at each level
  • Variable cluster size (may be bigger than a bin)
    handled by hierarchical area density control
  • Multilevel SA engine SA engine starts with a low
    temperature at each level except the coarsest
    level

32
AMG-style Linear Interpolation
33
AMG-based Linear Interpolation A. Brandt 1986
constant
34
Iterated Multilevel Flow
Make use of placement solution from 1st V-cycle
First Choice (FC) clustering
35
Iterated Multilevel Flow
Iterated V-Cycles
F-Cycle
Backtracking V-Cycle
36
Sample Impact of the Multilevel Components to
mPLs overall quality
  • First-Choice Clustering 34 reduced WL
  • QRS Relaxation 56 reduced WL
  • AMG Interpolation 23 reduced WL
  • Iterated V-cycles 28 reduced WL

37
mPL 3.0 vs. mPL1.0 and Gordian-L
Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits

mPL1.0Dom. mPL1.0Dom. mPL1.0Dom. Gordian-LDom. Gordian-LDom.
Circuit Wirelength CPU time CPU time Wirelength CPU time
Ibm04 1.18 0.31 0.31 1.05 1.90
Ibm07 1.14 0.34 0.34 1.05 3.77
Ibm09 1.14 0.33 0.33 1.04 4.90
Ibm10 1.11 0.31 0.31 0.99 6.54
Ibm14 1.11 0.41 0.41 1.04 8.28
Ibm16 1.16 0.46 0.46 1.00 11.76
Ibm17 1.07 0.44 0.44 0.98 10.41
Ibm18 1.18 0.42 0.42 1.03 13.43
Average 1.14 0.38 0.38 1.02 7.62
(12 better than mPL1.0 with 2x longer runtime
2 better than Gordian-L and 7x faster)
38
mPL 3.0 vs. Capo 8.5 and Dragon
Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits

Capo 8.5 / mPL3.0 Capo 8.5 / mPL3.0 Dragon / mPL3.0 Dragon / mPL3.0
Circuit Wirelength CPU time Wirelength CPU time
Ibm04 1.12 0.53 0.97 3.03
Ibm07 1.12 0.60 0.95 3.33
Ibm09 1.12 0.67 1.01 5.40
Ibm10 1.10 0.55 0.99 4.70
Ibm14 1.08 0.57 0.95 3.02
Ibm16 1.06 0.54 0.90 6.83
Ibm17 1.10 0.43 0.98 6.82
Ibm18 1.10 0.43 0.96 6.10
Average 1.10 0.54 0.96 4.91
(10 better than Capo with 2x longer runtime4
worse than Dragon but 4x faster)
39
mPL3.0 vs. mPL1.0, Capo8.5, Dragon and Gordian-L
40
Extension Multilevel Mixed-size Placement
  • Simultaneous place big and small objects
  • Gradually fix the locations of big objects and
    generate overlap-free placement for big objects
    during multilevel placement

41
Example Final Placement of ibm02 by mPG-ms
42
Concluding Remarks
  • There is significant opportunity to improve the
    placement technologies
  • Multilevel placement is a promising scalable
    solution
Write a Comment
User Comments (0)
About PowerShow.com