Large Scale Circuit Placement: Gap and Promise - PowerPoint PPT Presentation

1 / 41

About This Presentation

Title:

Large Scale Circuit Placement: Gap and Promise

Description:

Leading edge industrial placer. Component of Silicon Ensemble. Experimental ... The quality result of the same placer varies for circuits of similar size but ... – PowerPoint PPT presentation

Number of Views:33

Avg rating:3.0/5.0

Slides: 42

Provided by: cadlab

Learn more at: http://cadlab.cs.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: Large Scale Circuit Placement: Gap and Promise

1
Large Scale Circuit Placement Gap and Promise

Jason Cong
UCLA VLSI CAD LAB1
Joint work with Chin-Chih Chang, Tim Kong,
Michail Romesis, Joseph R. Shinnerl, Min Xie and
Xin Yuan

2
Outline

Introduction
Gap Analysis of Existing Placement Algorithms
Scalable Paradigm Multilevel Placement

3
Why Still Placement Problem

True, it has been studied over 30 years, but
We need good solutions more then ever
One of most important steps in IC implementation
flow
Directly defines interconnects
Difficult
Problem size grows 2X every 18-24 months
Moores Law
Cannot place hierarchically without quality
degradation

4
Example of Logic Hierarchy in Final Layout
By courtesy of IBM (Tony Drumm)
5
Why Still Placement

True, it has been studied over 30 years, but
We need good solutions more then ever
One of most important steps in IC implementation
flow
Directly defines interconnects
Difficult
Problem size grows 2X every 18-24 months
Moores Law
Cannot place hierarchically without quality
degradation
We are not very good at it

6
Outline

Introduction
Gap Analysis of Existing Placement Algorithms
Scalable Paradigm Multilevel Placement

7
Motivation

Lack of significant progress in wirelength
reduction
Rate of reduction is about 5-10 every 2-3 years
Latest developments in placement differ mainly in
runtime
Most work compare only with known heuristics
Use real design based benchmarks
Use synthetic benchmarks
Little understanding about the divergence from
the optimal

8
Placement Examples with Known Optimal Wirelength
Chang et al, 2003

Given a (real) netlist N
Construct netlist N with known opt. WL and match
the net distribution of N

9
Placement Examples with Known Upperbounds Cong
et al, 2003

Limitations of PEKO
All the nets are local
Wirelength contribution by global connections in
real designs can be significant

10
IllustrationPEKU Example Construction
Input t 64, D d235,d321,d47,d54,d62,
d71 ?0.2
W w1w30, w43, w53, w6 0,w7 2,w8 2,w91,
w100, w111, w121
Generate 28 2-pin optimally
Generate 16 3-pin optimally
Generate 5 3-pin randomly
Generate 6 4-pin optimally
Generate 1 4-pin randomly
Generate 4 5-pin optimally
Generate 2 6-pin optimally
Generate 1 7-pin optimally
Total WL 184
11
Studied Five State-of-the-Art Placers

Capo Caldwell et al, 2000
Based on multilevel partitioner
Aims to enhance the routability
Dragon Wang et al, 2000
Uses hMetis for initial partition
SA with bin-based swapping
mPL Chan et al, 2000
Nonlinear programming on the coarsest level
Discrete relaxation at finer levels
mPG Chang et al, 2002
Uses FC clustering and hierarchical density
control
Incremental A-tree for routability
Qplace Cadence Inc.
Leading edge industrial placer
Component of Silicon Ensemble

12
Experimental Results on PEKO

Existing Algorithms can be 59 to 140 away from
the optimal on PEKO
On Examples with pads
mPG and Qplace show improvement of 12 and 10
repectively
Dragon, mPL, and Capo do not benefit much from
the additional information
There is significant room for improvement in
placement algorithms

13
Experimental Results on PEKO

Capo, QPlace and mPL scales well in runtime
Average solution quality of each tool shows
deterioration by an additional 9 to 17 when the
problem size increases by a factor of 10

14
Experimental Results on PEKU

The effectiveness of existing placers can vary
significantly for circuits of similar size but
different characteristics
Comparing QRs helps to identify the technique
that works best under each scenario

QR (Placed Wirelength vs Upperbound) may not be
tight
15
High Interest in the Community
16
Timing-driven Placement Examples with Known
Optimal (TPEKO)

Obtain a placement for the circuit from any
available tool
Perform timing analysis on the circuit
Create an artificial combinational path with
equal or larger delay than the longest path
Guarantee the cells in the path are adjacent to
each other
Make necessary modifications

17
Evaluating Timing-Driven Placement Algorithms
Using TPEKO

Evaluating two state-of-the-art FPGA placement
algorithms
VPR Marquardt et al. 2000
PATH Kong 2002
Can be far away from the optimal for difficult
examples
35 on average
54 in the worst case

18
Observations from Gap Analysis

Significant opportunity in placement
Existing algorithms may produce solutions far
away from the optimal
The quality result of the same placer varies for
circuits of similar size but different
characteristic
Scalability problem in runtime and solution
quality
Significant ROI
Benefit equal to one to two generations of
process scaling
But without requiring multi-billion dollar
investment (hopefully!)

19
Outline

Introduction
Gap Analysis of Existing Placement Algorithms
Scalable Paradigm
Timing Optimization
Routability Optimization
Concluding Remarks
Application
Multi-Million Gate FPGA Placement

20
Paradigm 2 Multilevel Placement

Coarsening build the hierarchy by recursive
aggregation (generalized clustering)
Relaxation improve the placement at each level
by localized optimization
Interpolation transfer coarse-level solution to
adjacent, finer level (generalized declustering)
Multilevel Flow multiple traversals over
multiple hierarchies (V-cycle variations)

21
Multi-Level Optimization Framework

Multilevel coarsening generates smaller problem
sizes at coarser levels ? faster optimization at
coarser levels
May explore different aspects of the solution
space at different levels
Gradual refinement on good solutions from coarser
levels is very efficient
Successful in many applications
Originally developed for PDEs
Recent success in VLSI CAD partitioning,
placement, routing

22
Multilevel Coarse Placement
23
Multilevel Methods Coarsening by Recursive
Aggregation

Recursive aggregation defines the hierarchy.
Different aggregation algorithms can be used on
different levels and/or in different V-cycles.
Clustering methods
First-Choice Clustering (hMetis Karypis 1999).
AMG based aggregation
An aggregate need not be a cluster. A cell can
be fractionally associated to more than one
aggregate

24
Multilevel Methods Relaxation(Intralevel
Optimization)

Iterative improvement at each level by fast,
localized computation
Discrete permutation enumerations swapping
Unconstrained quadratic wirelength minimization
on subsets
Network-flow based improvement on subsets (RDFL)
Local relaxation is sufficient. Global
improvement comes from the multilevel hierarchy.
Relaxations at finer levels may be quite
different, e.g., more discrete, than relaxations
at coarser levels.

25
Relaxation on Local Subsets
Move the red cells to their optimal positions,
holding all other cells fixed and (perhaps)
ignoring overlap
Original Subnetlist with Subproblem
26
Example Goto-based Discrete Relaxation

Each cells optimal location is readily
calculated when all other cells are held fixed.
Compute a chain A, B, C, D, E, whereB is a
randomly selected neighbor of As optimal
location, etc.
Examine all permutations of the chain and take
the best one.
Problem the chain is not closed (A is not
necessarily near any other cells optimal
location).

27
Example Quadratic Relaxation on Noncontiguous
Subsets (QRS)

Select a subset M of cells to move
Identify other cells and pads, F, connected to M
by nets in
Decouple the horizontal and vertical problems.
M is obtained as segments of length k along a DFS
vertex traversal of the netlist

28
Solving the QRS subproblem

Problem formulation (horizontal case)
Iteratively solve the weighted quadratic
minimization problem, using the current solution
to determine the weight (as in Gordian-L)
May result in cell overlap!

29
Ripple-move legalization Hur and Lillis, 2000
Because many forms of subset relaxation ignore
overlap, post-relaxation cell swaps may be needed
to remove overlap.
30
Multilevel Methods Interpolation(Generalized
Declustering)

Goal transfer a partial solution from a coarser
level to its adjacent finer level
Simplest approach place all components of a
cluster at its center
Better approach place each component of an
aggregate at the weighted average of the
aggregates to which it is strongly connected.
Optionally impose constraints e.g., the average
location of the components can be held fixed.

31
Interpolation (Declustering)

Use the same grid structure at each level
Variable cluster size (may be bigger than a bin)
handled by hierarchical area density control
Multilevel SA engine SA engine starts with a low
temperature at each level except the coarsest
level

32
AMG-style Linear Interpolation
33
AMG-based Linear Interpolation A. Brandt 1986
constant
34
Iterated Multilevel Flow
Make use of placement solution from 1st V-cycle
First Choice (FC) clustering
35
Iterated Multilevel Flow
Iterated V-Cycles
F-Cycle
Backtracking V-Cycle
36
Sample Impact of the Multilevel Components to
mPLs overall quality

First-Choice Clustering 34 reduced WL
QRS Relaxation 56 reduced WL
AMG Interpolation 23 reduced WL
Iterated V-cycles 28 reduced WL

37
mPL 3.0 vs. mPL1.0 and Gordian-L
Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits

mPL1.0Dom. mPL1.0Dom. mPL1.0Dom. Gordian-LDom. Gordian-LDom.
Circuit Wirelength CPU time CPU time Wirelength CPU time
Ibm04 1.18 0.31 0.31 1.05 1.90
Ibm07 1.14 0.34 0.34 1.05 3.77
Ibm09 1.14 0.33 0.33 1.04 4.90
Ibm10 1.11 0.31 0.31 0.99 6.54
Ibm14 1.11 0.41 0.41 1.04 8.28
Ibm16 1.16 0.46 0.46 1.00 11.76
Ibm17 1.07 0.44 0.44 0.98 10.41
Ibm18 1.18 0.42 0.42 1.03 13.43
Average 1.14 0.38 0.38 1.02 7.62
(12 better than mPL1.0 with 2x longer runtime
2 better than Gordian-L and 7x faster)
38
mPL 3.0 vs. Capo 8.5 and Dragon
Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits Uniform-Cell IBM/ISPD 98 Circuits

Capo 8.5 / mPL3.0 Capo 8.5 / mPL3.0 Dragon / mPL3.0 Dragon / mPL3.0
Circuit Wirelength CPU time Wirelength CPU time
Ibm04 1.12 0.53 0.97 3.03
Ibm07 1.12 0.60 0.95 3.33
Ibm09 1.12 0.67 1.01 5.40
Ibm10 1.10 0.55 0.99 4.70
Ibm14 1.08 0.57 0.95 3.02
Ibm16 1.06 0.54 0.90 6.83
Ibm17 1.10 0.43 0.98 6.82
Ibm18 1.10 0.43 0.96 6.10
Average 1.10 0.54 0.96 4.91
(10 better than Capo with 2x longer runtime4
worse than Dragon but 4x faster)
39
mPL3.0 vs. mPL1.0, Capo8.5, Dragon and Gordian-L
40
Extension Multilevel Mixed-size Placement

Simultaneous place big and small objects
Gradually fix the locations of big objects and
generate overlap-free placement for big objects
during multilevel placement

41
Example Final Placement of ibm02 by mPG-ms
42
Concluding Remarks