Interconnect Planning - PowerPoint PPT Presentation

1 / 59

About This Presentation

Title:

Interconnect Planning

Description:

SLIP Workshop web page at http://www.sliponline.org. 3 ... of planning for design closure ... SLIP Workshop web page at http://www.sliponline.org. 29. Why do we need ... – PowerPoint PPT presentation

Number of Views:29

Avg rating:3.0/5.0

Slides: 60

Provided by: Davi701

Category:

more less

Transcript and Presenter's Notes

Title: Interconnect Planning

1
Interconnect Planning

Prof. David Z. Pan
dpan_at_ece.utexas.edu
Office ACES 5.434

2
Outline

Introduction
Two representative works on Interconnect Planning
Interconnect performance estimation modeling
(IPEM Cong-Pan, TCAD01)
Buffer planning Cong-Kong-Pan, TVLSI01
Rents rule and SLIP workshop
SLIP stands for system level interconnect
prediction
SLIP Workshop web page at http//www.sliponline.or
g

3
Introduction

We have shown so far a lot of algorithms on
interconnect optimization
They work effectively, but
They need some kind of planning for design
closure
We have shown interconnect width planning
(architecture planning) Cong-Pan, DAC99
Interconnect performance estimation modeling
Cong-Pan, TCAD01
Buffer planning Cong-Kong-Pan, TVLSI01

4
Interconnect Optimization Not Enough

Need high level tools to cooperate
Interconnect synthesis capability is limited
without planning
Whats their limit, during early design planning?
Enough routing resources?
Locations for buffers for long interconnects?
......

5
Needs for Efficient Interconnect Performance
Models

Efficiency
Abstraction to hide detailed design information
granularity of wire segmentation
number of wire widths, buffer sizes, ...
Explicit relation to enable optimal design
decision at high levels
Ease of interaction with logic/high level
synthesis tools

6
Problem Formulation
G
l
CL

Rd0 driver effective resistance of G0
Rd driver effective resistance of G
l interconnect length
CL loading capacitance

7
Problem Formulation
G
l
G0
CL
Input
OWS, SDWS, BIWS, BISWS
What is the optimized delay? Do not run
optimization algorithm !
8
Example Delay/Area Est. under WS
9
Delay/Area Estimation under 1-WS

Closed-form delay formula

Closed-form area formula

10
Delay/Area Estimation under OWS

Closed-form delay estimation formula

where
,
W(x) is Lamberts W function defined as

Closed-form area estimation formula

11
Delay Comparison of Various WS Solutions

OWS model consistently matches TRIO
1-WS and 2-WS work well for length lt8mm in Tier1
All work well in Tier4 up to chip size

12
Average Width (Area) Comparison

Very close for the model

13
Average Width (Area) Comparison

Very close for 1-WS, 2-WS and OWS !

14
Property of DEM-OWS

Theorem Tows is a sub-quadratic, convex function
of length l
Note Without wiresizing, wiring delay ? l2, as
used by most layout-driven logic synthesis
systems, e.g.
Ramachandran et al., ICCAD-92
Chen-Tsai-Kurdahi, IEE Proc.-Circuits Device
System95
Closed-form DEM-OWS will serve as a basis for
deriving SDWS, BIWS and BISWS

15
Comparison of DEM-OWS vs. TRIO

0.18um, Rd rg /100, CL cg x 100
For expt., max wire width is 20x min, wire is
segmented in every 10um

16
Critical Length for BI under OWS
Solve for l, gt critical length lcrit (b, Rd ,
CL ) - Computed by bisection method - Constant
time in practice
17
Critical Length Meaning
18
Critical Lengths lcrit (b, Rb , Cb)
unit mm
- Denote lc lcrit (b, Rb , Cb)
19
Logic Volume within lc
- Defined as the number of min 2-input NAND gates
that can be packed within the area of lc/2
lc/2
unit million
20
Property of BIWS

Theorem For BIWS, the distances between adjacent
buffers are the same, and equal to lc -- the
critical length lcrit (b, Rb , Cb ) .
Proof is based on the convexity of Tows

l
21
Linear DEM for BIWS

Original long interconnect is divided into ?l/lc?
stage
The stage number is proportional to l
Each stage of length lc has delay Tows(Rb , lc,
Cb)
gt linear DEM for BIWS

22
Comparison of DEM-BIWS vs. TRIO

TRIO is an interconnect synthesis engine
Rd0 rg /10, CL cg x 10 , buffer type is 100
x min.
For expt., max. wire width is 20x min. width,
wire is segmented in every 100um.

23
Comparison of DEM-BIWS vs. TRIO

0.18um, Rd0 rg /10, CL cg x 10, buffer type
is 100 x min.
For expt., max. wire width is 20x min. width,
wire is segmented in every 100um.

24
DEM under BISWS

Observations from extensive experiments
Linear delay versus length
Internal buffers are about the same size
Therefore, we estimate BISWS by the best BIWS
from available buffer types

Complexity O(B). Since the set B is normally
less than 20, constant time in practice.

25
Comparison of DEM-BISWS vs. TRIO

0.18um, Rd0 rg /10, CL cg x 10
For expt., max. allowable buffer/driver size is
400x min device max. wire width is 20x min.
width wire is segmented in every 100um.

26
Multiple-Pin Nets (TAU99)
Cs1
G
S1
Sn
Csn
S2
Cs2

Estimation with different optimization
objectives
Minimize the delay to a single critical sink
(SCS)
Minimize the maximum delay (defined as the tree
delay) for multiple critical sinks (MCS)
Minimize weighted delay ...

27
Key Idea

Estimation for Single Critical Sink
We first formulate the original problem into a
single-line-multiple-load (SLML) problem
Then transform SLML into a single-line-single-load
(SLSL) problem
Use previous 2-pin results to estimate delay and
area on the critical path
Estimation for Multiple Critical Sinks
We obtain a lower bound delay estimation for the
optimal tree delay
We show that in practice, the above lower bound
estimation is tight and close to the optimal tree
delay

28
Outline

Introduction
Two representative works on Interconnect Planning
Interconnect performance estimation modeling
(IPEM Cong-Pan, TCAD01)
Buffer block planning during floorplanning
Cong-Kong-Pan, TVLSI01
Rents rule and SLIP workshop
SLIP stands for system level interconnect
prediction
SLIP Workshop web page at http//www.sliponline.or
g

29
Why do we need Buffer Planning?
soft block
Hard (IP) block

Many buffers in modern designs
easily 10-20 cells now,
projected to be 70 in future uP according to an
Intel paper ISPD03
Restriction from hard IP blocks
Impact on floorplan and placement
gt need to plan ahead to ensure timing/design
convergence.

30
Limitation of Previous Works

Buffer Insertion
mostly done in a net by net manner after detailed
placement
mostly no obstacles (hard IP blocks, etc)
considered
no global buffer planning (only manual or
semi-manual planning)
buffers are distributed in almost random manner
across the entire chip

31
Buffer Block Planning with Floorplanning

Given initial floorplan and performance
constraint for each net

Output optimal location/dimension of buffer
blocks such that the overall chip area and the
number of buffer blocks are minimized

32
Feasible Region for BI

Feasible region is the maximal region that a
buffer can be placed to meet given delay
constraint

1 buffer
driver
CL
k buffers
driver
CL
33
Feasible Region for One Buffer
Cb
Rb
Rd
Tb
l
CL
x
34
Feasible Region for One Buffer

We obtain closed-form formula of FR for inserting
one buffer to meet delay constraint

35
KEY Observation for FR

Even under tight delay constraint, FR for BI can
still be very large!

gt FR provides a lot of flexibility to plan
buffer location
36
Extension I FR for Multiple Buffers
k
1
i
Rd
Rb
Cb
Tb
xi
CL

More complicated, but still closed-form solution
for FR
We also obtain the minimum number of buffers kmin
needed to meet delay constraint

37
Extension II 2D Feasible Region

FR extended to 2-dimension with obstacles

sink
source
38
Overall Picture BBP for Interconnect-Driven
Floorplanning

For each floorplan (FL) configuration
Apply BBP on the given FL
Evaluate resulting FL in terms of timing, area,
BB trade-off, etc.
Return the best FL solution

39
Experimental Setting

Two Algorithms
RDM no buffer planning, i.e., a buffer is
randomly placed to any feasible location
BBP buffer block planning
Two Scenarios
RES restricted, delay minimal BI position(s)
FR feasible region
6 MCNC 5 randomly generated circuits (0.18um
tech)
Delay budget randomly assigned to be 1 to 1.2 x
Topt

40
Nets That Meet Delay Target
FR provides a lot more flexibility than RES to
better meet delay target (e.g., to avoid
obstacles during BI)
41
Comparison of BB
BBP reduces BB from RDM by a factor of up to
3x BBP/FR further reduces BB from BBP/RES by up
to 34
42
Normalized Total Chip Area after BI
BBP/FR can effectively cluster more individual
buffers and put into dead area ( up to 7 area
saving)
43
Summary of Experimental Results
BBP/FR provides the best solution.
44
Summary and Recent Trends

A key concept here is the feasible region (FR)
FR provides a lot more flexibility to better meet
delay constraint and plan buffer locations
Many follow-up and related works
You can do an advanced search by typing buffer
and planning and 2003 under IEEE Xplorer
18 papers in 2003
16 papers in 2002
Independent feasible region Kohs group, ISPD
2000
Buffer site, even inside macros Alpert et al,
DAC 2001
With noise consideration Li et al, ASPDAC03
With congestion/routability consideration Ma et
al, DAC03 Sham and Young, TCAD03
Buffer planning should be more important with
growing number of buffers

45
A Priori System-LevelInterconnect
PredictionRents Rule and Wire Length
Distribution Models
Thanks to Dirk Stroobandt Ghent University
46
Why A Priori Interconnect Prediction?

Interconnect importance of wires increases (they
do not scale as components).
A priori
For future designs, very little is known.
The sooner information is available, the better.
A Priori Interconnect Prediction estimating
interconnect properties and their consequences
before any layout step is performed.
Extrapolation to future systems Roadmaps.
To improve CAD tools for design layout
generation.
To evaluate new computer architectures.

47
The Three Basic Models
Circuit model
Logic block
Net
Terminal / pin
48
Rents Rule
Rents rule was first described by Landman and
Russo in 1971. For average number of terminals
and blocks per module in a partitioned design
p Rent exponent
t ? average term./block
Measure for the complexity of the interconnection
topology Intrinsic Rent exponent p
(simple) 0 ? p ? 1 (complex)
Normal values 0.5 ? p ? 0.75

B. S. Landman and R. L. Russo. On a pin versus
block relationship for partitions of logic
graphs. IEEE Trans. on Comput., C-20, pp.
1469-1479, 1971.

49
Rents Rule (cont.)
Rents rule is a result of the self-similarity
within circuits
Assumption the complexity of the interconnection
topology is equal at all levels.
50
Rents Rule (other definition)
(Dense) region B cells,
T terminals
If ?B cells are added, what is the increase
?T? In the absence of any other information we
guess
Overestimate many of ?T terminals connect to T
terminals and so do not contribute to the
total. We introduce a factor p (p lt1) which
indicates how self-connected the netlist is
placement optimization
?T
B
?B
T
Statistically homogenous system
Or, if ?B ?T are small compared to B and T

P. Christie and D. Stroobandt. The
Interpretation and Application of Rents Rule.
IEEE Trans. on VLSI Systems, Special Issue on
SLIP, vol. 8 (no. 6), pp. 639-648, Dec. 2000.

51
Rents Rule (summary)
p
T t B
Rents rule is experimentally validated for a lot
of benchmarks.

Distinguish between
p intrinsic Rent exponent
p placement Rent exponent
p partitioning Rent exponent

average
Deviation for high B and T Rents region
II Also deviation for low B and T Rent region
III
Rents rule

B. S. Landman and R. L. Russo. On a pin versus
block relationship for partitions of logic
graphs. IEEE Trans. on Comput., C-20, pp.
1469-1479, 1971.

D. Stroobandt. On an efficient method for
estimating the interconnection complexity of
designs and on the existence of region III in
Rents rule. Proc. GLSVLSI, pp. 330-331, 1999.

52
Wirelength Estimation
1. Partition the circuit into 4 modules of equal
size such that Rents rule applies (minimal
number of pins).
2. Partition the Manhattan grid in 4 subgrids of
equal size in a symmetrical way.

W. E. Donath. Placement and Average
Interconnection Lengths of Computer Logic. IEEE
Trans. on Circuits Syst., vol. CAS-26, pp.
272-277, 1979.

53
Donaths Hierarchical Placement Model
3. Each subcircuit (module) is mapped to a
subgrid.
4. Repeat recursively until all logic blocks are
assigned to exactly one grid cell in the
Manhattan grid.
54
Donaths Length Estimation Model

At each level Rents rule gives number of
connections
number of terminals per module directly from
Rents rule (partitioning based Rent exponent
p)
number of nets cut at level k (Nk) equals
where ? depends on the total number of nets in
the circuit and is bounded by 0.5 and 1.

55
Donaths Length Estimation Model
Length of the connections at level k ?
Adjacent (A-) combination
Diagonal (D-) combination
?
Donath assumes all connection source and
destination cells are uniformly distributed over
the grid.
56
Results Donath
Scaling of the average length L as a function of
the number of logic blocks G
Similar to measurements on placed designs.
57
Results Donath
Theoretical average wire length too high by
factor of 2
58
Occupation Probability Function
Same result found by using a terminal
conservation technique

-
-
TA?C

-
-
TAB
TBC
TB
TABC

Assumption net cannot connect A,B, and C

J. A. Davis et al. A Stochastic Wire-length
Distribution for Gigascale Integration (GSI) -
PART I Derivation and Validation. IEEE Trans. on
Electron Dev., 45 (3), pp. 580 - 589, 1998.

59
Occupation Probability Function
For cells placed in infinite 2D plane
60
Occupation Probability Results