Title: Interconnect Planning
1Interconnect Planning
- Prof. David Z. Pan
- dpan_at_ece.utexas.edu
- Office ACES 5.434
2Outline
- Introduction
- Two representative works on Interconnect Planning
- Interconnect performance estimation modeling
(IPEM Cong-Pan, TCAD01) - Buffer planning Cong-Kong-Pan, TVLSI01
- Rents rule and SLIP workshop
- SLIP stands for system level interconnect
prediction - SLIP Workshop web page at http//www.sliponline.or
g
3Introduction
- We have shown so far a lot of algorithms on
interconnect optimization - They work effectively, but
- They need some kind of planning for design
closure - We have shown interconnect width planning
(architecture planning) Cong-Pan, DAC99 - Interconnect performance estimation modeling
Cong-Pan, TCAD01 - Buffer planning Cong-Kong-Pan, TVLSI01
4Interconnect Optimization Not Enough
- Need high level tools to cooperate
- Interconnect synthesis capability is limited
without planning - Whats their limit, during early design planning?
- Enough routing resources?
- Locations for buffers for long interconnects?
- ......
5Needs for Efficient Interconnect Performance
Models
- Efficiency
- Abstraction to hide detailed design information
- granularity of wire segmentation
- number of wire widths, buffer sizes, ...
- Explicit relation to enable optimal design
decision at high levels - Ease of interaction with logic/high level
synthesis tools
6Problem Formulation
G
l
CL
- Rd0 driver effective resistance of G0
- Rd driver effective resistance of G
- l interconnect length
- CL loading capacitance
7Problem Formulation
G
l
G0
CL
Input
OWS, SDWS, BIWS, BISWS
What is the optimized delay? Do not run
optimization algorithm !
8Example Delay/Area Est. under WS
9Delay/Area Estimation under 1-WS
- Closed-form delay formula
10Delay/Area Estimation under OWS
- Closed-form delay estimation formula
where
,
W(x) is Lamberts W function defined as
- Closed-form area estimation formula
11Delay Comparison of Various WS Solutions
- OWS model consistently matches TRIO
- 1-WS and 2-WS work well for length lt8mm in Tier1
- All work well in Tier4 up to chip size
12Average Width (Area) Comparison
13Average Width (Area) Comparison
- Very close for 1-WS, 2-WS and OWS !
14Property of DEM-OWS
- Theorem Tows is a sub-quadratic, convex function
of length l - Note Without wiresizing, wiring delay ? l2, as
used by most layout-driven logic synthesis
systems, e.g. - Ramachandran et al., ICCAD-92
- Chen-Tsai-Kurdahi, IEE Proc.-Circuits Device
System95 - Closed-form DEM-OWS will serve as a basis for
deriving SDWS, BIWS and BISWS
15Comparison of DEM-OWS vs. TRIO
- 0.18um, Rd rg /100, CL cg x 100
- For expt., max wire width is 20x min, wire is
segmented in every 10um
16Critical Length for BI under OWS
Solve for l, gt critical length lcrit (b, Rd ,
CL ) - Computed by bisection method - Constant
time in practice
17Critical Length Meaning
18Critical Lengths lcrit (b, Rb , Cb)
unit mm
- Denote lc lcrit (b, Rb , Cb)
19Logic Volume within lc
- Defined as the number of min 2-input NAND gates
that can be packed within the area of lc/2
lc/2
unit million
20 Property of BIWS
- Theorem For BIWS, the distances between adjacent
buffers are the same, and equal to lc -- the
critical length lcrit (b, Rb , Cb ) . - Proof is based on the convexity of Tows
l
21Linear DEM for BIWS
- Original long interconnect is divided into ?l/lc?
stage - The stage number is proportional to l
- Each stage of length lc has delay Tows(Rb , lc,
Cb) - gt linear DEM for BIWS
22Comparison of DEM-BIWS vs. TRIO
- TRIO is an interconnect synthesis engine
- Rd0 rg /10, CL cg x 10 , buffer type is 100
x min. - For expt., max. wire width is 20x min. width,
wire is segmented in every 100um.
23Comparison of DEM-BIWS vs. TRIO
- 0.18um, Rd0 rg /10, CL cg x 10, buffer type
is 100 x min. - For expt., max. wire width is 20x min. width,
wire is segmented in every 100um.
24DEM under BISWS
- Observations from extensive experiments
- Linear delay versus length
- Internal buffers are about the same size
- Therefore, we estimate BISWS by the best BIWS
from available buffer types
- Complexity O(B). Since the set B is normally
less than 20, constant time in practice.
25Comparison of DEM-BISWS vs. TRIO
- 0.18um, Rd0 rg /10, CL cg x 10
- For expt., max. allowable buffer/driver size is
400x min device max. wire width is 20x min.
width wire is segmented in every 100um.
26Multiple-Pin Nets (TAU99)
Cs1
G
S1
Sn
Csn
S2
Cs2
- Estimation with different optimization
objectives - Minimize the delay to a single critical sink
(SCS) - Minimize the maximum delay (defined as the tree
delay) for multiple critical sinks (MCS) - Minimize weighted delay ...
27Key Idea
- Estimation for Single Critical Sink
- We first formulate the original problem into a
single-line-multiple-load (SLML) problem - Then transform SLML into a single-line-single-load
(SLSL) problem - Use previous 2-pin results to estimate delay and
area on the critical path - Estimation for Multiple Critical Sinks
- We obtain a lower bound delay estimation for the
optimal tree delay - We show that in practice, the above lower bound
estimation is tight and close to the optimal tree
delay
28Outline
- Introduction
- Two representative works on Interconnect Planning
- Interconnect performance estimation modeling
(IPEM Cong-Pan, TCAD01) - Buffer block planning during floorplanning
Cong-Kong-Pan, TVLSI01 - Rents rule and SLIP workshop
- SLIP stands for system level interconnect
prediction - SLIP Workshop web page at http//www.sliponline.or
g
29Why do we need Buffer Planning?
soft block
Hard (IP) block
- Many buffers in modern designs
- easily 10-20 cells now,
- projected to be 70 in future uP according to an
Intel paper ISPD03 - Restriction from hard IP blocks
- Impact on floorplan and placement
- gt need to plan ahead to ensure timing/design
convergence.
30Limitation of Previous Works
- Buffer Insertion
- mostly done in a net by net manner after detailed
placement - mostly no obstacles (hard IP blocks, etc)
considered - no global buffer planning (only manual or
semi-manual planning) - buffers are distributed in almost random manner
across the entire chip
31Buffer Block Planning with Floorplanning
- Given initial floorplan and performance
constraint for each net
- Output optimal location/dimension of buffer
blocks such that the overall chip area and the
number of buffer blocks are minimized
32Feasible Region for BI
- Feasible region is the maximal region that a
buffer can be placed to meet given delay
constraint
1 buffer
driver
CL
k buffers
driver
CL
33Feasible Region for One Buffer
Cb
Rb
Rd
Tb
l
CL
x
34Feasible Region for One Buffer
- We obtain closed-form formula of FR for inserting
one buffer to meet delay constraint
35KEY Observation for FR
- Even under tight delay constraint, FR for BI can
still be very large!
gt FR provides a lot of flexibility to plan
buffer location
36Extension I FR for Multiple Buffers
k
1
i
Rd
Rb
Cb
Tb
xi
CL
- More complicated, but still closed-form solution
for FR - We also obtain the minimum number of buffers kmin
needed to meet delay constraint
37Extension II 2D Feasible Region
- FR extended to 2-dimension with obstacles
sink
source
38Overall Picture BBP for Interconnect-Driven
Floorplanning
- For each floorplan (FL) configuration
- Apply BBP on the given FL
- Evaluate resulting FL in terms of timing, area,
BB trade-off, etc. - Return the best FL solution
39Experimental Setting
- Two Algorithms
- RDM no buffer planning, i.e., a buffer is
randomly placed to any feasible location - BBP buffer block planning
- Two Scenarios
- RES restricted, delay minimal BI position(s)
- FR feasible region
- 6 MCNC 5 randomly generated circuits (0.18um
tech) - Delay budget randomly assigned to be 1 to 1.2 x
Topt
40 Nets That Meet Delay Target
FR provides a lot more flexibility than RES to
better meet delay target (e.g., to avoid
obstacles during BI)
41Comparison of BB
BBP reduces BB from RDM by a factor of up to
3x BBP/FR further reduces BB from BBP/RES by up
to 34
42Normalized Total Chip Area after BI
BBP/FR can effectively cluster more individual
buffers and put into dead area ( up to 7 area
saving)
43 Summary of Experimental Results
BBP/FR provides the best solution.
44Summary and Recent Trends
- A key concept here is the feasible region (FR)
- FR provides a lot more flexibility to better meet
delay constraint and plan buffer locations - Many follow-up and related works
- You can do an advanced search by typing buffer
and planning and 2003 under IEEE Xplorer - 18 papers in 2003
- 16 papers in 2002
- Independent feasible region Kohs group, ISPD
2000 - Buffer site, even inside macros Alpert et al,
DAC 2001 - With noise consideration Li et al, ASPDAC03
- With congestion/routability consideration Ma et
al, DAC03 Sham and Young, TCAD03 -
- Buffer planning should be more important with
growing number of buffers
45A Priori System-LevelInterconnect
PredictionRents Rule and Wire Length
Distribution Models
Thanks to Dirk Stroobandt Ghent University
46Why A Priori Interconnect Prediction?
- Interconnect importance of wires increases (they
do not scale as components). - A priori
- For future designs, very little is known.
- The sooner information is available, the better.
- A Priori Interconnect Prediction estimating
interconnect properties and their consequences
before any layout step is performed. - Extrapolation to future systems Roadmaps.
- To improve CAD tools for design layout
generation. - To evaluate new computer architectures.
47The Three Basic Models
Circuit model
Logic block
Net
Terminal / pin
48Rents Rule
Rents rule was first described by Landman and
Russo in 1971. For average number of terminals
and blocks per module in a partitioned design
p Rent exponent
t ? average term./block
Measure for the complexity of the interconnection
topology Intrinsic Rent exponent p
(simple) 0 ? p ? 1 (complex)
Normal values 0.5 ? p ? 0.75
- B. S. Landman and R. L. Russo. On a pin versus
block relationship for partitions of logic
graphs. IEEE Trans. on Comput., C-20, pp.
1469-1479, 1971.
49Rents Rule (cont.)
Rents rule is a result of the self-similarity
within circuits
Assumption the complexity of the interconnection
topology is equal at all levels.
50Rents Rule (other definition)
(Dense) region B cells,
T terminals
If ?B cells are added, what is the increase
?T? In the absence of any other information we
guess
Overestimate many of ?T terminals connect to T
terminals and so do not contribute to the
total. We introduce a factor p (p lt1) which
indicates how self-connected the netlist is
placement optimization
?T
B
?B
T
Statistically homogenous system
Or, if ?B ?T are small compared to B and T
- P. Christie and D. Stroobandt. The
Interpretation and Application of Rents Rule.
IEEE Trans. on VLSI Systems, Special Issue on
SLIP, vol. 8 (no. 6), pp. 639-648, Dec. 2000.
51Rents Rule (summary)
p
T t B
Rents rule is experimentally validated for a lot
of benchmarks.
- Distinguish between
- p intrinsic Rent exponent
- p placement Rent exponent
- p partitioning Rent exponent
average
Deviation for high B and T Rents region
II Also deviation for low B and T Rent region
III
Rents rule
- B. S. Landman and R. L. Russo. On a pin versus
block relationship for partitions of logic
graphs. IEEE Trans. on Comput., C-20, pp.
1469-1479, 1971.
- D. Stroobandt. On an efficient method for
estimating the interconnection complexity of
designs and on the existence of region III in
Rents rule. Proc. GLSVLSI, pp. 330-331, 1999.
52Wirelength Estimation
1. Partition the circuit into 4 modules of equal
size such that Rents rule applies (minimal
number of pins).
2. Partition the Manhattan grid in 4 subgrids of
equal size in a symmetrical way.
- W. E. Donath. Placement and Average
Interconnection Lengths of Computer Logic. IEEE
Trans. on Circuits Syst., vol. CAS-26, pp.
272-277, 1979.
53Donaths Hierarchical Placement Model
3. Each subcircuit (module) is mapped to a
subgrid.
4. Repeat recursively until all logic blocks are
assigned to exactly one grid cell in the
Manhattan grid.
54Donaths Length Estimation Model
- At each level Rents rule gives number of
connections - number of terminals per module directly from
Rents rule (partitioning based Rent exponent
p) - number of nets cut at level k (Nk) equals
- where ? depends on the total number of nets in
the circuit and is bounded by 0.5 and 1.
55Donaths Length Estimation Model
Length of the connections at level k ?
Adjacent (A-) combination
Diagonal (D-) combination
?
Donath assumes all connection source and
destination cells are uniformly distributed over
the grid.
56Results Donath
Scaling of the average length L as a function of
the number of logic blocks G
Similar to measurements on placed designs.
57Results Donath
Theoretical average wire length too high by
factor of 2
58Occupation Probability Function
Same result found by using a terminal
conservation technique
-
-
TA?C
-
-
TAB
TBC
TB
TABC
Assumption net cannot connect A,B, and C
- J. A. Davis et al. A Stochastic Wire-length
Distribution for Gigascale Integration (GSI) -
PART I Derivation and Validation. IEEE Trans. on
Electron Dev., 45 (3), pp. 580 - 589, 1998.
59Occupation Probability Function
For cells placed in infinite 2D plane
60Occupation Probability Results
- Use probability on each hierarchical level (local
distributions).
8
Occupation prob.
7
Donath
6
Experiment
5
L
4
3
2
1
0
10000
10
100
1000
G