Title: Design and Test Trends from Physical Design perspective
1Interconnect Synthesis
Session III Dr. Parthasarathi Dasgupta MIS
Group Indian Institute of Management Calcutta
2Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
3ITRS02 Interconnect Projected Parameters
ITRS 2002 Update - I
Solution known
Solution exists
Solution Unknown
Year of Production
2010
2016
DRAM Tech node(nm) 45
22 No. of metal levels 10
11 Total interconnect length(m/cm2) (active
wiring only, excluding global levels)
16063 33508 Interconnect RC
delay- 1mm line (ns) 565
2008
4ITRS02 Interconnect Projected Parameters
ITRS 2002 Update - II
Solution known
Solution exists
Solution Unknown
Year of Production
2010
2016
Effective dielectric constant of inter-level
metal insulator
2.1 1.8
Local wiring Pitch (nm) 105 750
Minimum Global wiring Pitch (nm) 205
100 Intermediate wiring Pitch (nm)
135 65 Conductor effective resistivity
(microOhm-cm) 2.2 2.0
5ITRS02 Some Grand Challenges - I
- Near Term (Through 2007)
- Mask-Making (Lithography)
- Process Control (Lithography)
- Integration of New Processes and Structures
- (Interconnect)
- Power Management (Design)
6ITRS02 Relevant Grand Challenges - II
- Long Term (2008 Through 2016)
- Next-Generation Lithography (Lithography)
- Identify Solutions Addressing Global Wiring
Issues - (Interconnect)
- Error-Tolerant Design (Design)
-
7Why are DSM Interconnects Important ?
- Signal Delay
- Reduction of chip size K times
- increases wire resistance K times
- increases wire capacitance K times, and hence
- increases global interconnect delay K2 times
- reduces gate switching time K times
- local interconnect delay remains unchanged
8Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
9Why bother about Signal Delay?
- Global Routing trees often need to be
constructed with an objective of minimizing
circuit delay - Minimum circuit delay preferred to increase
speed of the circuit - Accurate measurement of signal delay is thus
very important - Exact signal delay measurement is too complex
and time consuming - Hence there is a need to have an accurate delay
estimation
10Estimating Signal Delay
- Elmore Delay
- Delay through an on-path resistor its
resistance ? - downstream capacitance
- Delay through a path (driver to a sink pin)
- sum of delays through individual edges on the
path - First moment of the interconnect under impulse
response - Based on the 50 delay
r
Source
Rest of circuit
C1/2
C1/2
C2
Delay through interconnect r.(C1/2 C2)
11Elmore Delay Characteristics
- Fairly accurate estimate of delay at nodes far
from source - Expressible as a closed-form expression
involving only - resistors and capacitors
- Provable upper bound on actual delay for all
inputs - Additive
Source, S
A
B
Delay (S, B) Delay(S, A) Delay(A, B)
12Elmore Delay - An Example
i 1
j 1
I 1
j i
13Elmore Delay Computation
RC tree is traversed depth-first twice Pass 1
Compute the effective capacitance at each node of
the RC tree Pass 2 At a node, compute the
actual Elmore delay from the source, using the
sum of (a) delay upto the predecessor node, and
(b) the product of the resistance between the
predecessor node and the current node, and the
effective capacitance at current node obtained in
Pass 1.
1 k
1 k
1 k
1 k
1 k
A
B
500
500
500
500
500
?AB 1k ohm x 500 Ff x 5 1k ohm x 500 Ff x 4
1 k ohm x 500 Ff x 3 1 k ohm x 500 Ff x 3 1
k ohm x 500 Ff x 2 1 k ohm x 500 Ff x 1 2500
2000 1500 1000 500 7.5 n seconds.
14Bounds on Signal Delay
Lower bound and Upper Bound Computation Define
Rii resistance between source and node i Rki
resistance of the subpath common to the path
between source and node i, and that between
source and node k. The 3 Ts with dimension of
time are artificially constructed to simplify
bound computation. TP ?kRkkCk, TDi
?kRkiCk, TRi (?kR2kiCk)/Rii Let signal delay
at node i bet . Then, TDi - TRi lnTRi / TP(1 -
vi(t)) lt t lt TDi /(1 - vi(t)) -
TRi where v0 1, vi(t) 0.5
J Rubinstein, P Penfield and M A Horowitz,
Signal Delay in RC Tree Networks, IEEE Trans.
on Computer-Aided Design, CAD-2, 3, July, 1983.
15Bounds on Signal Delay - An Example
16Other delay Metrics
Higher order moments and using R, L and C have
also been tried by several researchers, but most
of them are rarely used due to the difficulty of
their computation inspite of their better
accuracy.
Bonding Wire
Chip
L
Mounting
Cavity
L
Lead Frame
Pin
17- Fidelity of a delay estimator
- Degree to which an optimal or near-optimal
solution according to a - delay estimator correlates to a nearly optimal
according to actual delay. - For a set of possible solutions obtained using
the estimator, how - close are the ranks correlated to those for the
solutions obtained by - the actual delay measurement?
- Measure of fidelity in the context of finding
Optimal RST - Portion of the pair-wise inequality relations
among the optimal - solutions that are correctly determined by the
heuristic solution -
- If there are m instances of RST and hi, si are
respectively the - objective values of the heuristic and optimal
solutions to - instance j,then fidelity
- f (i, j) 0 lt i lt j lt m, ((hi - hj)(si
-sj)gt 0) or (si sj) / mC2
How effective are Delay Estimators?
P. Dasgupta, et al, Relative Accuracies of
Estimators and their use in VLSI Routing, IIM-C
Tech. Report.
18Relative Accuracy of Delay Estimators
- Existing work
- Used in constructing near-optimal routing trees
based on - Elmore delay (Boese et al, ICCAD. 1993)
- Used for optimum wire sizing in routing trees
- (Cong et al, ACMTODAES, 1996)
- Major drawback of existing work
- Fidelity measured on all possible samples
- Main ideas
- Fidelity should be computed on a reasonably
diverse set of - relevant (near-optimal?) samples
- Should be dimensionless
- Preferably in the range (-1, 1)
- Relevant, I.e., act as a discriminator for
good solutions, and - not for the bad solutions
- Should be least affected by ties
19New Delay Metric?
- Can we use the bounds to have a better delay
metric? - Preferable characteristics of this delay metric ?
- Compact and closed-form expression
- Easily computable
- - Efficient lower bound of actual delay (this
helps!!)
20Practical use of delay minimization
Required Arrival Times
s1
s2
s0
sn
RAT(s0) lt RAT(si ) delay(s0, si ), I 1,
n RAT(s0) lt minimumi1, n (RAT(si ) delay(s0,
si )) Slack(s0, si ) (RAT(si ) delay(s0, si
)) - RAT(s0)
21Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
22Routing Tree Construction
- Mostly based on finding minimum-cost Steiner
trees (SMT) - Some are based on Rectilinear Steiner
Arborescences - which are minimum-cost Steiner trees (RSMT)
with shortest source-sink paths - Algorithms exist for simultaneous cost
minimization and - tree-diameter reduction
- Extended Prim with bounded diameter also
proposed - In DSM range, driver resistance / unit wire
resistance - hence, distributed interconnect structure /
capacitance
23Routing Tree Construction
- P-tree heuristics (Lillis et al)
- Iterated 1-steiner (Kahng Robins)
- Geo-Steiner (Best Steiner tree construction
method!) - Bounded PRIM (BPRIM)
- Shallow-Light trees (BRBC)
- Rectilinear Steiner Arborescence (RSA)
- (A-tree construction of Cong et al)
24RSMT Problem - Key Results
- Reduction to discrete grid
- Iterated 1-Steiner heuristic
- Greedily adds Steiner points to the tree
- Almost 11 improvement over MST on average
- Fast batched implementation (BI1S)
- Exact algorithm GeoSteiner 3.0
- Branch-and-cut
- 11.5 improvement over MST on average
25A-Tree Construction
A Rectilinear Steiner Tree is an A-tree if every
path from the source to any node in the tree is a
shortest path. A-Tree algorithm (in a nutshell)
- Start with a forest of n single-node
arborescences - Apply a sequence of moves
- Grows an existing arborescence
- Combines two arborescences to form a new one
- Stop when ONE arborescence is left
- Move may be safe (optimal) / heuristic
(possibly sub-optimal)
J. Cong, K-S. Leung, D. Zhou, Performance-Driven
Interconnect Design based on Distributed RC
Delay Model, Design Automation Conference, 1993.
26Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
27Topological Routing - A new idea
Our goal Partitioning routing area into
zones of pins with geometric proximity for better
area / topological routing and finding ways of
prioritizing zones. Why ??
Sinha, Sur-Kolay, Dasgupta and Bhattacharya,,
Partitioning Routing Areas into Zones with
Distinct Pins, IEEE International Conference
on VLSI Design, Bangalore, India, 2001.
28Forming Zones in a Placement
Objective all pins in a zone belong to distinct
nets and are reachable through connected
regions Rationale First, connect nets among
zones, then route in detail each zone
within its connected region Bus lines
are likely to be routed together.
29Graph for Zone Partitioning
- Pins in a placement
- gt Point set
- gt Voronoi diagram
- gt Delaunay triangulation
- gt Planar triangulated graph, G
-
Net name for pin gt color of point, i.e., vertex
in G
30Formulation of the problem
- Input Planar triangulated graph G with vertices
having different colours. - MIN_ZONE_PART
- Find minimum set of connected sub-graphs, which
partitions G such that vertices in each
sub-graph have distinct colors. - Proposed algorithm is based on Genetic Algorithm.
31Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
32Can we reduce Interconnect delay?
- Buffer allocation
- New directions
- Wire sizing
- Non-tree routing
-
33Why use Buffers in Routing Trees?
- Added buffer shields the heavy load downstream
on - the branch from the rest of the tree.
- Recover the slope of the signals transition
edge and - screen out the noise.
- Boost driving power and reduce delay.
34Buffer allocation schemes
- Classical technique of Van Ginneken
- Permutation tree (P-tree)-based method to
combine - topology construction and buffer-insertion
searches, - with wire sizing
- Okamoto-Congs work
35Ginnekens DP-based Method
Input a) A routing tree. b) Required arrival
times (RAT) at sinks. c) Legal buffer positions
(at vertices of routing tree) Output Find the
optimal buffer placement s.t. the RAT at source
is maximum. Method Two stage dynamic
programming-based algorithm. Stage 1. For each
vertex of routing tree, find best choices for
buffer assignment giving larger RAT at vertex
(Bottom-up). Stage 2. Top-down traversal from
root to leaves corresponding best choice for root
obtained in Stage 1. Actual buffer placement.
L.P.P.P. van Ginneken, Buffer Placement in
Distributed RC-tree Networks for Minimal Elmore
Delay, Int Sym on Circuits Systems, 1990, pp.
865-868.
36Ginnekens DP-based Method ..Contd.
s1
B total number of legal buffer positions Time
complexity O(B2)
buffer
s2
s0
Without buffer - Tk Tk rlLk 0.5rcl2,, Lk
Lk cl With buffer Tk Tk Dbuf RbufLk,
Lk Cbuf
s3
s4
s5
s6
- An option is strictly worse if
- load is larger, and (ii) required time is
earlier. - At each vertex, the worst options are not saved.
At root, the - best option is chosen.
37Okamoto-Congs Method - I
- existing techniques mostly works in two stages
- optimum Steiner tree construction
- optimum buffer insertion in this tree
- This method - DP-based simultaneous application
of - van Ginnekens buffer insertion
- Congs A-tree construction
S1 (Critical)
S1 (Critical)
S2
S2
S3
S3
Source
S4
Source
S4
Minimum-delay buffered tree
Minimum-delay tree followed by buffer insertion
38Okamoto-Congs Method - II
- Characteristic features
- Critical path isolation - root gate drives
critical sinks and a smaller additional load due
to buffered non-critical paths - If RATs at sinks are within a small range,
balanced load decomposition is applied in order
to decrease the load at output of root gate.
Critical Signal
Critical Signal Isolation
Balanced Load Decomposition
T. Okamoto and J. Cong, Interconnect Layout
Optimization by Simultaneous Steiner tree
construction and Buffer insertion, ICCAD, 1996.
39Okamoto-Congs Method - III
- Overview
- Critical path isolation (CPI) - root gate drives
critical sinks and a smaller additional load due
to buffered non-critical paths - If RATs at sinks are within a small range,
balanced load decomposition (BLD) is applied in
order to decrease the load at output of root
gate. - Bottom-up DP followed by top-down buffer
placement - CPI and BLD are taken into account when choosing
two subtrees to be merged into a single root. - For a given set of options of two nodes u, v,
and for root node r, the distances dist(r,u),
dist(r, v), and characteristics of buffer to be
placed at r, the set of options at r are computed - Using a cost criteria for different roots in the
A-tree, the best subtree is formed. - In 2nd phase, option at root which gives max
RAT(root) is chosen, and the tree is traversed in
top-down manner using the best chosen nodes in
the previous phase.
40Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
41Improving Delay by Wire Sizing
- Why wire sizing ?
- In DSM, when wire resistance becomes
significant, proper sizing of the interconnects
can reduce the interconnect delay. - First proposed by Cong, Leung and Zhou in 1993.
- When driver resistance is much larger than wire
resistance of the interconnect, the interconnect
can be modeled as a lumped capacitor without
losing much accuracy, and conventional minimum
wire-width solution often leads to an optimal
design. - When driver resistance falls below unit wire
resistance, optimal wire-sizing can lead to
substantial delay reduction.
J. Cong, K.S. Leung, "Optimal Wire sizing under
the Distributed Elmore Delay Model," ICCAD, 1993.
42P-tree-based method
- Salient features
- Uses the notion of permutation of sinks
- Constructs binary search trees as the routing
trees - Finds an optimal sink permutation P based on
minimum length of tour - on the sinks, and searches for the optimal
binary tree for P - Based on DP as in Ginnekens algorithm
- Uses load and RAT as cost parameters in DP
- Performs simultaneous wire sizing for the
constructed tree
s0
s0
s1
s2
s3
s4
s5
s1
s4
s5
s2
s3
Two different trees induced by a sink permutation
Lillis, Cheng, Lin, New Performance Driven
Routing Techniques with Explicit Area/Delay
Tradeoff and Simultaneous Wire Sizing , 33rd
Design Automation Conference, pp. 395-400, 1996.
43Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
44 Battling with Manufacturing Defects
- Wire doubling
- Simple, easy to integrate in current design flows
- Can be applied to all nets
Can the use of Graphs (with cycles) instead
of (conventional) Trees for Routing Topologies be
useful ?
- Non-tree routing (NTR)
- Still easy to integrate in current flows
(post-processing approach) - Appropriate for non timing-critical nets
- Potentially more effective
45NTR increases Reliability
- Open fault missing material (or extra oxide
where via should be formed) -
- Predominant for reduced feature size
- Manufacturing defects and electro-migration tend
to be - acute problems for DSM
- Reliability ability of the interconnect to
tolerate open - faults increases for NTR topology
46Spot Defect Classification
(Source Ion Mandoiu, Fujitsu Lab Talk)
47Probability of Failures
48 NTR Problem formulation
Optimal Routing Graph (ORG) Problem Given a
signal net N (n1, n2, , nm) with source s0,
find a set S of Steiner points and a routing
graph G (N U S, E), such that the maximum
source-sink signal propagation delay is
minimum. Result. ORG problem is NP-hard.
B. A. McCoy and G. Robins, Non-Tree Routing,
IEEE Transactions on CAD/ICAS, Vol 14, No. 6,
June 1995.
49Other uses of Non-tree Routing
- May reduce signal propagation delay
- Wire capacitance Wire resistance
- Observation Often, for DSM designs,
- decrease in R gt increase in C
- Capable of reducing signal skew
- Signal skew improved by an average of 63 over
Steiner routing
50Augmenting Paths for NRT Construction
(C) Paths connecting tree nodes or projections
of tree nodes onto adjacent tree edges
(C)
(Source Ion Mandoiu, Fujitsu Lab Talk)
51Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
52Achieving High Clock Speed!
- Factors determining the operating speed of a
circuit - Delay
- Clock distribution
- Clock skew
- Measures the asymmetric clock distribution
- Maximum clock delay - Minimum clock delay
- Ideally should be Zero (Zero Skew Trees)
source
Reducing Clock Skews
53Zero Skew Routing
- Greedy Deferred Merge Embedding for Zero skew
- Greedy bottom-up method
- Set of merging segments, initially each segment
having a - sink
- Iteratively finds the pair of closest segments
- Determine the position of parent such that the
delays from - parent to the children are equal
M. Edahiro, A Clustering-based Optimization
Algorithm in Zero-Skew Routings, 30th Design
Automation Conference, 1993.
54Bounded-Skew Routing
- Problems with Zero-skew Tree construction
- Very difficult to achieve
- Increased wiring area
- Higher power dissipation
- Practical case Circuits operate correctly within
some non-zero skew bound. - BST/DME
- Form merge regions instead of merge segments
- Bottom-up region formation followed by top-down
process to - determine the exact location of the internal
nodes.
Cong et al, Bounded-Skew Clock and Steiner
Routing, ACMTODAES, Vol 3, 1998.
55Semi-Synchronous Circuits
- Cluster-based method for Semi-Synchronous Circuit
- A circuit in which the clock is assumed to be
distributed periodically to - each individual register, though not
necessarily simultaneously - Clock period minimization is of prime importance
- Registers are partitioned into clusters
depending on their geometric - positions
- Registers within a cluster are in close
proximity and have identical - clock timing requirements
- Clusters are then modified to improve the clock
period while keeping - the radius small
- Each cluster of registers is driven by a buffer
- Clock period is 18 shorter than zero-skew
method - Wire length and power consumption are comparable
to zero skew
Saitoh, Azuma and Takahashi, A Clustering Based
fast Clock Schedule Algorithm for Light Clock
Trees, IEICE Trans. Fundamentals, Dec, 2002.
56Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
57Early Design Planning Needs
- Interconnect Planning (Otten, others)
- Buffer Block Planning
- Interconnect Architecture (IA)
- Performance Prediction
- Others .
58Buffer Block Planning for Interconnects
- Why planning for buffers?
- Early works were for one net at a time, and had
no global - planning for buffer placement for all the nets
- Buffers can not be placed anywhere as they will
be in the - silicon, and require connections to
power/ground - networks.
- Arbitrary buffer placement may affect use/reuse
of IP - blocks
59A Method for Buffer Block Planning
- Salient features of Congs method
- Introducing the concept of feasible region (FR)
for buffer - placement under some delay constraint
- FR can be quite large and can be used to place
buffer - clusters
- an effective algorithm proposed for finding FRs
which can - be used for subsequent Floorplanning and
Interconnect - Planning
J. Cong, T. Kong and Z. Pan, Buffer block
Planning for Interconnect Planning and
Prediction, IEEE TCAD/ICAS, December, 2001.
60Routing Tree for Fixed Buffer Locations
- High performance design requires using a large
number of - buffers
- In practice, buffers are organized into buffer
blocks which - are pre-planned
- Buffer positions are fixed prior to routing tree
construction
obstacle
Buffer block
Logic Block
Pin
Routing Graph for a floorplan with buffer block
planning
61Routing Tree for Fixed Buffers - A Method
- Summary of the method
- given the RAT values at sinks, and feasible
regions of buffers, to construct a routing tree
and assign buffers such that the RAT at source is
maximum. - Each node v is identified by a tree below it,
and characterized by (i) load capacitance at v in
its subtree, (ii) RAT(v) in the subtree, (iii)
RE reachable sink set in the subtree, and (iv) a
flag buf indicating if a buffer is inserted in
subtree at v.
J. Cong and Xin Yuan, Routing Tree Construction
under Fixed Buffer Locations, Design Automation
Conference, 2000.
62Routing Tree for Fixed Buffers - A Method
- Summary of the method Contd.
- Expansion of nodes (subtrees), and merging of
nodes (subtrees) are done at each node and the
corresponding labels generated. - A label-queue stores all the labels at any stage
of the algorithm, and at each iteration of the
algorithm, a new label with maximum RAT is
selected. - Subtrees are not deleted on immediately on
merging. - Stops when algorithm fetches a label from queue
for for the whole tree
63Routing Tree for Fixed Buffers - Example
64Outline
- Interconnect synthesis
- ITRS challenges (http//public.itrs.net)
- Delay models and estimators
- Routing tree construction methods
- Topological routing
- Delay reduction
- Buffer insertion
- Wire sizing
- Non-tree routing methods
- Clock tree routing
- Early interconnect planning
- Buffer block planning
- Interconnect architecture and metrics
65Can we Predict IA Performance?
- Performance of Interconnect Architecture (IA)
traditionally predicted using delay, congestion,
etc. - Previous works lack in considering several
factors like materials, use of vias, repeaters,
number of layers, etc - Helps to choose the dynamic parameters related
to process and materials for an IA
66Interconnect Architecture
An Interconnect Architecture (IA) is a collection
of pairs of wiring layers (tiers), with all wires
in a given layer pair having uniform width (w),
height (h), spacing (s) and thickness (t)
Layer-pair j
Layer-pair (j1)
Repeater
Repeater
via
Schematic of an IA
Repeater
67Proposing a Novel Metric
- Novel metric for IA performance evaluation
- Efficient dynamic programming method for exactly
computing the metric - for given Interconnect Architecture
- for given Wirelength Distribution (WLD)
- Studies of effects of materials, geometry,
frequency, design parameters, repeater resources
on IA metric
Dasgupta, Kahng and Muddu, A Novel Metric for
Interconnect Performance, Design and Test
Automation in Europe (DATE) 2003.
68Performance-, WLD-Aware Routing Model
- All connections (wires) are two-pin, L-shaped
- Each segment of an L is assigned to one layer of
a tier - For a given WLD, longer wires always routed on
upper tiers shorter wires always routed on lower
tiers - Every wire has a target delay (proportional to
clock period) - Repeaters inserted as needed to meet delay
targets - Starting from longer wires first
- All repeaters used in wires of a tier are of
same size - Repeater resource is specified as fraction of
total die area - Repeaters inserted until repeater area budget is
exhausted
69Rank Metric for IA
- Determines IA quality in terms of how completely
target performance is met while embedding all
wires - Rank of a wire its index in the WLD, where
wires have been arranged in order of
non-increasing length - Rank of an IA index of the highest-rank wire in
the WLD that meets its target delay, subject to
the constraints - The given repeater area budget is not exceeded
- Lower-rank ( longer) wires in the WLD meet
target delays - All wires in the WLD can be assigned to the IA
- The rank of an IA is zero if not all the wires of
the WLD can be assigned to the IA, even without
meeting any delay targets
70Rank of IA Dependencies
WLD
Number of wires
IA of layer pairs W, H, S and T, tech node,
gate count and gate parameters
TWirelength
Target Delays
Rank of the IA
Repeater area budget AR
Via blockage
71The Rank Computation Problem
- Given
- IA with fixed number of layer-pairs with fixed
geometry - WLD W with n wires
- Available repeater area AR
- Upper bound di target delay for each wire
- Find
- An assignment of wires from the WLD to the IA
- using repeater insertion within the repeater area
budget - to meet target delays of wires
- such that rank of first wire failing to meet
target delay is maximized
72Rank Computation
- Rank of an IA is computed by assigning maximum
number of wires from the WLD to tiers of the IA - by making ActualDelay ? TargetDelay
- within AR
- Maximizing the Rank requires optimum combination
of - wires assigned to tiers
- repeaters assigned to wires
- Exhaustive search over wires, tiers and repeaters
is infeasible - How to compute Rank efficiently?
- Greedy approach or Dynamic Programming (DP)
73Layout Enhancement for Manufacturability
Session III Dr. Parthasarathi Dasgupta MIS
Group Indian Institute of Management Calcutta
74Outline
- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction
75Process flow for IC Manufacturing
Layout Design
Pattern Generation
Mask or Reticle
Chip Production
76IC Manufacturing Terminology
Reticle - A photographic plate developed from a
sequence of polygonal patterns for a single layer
of an IC Depth of focus - a plus or minus
deviation from a defined reference plane wherein
the required resolution for photolithography is
still achievable Photoresist - A
radiation-sensitive material used as a coating on
wafer prior to doping
77Photolithography
RADIATION
MASK
RESIST
RESIST
RESIST
OXIDE
OXIDE
SILICON
SILICON
78Outline
- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction
79IC Manufacturing in DSM - Problems?
- Feature dimensions (lt 350 nm) lt Wavelength of
the incident light - Effects?
- Optical diffraction.
- Resist process effects.
- Distortion or disappearance of features.
- Rayleigh limit (resolution ? ? / NA2)
- Compensation Schemes (Amp / Phase)
- Optical Proximity Correction.
- Phase-Shifting Masks.
- ...
Resolution Enhancement Techniques (RET)
80Optical Proximity Correction (OPC)
- Perturb the shapes of transmitting apertures in
the mask in a systematic manner to address
optical lithographic distortions. - OPC correction primitives
- Serif small L-shaped geometry added to
(subtracted from) convex (concave) corner to
nullify rounding - Hammerhead A U or inverted-U to compensate for
line-end shortening - Notch local thinning of a feature to compensate
for linewidth variation - Outtrigger disconnected, non-printing geometry
that uses diffraction effects to enhance
linewidth control
81OPC Example
A. B. Kahng and Y. C. Pati, "Subwavelength
Optical Lithography Challenges and Impact on
Physical Design", Proc. ISPD, April 1999, pp.
112-119.
82Outline
- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction
83Phase Shifting Mask (PSM)
Proposed in M.D. Levenson, et al. Improving
Resolution in Photolithography with a
Phase-Shifting Mask, IEEE Trans. Electron
Devices, 29, p. 1812, Dec. 1982. By using a
coating based on Chromium or Molybdenum Silicide
(MoSi), two adjacent clear regions of a mask are
enabled to transmit light with pre-defined
phase-shifts. For a phase-shift 180 degrees,
diffracted light in the intermediate dark region
interfere destructively. Effect - Better
resolution and depth of focus (DOF).
84PSM Example
Phase shifter
Light Intensity
Regions of Interference
Without Phase-shifting mask
With Phase-shifting mask
85Outline
- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction
86Phase Assignment Problem
Input A given set of features in a mask
layout Objective Assign phases to all the
features of the layout such that no two
conflicting features are assigned the same phase.
87New Thoughts?
Constrained Physical Design !!
Layout Geometry Mask
Geometry
Actual geometry on Wafer
88Outline
- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction
89Chemical-Mechanical Polishing (CMP)
- Requirements for ULSI --
- smaller feature size
- higher resolution
- multi-layer interconnects
- global planarity on ILD and metal layers for
better - depth of focus
- CMP
- - can be performed on both ILD and metals
- - polishes wafer surface flat
- - uses chemical slurry and circular action
90Problems with CMP
Wafer
Slurry
Rotating Pad
- Associated Problems
- wearing out of Polishing Pad over metal features
- dishing in sparse regions of layout
- greater ILD thickness over dense regions of
layout
91Outline
- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction
92Uniform Feature Density?
- The density of a layer in any particular region
is the - total area covered by the drawn features on
that - layer divided by the area of the region
- ILD thickness ? Local Feature density
- Uniform (feature) density achieved by
- Post-Processing Insertion of Dummy
(electrically - inactive) features
93Uniform Feature Density - Tiling
Dummy feature
Normal feature
Cross-sectional view
Top view
94(Dummy) Fill Synthesis Problem (Tiling)
- Foundry rules specify minimum and maximum
feature densities to reduce effect of CMP on
yield (e.g., each metal layer, every 2000 um x
2000 um window must be between 35 and 70 filled - Problem
- Input A post-CMP feature distribution on a
layout - Objective The amount and location of dummy
features to be placed into the layout. - Constraints Feature density gt a prescribed
minimum, variation in feature density is within a
prescribed range, electrical and physical design
rules are observed.
95Solving Tiling Problem
- Outline of A Generic Approach
- For every prescribed window size, find the
available - area for dummy features
- fixed dissection
- arbitrary dissection
- Compute amounts and locations of dummy fills
- satisfying the constraints
- Generate Fill Geometry
96Methods for Dummy Feature Placement - I
- rule-based (widely used in Industry)
- boolean operation
- tiles of a single prescribed density
- fills ONLY open space
97Methods for Dummy Feature Placement - II
- model-based
- analytical expression of the relation between
local - density and ILD thickness
- more accurate than rule-based
R. Tian, D. F. Wong and R. Boone, Model-Based
Dummy Feature Placement for Oxide CMP
Manufacturibility, DAC 2000.
98Outline
- Issues in lithography
- Resolution enhancement
- Optical process correction
- Phase Shift Masking
- Phase assignment
- Chemical mechanical polishing
- Dummy fill synthesis
- Layout data representation and compaction
99Dummy Fills Is DENSITY the ONLY factor?
- Representing fill patterns (GDSII)
- Generating compressed fill patterns
- Compressing existing fill patterns
100Defining Layouts - GDSII
- A standard (binary) file format
- Used for transferring / archiving 2D graphical
design data - Records
- Header (record type)
- Information (GDSII BNF)
- Contains hierarchy of structures
- Structure
- Boundary / polygon, path, text, box
- Structure references (SREF)
- Array of structures references (AREF)
101GDSII AREFs
X3, Y3
SREF / AREF
Inter row spacing
X1, Y1
Inter column spacing
X2, Y2
102GDSII File Description Example
Header Bgnlib Lib1 Bgnstr Struct1
(ltelementgt) Endstr Endlib Element -
ltboundarygt ltpathgt ltarefgt lttextgt ltnodegt
ltboxgt Endel Header Bgnlib Libname LIB1
Bgnstr Strname CELL1 Boundary Layer 0
Datatype 0 XY 6 X -1000.000 Y
0.000 X 163000.000 Y 0.000 X
163000.000 Y 177000.000 X 80000.000 Y
260000.000 X -1000.000 Y
260000.000 X -1000.000 Y 0.000
Endel Endstr
103GDSII File Description Example
Bgnstr Strname AREF_SAMPLE1 Aref
Sname CELL1 Strans 0,0,0 Colrow 7 , 3 XY
3 X -5114000.000 Y -3006000.000 X
-3095600.000 Y -3006000.000 X
-5114000.000 Y -1891800.000 Endel
Endstr
104GDSII File Description Example
Bgnstr Strname SREF_SAMPLE1 Sref Sname
AREF_SAMPLE1 XY 1 X
-7114000.000 Y -2006000.000 Endel
Endstr Bgnstr Strname LAYOUT Aref
Sname SREF_SAMPLE1 Strans 0,0,0 Colrow 9 ,
5 XY 3 X -114000.000 Y -2006000.000
X -2095600.000 Y -2006000.000 X
-114000.000 Y -2891800.000 Endel
105GDSII File Description Example
Aref Sname CELL1 Strans 0,0,0 Colrow 2 ,
3 XY 3 X -3140000.000 Y
-2006000.000 X -3240000.000 Y
-2006000.000 X -3140000.000 Y
-3891800.000 Endel Endstr Endlib
Aref Sname CELL1 Strans 0,0,0 Colrow 2 , 3
XY 3 X -3140000.000 Y
-2006000.000 X -3240000.000 Y
-2006000.000 X -3140000.000 Y
-3891800.000 Endel Endstr Endlib
106Why Layout Data Compaction is needed?
- Drastic increase in Layout data
- pattern modification due to OPC and PSM
- fracturing of layout data (partitioning into
- primitive patterns, like triangles,
trapezoids, etc.) - fill pattern generation
107A Method for Layout Data Compaction
- Basic figure a rectangle or a trapezoid
corresponding - to a fractured mask pattern.
- Grouped figure group of multiple basic figures
- Array a reference to grouped figures with
pitch and - number of repetitions in each of X and Y
directions. - Cell has some references and one definition.
Cell - includes grouped figures.
Effective Data Compaction Algorithm for Vector
Scan EB Writing System, S. Ueki et. Al., Proc.
of SPIE, Vol. 4186, 2001.
108Compaction Steps in Uekis Algorithm
Step 1. Search arrays of same type of figure
(array search algorithm) Step 2. Classify arrays
by array type and create cells that include
multiple figures for each array type. Step 3.
Search cells from all figures that are not
positioned in array. (cell search algorithm)
109Example - I
Shape A A1, A2, A3 Shape B B1, B2, B3 Figures
classified by shape
B1
B2
B3
A1
A3
A2
Array of A1, A2, A3 Array of B1, B2,
B3 Array figures of same type
B1
B2
B3
A1
A3
A2
Grouped figure AB One array AB1, AB2, AB3
AB1
AB3
AB2
Cell extracted for arrays of same pitch and same
of figures
110Example - II
Three cell references
5 x 1 array
111Open Artwork System Interchange Standard
- OASIS Salient Features
- Proposed in February, 2003 by SEMI TWG
- Achieve gt10x (order of magnitude) file size
reduction - compared to GDSII
- Allow integers to extend to gt 64 bits, when
required - Efficiently handle flat geometric data,
including array - of figures
- Make format publicly available to academic and
- commercial entities
New Stream Format Progress Report on
Containing Data Size Explosion, DSM Technical
Notes, Mentor Graphics
112OASIS Repetition Types
Type 1
Type 2
Type 4
Type 5
Type 3
Type 7
Type 6
Type 8
113References
1. J. Rubinstein, P. Penfield and M. A. Horowitz,
"Signal Delay in RC Tree Networks", IEEE Trans.
on Computer-Aided Design, CAD-2, 3, July,
1983. 2. J. Lillis, C. K. Cheng, T. Y. Lin and
C. Y. Ho, "New Performance Driven Routing
Techniques with Explicit Area/Delay Tradeoff and
Simultaneous Wire Sizing",33rd Design Automation
Conference, pp. 395-400, 1996. 3. J. Cong and
Xin Yuan, "Routing Tree Construction under Fixed
Buffer Locations", Design Automation Conference,
2000. 4. P. Dasgupta, "Relative Accuracies of
Estimators and their use in VLSI Routing", IIM-C
Tech. Report, 2003.
114References
5. K. Sinha, S. Sur-Kolay, P. Dasgupta and B. B.
Bhattacharya, "Partitioning Routing Areas into
Zones with Distinct Pins", Proc. International
Conference on VLSI Design (IEEE-CS Press),
Bangalore, India, 2001. 6. L.P.P.P. van
Ginneken, "Buffer Placement in Distributed
RC-tree Networks for Minimal Elmore Delay",
International Symposium on Circuits Systems,
1990, pp. 865-868. 7. T. Okamoto and J. Cong,
"Interconnect Layout Optimization by Simultaneous
Steiner tree construction and Buffer insertion",
International Conference on Computer-Aided Design
(ICCAD), 1996.
115References
8. J. Cong and K.S. Leung, "Optimal Wiresizing
Under the Distributed Elmore Delay Model",
International Conference on Computer-Aided Design
(ICCAD), 1993. 9. B. A. McCoy and G. Robins,
"Non-Tree Routing", IEEE Transactions on
CAD/ICAS, Vol 14, No. 6, June 1995. 10. M.
Edahiro, "A Clustering-based Optimization
Algorithm in Zero-Skew Routings", 30th Design
Automation Conference, 1993. 11. J. Cong, A. B.
Kahng, C-K.Koh and C.-W. A Tsao, "Bounded-Skew
Clock and Steiner Routing, ACM Transactions on
Design Automation of Electronic Systems, Vol 3,
No 3, 1998, pp. 341-388.
116References
12. M. Saitoh, M. Azuma and A. Takahashi, "A
Clustering Based fast Clock Schedule Algorithm
for Light Clock Trees", IEICE Transactions
Fundamentals, Vol E85-A, No. 12, Dec, 2002. 13.
J. Cong, T. Kong and Z. Pan, "Buffer block
Planning for Interconnect Planning and
Prediction", IEEE Transactions on CAD/ICAS,
December, 2001. 14. J. Cong and Xin Yuan,
"Routing Tree Construction under Fixed Buffer
Locations", DAC, 2000. 15. P. Dasgupta, A. B.
Kahng and S. V. Muddu, "A Novel Metric for
Interconnect Performance", Design and Test
Automation in Europe (DATE), 2003.
117References
16. A. B. Kahng and Y. C. Pati, "Subwavelength
Optical Lithography Challenges and Impact on
Physical Design", Proc. International Symposium
on Physical Design, April 1999, pp. 112-119. 17.
R. Tian, D. F. Wong and R. Boone, "Model-Based
Dummy Feature Placement for Oxide CMP
Manufacturibility", Design Automation Conference,
2000. 18. S. Ueki, et al, "Effective Data
Compaction Algorithm for Vector Scan EB Writing
System", S. Ueki, Proceedings of SPIE, Vol.
4186, 2001.
118References
19. "New Stream Format Progress Report on
Containing Data Size Explosion", DSM Technical
Notes, Mentor Graphics, 2003. 20. A. B. Kahng
and G. Robins, "A New Class of Steiner Tree
Heuristics with Good Performance The Iterated
1-Steiner Approach", Proc. International
COnference on Computer-Aided Design, pp. 428-431,
1990. 21. International Technology Roadmap for
Semiconductors (ITRS), http//public.itrs.net. 22
. J. Cong, K-S. Leung, D. Zhou,
"Performance-Driven Interconnect Design based on
Distributed RC Delay Model", Design Automation
Conference, 1993.
119Session IV Analyzing Layout Databases for
Improving Test Quality
Dr. Sujit T Zachariah Intel India Development
Centre, Bangalore (sujit.t.zachariah_at_intel.com)
120Outline
- HVM Test Basics
- Case Studies
- Defect Based Testing
- Shorts
- Bridge defects (Random two-node and multi-node)
- Opens
- Open defects (Random and systematic)
- Circuit Marginality Testing
- Power Supply Droop
- Q A
121High Volume Manufacturing (HVM)Test Basics
122HVM Testing Approaches An Overview
- Functional testing
- Exorbitant cost of testers (at-speed application)
- Need for frequent tester upgrades
- Cost of manual test generation
- Structural testing
- Low cost, re-usable structural testers
- Automated approaches for test pattern generation
- Use of fault models
- Classical approach stuck-at fault model
123But
- Most manufacturing defects behave electrically as
shorts or opens - Marginality issues introduced by design tool
approximations and process variations on the rise
with device scaling - Stuck-at fault model inadequate for both cases
- We need to rethink fault models!
- Adequately model failing behavior
- Simple enough for targeting test generation
124Also
- For realistic fault models
- Number of possible faults is extremely large
- Current ATPG techniques limit target size
- Implies need for fault extraction prior to fault
modeling - Enumerate all failure sites
- Prioritize failure sites as a ranked list
(probability) - Analysis at lower level of design abstraction
- Circuit (schematic) Example cross talk
analysis - Physical (layout) Layout Analysis
for Test
125Layout Databases Assumptions
- All standard industry formats converted to
standard hierarchical database format - Rectilinear polygons converted to set of non
overlapping rectangles - Non Manhattan geometry approximated as
rectilinear polygons
126Case Study 1 Defect Based TestingExtraction
of Random Bridge Defects
127Bridge Fault Extraction Overview
- Identify potential bridge failure sites in a
layout - Useful for yield estimation, test generation and
failure analysis - Approaches
- Capacitance Extraction Based Approaches Stroud,
Emmert et al 00 - Inductive Fault Analysis (IFA) Based Approaches
Ferguson, Shen 88 - Uses defect information from manufacturing
sources - Likelihood of occurrence modeled using Weighted
Critical Area (WCA)
128Inductive Fault Analysis (IFA) Overview
129Bridge Faults Types
Multi-Node Bridge Faults
Two-Node Bridge Faults
ltn2,n3gt 1.8 ltn1,n2gt 0.7 ltn2,n3gt 0.6 ltn1,n2,n3gt 0.4
ltn2,n3gt 2.2 ltn1,n2gt 1.1 ltn2,n3gt 1.0
- Why Multi-Node Bridge Analysis?
- Accuracy of extracted bridge list
- Impact on test quality and yield estimation
130IFA Based Approaches
- CARAFE Jee, Ferguson 92
- CREST Nag, Maly 95
- LOBS Gonclaves, Teixeira, Teixeira 96,97
- Eiffel Chakravarty, Zachariah 00
- FedEx Walker, Stanojevic 01
131IFA Based Approaches CARAFE
- Straightforward implementation of the WCA
definition - For each layer L (or layer pair)
- For each defect size S
- Expand each feature by the defect size S
- Determine CARs as the intersection area of the
expanded rectangles - Annotate CARs with net name pair and collect them
into a global list - Find union of CARs by selectively merging the
rectangles from the global list - Repeat computations for each given defect size
132IFA Based Approaches CARAFE
- Sources of inefficiency
- Linear increase in run time with the number of
defect sizes processed - Sub optimal line sweeping rectangle intersection
algorithm - Overhead due to global processing of CARs
- Limits use to very small layouts
133IFA Based Approaches CREST
- Uses layout hierarchy (no flattening)
- WCA computations performed one instance at a time
- bottom up approach - Through-the-cell routing and net name propagation
issues - Accuracy issues with generated fault list (WCA
values ranking)
134IFA Based Approaches LOBS
- Uses sliding window algorithm for computing CARs
based on maximum defect size - Algorithm for determining union of CARs based on
the cube generation of the intersections - When two CARs A and B overlap,
- CA computed as Area(A) Area(B) - Area (A
intersection B) - Potential explosion in number of computations if
number of overlapping CARs is large
135IFA Based Approaches Eiffel
- Process multiple defect sizes
- Results deduced for all defect sizes from the
calculations for maximum defect size - Interval tree based algorithm to determine
rectangle intersections - Novel algorithm for finding the union of
rectangles constituting the critical area for a
bridge - Resulting Algorithm is
- Able to process large number of defect sizes
- Able to handle larger layout databases
136 Algorithm Outline
- For each layer L (or layer pair)
- Step1 Determine CAR for the maximum defect size
- Expand each feature by the maximum defect size
Smax - Determine max_CARs as the intersection area of
the expanded rectangles - Efficient computation using interval trees
- Annotate CARs with net name pair and collect them
into buckets, with each net pair having its own
bucket
137 Algorithm Outline
- Step 2 Process each bucket of max_CARs
- For each net name pair ltN1,N2gt (bridge)
- For each defect size S
- Shrink max_CARs by (Smax-S) to obtain CARs for
the size S - Merge CARs to obtain CA(N1,N2,S,L)
- (Efficient merging using novel algorithm)
- Weigh CA(N1,N2,S,L) with pL(S) and update
WCAltN1,N2gt - (Bridges and their associated WCA maintained in
balanced AVL tree for efficiency of update
process)
138Experimental Results
300X improvement
139Experimental Results
140IFA Based Approaches FedEx
- Algorithm targeted for fast results
- Capable of handling large VLSI layout databases
- Accuracy traded to achieve speed
141Multi-Node Bridges
- Computation more challenging than two-node
analysis - Eiffel Algorithm
- Compute two node critical area rectangles
- Performed only for the maximum defect size
- Efficient interval tree based solution
- Resulting critical areas collected into a global
list - Critical area rectangles for all defect sizes
deduced from critical areas corresponding to the
maximum defect size
142Multi-Node Bridges (Continued)
- Compute multi-node WCA value increments from
critical area rectangles - Novel line sweep based solution
143Experimental Resul