Title: An Onlogn Time Algorithm for Optimal Buffer Insertion
1An O(nlogn) Time Algorithm for Optimal Buffer
Insertion
- Weiping Shi and Zhuo Li
- Department of Electrical Engineering
- Texas AM University
2Outline
- Introduction
- Problem formulation
- Review of van Ginnekens
- New techniques
- Algorithm and analysis
- Simulation
- Conclusion
3Introduction
- Buffer insertion and sizing is one of the most
effective method for reducing interconnect delay - Fundamental algorithm
- Van Ginneken (90) slack minimization in O(n2)
time and space, where n is the number of buffer
positions - Running time is polynomial but slow for
- Large nets or multiple buffer types
- Inner loop of simultaneous tree construction and
buffer insertion - Running time is not known to be polynomial for
- Buffer cost minimization
- Finding possible buffer positions
4Other Related Work
- Extensions
- Lillis, Cheng and Lin (96) O(B2n2) time and
space for B buffer types - Alpert and Devgan (97) wire segmenting
- Simultaneous tree construction and buffer
insertion - Okamoto and Cong (96) buffered Steiner tree
- Kang, Dai, Dillinger and LaPotin (97) delay
bounded buffer tree - Zhou, Wong, Liu and Aziz (00) FastPath algorithm
for 2-pin nets - Hassoun, Alpert and Thiagarajan (02) buffered
routing path
5Basic Buffer Insertion Problem
- Given A routing tree, n possible buffer
positions, sink capacitances and required arrival
times (RAT), one buffer type, unit wire
resistance and capacitance
buffer type
s2
s1
sinks
s0
s3
s4
source
possible buffer positions
6Basic Buffer Insertion Problem
- Find Some buffer positions to insert buffers so
that the slack at the source Q(s0) is maximized
s2
s1
s0
s3
s4
7Linear Delay for Buffer
u
v
u
C(b)
Cv
Driver resistance
Input capacitance
Intrinsic buffer delay
8Elmore Delay for Wire
L
v
u
unit capacitance C0 unit resistance R0
Cv
9Review of van Ginneken
- Dynamic programming, bottom up
- Each candidate solution of a branch is
represented by a (Q, C) pair, where Q is slack
and C is capacitance - For two candidates Ai and Aj of the same branch,
if Q(Ai)ltQ(Aj) and C(Ai)gtC(Aj), then Ai is
redundant - For a routing tree with n buffer positions, there
are at most n1 nonredundant candidates - Example
10Add a Wire
- For each candidate, subtract wire delay from
Q(Ai) and add wire capacitance to C(Ai) for each
candidate Ai - Example assume R0C01
- New Q(A1)500 (102/21030)150
- New C(A1)301040
- Delete redundancy
A1 (500, 30) A2 (400, 20) A3 (300, 15) A4
(250, 10)
(150, 40) (150, 30) (100, 25) (100, 20)
s0
- Time cost is O(n), where n is the number of
buffer positions downstream
11Add a Buffer
- At each possible buffer position, create a new
candidate with a buffer - Example assume K(b)0, R(b)10, C(b)10
Value of Q if add a buffer 5001040100 400102
0200 3001015150
?
New candidate (200, 10)
- Time cost is O(n), where n is the number of
buffer positions downstream
12Merge Branches
- Merge candidates of two branches of the routing
tree - (Q, C) (Q, C) ? (minQ, Q, CC)
(500, 20) (300, 10)
s0
(500, 30) (400, 20) (300, 15) (250, 10)
- Time cost is O(n1n2), where n1 and n2 are the
number of buffer positions in the two branches
13Analysis and Challenge
Van Ginnekens Our goal
Wire O(n) Buffer O(n) Merge O(n1n2) Delete
redundancy included Total O(n2)
O(log n) O(log n) O(n1 log (n2/n11)) O(log n)
per deletion O(nlogn)
14Idea 1 Candidate Tree
- Candidates are stored in a balanced search tree
- Red-black tree, AVL tree
- Decreasing Q and C order
- Search, insertion and deletion can be done in
O(log n) time - Values of (Q, C) are implicitly stored by five
fields q, c, qa, ca and ra - When qa, ca and ra are all 0, Qq and Cc
- Qqqarac, Ccca
15Example Candidate Tree
(500, 20)
(300, 10)
s0
10
16Compute (Q,C)
- Qqqarac, Ccca
- When each node is visited, values of Q and C are
computed, and values of qa, ca, and ra are
propagated down the tree
- A wire can now be processed in O(1) time. This
lazy update saves a great amount of time
17Idea 2 Pre-buffer Slack
- Pre-buffer slack of a candidate (Q, C) is the
slack after a buffer is inserted P Q K(b)
R(b)C - Example K(b)0, R(b)10
A1 Q500, C40, A2 Q400, C20, A3 Q300, C15,
P100 P200 P150
- If P(Ai)ltP(Aj) and C(Ai)gtC(Aj), then Ai is
redundant - A buffer/driver will be added to every candidate
eventually - If a buffer/driver is added now, Ai is worse than
Aj - If a buffer/driver is added later, Ai will be
even more worse than Aj since C(Ai)gtC(Aj)
18Pruning Based on (P, C)
- Using (P, C) to prune redundant candidates is
much more efficient than using traditional (Q, C) - If a candidate is redundant under (P, C), it will
be redundant under (Q, C) eventually - However using (P, C) we can prune redundant
candidates early and avoid generating more
redundant candidates - Using (P, C) alone can make Ginnekens algorithm
7X faster! - At each possible buffer position, we find the
candidate with max P in O(1) time and combine it
with the buffer
19Idea 3 Redundancy Deletion
- A wire can make some candidates redundant
- When a wire of length L is added
- Ci becomes CiLC0, the order of Cs does not
change - Qi becomes QiL2R0C0/2 LR0Ci, the order of Qs
may change
A1 (500, 30) A2 (400, 20) A3 (300, 15) A4
(250, 10)
20Expiration List
- For candidates A1,, An with QigtQi1and CigtCi1,
create an expiration list that stores
Li(QiQi1)/(CiCi1) in increasing
order - Example L110, L220, L310
- A wire with resistance R is checked against min
Li in the expiration list - If R ? min Li, then (Qi,Ci) is redundant. Delete
it, update expiration list, and re-check - If R lt min Li, no candidate is redundant
- Each redundant candidate can be deleted in O(log
n) time
21Idea 4 Unbalanced Merge
- Similar to O(nlogn) algorithm of floorplan
minimization - Using field ca, we turn the candidate tree of one
branch into the candidate tree of the merged
branches
(500, 3020) (400, 2020)
(300, 1510) (250, 1010)
- Two candidate trees with n1 and n2 candidates,
where n1 ? n2, can be merged in time
O(n1log(n2/n11))
22Algorithm
- Wire
- In O(1) time, modify fields qa, ca and ra of the
root of candidate tree - In O(log n) time, delete each redundant candidate
using expiration list - Buffer
- In O(1) time, find the candidate that gives the
max P for the buffer - Form a new candidate and in O(log n) time insert
it into the candidate tree - In O(log n) time, delete each redundant candidate
- Merge
- In O(n1log(n2/n11)) time, merge two branches
23Time and Space Cost
- Time cost (except deletion) of each
step Ta(n) ? clog n Ta(n1) for wire
or buffer Ta(n) ? cn1log(n2/n11)
Ta(n1)Ta(n2) for merge where c
is a fixed constant, n is the number of buffer
positions in the sub-trees, n1?n2 and nn1n2 - Solve the recurrence relation Ta(n) O(nlogn)
- Since there are at most n deletions, and each
deletion takes O(log n) time, total deletion time
Td(n) O(nlogn) - Total time cost T(n)Ta(n)Td(n)O(nlogn)
- Space cost is also O(nlogn)
24Simulation (CPU Time)
25Simulation (Memory)
26Multiple Buffer Types
- For each buffer type bi, create a candidate tree
Ti that stores (P, C) where P is the pre-buffer
slack for buffer type bi - For a wire, update every candidate tree Ti
- For a buffer position, add a buffer of type bj to
every candidate tree Ti - Merge is performed for the same type of candidate
trees - Time complexity is O(B2 nlogn)
27Conclusion
- An innovative algorithm that finds optimal buffer
insertion in time and space O(nlogn) - For industrial test cases, the new algorithm is 2
to 50 times faster and uses 1/2 to 1/100 of the
memory than van Ginnekens O(n2) time and space
algorithm - Since many algorithms for buffer insertion and
sizing are based on van Ginnekens algorithm, our
algorithm automatically improves these algorithms - New concepts and techniques, such as candidate
tree, (P, C) pruning, expiration list and fast
merging method, can be applied to other buffer
insertion problems