An Efficient Surface-Based Low-Power Buffer Insertion Algorithm - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

An Efficient Surface-Based Low-Power Buffer Insertion Algorithm

Description:

An Efficient Surface-Based Low-Power Buffer Insertion Algorithm Rajeev R. Rao, David Blaauw, Dennis Sylvester, Charles Alpert*, Sani Nassif* Department of EECS ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 28
Provided by: RRa84
Learn more at: http://archive.sigda.org
Category:

less

Transcript and Presenter's Notes

Title: An Efficient Surface-Based Low-Power Buffer Insertion Algorithm


1
An Efficient Surface-Based Low-Power Buffer
Insertion Algorithm
  • Rajeev R. Rao, David Blaauw, Dennis Sylvester,
    Charles Alpert, Sani Nassif
  • Department of EECS, University of Michigan, Ann
    Arbor, MI
  • IBM Austin Research Laboratory, Austin, TX
  • rrrao, blaauw, dennis_at_eecs.umich.edu, alpert,
    nassif_at_us.ibm.com

2
Interconnect Trends
  • Interconnect power a major issue
  • Huge power consumption in both global and local
    signal nets
  • Repeater counts increasing drastically
  • IBM 50 of leakage in inverters/buffers
  • Assuming continuation of current design styles,
    dramatic projections for the 32nm technology node
  • 70 of cell count repeaters
  • 65-80 of dynamic power due to interconnects
  • Leakage increasing exponentially
  • Require Optimal repeater usage with the
    objective of total power minimization

3
Outline
  • Introduction
  • Delay and Buffer models
  • Previous Work
  • Proposed Algorithm
  • Library characterization
  • Generation of different types of candidates
  • Merging, Propagation, Snapping
  • Results
  • Conclusion

4
Introduction
  • Wire RC delay is quadratic function of wire
    length
  • Segmenting wires decreases delay
  • Same idea applicable for interconnect tree
    structures
  • Buffers inserted for delay management
  • Additional benefit Buffers/Inverters decouple
    large output loads

1
1
Repeater
Receiver
Driver
Wire Length 2, Wire Delay ? (1)2(1)2 2
5
Elmore Delay model
  • Represent interconnect tree with a lumped RC
    model
  • Assume binary tree topology is fixed with an
    initial Steiner tree estimation
  • n vertices (branch points) and (n-1) edges (ie.,
    wires)
  • For a wire e connecting vertices (u, v) the
    Elmore delay is
  • where T(v) is the maximal subtree rooted at v
    that does not contain buffers
  • The total delay from a vertex v to a sink node si
    is

6
Buffer model
  • Linear gate delay model used for the buffers
  • Assumption Delay is a linear function of output
    capacitance
  • Isolation Property Buffer devices decouple
    downstream output loads from the parent trees
  • Assumption Miller effect (bootstrapping) due
    to Cgd is negligible

Dbuffer Dintrinsic-delay Rintrinsic-resistance
Coutput-load
v
Cload
7
Buffer Insertion Problem
BufLib b1 b2 b3
  • Timing Metrics
  • Required Arrival Time (RAT)
  • Each sink specified a given RAT(si) value and
    source is fixed as RAT(so)0
  • Delay minimization ? Maximize slack at source
    q(so)
  • Subtree Delay (SD)
  • SD(si) RATmax(si) RAT(si)
  • Delay minimization ? Minimize SD(so)
  • Advantage Unlike RAT, equations using SD are
    additive
  • Our approach
  • Tradeoff surfaces in 3D space of delay,
    capacitance and power
  • Continuously-sized buffer libraries

8
Outline
  • Introduction
  • Delay and Buffer models
  • Previous Work
  • Proposed Algorithm
  • Library characterization
  • Generation of different types of candidates
  • Merging, Propagation, Snapping
  • Results
  • Conclusion

9
Previous Work
  • L. P. P. P. van Ginneken (VG) ISCAS90
  • Two phase dynamic programming algorithm
  • Backward traversal up the interconnect tree to
    compute of load and delay values
  • Forward solution pass to reconstruct best
    candidate

Function BOTTOM_UP (v) 1. If v e sink return
(Cv, SDv) Else 2. / compute options for
subtrees / 3. BOTTOM_UP( left(v) ) 4. BOTTOM_UP(
right(v) ) 5. Join pairs of subtrees by a merge
operation 6. Find best cnd among merged cnds to
add a buffer 7. Add parent wire to both types of
cnds 8. Prune inferior cnds from set of cnds 9.
Store cnd list for node v and return
Post-order DFS traversal
Merge operation Cparent Cleft
CrightSDparent max(SDleft, SDright)
Buffer candidate creation
Pruning provably inferior candidates
10
VG Algorithm
  • Candidate Format 2-tuple (Load, Subtree Delay)
    (c,s)
  • Recursive forumulas for two possible cases
  • Pruning Criteria (c1,s1) better than (c2,s2)
    if both load and subtree delay values are lower
    i.e., c1ltc2 and s1lts2
  • Merge operation linear
  • Complexity O(n2) where n number of buffer
    locations
  • Additional objective Minimize buffer count ??
    Complexity is non-polynomial

Only a wire is added at root of subtree A buffer and a wire added at root of subtree
c1 c0 cwire s1 s0 dwire c1 cbuf cwire (Isolation Property) s1 s0 dint rbufc0 dwire

(c1.s1)
(c1.s1)
(c0.s0)
(c0.s0)
11
Previous Work
  • Extensions to VG by Lillis et. al. ICCAD95,
    JSSC96
  • A buffer library B can be used during buffer
    insertion ? Complexity O(n2B2)
  • Simultaneous wire sizing and buffer insertion
  • Incorporate signal slew into buffer delay model
  • Dynamic power minimization subject to timing
    constraints
  • Candidate Format 3-tuple (Load, Subtree Delay,
    Power) (c,s,p)
  • Equate power with effective total capacitance
  • Assumption All capacitive values can be linearly
    mapped onto a polynomially-bounded integer domain
    (cmax max cap value)
  • Sophisticated pruning mechanism using orthogonal
    range query
  • Complexity O(n3Bc2maxlog(ncmax)) based on the
    assumption

12
Previous Work
  • Several approaches presented in literature to
    target power minimization in conjunction with
    buffer insertion. Examples
  • Quadratic programming Chu et. al. TCAD99
  • Lagrangian relaxation C.-P.Chen et. al. TCAD99
  • ClockTune J.-L.Tsai et. al. TCAD04
  • Associate total power with effective capacitive
    area of wires devices
  • Area minimization ? Power minimization
  • Ignores the contribution of static leakage power
  • Inclusion of this component results in
    non-polynomial complexity
  • Addition of extra components in candidates
    generally leads to exponential complexity for
    dynamic programming

13
Contributions of this paper
  • Novel continuous buffer insertion algorithm
    with total power minimization
  • Inclusive of both dynamic and leakage power
  • Generate tradeoff surfaces in the 3D DCP (Delay,
    Capacitance, Power) space
  • User is able to pick any desired point on this 3D
    surface
  • Easy to explore trade-offs between the 3
    variables
  • Ability to handle arbitrarily large buffer
    libraries
  • Continuously sized cell libraries with numerous
    buffer sizes
  • Capable of snapping to discrete buffer sizes if
    necessary
  • Worst-case polynomial complexity O(n2)
  • Similar to basic VG algorithm

14
Outline
  • Introduction
  • Delay and Buffer models
  • Previous Work
  • Proposed Algorithm
  • Library characterization
  • Generation of different types of candidates
  • Merging, Propagation, Snapping
  • Results
  • Conclusion

15
Library Characterization
  • Buffer library with a set of continuously sized
    buffers
  • Let S sizing factor of the library. Express
    delay (db), capacitance (cb) and leakage (lb) in
    terms of S.
  • Determine c0, c1, l0, l1, d0, d1 through
    empirical fitting constants
  • Equations combine discrete buffer sizes
    approximate the ideal of continuous buffer sizing

cb ? Buffer Area ? cb c0 c1S lb ? Device
width ? lb l0 l1S db? Linear gate delay
model ? db d0 d1(Cout/S)
16
Generation of candidates
b2
b1
b3
b4
lw1
lw2
lw3
o
u
v
t
  • Point Candidate
  • Candidate Format 3-tuple (Do, Co, Po)
  • Node has point candidate ? there are no buffers
    in subtree rooted at that node
  • All sinks have point candidates
  • Write equations to determine candidate at u

17
Generation of candidates
(D0, C0, P0)
b2
b1
b3
b4
lw1
lw2
lw3
o
u
v
t
?
Variable S
  • Curve Candidate
  • Candidate Format Dumin,Dumax, (gi, ki)
    i0,2
  • Node has curve candidate ? Exactly one buffer in
    subtree rooted at node

18
Generation of candidates
(Du, Cu, Pu)
b2
b1
b3
b4
lw1
lw2
lw3
o
u
v
t
For a given S, Cv fixed, Dv, Pv vary based on Du
C-plane with discrete Cv
  • Surface Candidate
  • C-plane Format Cv, Dmin,Dmax, (ki) i0,2
  • Candidate Format vectorltCPlanegt

19
Generation of candidates
(Du, Cu, Pu)
(Dv, Cv, Pv)
b2
b1
b3
b4
lw1
lw2
lw3
o
u
v
t
  • Similar equations can be written to determine
    candidate at t
  • Ct ? S but Dt, Pt ? Cv, Dv, S
  • New set of C-planes.
  • ? C-plane, Lower envelope ? Power optimal
    solution
  • Surface candidate ? Surface candidate

20
Design Choices
  • Wire network is a binary tree
  • Zero-length wires, dummy nodes
  • Ignore signal polarity on buffers
  • Pair of solution sets (similar to Lillis)
  • Number of surface candidates per node 2
    (Buffered/Non-buffered)
  • Trade-off between more fine grained solutions and
    efficiency
  • No impact on optimality or complexity

21
Merging and Implicit Pruning
  • First, merge left and right candidate
  • Compare equal delay points by checking 4
    combinations of left and right candidates
  • Create P/C curves and extract the lower envelope
    ? Pruning
  • Translate P/C curves with fixed D value into P/D
    curves with fixed C values ? Creation of C-planes
    for 4 different surface candidates
  • Next, recombine these 4 surfaces into single
    candidate
  • Map P/D curves from one C-plane to another using
    linear interpolation
  • ? (D,C) value pick lowest power value ? Pruning
  • Use composite surface to create the
    buffered/non-buffered candidate

22
Reconstruction and Snapping
  • Pair of candidate solutions created for source
  • Any trade-off point in the DCP surface can be
    picked
  • Forward solution pass to reconstruct the tree
    structure with buffer locations
  • Snapping If required size is unavailable then
    buffer with nearest size value is chosen
  • Problem Discrepancies in D, C, P values ?
    Solution Local refinements in the C-planes
  • Single pass through the RC tree
  • Complexity O(n2) where n number of possible
    buffer locations

23
Outline
  • Introduction
  • Delay and Buffer models
  • Previous Work
  • Proposed Algorithm
  • Library characterization
  • Generation of different types of candidates
  • Merging, Propagation, Snapping
  • Results
  • Conclusion

24
Results
  • Benchmarks C-tree nets
  • TSMC 0.13um buffer library
  • Number of discrete buffer choices 9
  • Multilinear fitting models using GNU Scientific
    Library
  • Example 3D surface

25
Results Snapping
26
Results Comparison
  • Implementation of Lillis algorithm with leakage
    included
  • Pruning less effective

27
Conclusion
  • Buffer insertion algorithm with total power (Pdyn
    Pstat) minimization as objective
  • Generate 3D surfaces in Delay, Capacitance and
    Power space
  • Ability to explore different types of trade-offs
  • Able to handle large buffer libraries with
    continuous sizes
  • Worst case polynomial complexity
Write a Comment
User Comments (0)
About PowerShow.com