Title: Fullchip Interconnect Power Estimation and Simulation Considering Concurrent Repeater and Flipflop I
1Full-chip Interconnect Power Estimation and
Simulation Considering Concurrent Repeater and
Flip-flop Insertion
- Weiping Liao and Lei He
- Wliao,lhe_at_ee.ucla.edu
- University of California at Los Angeles
Partially supported by HP, Intel, SRC and NSF
2Motivation
- Interconnect delay may be more than one clock
cycle due to increased design scale and increased
clock rate - Existing research at microarchitecture level does
not explicitly consider interconnect power and
pipeline - Superscalar Watttch Brooks-et al, ISCA 2000,
SimplePower Ye-et al, DAC 2000 - VLIW PowerImpact Liao-et al, ICCAD 2002
Focus of this work microarchitectrure level
interconnect prediction with layout impact
3Outline
- Net-based repeater and flip-flop (FF) insertion
- Full-chip interconnect power estimation
- Microarchitecture-level interconnect power
estimation and simulation - Performance impact of interconnect pipeline
- Conclusions
4Net-based Problem Formulation
- Given two-pin net with driver, load and clock
period - Goal insert repeaters and FFs such that the
interconnect delay is smaller than clock period - Min-FF solution minimize number of FF
- Min-power solution minimize power
5Delay and Power Model
- Elmore delay model
- Power model both dynamic and leakage power
- a dynamic switching factor
- S total repeater size
- Cb capacitance of minimum sized device
- Cw interconnect capacitance for unit length
- SF total size of FFs
- Ioff leakage current of minimum sized device
6Parameters based on ITRS 100nm
Sylvester-et al, ICCAD 1998
7Repeater Insertion Models
Driver
- Single model
- Number of repeaters
- Size of repeaters
- Cascaded model
- Number of repeaters
- Design of cascaded inverters
- First inverter size
- Uniform stage ratio
- Stage number
- Hybrid model
- Design of first cascaded inverters
- Design of rest repeaters in single model
Load
8Concurrent Repeater and FF insertion
- Single model
- Min-FF analytical solution extended from Lu-et
al DATE 2002 - Min-power enumerate number of FFs for lowest
power - Cascaded and hybrid models
- No existing analytical solution
- Empirical formula or tables based on enumeration
9Power Consumption for Wires under Min-FF solution
?
- Delay target 80 clock period
- For same wire length and delay target
- With the same number of FFs hybrid model has the
lowest power - With different number of FFs no model always
achieves the lowest power
10Power vs. Wire Length
- Power with only repeater insertion increases
superlinearly with respect to the wire length. - By inserting FF every 1mm, we can achieve
virtually linear relation between power and wire
length
11Power versus Wire Length
- Clock rate 3GHz
- Min-power reduce power by 50 for wire length of
7mm
12Outline
- Net-based repeater and flip-flop (FF) insertion
- Full-chip interconnect power estimation
- Microarchitecture-level interconnect power
estimation and simulation - Performance impact of interconnect pipeline
- Conclusions
13Layer Assignment
Two global layers
Intermediate layers
- Two global layers and 50 occupation rate
- Minimum wire length in global layer
satisfies - i(l) interconnect density function
- Lmax maximum interconnect length
14Intermediate Layers
- Initially assume one pair of intermediate layers
- Calculate according to
- If gt Lbuf (maximum wire length without
repeater insertion) - Increment the number of pairs of intermediate
layers - is the minimum wire length for
intermediate interconnects
15Full-Chip Interconnect Power Estimation with
Random Interconnects
- Decide layer assignment with i(l) based on
stochastic wire length distribution Davis,
ED98 - Iterate through interconnect length L from
to Lmax - Calculate power p(L) by min-FF and min-power
- Add total power by p(L) i(L)
- Do not consider local interconnect power
- It is often included in logic power in uArch
simulators
16Full-chip Interconnect Power Estimation Results
- Min-FF and min-power solutions reduce full-chip
interconnect power by 2.17X and 2.56X,
respectively, compared to the min-delay solution - Min-delay repeat insertion to minimize
interconnect delay
17Outline
- Net-based repeater and flip-flop (FF) insertion
- Full-chip interconnect power estimation
- Microarchitecture-level interconnect power
estimation and simulation - Estimation based on fixed switching factor
- Simulation based on cycle-accurate activities
- Performance impact of interconnect pipeline
18Microarchitecture-level Interconnect Power
Estimation
- Two types of interconnects
- Inside each module random interconnects
- Busses between modules structural interconnects
- Bit-widths and Manhattan distance
- Update layer assignment based on new interconnect
density function - ik(l) is the interconnect density function with k
iterates over all modules and busses - We still assume stochastic interconnect
distribution inside a module
19Floorplanning Used in Experiments
Floorplanning similar to MIPS R10000 processor
Floorplanning we used
20Interconnect Power
- interconnect power differs by 1.31X and 1.16X
for min-FF and min-power solutions, respectively - Structural information should be considered for
accurate modeling - Min-power provides the lower bound of pipelined
interconnect power - Reduces power by 3.2 compared to min-FF solution
- Min-FF is a preferred considering power and IPC
tradeoff - Min-power solution may use excessive number of
FFs
21Cycle-Accurate Interconnect Power Simulation
- Cycle-accurate simulation based on SimpleScalar
- Min-FF solution
- For each module and bus, calculate power with
ideal clock gating - Count both dynamic and leakage power if accessed
- Count only leakage power otherwise
- No existing uArch simulator considers
cycle-accurate interconnect power
22Simulation v.s. Estimation
- Difference between estimation and simulation is
1.71X - Cycle-accurate simulation is required to obtain
accurate power for global and intermediate
interconnects
23Outline
- Net-based repeater and flip-flop (FF) insertion
- Full-chip interconnect power estimation
- Microarchitecture-level interconnect power
estimation and simulation - Performance impact of interconnect pipeline
- Conclusions
24Performance Impact of Interconnect Pipeline
- BIPS as the performance metric
- Performance increases by up to 1.76X although IPC
degrades - Performance does not fully scale w.r.t. clock due
to the IPC degradation
25Floorplanning Optimization
IALU1
IALU3
IALU2
IALU3
IALU2
IALU1
FALU
FALU
Industrial floorplanning
Optimized for IPC
- Reduce interconnect length of following
IPC-critical paths - Between Load/Store Queue and L1 data cache
- Between issue window (register update unit) and
frequently-used functional units
26IPC under 3.5GHz Clock
- Floorplanning optimization can increase IPC
- On average 23.49
- Up to 41.58
27Conclusions
- Concurrent repeater and FF insertion becomes
necessary to achieve the target delay specified
by ITRS - Interconnect pipelining should be considered for
accurate power and performance modeling at uArch
level - Floorplanning optimization considering
interconnect pipelining can improves IPC by up
to 41.58 - a paradigm change for floorplaning from total
wire length minimization to interconnect IPC
optimization - More general, we need optimization simultaneously
considering microarchitecture and VLSI - More results can be found at http//eda.ee.ucla.ed
u