Fullchip Interconnect Power Estimation and Simulation Considering Concurrent Repeater and Flipflop I

1 / 27
About This Presentation
Title:

Fullchip Interconnect Power Estimation and Simulation Considering Concurrent Repeater and Flipflop I

Description:

... and Simulation Considering Concurrent Repeater and Flip-flop Insertion ... Net-based repeater and flip-flop (FF) insertion. Full-chip interconnect power estimation ... –

Number of Views:31
Avg rating:3.0/5.0
Slides: 28
Provided by: asatisf201
Category:

less

Transcript and Presenter's Notes

Title: Fullchip Interconnect Power Estimation and Simulation Considering Concurrent Repeater and Flipflop I


1
Full-chip Interconnect Power Estimation and
Simulation Considering Concurrent Repeater and
Flip-flop Insertion
  • Weiping Liao and Lei He
  • Wliao,lhe_at_ee.ucla.edu
  • University of California at Los Angeles
    Partially supported by HP, Intel, SRC and NSF

2
Motivation
  • Interconnect delay may be more than one clock
    cycle due to increased design scale and increased
    clock rate
  • Existing research at microarchitecture level does
    not explicitly consider interconnect power and
    pipeline
  • Superscalar Watttch Brooks-et al, ISCA 2000,
    SimplePower Ye-et al, DAC 2000
  • VLIW PowerImpact Liao-et al, ICCAD 2002

Focus of this work microarchitectrure level
interconnect prediction with layout impact
3
Outline
  • Net-based repeater and flip-flop (FF) insertion
  • Full-chip interconnect power estimation
  • Microarchitecture-level interconnect power
    estimation and simulation
  • Performance impact of interconnect pipeline
  • Conclusions

4
Net-based Problem Formulation
  • Given two-pin net with driver, load and clock
    period
  • Goal insert repeaters and FFs such that the
    interconnect delay is smaller than clock period
  • Min-FF solution minimize number of FF
  • Min-power solution minimize power

5
Delay and Power Model
  • Elmore delay model
  • Power model both dynamic and leakage power
  • a dynamic switching factor
  • S total repeater size
  • Cb capacitance of minimum sized device
  • Cw interconnect capacitance for unit length
  • SF total size of FFs
  • Ioff leakage current of minimum sized device

6
Parameters based on ITRS 100nm
Sylvester-et al, ICCAD 1998
7
Repeater Insertion Models
Driver
  • Single model
  • Number of repeaters
  • Size of repeaters
  • Cascaded model
  • Number of repeaters
  • Design of cascaded inverters
  • First inverter size
  • Uniform stage ratio
  • Stage number
  • Hybrid model
  • Design of first cascaded inverters
  • Design of rest repeaters in single model

Load
8
Concurrent Repeater and FF insertion
  • Single model
  • Min-FF analytical solution extended from Lu-et
    al DATE 2002
  • Min-power enumerate number of FFs for lowest
    power
  • Cascaded and hybrid models
  • No existing analytical solution
  • Empirical formula or tables based on enumeration

9
Power Consumption for Wires under Min-FF solution
?
  • Delay target 80 clock period
  • For same wire length and delay target
  • With the same number of FFs hybrid model has the
    lowest power
  • With different number of FFs no model always
    achieves the lowest power

10
Power vs. Wire Length
  • Power with only repeater insertion increases
    superlinearly with respect to the wire length.
  • By inserting FF every 1mm, we can achieve
    virtually linear relation between power and wire
    length

11
Power versus Wire Length
  • Clock rate 3GHz
  • Min-power reduce power by 50 for wire length of
    7mm

12
Outline
  • Net-based repeater and flip-flop (FF) insertion
  • Full-chip interconnect power estimation
  • Microarchitecture-level interconnect power
    estimation and simulation
  • Performance impact of interconnect pipeline
  • Conclusions

13
Layer Assignment

Two global layers

Intermediate layers
  • Two global layers and 50 occupation rate
  • Minimum wire length in global layer
    satisfies
  • i(l) interconnect density function
  • Lmax maximum interconnect length

14
Intermediate Layers
  • Initially assume one pair of intermediate layers
  • Calculate according to
  • If gt Lbuf (maximum wire length without
    repeater insertion)
  • Increment the number of pairs of intermediate
    layers
  • is the minimum wire length for
    intermediate interconnects

15
Full-Chip Interconnect Power Estimation with
Random Interconnects
  • Decide layer assignment with i(l) based on
    stochastic wire length distribution Davis,
    ED98
  • Iterate through interconnect length L from
    to Lmax
  • Calculate power p(L) by min-FF and min-power
  • Add total power by p(L) i(L)
  • Do not consider local interconnect power
  • It is often included in logic power in uArch
    simulators

16
Full-chip Interconnect Power Estimation Results
  • Min-FF and min-power solutions reduce full-chip
    interconnect power by 2.17X and 2.56X,
    respectively, compared to the min-delay solution
  • Min-delay repeat insertion to minimize
    interconnect delay

17
Outline
  • Net-based repeater and flip-flop (FF) insertion
  • Full-chip interconnect power estimation
  • Microarchitecture-level interconnect power
    estimation and simulation
  • Estimation based on fixed switching factor
  • Simulation based on cycle-accurate activities
  • Performance impact of interconnect pipeline

18
Microarchitecture-level Interconnect Power
Estimation
  • Two types of interconnects
  • Inside each module random interconnects
  • Busses between modules structural interconnects
  • Bit-widths and Manhattan distance
  • Update layer assignment based on new interconnect
    density function
  • ik(l) is the interconnect density function with k
    iterates over all modules and busses
  • We still assume stochastic interconnect
    distribution inside a module

19
Floorplanning Used in Experiments
Floorplanning similar to MIPS R10000 processor
Floorplanning we used
20
Interconnect Power
  • interconnect power differs by 1.31X and 1.16X
    for min-FF and min-power solutions, respectively
  • Structural information should be considered for
    accurate modeling
  • Min-power provides the lower bound of pipelined
    interconnect power
  • Reduces power by 3.2 compared to min-FF solution
  • Min-FF is a preferred considering power and IPC
    tradeoff
  • Min-power solution may use excessive number of
    FFs

21
Cycle-Accurate Interconnect Power Simulation
  • Cycle-accurate simulation based on SimpleScalar
  • Min-FF solution
  • For each module and bus, calculate power with
    ideal clock gating
  • Count both dynamic and leakage power if accessed
  • Count only leakage power otherwise
  • No existing uArch simulator considers
    cycle-accurate interconnect power

22
Simulation v.s. Estimation
  • Difference between estimation and simulation is
    1.71X
  • Cycle-accurate simulation is required to obtain
    accurate power for global and intermediate
    interconnects

23
Outline
  • Net-based repeater and flip-flop (FF) insertion
  • Full-chip interconnect power estimation
  • Microarchitecture-level interconnect power
    estimation and simulation
  • Performance impact of interconnect pipeline
  • Conclusions

24
Performance Impact of Interconnect Pipeline
  • BIPS as the performance metric
  • Performance increases by up to 1.76X although IPC
    degrades
  • Performance does not fully scale w.r.t. clock due
    to the IPC degradation

25
Floorplanning Optimization
IALU1
IALU3
IALU2
IALU3
IALU2
IALU1
FALU
FALU
Industrial floorplanning
Optimized for IPC
  • Reduce interconnect length of following
    IPC-critical paths
  • Between Load/Store Queue and L1 data cache
  • Between issue window (register update unit) and
    frequently-used functional units

26
IPC under 3.5GHz Clock
  • Floorplanning optimization can increase IPC
  • On average 23.49
  • Up to 41.58

27
Conclusions
  • Concurrent repeater and FF insertion becomes
    necessary to achieve the target delay specified
    by ITRS
  • Interconnect pipelining should be considered for
    accurate power and performance modeling at uArch
    level
  • Floorplanning optimization considering
    interconnect pipelining can improves IPC by up
    to 41.58
  • a paradigm change for floorplaning from total
    wire length minimization to interconnect IPC
    optimization
  • More general, we need optimization simultaneously
    considering microarchitecture and VLSI
  • More results can be found at http//eda.ee.ucla.ed
    u
Write a Comment
User Comments (0)
About PowerShow.com