Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction

About This Presentation
Title:

Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction

Description:

Wire segment connectivity is programmable. FPGA Routing Structure ... arch-SV. 1.3v. 1.0v. 0.9v. 1.5v. arch-PV. 1.5v/0.8v. 1.3v/1.0v. 0.9v/0.8v. 1.0v/0.8v. arch-PV PG ... –

Number of Views:51
Avg rating:3.0/5.0
Slides: 31
Provided by: Fei64
Learn more at: http://eda.ee.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction


1
Routing Track Duplication with Fine-Grained
Power-Gating for FPGA Interconnect Power Reduction
  • Yan Lin, Fei Li and Lei He
  • EE Department, UCLA
  • Partially supported by NSF grant CCR-0306682.
    Address comments to lhe_at_ee.ucla.edu.

2
Outline
  • Review and Motivation
  • Interconnect Leakage Power Reduction using
    Power-gating
  • Interconnect Dynamic Power Reduction using
    Dual-Vdd
  • Conclusions and Ongoing Work

3
Power Limitation of FPGAs
  • Existing FPGAs are HIGHLY power inefficient (gt
    100X more than ASIC)
  • E.g. Kusse, ISLPED98
  • Power is likely the largest limitation for FPGAs

Design Example Vdd Energy
Xilinx XC4003A 5v 4.2mW/MHz
Static CMOS ASIC 3.3v 5.5uW/MHz
4
FPGA Power Reduction
  • Power aware FPGA CAD algorithms for existing FPGA
    architectures
  • CAD algorithms to minimize power-delay product
    Lamoureux et al, ICCAD03
  • Configuration inversion for leakage reduction
    Anderson et al, FPGA04
  • Power efficient FPGA circuits and architectures
  • Dual-Vdd and Vdd-programmable FPGA logic blocks
    Li et al, FPGA04Li et al, DAC04
  • Vdd-programmable FPGA interconnects
  • Li et al, ICCAD04
  • Anderson et al, ICCAD04

5
Overall FPGA Structure
  • Cluster-based Island Style FPGA Structure
  • Logic blocks are embedded into routing resources
  • Wire segment connectivity is programmable

6
FPGA Routing Structure
  • Subset Programmable switch block
  • An incoming track can be connected to different
    outgoing tracks with the same track number
  • Programmable connection block

7
Vdd-programmable Interconnects Li et al,
ICCAD04
  • Conventional routing switch
  • Vdd-programmable switch
  • Vdd selection for used switch
  • Power-gating unused switch
  • Configurable Vdd-level conversion
  • Avoid excessive leakage when low Vdd switch
    drives high Vdd switches

Power transistor
8
Limitation of Vdd-programmable Interconnects Li
et al, ICCAD04
  • Fine-grained Vdd-level converter insertion
  • Area overhead
  • 54 area overhead for circuit s38584
  • Leakage overhead
  • 36 leakage overhead for circuit s38584
  • SRAM cell overhead
  • 300 SRAM cell overhead for each switch
  • Area/SRAM efficient low-power interconnects are
    needed

9
Outline
  • Review and Motivation
  • Interconnect Leakage Power Reduction using
    Power-gating
  • Interconnect Dynamic Power Reduction using
    Dual-Vdd
  • Conclusions and Ongoing Work

10
Low Utilization Rate of Interconnects
  • 78.15 of total power is consumed by global
    interconnect power Li et al, DAC04
  • 47 of global interconnect power is leakage
  • Why?
  • Extremely low utilization rate (12 w/ minimum
    array)

Circuit of total interconnect switches of unused interconnect switches Utilization rate ()
alu4 apex4 bigkey clma des diffeq dsip elliptic ex5p frisc 36478 43741 63259 653181 87877 42746 75547 140296 45404 2388523 31224 37703 54017 593343 79932 36974 70138 125800 39288 216993 14.40 13.80 9.87 9.16 9.04 13.50 7.16 10.33 13.47 9.15
Average 11.90
11
Interconnect Utilization Rate is Intrinsically Low
  • Programmable switch block
  • no more than 25
  • Programmable connection block
  • Only one is used (for 64 tracks)
  • Power-gating unused interconnects is necessary

12
Vdd-gateable Routing Switch
  • Conventional routing switch
  • Vdd-gateable routing switch
  • Only two states for a routing switch
  • High Vdd
  • Power-gating
  • Enable power-gating capability w/o extra SRAM
    cells

Power transitor
13
Vdd-Gateable Connection Block
  • Conventional connection block
  • Vdd-gateable connection block
  • Enable power-gating capability w/ only one extra
    SRAM for a connection block
  • Only n1 SRAM cells for 2n connection switches
  • A low leakage decoder is needed

14
Power and Delay of Vdd-gateable Switch
  • Vdd-gateable switch compared to conventional
    switch
  • Dynamic power is almost the same
  • gt300X leakage power reduction
  • 6 delay increase

Vdd Routing switch delay (ns) Routing switch delay (ns) Energy per switch (Joule) Energy per switch (Joule)
Vdd w/o power-gating w/ power-gating w/o power-gating w/ power-gating
1.3v 5.90E-11 6.26E-11(6) 3.3E-14 3.25E-14
1.0v 6.99E-11 7.42E-11(6.1) 1.63E-14 1.65E-14
15
Power Reduction by Power-gating Unused
Interconnects
Circuit Single-Vdd (baseline) Single-Vdd (baseline) Total Power Saving Total Power Saving
Circuit Interconnect power (W) Total power (W) Li et al, ICCAD04 Vdd-gateable Interconnects
alu4 0.0657 0.0769 25.13 29.09
apex4 0.0437 0.0500 21.83 30.70
bigkey 0.1044 0.1375 33.38 24.89
clma 0.4918 0.5450 23.42 45.69
des 0.1688 0.2136 36.71 31.79
diffeq 0.0292 0.0360 17.50 45.20
dsip 0.1003 0.1280 34.34 43.66
Avg. -- -- 25.19 38.18
Vdd-programmable interconnects
Vdd-gateable interconnects
16
Outline
  • Review and motivation
  • Interconnect Leakage Power Reduction using
    Power-gating
  • Interconnect Dynamic Power Reduction using
    Dual-Vdd
  • FPGA fabrics and algorithms
  • Design flow and quantitative evaluation
  • Conclusions and Ongoing Work

17
Pre-Defined Dual-Vdd Routing Architecture
  • Reduce dynamic power with dual-Vdd by making use
    of timing slack
  • Partition routing channel into VddH and VddL
    regions
  • Vdd-gateable interconnect switch is used
  • Ratio of VddH/VddL track is an architectural
    parameter

18
Ratio of VddH to VddL Track
  • Determine ratio using dual-Vdd assignment profile
    without considering layout constraint
  • Sensitivity-based dual-Vdd assignment
  • Assignment unit --- a routing tree
  • Power sensitivity --- ?P/ ?Vdd
  • Power difference for a routing tree between VddH
    and VddL
  • Greedy algorithm --- sensitivity based
  • Initial uniform VddH assignment
  • Procedure assign VddL to routing tree with
    largest power sensitivity (but without increasing
    critical delay)

19
Profile of Dual-Vdd Assignment
  • Assignment with no critical path delay increase
    (VddHVddL1.5v1.0v)

Circuits of routing trees of logic blocks of I/O blocks VddL routing trees () VddL logic blocks ()
alu4 782 162 22 49.74 82.10
apex4 849 134 28 35.45 78.36
bigkey 1542 294 426 67.77 85.03
clma 7995 1358 144 69.74 89.84
s38417 5426 982 135 64.17 80.05
seq 1138 274 76 20.74 61.62
spla 2091 461 122 54.52 88.47
Avg. 54.54 80.28
  • Set the ratio of VddH/VddL track to 11

20
Level Converter is NOT Needed
B
A
  • Wire segment can only be connected to another
    wire segment with the same track number via a
    subset switch block

21
Level Converter is NOT Needed
B
A
  • Wire segment can only be connected to another
    wire segment with the same track number via a
    subset switch block
  • No level converter is needed in switch block

22
Layout Constraint Due to Dual-Vdd
  • Dual-Vdd introduces performance degradation due
    to layout constraint
  • Insufficient routing resources for Vdd-matched
    routing trees
  • May introduce detours
  • Solutions
  • Vdd-programmable interconnects Li et al,
    ICCAD04
  • Provide sufficient routing tracks for Vdd-matched
    routing trees
  • Control leakage by power-gating unused
    interconnects

23
Design Flow for Dual-Vdd Interconnects
Tech Mapped Netlist (Single-Vdd)
Timing Driven Layout (Single-Vdd)
Dual-Vdd Assignment for Routing Trees
Timing Driven Layout (Dual-Vdd)
Power-gating Unused Switches
Delay/Power Estimation
Delay
Power
24
Dual-Vdd Routing Algorithm
  • Based on the maze routing algorithm in VPR
  • Modify the cost function
  • TotalCost(n) the cost of routing tree T through
    wire segment n to the target sink j
  • PathCostDv(n) the cost of the path from the
    current partial routing tree to wire segment n
  • ExpectedDv(n,j) the estimated cost from wire
    segment n to the target sink j
  • Matched(T,n) boolean function describing
    Vdd-matching status

25
Outline
  • Review and motivation
  • Interconnect Leakage Power Reduction using
    Power-gating
  • Interconnect Dynamic Power Reduction using
    Dual-Vdd
  • FPGA fabrics and algorithms
  • Quantitative evaluation
  • Conclusions and Ongoing Work

26
Comparison of Low Power Architectures
0.27
0.22
power (watt)
0.17
0.12
Circuit S38584
0.07
60
70
80
90
100
110
120
130
clock frequency (MHZ)
  • Dual-Vdd interconnects with fine-grained power
    gating
  • May have performance degradation due to layout
    constraint
  • Can reduce more power than purely power-gating
    unused switches
  • Achieve 9.78 interconnect dynamic power
    reduction, 38.68 total power saving with 1.5W
    channel width
  • W is the nominal routing channel width in
    single-Vdd FPGA

27
Impact of Routing Channel Width
  • We get the power reduction percentage at the
    maximum clock frequency achieved by dual-Vdd
    interconnects
  • Channel width increases from 1.0W to 2.0W
  • Power saving increases from 34.86 to 45
  • Normalized clock frequency increases from 0.743
    to 0.955

28
Area Overhead of Vdd-gateable Interconnects
  • Device area is dominant

Single-Vdd (baseline) Dual-Vdd w/ Power-gating (1.0W) Dual-Vdd w/ Power-gating (1.5W) Dual-Vdd w/ Power-gating (2.0W) Li et al, ICCAD04
Total FPGA area 7077044 11092744 15420197 20249865 22678225
Area overhead () - 57 118 186 220
  • Area overhead is mainly due to power transistors
    for power-gating capability
  • Track duplication with power-gating vs
    Vdd-programmable interconnects Li et at,
    ICCAD04
  • More power reduction (45 vs 25) less area
    overhead
  • Mainly due to Vdd-level converter removal
  • High Vdd interconnects with power gating is BEST
    considering area

29
Outline
  • Review and motivation
  • Interconnect Leakage Power Reduction using
    Power-gating
  • Interconnect Dynamic Power Reduction using
    Dual-Vdd
  • Conclusions and Ongoing Work

30
Conclusions and Ongoing Work
  • Conclusions
  • Developed power-gateable interconnects w/
    virtually no extra SRAM cell
  • Achieved 38.18 total power reduction using
    Vdd-gateable interconnects
  • Achieved 24.78 interconnect dynamic power
    reduction, 45.00 total power reduction with
    duplicated (2W) channel width
  • Ongoing work
  • Power-ground design to support dual-Vdd
  • Optimal mix of Vdd-programmable and Vdd-gateable
    interconnects
  • Architecture evaluation considering Vdd
    programmability Lin et al, to appear in FPGA05
Write a Comment
User Comments (0)
About PowerShow.com