Title: Routing Track Duplication with Fine-Grained Power-Gating for FPGA Interconnect Power Reduction
1Routing Track Duplication with Fine-Grained
Power-Gating for FPGA Interconnect Power Reduction
- Yan Lin, Fei Li and Lei He
- EE Department, UCLA
- Partially supported by NSF grant CCR-0306682.
Address comments to lhe_at_ee.ucla.edu.
2Outline
- Review and Motivation
- Interconnect Leakage Power Reduction using
Power-gating - Interconnect Dynamic Power Reduction using
Dual-Vdd - Conclusions and Ongoing Work
3Power Limitation of FPGAs
- Existing FPGAs are HIGHLY power inefficient (gt
100X more than ASIC) - E.g. Kusse, ISLPED98
- Power is likely the largest limitation for FPGAs
Design Example Vdd Energy
Xilinx XC4003A 5v 4.2mW/MHz
Static CMOS ASIC 3.3v 5.5uW/MHz
4FPGA Power Reduction
- Power aware FPGA CAD algorithms for existing FPGA
architectures - CAD algorithms to minimize power-delay product
Lamoureux et al, ICCAD03 - Configuration inversion for leakage reduction
Anderson et al, FPGA04 - Power efficient FPGA circuits and architectures
- Dual-Vdd and Vdd-programmable FPGA logic blocks
Li et al, FPGA04Li et al, DAC04 - Vdd-programmable FPGA interconnects
- Li et al, ICCAD04
- Anderson et al, ICCAD04
5Overall FPGA Structure
- Cluster-based Island Style FPGA Structure
- Logic blocks are embedded into routing resources
- Wire segment connectivity is programmable
6FPGA Routing Structure
- Subset Programmable switch block
- An incoming track can be connected to different
outgoing tracks with the same track number - Programmable connection block
7Vdd-programmable Interconnects Li et al,
ICCAD04
- Conventional routing switch
- Vdd-programmable switch
- Vdd selection for used switch
- Power-gating unused switch
- Configurable Vdd-level conversion
- Avoid excessive leakage when low Vdd switch
drives high Vdd switches
Power transistor
8Limitation of Vdd-programmable Interconnects Li
et al, ICCAD04
- Fine-grained Vdd-level converter insertion
- Area overhead
- 54 area overhead for circuit s38584
- Leakage overhead
- 36 leakage overhead for circuit s38584
- SRAM cell overhead
- 300 SRAM cell overhead for each switch
- Area/SRAM efficient low-power interconnects are
needed
9Outline
- Review and Motivation
- Interconnect Leakage Power Reduction using
Power-gating - Interconnect Dynamic Power Reduction using
Dual-Vdd - Conclusions and Ongoing Work
10Low Utilization Rate of Interconnects
- 78.15 of total power is consumed by global
interconnect power Li et al, DAC04
- 47 of global interconnect power is leakage
- Why?
- Extremely low utilization rate (12 w/ minimum
array)
Circuit of total interconnect switches of unused interconnect switches Utilization rate ()
alu4 apex4 bigkey clma des diffeq dsip elliptic ex5p frisc 36478 43741 63259 653181 87877 42746 75547 140296 45404 2388523 31224 37703 54017 593343 79932 36974 70138 125800 39288 216993 14.40 13.80 9.87 9.16 9.04 13.50 7.16 10.33 13.47 9.15
Average 11.90
11Interconnect Utilization Rate is Intrinsically Low
- Programmable switch block
- no more than 25
- Programmable connection block
- Only one is used (for 64 tracks)
- Power-gating unused interconnects is necessary
12Vdd-gateable Routing Switch
- Conventional routing switch
- Vdd-gateable routing switch
- Only two states for a routing switch
- High Vdd
- Power-gating
- Enable power-gating capability w/o extra SRAM
cells
Power transitor
13Vdd-Gateable Connection Block
- Conventional connection block
- Vdd-gateable connection block
- Enable power-gating capability w/ only one extra
SRAM for a connection block - Only n1 SRAM cells for 2n connection switches
- A low leakage decoder is needed
14Power and Delay of Vdd-gateable Switch
- Vdd-gateable switch compared to conventional
switch - Dynamic power is almost the same
- gt300X leakage power reduction
- 6 delay increase
Vdd Routing switch delay (ns) Routing switch delay (ns) Energy per switch (Joule) Energy per switch (Joule)
Vdd w/o power-gating w/ power-gating w/o power-gating w/ power-gating
1.3v 5.90E-11 6.26E-11(6) 3.3E-14 3.25E-14
1.0v 6.99E-11 7.42E-11(6.1) 1.63E-14 1.65E-14
15Power Reduction by Power-gating Unused
Interconnects
Circuit Single-Vdd (baseline) Single-Vdd (baseline) Total Power Saving Total Power Saving
Circuit Interconnect power (W) Total power (W) Li et al, ICCAD04 Vdd-gateable Interconnects
alu4 0.0657 0.0769 25.13 29.09
apex4 0.0437 0.0500 21.83 30.70
bigkey 0.1044 0.1375 33.38 24.89
clma 0.4918 0.5450 23.42 45.69
des 0.1688 0.2136 36.71 31.79
diffeq 0.0292 0.0360 17.50 45.20
dsip 0.1003 0.1280 34.34 43.66
Avg. -- -- 25.19 38.18
Vdd-programmable interconnects
Vdd-gateable interconnects
16Outline
- Review and motivation
- Interconnect Leakage Power Reduction using
Power-gating - Interconnect Dynamic Power Reduction using
Dual-Vdd - FPGA fabrics and algorithms
- Design flow and quantitative evaluation
- Conclusions and Ongoing Work
17Pre-Defined Dual-Vdd Routing Architecture
- Reduce dynamic power with dual-Vdd by making use
of timing slack
- Partition routing channel into VddH and VddL
regions - Vdd-gateable interconnect switch is used
- Ratio of VddH/VddL track is an architectural
parameter
18Ratio of VddH to VddL Track
- Determine ratio using dual-Vdd assignment profile
without considering layout constraint - Sensitivity-based dual-Vdd assignment
- Assignment unit --- a routing tree
- Power sensitivity --- ?P/ ?Vdd
- Power difference for a routing tree between VddH
and VddL - Greedy algorithm --- sensitivity based
- Initial uniform VddH assignment
- Procedure assign VddL to routing tree with
largest power sensitivity (but without increasing
critical delay)
19Profile of Dual-Vdd Assignment
- Assignment with no critical path delay increase
(VddHVddL1.5v1.0v)
Circuits of routing trees of logic blocks of I/O blocks VddL routing trees () VddL logic blocks ()
alu4 782 162 22 49.74 82.10
apex4 849 134 28 35.45 78.36
bigkey 1542 294 426 67.77 85.03
clma 7995 1358 144 69.74 89.84
s38417 5426 982 135 64.17 80.05
seq 1138 274 76 20.74 61.62
spla 2091 461 122 54.52 88.47
Avg. 54.54 80.28
- Set the ratio of VddH/VddL track to 11
20Level Converter is NOT Needed
B
A
- Wire segment can only be connected to another
wire segment with the same track number via a
subset switch block
21Level Converter is NOT Needed
B
A
- Wire segment can only be connected to another
wire segment with the same track number via a
subset switch block
- No level converter is needed in switch block
22Layout Constraint Due to Dual-Vdd
- Dual-Vdd introduces performance degradation due
to layout constraint - Insufficient routing resources for Vdd-matched
routing trees - May introduce detours
- Solutions
- Vdd-programmable interconnects Li et al,
ICCAD04 - Provide sufficient routing tracks for Vdd-matched
routing trees - Control leakage by power-gating unused
interconnects
23Design Flow for Dual-Vdd Interconnects
Tech Mapped Netlist (Single-Vdd)
Timing Driven Layout (Single-Vdd)
Dual-Vdd Assignment for Routing Trees
Timing Driven Layout (Dual-Vdd)
Power-gating Unused Switches
Delay/Power Estimation
Delay
Power
24Dual-Vdd Routing Algorithm
- Based on the maze routing algorithm in VPR
- Modify the cost function
- TotalCost(n) the cost of routing tree T through
wire segment n to the target sink j - PathCostDv(n) the cost of the path from the
current partial routing tree to wire segment n - ExpectedDv(n,j) the estimated cost from wire
segment n to the target sink j - Matched(T,n) boolean function describing
Vdd-matching status
25Outline
- Review and motivation
- Interconnect Leakage Power Reduction using
Power-gating - Interconnect Dynamic Power Reduction using
Dual-Vdd - FPGA fabrics and algorithms
- Quantitative evaluation
- Conclusions and Ongoing Work
26Comparison of Low Power Architectures
0.27
0.22
power (watt)
0.17
0.12
Circuit S38584
0.07
60
70
80
90
100
110
120
130
clock frequency (MHZ)
- Dual-Vdd interconnects with fine-grained power
gating - May have performance degradation due to layout
constraint - Can reduce more power than purely power-gating
unused switches - Achieve 9.78 interconnect dynamic power
reduction, 38.68 total power saving with 1.5W
channel width - W is the nominal routing channel width in
single-Vdd FPGA
27Impact of Routing Channel Width
- We get the power reduction percentage at the
maximum clock frequency achieved by dual-Vdd
interconnects - Channel width increases from 1.0W to 2.0W
- Power saving increases from 34.86 to 45
- Normalized clock frequency increases from 0.743
to 0.955
28Area Overhead of Vdd-gateable Interconnects
Single-Vdd (baseline) Dual-Vdd w/ Power-gating (1.0W) Dual-Vdd w/ Power-gating (1.5W) Dual-Vdd w/ Power-gating (2.0W) Li et al, ICCAD04
Total FPGA area 7077044 11092744 15420197 20249865 22678225
Area overhead () - 57 118 186 220
- Area overhead is mainly due to power transistors
for power-gating capability - Track duplication with power-gating vs
Vdd-programmable interconnects Li et at,
ICCAD04 - More power reduction (45 vs 25) less area
overhead - Mainly due to Vdd-level converter removal
- High Vdd interconnects with power gating is BEST
considering area
29Outline
- Review and motivation
- Interconnect Leakage Power Reduction using
Power-gating - Interconnect Dynamic Power Reduction using
Dual-Vdd - Conclusions and Ongoing Work
30Conclusions and Ongoing Work
- Conclusions
- Developed power-gateable interconnects w/
virtually no extra SRAM cell - Achieved 38.18 total power reduction using
Vdd-gateable interconnects - Achieved 24.78 interconnect dynamic power
reduction, 45.00 total power reduction with
duplicated (2W) channel width - Ongoing work
- Power-ground design to support dual-Vdd
- Optimal mix of Vdd-programmable and Vdd-gateable
interconnects - Architecture evaluation considering Vdd
programmability Lin et al, to appear in FPGA05