Title: Keeping Hot Chips Cool
1(No Transcript)
2Keeping Hot Chips Cool
- Ruchir Puri, Leon Stok, Subhrajit Bhattacharya
- IBM T.J. Watson Research Center
- Yorktown Heights, NY
3So, Whats Going On ?
- At 65nm node Static Power is equal to Active
Power - Clock distribution accounts for half of active
power
4Why Cant We Keep Scaling Vt ?
1000
100
10
1
5Low Power Opportunities
- Most of the Power reduction techniques exploit
this positive slack.
6Low Power Levers
- Structural Techniques
- Voltage Islands
- Multi-threshold devices
- Multi-oxide devices
- Minimize capacitance by custom design
- Power efficient circuits
- Parallelism in micro-architecture
- Dynamic Techniques
- Clock gating
- Power gating
- Variable frequency
- Variable voltage supply
- Variable device threshold
7Outline
Clock Latch Optimization
Voltage Islands
Power Gating
Leakage Power
Active Power
Clock Power
8Outline
Clock Latch Optimization
Voltage Islands
Power Gating
Leakage Power
Active Power
Clock Power
9Minimizing Active PowerCoarse Grained Voltage
Islands
- Trade off power for delay by running functional
blocks at different voltages - Can use mix of Low and High Vt to balance
performance and leakage - Switch off inactive blocks to reduce leakage
power
- E.g. Telecom ASIC 1.0/1.2 V islands saved
- 16 active power
- 50 standby power
10Fine-Grained Voltage Islands
PowerPC 405
- No timing degrade, and no area increase for the
core!
11Outline
Clock Latch Optimization
Voltage Islands
Power Gating
Leakage Power
Active Power
Clock Power
12Minimizing Clock PowerLocal Clock buffer -
Latch clustering
- Clocks consume large amount of power in
high-performance designs - Large portion of that power goes to the last
stage of the clock tree - Minimize the Capacitive loading on local clock
buffers by clustering latches around them. - Tradeoff between latch placement flexibility and
clock power savings - Reduction in clock skew between capturing and
launching latch compensates for loss in latch
placement flexibility.
13Clock Power Savings
- Reduces total capacitance on the local clock
buffer by 25 - Direct savings in clock power in the Random
Control Logic
14Outline
Clock Latch Optimization
Voltage Islands
Power Gating
Leakage Power
Active Power
Clock Power
15Minimizing Leakage PowerPower Supply Gating
- Leakage power is now more than switching power
- Limits the performance of microprocessors
- Power gating is one of the most effective ways of
minimizing leakage power - Cut-off power to inactive units/components
- Dynamic/workload based power gating
- Reduces both gate and sub-threshold leakage
- Over 20-2000x reduction in leakage with little or
no cycle time penalty.
16Power Gating Concept
Performance on Demand
Dedicated Units
off
on
P1
P2
P1
P2
L2
L2
P4
P4
P3
P3
More Power Available to Scalar Units Higher SPEC
Performance
Dedicated Units Available for Higher Application
Performance
17Normal Operation Mode
18Sleep Mode
VDDL
VGS VDD
IDS,MAX
CORE
IDS
VGND
VGS 0 V
VDS
During the sleep mode, all of the internal
capacitive nodes and VGND node are charged up to
near VDD. Requires sizing down of footer device
to reduce standby leakage.
GNDL
19Wake-Up Mode
20Sleep / Wake / Run State Control
assert run
assert wake
Exit sleep state
disable fence
discharge
off
run
deassert wake/run
enable fence
Enter sleep state
off
run
charge
)
discharge cycle (wake)
charge cycles
sleep
sleep
run (idle)
21Footer Selection and Sizing
15.5x
10x-20x Leakage Reduction
20x
25x
Leakage Reduction
33x
50x
lt 1 Frequency Loss
100x
22Power vs Performance Tradeoff
130nm Hardware
8 Performance Degradation Due to Sleep
Transistor with 1 area overhead
Target Specification 250MHz at 0.9V 500MHz at
1.4V 1 footer size is used for a 2-stage
pipelined 40-bit ALU
23Sleep Transistor Sizing and Performance
130nm Hardware
Less Than 2 Performance Degradation
More Than 8 Performance Degradation
24Leakage Power Reduction
130nm Hardware
Leakage Suppression Using VDD Scaling
8.4 x
2000 x
Leakage Suppression using Power Gating Structure
with 1 area overhead
25Physical DesignExternal Footer Switch
26Physical DesignInternal Footer Switch
- Internal fine-grained power gating is more
efficient in addressing - Electro-Migration and Current Delivery.
27Ground Redistribution
The real chip-level ground distribution is M4
and above. It is unchanged by power
gating
Global ground
Virtual ground
M3
This part of the redistribution is electrically
similar to an unmodified distribution
V2
M2
V1
M1
Contact
Logic Device
Footer Cell
28Physical Design Footer Insertion
Footer Rows
Without Footers
With Footers
29Power Gating in High-Performance
Gated and non-gated logic haveidentical
width 5 total area overhead for power
gating 20X leakage reduction lt1 performance
degradation
Non-gated Logic
Gated Logic
30Power Gating Footer area overhead
10mV Virtual Ground
31Conclusions
- Power is the limiting factor in traditional CMOS
scaling and must be dealt with aggressively - Controlling leakage is crucial for future scaling
- Power gating and voltage islands are effective
techniques to minimize leakage and active power - Special consideration to clock distribution must
be given in high performance designs to minimize
clock power - In order to keep hot chips cool, a holistic power
minimization approach across the whole design
stack is required which must include - Device level techniques
- Circuit level techniques
- System level power management