Title: PowerEfficient Architecture: A Primer and Some Research Ideas
1Power-Efficient ArchitectureA Primer and Some
Research Ideas
- Prof. Margaret Martonosi
- Dept. of Electrical Engineering
- Princeton University
2CPUs in the 1990s
- The Good News
- Technology and architecture improvements give us
exponential performance improvements - 2X performance every 18 months
- The Bad News
- Increases in power dissipation have a slower
doubling rate, but are also exponential
3Why care about Power?
- Battery-operated devices
- You carry the energy you useCurrent battery
technology weighs roughly 20 Watt-hours per pound - Important even in non-battery devices though...
- Heat dissipation
- packaging
- cost
- Environment!?!
- Current delivery Getting tough to deliver large
currents into chip 30W at 3V is 10Amps!
4Power Dissipation Basics
- In current CMOS circuits
- static power is negligible
- leakage currents
- turn off the clock, and power dissipation -gt 0
- dynamic power matters
- charging/discharging capacitance in circuit
- when bits change, power is dissipated
5Power-Efficient CPUs Background
- Reducing dynamic power dissipation
- Pd proportional to CV2Nf
- Lots of work ongoing at the circuits level
- lots of work on reducing V (square law helps)
- almost all CPU supply voltages are lt 3V now...
- Can always reduce f useful but hurts
performance! - Analogy Car has best gas mileage when idling, so
dont move?!? - Reducing N
- clock gating...
6Power Optimization More Global Approaches
- Rather than attacking C,V,f, N within a
particular pre-set architecture, look at ways of
dramatically restructuring architectures to - Streamline work and bit transitions to exactly
what application needs - Shorten wires
- Replace broadcast structures with point-to-point
structures - Look at ways of performing several operations in
parallel at a slower clock rate
7Operand-Value-Based Analysis and Optimizations
- Currently Some compiler and hardware
optimizations are specific to operand values - Constant propagation Algebraic simplification
- Null arithmetic elimination
- Our Goals
- Tailor computations to particular categories of
operand values being computed on - e.g. narrow-bitwidth operands
- Consider both software and hardware mechanisms
- also both compile-time and run-time mechanisms
8Example Narrow-Width Operands
- 64-bit architectures are largely motivated by
addresses getting larger data has not increased
as quickly - Multimedia instruction set extensions like MMX
try to parallelize operations on narrow-width
operands - Works well when programmer gives sufficient type
information to infer operand sizes - but programmers arent always fastidious about
defining variables to be as small as possible. - Goal Harness optimizations for narrow-width
operands even when programmer hasnt defined
quantities as narrow-width.
9Motivation Narrow-Width Operands are Common!
Cumulative Percent Occurrence
- Multimedia instruction sets (eg. MMX) take
advantage of them for sub-word parallelism - But, general-purpose apps also have many
operations that turn out to have small (lt16 bits)
operand sizes
10Optimizing for Narrow-width OperationsWhat to
do?
- When performing operations with narrow-width
operands, the upper bits are not needed in the
computation
- What can be done to avoid the wasted upper bits
of work? - Save power through selective clock gating
- Increase performance through MMX-style packing
11Clock Gating Architecture
- We propose selective clock gating based on the
operand values. - Observe operand values going to/from registers
- For operations with 2 narrow-width operands,
disable upper bits of the functional unit.
12Operand-Based Clock Gating Power Savings
Results
Integer Unit Power Consumption (mw)
- Total power saved 50-60 of the power consumed
by the integer execution unit - High performance microprocessors integer
execution unit 10-15 of overall power - 5-10 of total power
- In VLIW and DSPs, this number is likely to be
even larger
13More Power-Efficient Architecture Work
- Other Value-based optimizations
- Dynamic strength reduction
- to eliminate unneeded ALU ops
- Explicit Value Steering
- Register renaming, operand bypassing, reservation
stations are all dynamic techniques that
support passing data output by one calculation
directly to the inputs of another. - Look at more explicit, program-controlled ways of
doing this with lower power.
14Power Efficiency A Systems-Level Approach
- Compiler/CPU interaction Need to give compiler
hooks into the hardware for optimizing program
power - System-level Power management OS and run-time
systems can manage power by prioritizing
activities. Some activities need not be done as
often if power is a concern. - E.g. Turn off MS Words spell-checking on a
long-distance flight - Communication/Computation Tradeoffs Power really
exacerbates all of the standard Comm/Comp
tradeoffs. Need to study protocols and OS issues
related to this.