Title: 1CADENCE DESIGN SYSTEMS, INC'
1Low Power Design Towards Multi-Voltage Domains
2The vision Ambient Intelligence
- Power efficiency is one cornerstone of ambient
intelligence - Flexibility and adaptability are another
3Semiconductor System Chips
- Trend 1 Process technology migration to CMOS
- relentless digitization of signals and systems
- Trend 2 Increasing use of embedded
intelligence - variety of (multiple) compute engines available
on-chip
- Trend 3 Networking of embedded intelligence
- multiple comm. front-ends, networking available
on-chip
- The consequence
- smart spaces, intelligent interfaces, sensor
networks - Integrated circuit chips are driving capability
increases with cost reductions in systems.
4As Chips Move
- From complete circuit boards to System-Chips
- With analog/digital, RF/baseband, computing,
communications - Optical and optoelectronic lenses, prisms,
mirrors
- From racks to packages
- (bio)chemical lab-in-a-package
- Microfluidic, microelectronic, micromechanical
processing
5CMOS Outlook
However
www.intel.com
6Guiding Observations
Transistors (and silicon) are free Power is the
only real limiter Optimizing for frequency
AND/OR area may achieve neither
www.intel.com
7Technology Challenges
Power Active Leakage Interconnects (RC
Delay) Variations
www.intel.com
8Why worry about Power Battery Size/Weight
9Power is a Limiter
100000
10000
1000
Pentium
Power (Watts)
100
P6
286
486
10
8086
386
8080
8008
8085
1
4004
0.1
1971
1974
1978
1985
1992
2000
2004
2008
Power delivery and dissipation will be
prohibitive !
Source Borkar, De Intel?
10Power Density will Increase
10000
1000
100
Power Density (W/cm2)
8086
10
P6
8008
Pentium
8085
4004
386
286
486
8080
1
1970
1980
1990
2000
2010
Power densities too high to keep junctions at low
temps
Source Borkar, De Intel?
11Dual-Mode CT model
Tile
- The tile operates in two states
- ON executes a task (N inst/sec), burns power POn
- OFF functionally idle, burns power Poff
- Power minimization minimize number of executed
instruction maximize slack
Off
On
t
The shortest executing program achieves minimum
power
12Processor Sleep Modes
13CMOS Energy and Power
- E CL VDD2 P0?1 tsc VDD Ipeak P0/1?1/0 VDD
Ileak/f - P CL VDD2 f tscVDD Ipeak f
VDD Ileak
f P fclock
Dynamic power (80 today and decreasing
relatively)
Short-circuit power (5 today and decreasing
absolutely)
Leakage power (15 today and increasing)
14Dynamic Energy Consumption
Vdd
Vin
Vout
CL
Energy/transition CL VDD2 P0?1
15Leakage Energy
Vout
Drain junction leakage
OFF
Sub-threshold current
Gate leakage
Independent of switching
16Active Power Reduction
High Supply Voltage
Slow
Fast
Slow
Multiple Vdd
Low Supply Voltage
- Vdd scaling will slow down
- Mimic Vdd scaling with multiple Vdd
- Challenges
- Interface between low high Vdd
- Delivery and distribution
17Exploiting Variable Supply
- Supply voltage can be dynamically changed during
system operation - Cubic power savings
- Circuit slowdown
- Just-in-time computation
- Stretch execution time up to the max tolerable
18Variable-supply Architectures
- High-efficiency adjustable DC-DC converter
- Adjustable synchronization
- Variable-frequency clock generator
Chandrakasan96 - Self-timed circuits Nielsen94
- Example Crusoe embedded processor Transmeta00
Prog ROM
DC-DC VCO
Dec
Power Manager
CPU
Data RAM
Vdd
CLK
19Transmeta Processor
20Transmeta (Contd)
21Compilers can help!
Afoo() if A gt 0 then Flong(A) else Fshort(A)
Alt0
Agt0
TK/f
TK/f
Compiler inserts voltage switching points based
on control/dataflow analysis
TK/f
f250MHz
f1100MHz
22Voltage Scaling for Low Power
- Dynamic Power ? a0--gt1 ? freq ? ( C Vdd2 )
- For a CMOS gate, when Vdd ?
- power ? ?
- delay ?
- How to minimize delay penalty while enjoying
power gain ?
23Voltage Scaling for Low Power
- Minimizing the delay penalty due to voltage
scaling - Circuit-level
- lowering threshold voltage
- heavily process-dependent
- Architecture-level
- speedup (pipelining, concurrency), then downscale
supply voltage, or - match supply voltage with throughput requirement
- multiple supply voltages in the same design
- one supply voltage for each block
24Basics of Multiple Supply Voltage
25Basics of Multiple Supply Voltage
26The Problem
- PMOS not turned off when input is weak-1, static
power loss may offset the gain!
VDDLlt VDDH Vth,p
27Level Converter
28Basics of Multiple Supply Voltage
29Clustered Voltage Supply (CVS)
- Following are the kind of connections possible
- Inter VDDL gates
- Inter VDDH gates
- VDDH to VDDL gates
K. Usami and M. Horowitz, Clustered Voltage
scaling technique for Low Power Designs, ISLPD 95
30How does CVS work ?
K. Usami and M. Horowitz, Clustered Voltage
scaling technique for Low Power Designs, ISLPD 95
31How does CVS work ?
K. Usami and M. Horowitz, Clustered Voltage
scaling technique for Low Power Designs, ISLPD 95
32How does CVS work ?
- Backward Graph Traversal
- Depth First Search algorithm from the Primary
Outputs to Primary Inputs
K. Usami and M. Horowitz, Clustered Voltage
scaling technique for Low Power Designs, ISLPD 95
33Extended Clustered Voltage Supply (ECVS)
K. Usami et al., Automated Low Power Technique
Exploiting Multiple Supply Voltages Applied to a
Media Processor, IEEE Journal of Solid State
Circuits 1998
34Extended Clustered Voltage Supply (ECVS)
K. Usami et al., Automated Low Power Technique
Exploiting Multiple Supply Voltages Applied to a
Media Processor, IEEE Journal of Solid State
Circuits 1998
35Supply Voltage Reduction in Clock Tree
K. Usami et al., Automated Low Power Technique
Exploiting Multiple Supply Voltages Applied to a
Media Processor, IEEE Journal of Solid State
Circuits 1998
36Results
- ECVS scheme showed a power reduction of 39 - 57
(47 on average) on a Mpact Media processor chip
(0.3um technology 3 metal CMOS process) - VDDH 3.3V, VDDL 1.9V
- ECVS can be applied with Vth reduction and
transistor resizing so as to maximize the gain in
power reduction - 60 of the paths have delay of half of the cycle
time - 0.3 of the paths are critical
- Power overhead of level converters was only 8
- Power on clock network was reduced by 69, clock
skew was almost same as original one - Area overhead was 15, chip size increased by 7
37Design Approaches for MVS
- Macro based MVS
- Allows functional units in the SoC to be
operated at different Vdd - Module based MV
- Modules in SoC to be run at
different Vdd - Fine Grained MVS (CVS)
- Gates are run at
different Vdds - Cluster of voltage
islands to be created -
38MVS with multi Vth approach
- Optimum VDDL is around 60-70 of original VDD
(single Vth) - Optimum VDDL is around 50 of original VDD
(dual Vth) - 40 50 power reduction for multi VDD
- Advantage for multi Vth is mainly in terms of
negating delay increase
39Power figure for Multi Vdd and Multi Vth
K dyanamic power / Leakage power
40Power Grid routing issues
41Power Grid routing issues
42Power Grid routing issues
43Multi- Voltage flow to be at Cadence
Our area of investigation
44(No Transcript)