Design challenges in sub-100nm high performance microprocessors

About This Presentation

Title:

Design challenges in sub-100nm high performance microprocessors

Description:

Design challenges in sub-100nm high performance microprocessors Nitin Borkar, Siva Narendra, James Tschanz, Vasantha Erraguntla Circuit Research, Intel Labs – PowerPoint PPT presentation

Number of Views:946

Avg rating:3.0/5.0

Slides: 284

Provided by: Vas69

Category:

more less

Transcript and Presenter's Notes

Title: Design challenges in sub-100nm high performance microprocessors

1
Design challenges in sub-100nm high performance
microprocessors

Nitin Borkar, Siva Narendra, James Tschanz,
Vasantha Erraguntla
Circuit Research, Intel Labs
nitin.borkar_at_intel.com
siva.g.narendra_at_intel.com
james.w.tschanz_at_intel.com
vasantha.erraguntla_at_intel.com

2
Outline

Section 1 Challenges for low power and high
performance (90 mins)
Historical device and system scaling trends
Sub-100nm device scaling challenges
Power delivery and dissipation challenges
Power efficient design choices
Section 2a Circuit techniques for variation
tolerance (90 mins)
Short channel effects
Adaptive circuit techniques for variation
tolerance

3
Outline (contd.)

Section 2b Circuit techniques for leakage
control (90 mins)
Leakage power components
Leakage power prediction
Leakage reduction and control techniques
Section 3 Full-chip power reduction techniques
(90 mins)
Micro-architecture innovations
Coding techniques for interconnect power
reduction
CMOS compatible dense memory design
Special purpose hardware
Design methodologies challenges for CAD

4
Section 1

Challenges for low power and high performance

5
Moores Law on scaling
6
Scaling of dimensions
7
Transistors on a chip
1000
2X growth in 1.96 years!
100
Pentium 4
Pentium III
10
Pentium II
Pentium
Transistors (MT)
486
1
386
286
0.1
8086
8085
0.01
8080
8008
4004
0.001
1970
1980
1990
2000
2010
Year
Transistors on Lead Microprocessors double every
2 years
8
Die size growth
100
Pentium 4
Pentium III
Pentium II
Pentium
486
Die size (mm)
10
386
286
8080
8086
7 growth per year
8085
8008
2X growth in 10 years
4004
1
1970
1980
1990
2000
2010
Year
Die size grows by 14 to satisfy Moores Law
9
Frequency
10000
Pentium 4
1000
Pentium III
Pentium II
100
Pentium
Frequency (Mhz)
486
386
10
8085
286
8086
8080
1
8008
4004
0.1
1970
1980
1990
2000
2010
Year
Lead Microprocessors frequency doubles every 2
years
10
Performance
Applications will demand TIPS performance
11
Power
Future
100
Pentium 4
Pentium III
Pentium
10
486
286
8086
Power (Watts)
386
8085
1
8080
8008
4004
0.1
1971
1974
1978
1985
1992
2000
Year
Lead Microprocessors power continues to increase
12
Obeying Moores Law...
200M--1.8B transistors on the Lead Microprocessor
13
Vcc will continue to reduce
10.00
1.35
1
1.00
Supply Voltage (V)
1.15
0.9
0.10
1970
1980
1990
2000
2010
Year
Only 15 Vcc reduction to meet frequency demand
14
Constant Electric Field Scaling
15
Active capacitance density
Active capacitance grows 30-35 each technology
generation
16
Power will be a problem
100000
18KW
5KW
10000
1.5KW
500W
1000
P4
P III
100
Power (Watts)
Pentium
486
286
10
386
8086
8085
8080
1
8008
4004
0.1
1974
1978
1985
1992
2000
2004
2008
1971
Year
Power delivery and dissipation will be prohibitive
17
Closer look at the power
100,000
Will be...
18KW
10,000
5KW
Should be...
Power (Watts)
1.5KW
623W
1,000
500W
375W
225W
135W
100
2002
2004
2006
2008
Year
18
Advanced transistor design
Shallow highly doped source/drain extension
Thin TOX
p
p
Halo/pocket
Retrograde Well
Shallow trench isolation
n-well
Deep source/drain
19
Intels 15 nm bulk transistor
R. Chau et al., IEDM 2000
20
Transistor scaling trends - SCE
Le
Tox
Dj
D
Aspect Ratio

Short channel effect (SCE) as measured as aspect
ratio has been worsening with scaling

21
Transistor scaling challenges - Dj

Junction depth reduction
Device channel length decrease for same SCE
- Series resistance to the channel increases

22
Transistor scaling challenges - Tox

Thinning gate oxide
Increased gate tunneling leakage
Electrical thickness is 2X physical thickness
Gate stress now limits max VCC
Solutions
New decoupling caps
Modified oxides/gate materials
Model gate leakage in circuit simulation

23
VCC and VT scaling
24
Vcc scaling Soft errors

Vcc and cap scaling with technology reduces
charge stored
Soft errors prominent in logic circuits
No error correction in logic circuits
Storage nodes per chip increasing
Higher soft errors at the chip level

25
Motivation

Soft error rate (SER) per bit staying constant in
future processes
T. Karnik et al, 2001 VLSI Circuits Symposium
Need to reduce SER/bit

Goal Reduce chip-level SER with no performance
penalty and minimum power penalty
26
Measured Latch Data
SERX
2.25
7,000
2
5,250
Original
Errors
3,500
SER ImprovementX
1,750
Hardened
0
1
0.5
0.7
0.9
1.1
1.3
Supply Voltage (V)
T. Karnik et al, 2001 VLSI Circuits Symposium

Will need 2X SER improvement in latches with no
performance loss.

27
VT vs. leakage

Leakage rises as the VT is lowered
MOS has a sub-threshold slope of 110mV/decade
Lower VT by 50mV ? 3X leakage
Solutions
Dual VT
Stacking of off gates
Controlled back gate bias?
Multiple process technologies Mobile vs.
Performance?

28
Sub-threshold Leakage
Sub-threshold leakage current will increase
exponentially
Assumtions 0.25mm, Ioff 1na/m 5X increase each
generation at 30ºC
29
Leakage Power
Excessive sub-threshold leakage power
30
Leakage Power increases
100,000
0.18u
0.13u
0.1u
0.07u
0.05u
10,000
1,000
Ioff (na/u)
100
10
30
40
50
60
70
80
90
100
Temp (C)
Drain leakage will have to increase to meet freq
demand Results in excessive leakage power
31
Wide Domino Functionality
CLK
CLK
Q2
Q1
A
B
C
B
C
Static Gate
D2 Domino Gate
CLK
D1 Domino Gate

Lower AC noise margin Vt
Ioff could limit NOR fan-in
High activity, higher power, 2X
Irreversible logic evaluation
Scalability is not good

High performance 30 over static
High fan-in NOR, less logic gates
High fan-in complex gates possible
Smaller area

32
Bitline Delay Scaling Problem

Bit line swing limited by parameter mismatch
differential noise
Cell stability degrades with Vt lowering
Bit line delay a (Cap/W)Vswing/(Ion/W -
rowsIoff/W)
Reducing of rows per bitline approaching limit

33
Restrict transistor leakage
10000
7 GHz
5.5 GHz
4 GHz
2.5 Ghz
1000
Pentium 4
Frequency (Mhz)
Pentium II
100
Pentium
486
386
10
1985
1990
1995
2000
2005
2010
Year
Reduce leakage ? Frequency will not double every
2 years
34
Interconnect scaling trends
35
Interconnect performance
R increases faster at lower levels C increases
faster at higher levels RC increases 40-60
36
Interconnect distribution
Interconnect distribution does not change
significantly
37
Wire Scaling

Uarch for short wires
Repeaters

38
Optimum Repeater

Vary
N size, P size
Repeater distance
Metal width, space

39
P, V, T Variations
40
Frequency SD Leakage
Normalized Frequency
0.18 micron 1000 samples
30
20X
15
20
Normalized Leakage (Isb)
41
Vt Distribution
120
0.18 micron 1000 samples
100
80
30mV
of Chips
60
40
20
0
-39.71
-25.27
-10.83
3.61
18.05
32.49
D
VTn(mv)
42
Frequency Distribution
150
100
of Chips
50
0
1.00
1.07
1.15
1.22
1.30
1.37
Freq (Normalized)
43
Isb Distribution
100
of Chips
1
1.00
4.82
8.64
12.47
16.29
20.11
Isb (Normalized)
44
Supply Voltage Variation

Activity changes
Current delivery RI and L(di/dt) drops
Dynamic ns to 10-100us
Within-die variation

45
Handling di/dt
Bulk Decoupling
High Frequency Decoupling
VRM Response
Local Decoupling
Silver BoxResponse
On DieDecoupling
46
Vcc Variation Reduction

On die decoupling capacitors reduce DVcc
Cost area, and gate oxide leakage concerns
On die voltage down converters regulators

47
Temperature Variation
Cache
70ºC
Core
120ºC

Activity ambient change
Dynamic 100-1000us
Within-die variation

48
Major Paradigm Shift

From deterministic design to probabilistic and
statistical design
A path delay estimate is probabilistic (not
deterministic)
Multi-variable design optimization for
Yield and bin splits
Parameter variations
Active and leakage power
Performance

49
Performance Efficiency of mArch
Pollacks Rule
4
3
Area(Lead / Compaction)
2
Growth (X)
Performance(Lead / Compaction)
1
Note Performance measured using SpecINT and
SpecFP
0
1.5
1
0.7
0.5
0.35
0.18
Technology Generation

Implications (in the same technology)
New microarchitecture 2-3X die area of the last
uArch
Provides 1.5-1.7X performance of the last uArch

We are on the wrong side of a Square Law
50
Frequency Performance

Frequency increased 61X
18.3X ? process technology
Additional 3.3X ? uArch

Performance increased 100X
14X ? process technology
Additional 7X ? uArch, design

51
Design EfficiencymArch
In the same process technology, compare Scalar ?
Super-scalar ? Dynamic ?
Netburst 2-3X Growth in area 1.4X Growth in
Integer Performance 1.7X Growth in Total
Performance 2-2.5X Growth in Power
Pollacks Rule in actionPower inefficiency
52
Power Efficiency - Circuits
Assumptions Activity Static 0.2, Domino
0.5 Clock consumes 40 of full chip power
High Power circuits contribute to power
inefficiency
53
Power density will increase
Power density too high to keep junctions at low
temp
54
Thermal Solutions
Ta
Attachment
Qsa Sink-to-Ambient (Heat-Sink)
Resistance
Ts
Heat Sink
Qcs Case-to-Sink (Interface)
Resistance
Interface
Tc
Package
Qjc Junction-to-Case (Package)
Resistance
Mounting
Tj
55
Thermal CapabilityToday
Package - Polymer thermal interface - 1.5mm
Cu heat spreader - 0.35oC/W (typical) Thermal
Interface Material - Thermal Grease - Phase
Change Material - 0.12oC/W Heat Sink - Al
Folded Fin Cu base - 3.5 x 2.5 x 2 at
400g - 0.38oC/W - 5 (for RM fan)
1.0
0.8
Heat Sink (0.38oC/W)
0.6
QJA 0.82oC/W
Thermal Resistance (oC/W)
TIM (0.12oC/W)
0.4
Package (0.35oC/W)
0.2
0.0
TJ 90oC, TA 45oC, QJA 0.82oC/W P
(90-45)/0.82 55W
56
Thermal CapabilityFuture
Must improve on all frontsno silver bullet
57
Shrinking Size Quieter
3000
2500
2000
1500
System Volume ( cubic inch)
1000
500
0
PC tower
Mini tower
m-tower
Slim line
Small pc
Small quiet, yet high performance
58
Thermal Budget
Desktop PC ASP
2200
Performance PC
1800
1400
Value PC
US
1000
600
200
1995 1996 1997 1998
1999 2000
Source Dataquest Personal Computers
Shrinking ASP, and shrinking budget for thermals
59
Thermal

Throttling / clock gating
Circuits and sizing
10 performance gain at same power can be
translated into 25 power reduction by changing
VCC
Improved die attach / package
Can effect new uArch / floor planning
Spread and reduce power

60
Thermal Envelope Cost
1000
Liquid Spray Refrigeration
100
Liquid Immersion
Unit Cost ()
Mobile High Perf
HAR HS with Heat Pipe
Itanium proc
10
HAR HS
Pentium 4 proc
Celeron
Extrusions
1
1
10
100
1000
Power (W)
61
The Odds
1.5
100
Pentium III
75
1.0
Projected Heat Dissipation Volume
Pentium 4
50
Heat-Sink Volume (in3)
Thermal Budget (oC/W)
Air Flow Rate (CFM)
Projected Air Flow Rate
0.5
25
Thermal Budget
0
0
250
0
50
100
150
200
Power (W)
Power ? Thermals ? Higher Heat Sink Volume ?
Higher Air-flow Is this cheaper, smaller, and
quieter?
62
Whats next

Circuit techniques for variation tolerance
Circuit techniques for leakage control
Full-chip power reduction techniques
30 min quiz

63
Section 2a

Circuit techniques for variation tolerance

64
Moores law on scaling
65
Scaling of dimensions
66
Requires die size growth or same die size
67
(No Transcript)
68
Drain current (Linear scale)
VT
(Log scale)
IOFF
69
Barrier Lowering (BL)
Increasing electron energy (NMOS)
L
n
p
Barrier height
Barrier height
n
Channel of length L
Xd
Xd
Drain (n)
Source (n)
70
Drain Induced BL (DIBL)
71
Impact of variation in L
BL (VDS?0)
DIBL (VDSVDD)
VT (Volts)
Channel length (um)
DL ? DVT ? DION DIOFF
72
180nm measurements
Necessary to make circuits less sensitive to VT
(ION IOFF) variation
73
Transistor scaling
L
Tox
Dj
D
Transistor aspect ratio
Short channel effects increase with scaling
74
Transistor scaling challenges - Dj
0.8
0.5
m)
0.7
0.4
NMOS
m)
m
m
(mA/
0.6
0.3
(mA/
DN
DP
PMOS
0.5
I
0.2
I
S. Thompson et al., 1998.
S. Thompson et al., 1998.
0.4
0.1
0
50
100
150
200
Junction Depth (nm)
S. Asai et al., 1997.
75
Transistor scaling challenges - Tox
76
High-K Gate Dielectric

Lower gate leakage
Higher Cox at a given gate leakage

77
Parameter variation
Device and chip level parameters
Parameter variations increase with
scaling Adaptive VDD, VT to reduce chip level
variation
78
Scaling challenges summary

L, VDD, VT scaling
? Increasing parameter variation
? Increasing sub-threshold leakage power
? Increasing gate leakage power
Product life cycle reduced from 3.6 years to 2
years
? Concurrent engineering
? Better prediction models

79
VT variation categories
80
Adaptive Body Bias (ABB)
81
Side effects of ABB
(2) Apply reverse bias
Determine impact of adaptive body bias on
within-die VT variation.
82
Short Channel MOS VT
BL? ? lb? DIBL? ? ld?
83
Within-die VT Variation
Within-die VT variation is primarily due to CD
variation
84
Solutions

Bi-directional adaptive body bias
Several separate bias generators on-chip

85
Testchip die micrograph

150nm CMOS
21 subsites per die
Microprocessor critical path
FrequencyMin(F1..F21)
PowerSum(P1..P21)
Separate VBS for each subsite
62 dies per wafer

5.3 mm
4.5 mm
86
Sub-site micrograph
21 sub-sites with separate body bias for each
sub-site
87
CUT schematics
88
Simple Adaptive Body Bias (S-ABB)
Neglects WID variation
Area overhead 2
89
Effectiveness of S-ABB
Frequency Variations/m
NBB 4.1
S-ABB 1.0
S-ABB
NBB
90
Adaptive Body Bias (ABB)
Accounts for WID variation
Area overhead 2-3
91
Effectiveness of ABB
Frequency Variations/m
NBB 4.1
ABB 0.69
ABB
NBB
92
Adaptive Bias Distribution
N FBB
N FBB
N RBB
P FBB
P RBB
P FBB
1 die
13 dies
38 dies
10 dies
93
Frequency vs. Critical Path Count (NCP)

Frequency m and s reduce as NCP increases
Frequency distribution unchanged for NCP gt 14

94
WID Delay Variation vs. Logic Depth
NMOS s/m 5.6 PMOS s/m 3.0
Number of samples ()
Delay s/m 4.2
Variation ()
Miyazaki, ISSCC 2000 This work
Path Depth 49 16
Device ?/? 2.4 4.27
Frequency ?/? 0.55 4.17
95
Within-Die Adaptive Body Bias (WID-ABB)
Compensates for WID variation
Area overhead Similar to ABB
96
Effectiveness of WID-ABB
97 in highest bin
Frequency Variations/m
ABB 0.69
WIDABB 0.21
ABB
WID-ABB
97
Within-Die Bias Distributions
P FBBN RBB
Circuit Block Count
P,N FBB
FBB
P RBBN FBB
PMOS Body Bias (V)
P,N RBB
RBB
NMOS Body Bias (V)
RBB
FBB
98
Bias Resolution
Bias resolution ABB ABB WID-ABB WID-ABB
Bias resolution dies, F gt 1 s/m dies, F gt 1.075 s/m
500mV 79 2.87 2 1.89
300mV 100 1.47 66 0.50
100mV 100 0.69 97 0.21

300mV bias resolution sufficient for ABB
WID-ABB requires 100mV bias resolution

99
ABB summary

D2D and WID variations impact microprocessor
frequency and leakage
ABB improves die acceptance rate from 50 to 100
ABB is most effective when WID variations are
considered
Compensating for WID variations by WID-ABB
increases number of high frequency dies from 32
to 97

100
Adaptive VDD VT
For iso-frequency Decrease VDD Increase
VT Increase VDD Decrease VT
Fast die
Slow die
V
T
2
S
a
a
10
P

V
P
leak
sw
DD
101
Testchip goals

Body bias (VBS) for VT modulation
Measure frequency improvement with
Adaptive VDD
Adaptive VBS
Adaptive VDDVBS
Adaptive VDD Within-die VBS
Subject to total active and standby power
constraints

102
Baseline measurements
103
Adaptive VDD vs. Fixed VDD
Active power limit 10W/cm2 Standby power
limit 0.5W/cm2
Fixed VDD 1.05V Frequency reduced to meet
power limit
Adaptive VDD 20mV resolution VDD frequency
changed
simultaneously
104
VDD resolution requirement
100
80
60
Die count
40
20
0
0.9
0.95
1
1.05
Frequency Bin
Fixed VDD 1.05V
Adaptive VDD 50mV resolution
Adaptive VDD 20mV resolution
Minimum of 20mV resolution in VDD is required
105
Adaptive VDD vs. Adaptive VBS
100
80
60
79
Die count
74
16
15
40
10
3
0
0
20
0
0.9
0.95
1
1.05
Frequency Bin
Adaptive VDD 20mV resolution
Adaptive VBS 100mV resolution
6 Fixed VDD Target frequency bin 10 Adaptive
VDD 16 Adaptive VBS
106
Adaptive VDD VBS
100
80
60
26
71
79
Die count
16
40
3
2
0
0
20
0
0.9
0.95
1
1.05
Frequency Bin
Adaptive VBS
Adaptive VDDVBS
Adaptive VDD VBS more effective than adaptive
VDD or adaptive VBS
107
VDD distribution
1.05
1.07
0.99
1.01
1.03
VDD (V)
Adaptive VDD VBS results in lower VDD than in
adaptive VDD
108
VBS distribution
Adaptive VBS
Adaptive VDDVBS
Adaptive VDD VBS results in more dies with FBB
than in adaptive VBS
109
Adaptive VDD Within-die VBS
Adaptive VDD Within-die VBS is most effective
110
AVDD ABB Summary

150nm CMOS with 10W/cm2 active 0.5W/cm2 standby
power density limits result in
20mV resolution in VDD is required
100mV resolution for VBS is required

111
Neighborhood VT variation
The devices of interest that are in close
proximity can be either of the same or different
polarity.
Impacts sense amps, diff amps, current mirrors
etc.
Impacts clock generation circuits, switching
thresholds etc.
Voltage biasing
Current biasing
112
Voltage biasing
Linear threshold voltage mismatch of matched
device pair for 500 mV forward body bias, zero
body bias and 500 mV reverse body bias.
113
Application to sense-amp
Traditional sense-amplifier
New sense-amplifier
114
Simulation results
1.5 V, 1 mV/pS ramp rate, and 110 C
115
Current biasing
Basic iso-current biasing
116
Application
Non-overlapping 2f clock generation
117
Current biasing
Process insensitive current biasing
118
Iref existing techniques

Reference voltage to reference current conversion
Bandgap circuit with off-chip resistor
MOS reference voltage with off-chip
resistor
Direct reference current generation
MOS based temperature compensation only

119
Objective
Generate process compensated current ? with thin
tox digital CMOS devices ? without external
resistors
120
Device measurement
0.18 mm CMOS technology, 30oC, Uncompensated
current
n 77 s 235.8 mA m 1.6 mA s/m
15
121
Subtraction method
(at x Xmid)
(around Xmid)
y1 y2 vary with x, but yD is insensitive to x
122
Example
Choose m1?m2 and n2 This will provide non-zero
yD insensitive to x around xd for proper n1
123
Illustration
n2 1 m1 4.2 m2 2 xd 15 ? n1 0.13
y1 (35)
y2 (47)
xd
yD (6)
x
124
MOS devices in saturation
Using long-channel wide devices
?
?
125
Compensation by subtraction
126
VT generation circuit
VDD
½ VDD
1VT
20/2
15/2
5VT
15/2
20/2
2/20
127
Subtraction circuit
VDD
Vsg2 ? 2VT
z2
z1
Iref I1-I2
z1/z2 1/8
I2
I1
128
Device measurement
0.18 mm digital CMOS technology, 30oC
I1
I2
n 112
129
Compensated current
0.18 mm digital CMOS technology, 30oC
n 112 s 17.4 mA m 305 mA s/m
5.7
130
Sub-1 V operation
b, a Vddmin (V) Temp (oC) Iref variation Vdd sensitivity
5, 2 0.9 30 5.0 0.3 per 100 mV
3, 2 0.6 30 5.2 0.4 per 100 mV
Low voltage operation enabled by redesigning Vt
generation circuit
131
Process corner simulationresults
0.18 mm digital CMOS technology, 30oC, VDD 0.9
V, z1/z2 1/6
1.4
Iu (-16 to 22)
Iref (5)
1.22
1.14
1.2
1
1
0.99
0.97
0.95
0.95
0.89
1.0
0.84
0.8
Normalized current
0.6
0.4
0.2
0.0
Slow -
Slow
Typical
Fast
Fast
Process corner
7.6X smaller variation than uncompensated current
132
Summary on Iref

Subtraction technique for compensation
Compensation technique reduces reference current
variation to 5 at Vdd of 0.9 V from 38
Variation remains as 5 at Vdd of 0.6 V

133
Section 2a Summary

Device parameter variation increases with scaling
? design margins increase
Adaptive schemes required to minimize impact of
device variation on design margin of digital
circuits
Voltage and current biasing schemes to minimize
impact of variation on analog circuits

134
Section 2b

Circuit techniques for leakage control

135
Outline

Leakage sources impact of variations
Leakage estimation with variations
Static leakage reduction techniques
Dynamic leakage reduction techniques
Leakage-tolerant circuits

136
Sources of Leakage
137
Transistor leakage mechanisms
From Keshavarzi, Roy, Hawkins (ITC 1997)
1. PN junction leakage
5. Punchthrough current
2. Weak inversion SD leakage
6. Narrow width effects
3. DIBL and contribution from SCE
7. Gate oxide leakage
4. GIDL
8. Hot carrier injection
138
Components of leakage
139
Subthreshold leakage trends

Historic Vt scaling 15 per generation
S-D and gate leakage impact 3-5X increase
Significant component of total power
Serious dynamic circuit robustness penalty

140
Leakage vs. switching power
Leakage gt 50 of total power!
250nm
180nm
130nm
100nm
70nm

Key requirements
Accurate prediction of chip leakage power
Techniques to reduce chip leakage power

141
DIBL impact on leakage
BL (VDS?0)
DIBL (VDSVDD)
VT (Volts)
Higher IOFF due to DIBL
Channel length (um)
142
Variation impact on leakage
1.0E
-
05
1.0E
-
05
1.0E
-
05
1.0E
-
05
m
m
150 nm technology
110C
110C
0.18
m CMOS
110C
0.18
m CMOS
110C
110C
110C
VD1V
VD1V
VD1V
VD1V
VD1V
VD1V
1.0E
-
06
1.0E
-
06
1.0E
-
06
1.0E
-
06
1.0E
-
07
1.0E
-
07
1.0E
-
07
1.0E
-
07
NBB0V
NBB0V
NBB0V
NBB0V
Intrinsic IOFF (A)
Intrinsic IOFF (A)
Intrinsic IOFF (A)
Intrinsic IOFF (A)
1.0E
-
08
1.0E
-
08
1.0E
-
08
1.0E
-
08
Lwc
Lwc
1.0E
-
09
1.0E
-
09
Lwc
Lwc
1.0E
-
09
1.0E
-
09
Lwc
Lwc
1.0E
-
10
1.0E
-
10
1.0E
-
10
1.0E
-
10
Lnom
Lnom
Lnom
Lnom
Lnom
Lnom
1.0E
-
11
1.0E
-
11
1.0E
-
11
1.0E
-
11
1500
2000
2500
3000
3500
1500
2000
2500
3000
3500
1500
2000
2500
3000
3500
1500
2000
2500
3000
3500
1/
IDlin
1/
IDlin
1/
IDlin
1/
IDlin
Shorter L
Shorter L
Shorter L
Shorter L
Shorter L
Shorter L
Shorter L transistors contribute more to chip
leakage
143
Transistor scaling challenges - Tox
144
High-K Gate Dielectric

Lower gate leakage
Higher Cox at a given gate leakage

145
Source/Drain Tunneling Leakage
146
Leakage Estimation and Modeling
147
Leakage estimation
Prior techniques
148
New model
Includes within-die variation
After simplification using error function
properties,
149
Applications
150
Measurement results
0.18 um 32-bit microprocessors (n960)
50 of the samples within 20 of the measured
leakage Compared 11 and 0.2 of the samples
using other techniques
151
Static Leakage Reduction1) Transistor Stacks
152
Leakage of Stacks
Stack leakage is 5-10X smaller
153
ScalabilityStack Effect
Stack effect becomes stronger with scaling
154
Exploiting natural stacks
32-bit Kogge-Stone adder
High VT Low VT
Energy Overhead 1.64 nJ 1.84 nJ
Savings 2.2 mA 38.4 mA
Min time in Standby 84 mS 5.4 mS
Reduction Avg Worst
High VT 1.5X 2.5X
Low VT 1.5X 2X
155
Stack forcing
Delay Penalty
Leakage Reduction
Equal Loading
Low-Vt stack-forcing reduces leakage power by 3X
156
Static Leakage Reduction2) Dual-Vt Process
157
Dual VT design technique
Leakage 3X smaller (Active Standby) No
performance loss
158
Optimum choices of high low Vt
75-100mV VT difference is optimal
159
Dual-VT and sizing

Techniques
DVT
min-lvt
min-area
min-pwr

Optimize design with concurrent dual-VT
allocation and sizing
160
Results total power
min-lvt
min-pwr
min-area
DVT
Switching
Leakage
Total power (normalized)
1.96 GHz(High-VT target)
2.30 GHz(Low-VT target)
2.21 GHz

Total power reduced by 6-8 over DVT-only
Leakage power reduced by 20 over DVT-only

161
Results total device width
min-lvt
high-VT
min-pwr
DVT
min-area
low-VT
Total width (normalized)
1.96 GHz(High-VT target)
2.30 GHz(Low-VT target)
2.21 GHz

Less low-VT usage than DVT-only
Trade-off between area low-VT usage

162
Results area comparison
Frequency 2.3GHz
DVT
min-lvt
min-lvt 15 area overhead
min-pwr
min-area
20 burn-in power reduction
163
Effect of leakage change

Push leakage in manufacturing to increase
frequency
Dual-VT design ideally push low-VT only

2.2 GHz
DVTS,original
Path Count (x1000)
2.76 GHz
DVTS,low-VT leakage push
High-VT paths do not speed up
Path Delay (ps)
164
Enhanced dual-VT design

Allow for efficient frequency change
Insert additional low-VT devices

2.2 GHz
EDVTS,20
Path Count (x1000)
EDVTS,low-VT leakage push
2.76 GHz
Path Delay (ps)
Dual-VT insertion should consider process scaling
165
Dual-VT sizing summary

Dual-VT sizing reduces low-VT usage by 30-60
compared with DVT-only
Leakage power reduced by 20
Dual-VT designs offer 9 frequency improvement
over single-VT
Enhanced design allows frequency increase through
low-VT leakage push

166
Dynamic Leakage Reduction1) Body bias
167
Reverse body bias
Total Leakage Power Measured on 0.18m Test Chip
Tech 0.35 mm 0.18 mm
Opt.RBB 2V 0.5V
Ioff Red. 1000X 10X
RBB reduces SD leakage Less effective with
shorter L, lower VT, scaling
168
Impact of scaling on RBB effectiveness
RBB becomes less effective with technology scaling
169
Switching leakage reduction forward body bias
20 power reduction at 1GHz 8 ? frequency at
iso-power 20X ? idle-mode leakage
170
Router chip with forward body bias
150nm technology
Digital core with on-chip PMOS body bias
generator (BG).
171
Power and performance gain by FBB
33 performance gain at 1.1V! 25 power
reduction at 1Ghz!!
172
Standby leakage control by FBB
173
Dynamic Leakage Reduction2) Dynamic sleep
transistor
174
Active leakage control
Sleep transistor
Body bias
175
32-bit ALU overview
Technology 130nm dual-VT CMOS
Die Area 1.61 X 1.44 mm2
Transistors 160K
Frequency 4.05GHz _at_ 1.28V450mV FBB, 75C
CBG central bias generator LBG local bias
generator
176
Sleep transistor layout
ALU
Sleep transistor cells
177
Body bias layout
Sleep transistor LBGs
ALU core LBGs
Number of ALU core LBGs 30
Number of sleep transistor LBGs 10
PMOS device width 13mm
Area overhead 8
ALU
ALU core LBGs
Sleep transistor LBGs
178
Frequency leakage impact
Reference No sleep transistor, 450mV FBB to core, 1.35V, 75C Frequency degradation Leakage reduction Area increase

No over/under drive or sleep body bias 2.3 37X 6
200mV over/under drive 1.8 44X 7
Sleep body biasFBB RBB 1.8 64X 8

Dynamic body biasFBB ZBB 0 1.9X 8
PMOS sleeptransistor
PMOSbody bias
179
Virtual supply convergence
Convergence gt 1ms

Convergence time is dependent on capacitance

Convergence lt 1ms
Leaky MOS decap on virtual VCC better leakage
savings for gt 1ms idle time
180
Total power equal frequency
TON 100 cycles, 75C, a0.05, F4.05GHz
15savings
8savings
Overhead
Leakage
? 77
? 45
LBG
Total power (mW)
Switching
? 3
Clock gatingonly
Clock gating body bias
Clock gating sleep transistor
181
Leakage-Tolerant Circuits1) Dynamic register
file
182
Impact of increasing leakage

Leakage disturbs the local bit line (LBL)
Noise can result in erroneous evaluation
Wider addressing exacerbates problem

183
Dual-Vt design for robustness

High-Vt and stronger keepers mitigate leakage and
improve robustness
Contention causes severe penalty in delay

184
Source-follower NMOS (SFN)

As leakage charges the output node, feedback
reduces the leakage

Automatic Vgs reduction and reverse Vbs
185
Leakage bypass w/ stack forcing

Extra PMOSs supply leakage currents
Leakage is bypassed away from LBL
Extra NMOS device forces stack node

Stack node
186
Better robustness vs. delay
Larger keeper smaller skew

DVT Much better than LVT

187
Energy vs. delay for SFN

Robustness fixed at 10 across all points
Leakage-tolerant techniques not only improve
robustness, but reduce energy as well
SFN width not as competitive because of PMOS
pull-up

188
Energy vs. delay for LBSF

LBSF faster despite 3-stack pull-down in LBL,
2-stack in GBL
Comparable total width in pull-down stacks yield
similar capacitance

189
Summary of LBSF and SFN
Full LBSF SFN DVT SFN LBSF
Delay improvement 33 10 31
Energy reduction 37 24 38
Total width reduction 47 -3 26

Improved RF robustness without delay penalty
Advantages of LBSF and SFN improve as leakage
increases

190
Leakage-Tolerant Circuits2) L1 cache using
bitline leakage reduction (BLR)
191
Bitline leakage reduction

Memory cell HVT and Lmax
Solution Larger, Dual-Vt cell for L1 cache

3 types of cells
HVT Lmax
HVT Lmin
DVT Lmin

192
Intrinsic and effective read current

DVTLmin cell IINT is 35 larger IEFF is smaller

128 rows per bitline

100 nm technology

193
Bitline leakage reduction
WL -100mV ? Vmax Vvc Vmax 100mV
194
BLR test chip results

2Kb bank of 16Kb L1 cache

BLR 25 higher read current, 3 larger cell area
195
BLR performance
1.2V, 110oC

Bitline delay improved from 91ps to 75ps
Read delay reduced from 159ps to 132ps

Bitline development rate improved by 8

196
Leakage-Tolerant Circuits3) Conditional keeper
for burn-in
197
Leakage at burn-in (BI)

BI conditions elevated voltage and temperature
further challenges leakage issue
Higher leakage, higher temperature
Thermal runaway issue and positive feedback
effect
Impact of leakage (specially at BI) on circuit
functionality
Stability of IDDQ measurement with BI stress

198
Keepers need to be upsized for burn-in

Larger keepers increase delay at normal
condition

199
Burn-in conditional keeper
Normal mode Keeper
Effective Burn-in Keeper
Burn-in signal (BI)
PKB
PK1
Clock
Min. sized
Pull Down NMOS
Clock
200
Burn-in keeper 100nm comparison
STD
Norm. delay (Normal condition)
Delay improvement
BI-CKP
NORs Fan-in (number of inputs)
Burn-in Keeper size of pull down
Larger delay improvement for wider dynamic gates
201
Summary

Control of leakage power becoming crucial
Leakage estimation is necessary during design
phase
Static and dynamic techniques can be used for
leakage control
Dual-VT process and stack effect
Dynamic sleep transistor and body bias
Leakage-tolerant circuits
Cache and memory leakage techniques
Burn-in leakage reduction

202
Section 3
Full-chip power reduction techniques and design
methodologies
203
Micro architecture innovations
204
mArchitecture Tradeoffs

Higher target frequency with
Shallow logic depth
Larger number of critical paths
But with lower probability

205
Improve mArch Efficiency
Thermals Power Delivery designed for full HW
utilization
Single Thread
ST
Wait for Mem
Multi-Threading
Wait for Mem
MT1
Wait
MT2
MT3
Multi-threading improves performance without
impacting thermals power delivery
206
Still obey Moores Law!
10,000
Actual
Moore's Law.
1,000
Transistors (MT)
100
10
2000
2002
2004
2006
2008
Year
Total transistors meet Moores Law
207
Freds Rule
4
Area(Lead/Compaction)
3
Growth (X)
2
1
Perf(Lead/Compaction)
0
1.5
1
0.7
0.5
0.35
0.18
Technology Generation
In the same process technology 2X Area ? 1.4X
Performance
208
Reduced die size causes Performance gap
30-60 performance loss even after meeting
Moores Law
209
Exploit MemoryLow PD

Large on die caches provide
Increased Data Bandwidth Reduced Latency
Hence, higher performance for much lower power

210
Memory has lower power density
Exploit memory !
211
Increase memory area
70
57
60
55
54
50
41
40
Memory Area of total
29
30
20
10
0
2000
2002
2004
2006
2008
Year
Use gt 50 die area in memory
212
Memory trend
100000
12M
2.5M
10000
24M
1M
5.5M
1000
Memory (KB)
100
16
16
8
10
1
1980
1990
2000
2010
Year
213
Power density is reduced
Full chip power density is reduced But local
power density will be high
214
Can DRAM help?

Transistor perf not critical for DRAM
Dont need large retention time
10X more storage in same area power
TB/sec Bandwidth, at lt10ns latency

215
Embedded DRAM on logic
Provides 10X memory--same area, same power as SRAM
216
Embedded DRAM could improve performance
Source Glenn Hinton, 99

Embedded DRAM provides
10X increase in on-die Memory
1,000 X increase in Bandwidth
10X reduction in Latency

217
On-die DRAM Applications
(1)
(2)
218
130nm test chip
0.52m
0.52m
0.73m
1.61m
1.05m
1.10m
N/P Inversion
P/N Accumulation
P/P Depletion
219
Area and Power Comparison

P/P the best from power and area perspective

220
Interconnect power reduction
221
Motivation CC Multiplier (CCM)
CCM 0
CCM 1
CCM 2
Cc
Cc
CCM Cc Multiplier
Cg

RintCint delay of long busses is a key speed
limiter
Coupling cap (Cc) is a large component of Cint

Cint Cg CCM ? (2Cc)
222
Coupling Capacitance Scaling
metal-4

Coupling capacitance remains a large fraction of
Cint despite moving from Al to Cu.

223
Static Bus (SB)

Simple scheme with no timing constraints
Minimize delay through optimal repeater insertion
CCM of 2 has negative impact on delay

224
Dynamic Bus

Domino timing applied to interconnect
Monotonic transitions
Reduced collinear capacitance
Static (worst case) 2X
Dynamic (worst case) 1X
F2 repeater required susceptible to noise
Higher transition activity when input 1
Static CMOS inverters drive all segments

225
Dynamic Bus Advantages

Capacitance effects reduced
Collinear capacitance reduced 2X
Orthogonal capacitance unchanged
Inductance effects reduced
Can oppose transition for static bus
Can reduce capacitive effects for dynamic bus

226
Static Pulsed Bus (SPB)

Static PG generates a pulse on a data transition
Toggle FF (TFF) restores correct data at bus end
Leading edge is critical repeaters are skewed

227
SPB Benefits
CCM 0
CCM 1
Cc
Cc
Cg

In SPB, data transitions are monotonic
worst case CCM 1 and repeaters can be skewed
Similar to dynamic bus but (1) has no clock
overhead and (2) its energy scales with switching
activity

228
SB Vs. SPB Delay
SPB Delay Breakdown
RC Rep. 77
Other 23

SPB reduces delay by 22 as a result of
Repeater skewing
CCM lt 1 due to useful noise coupling

229
SB Vs. SPB Energy

SPB reduces energy by 12 due to
Smaller skewed repeater sizes
Smaller CCM

230
SB vs. SPB Different Bus Lengths
at iso-energy
at iso-delay

At iso-energy, SPB improves delay by 15-25
At iso-delay, SPB reduces energy by 12-25
At iso-delay, SPB reduces current/width by 26-34

231
SPB summary

SPB has monotonic data transitions
? worse case CCM 1
? repeaters can be skewed
Unlike dynamic bus
? no clock precharge-evaluate energy and routing
? energy consumption is data activity dependent
For 1500mm-4500mm metal-4 line, SPB
? improves delay by 15-25
? reduces energy by 12-25
? reduces width by 34-42
? reduces peak-current by 26-34

232
Transition-Encoded Bus (TEB)

Encoder circuit
XOR of previous and current input
Domino compatible output
Decoder circuit
XOR of previous output and bus state

233
TEB Advantages

Dynamic bus performance improvement
Collinear capacitance reduction
Static bus energy
Transition dependent switching activity
Noise-insensitive F2 repeater required
Regains noise immunity of CMOS inverter

234
Energy Comparison
Static
Transition-encoded
9mm metal3, 130nm process, 1.2V, 30ºC
235
Results

Averaged over 3-9mm buses
Metal3 in 130nm technology, 1.2V, 30ºC

236
TEB summary

Transition-encoded bus
High performance, energy efficient on-chip
interconnect technique
32 active area reduction
49 peak current reduction
Transition dependent energy consumption
? Energy savings at aggressive delay targets
Enables 10-35 performance improvement on 79 of
full-chip Pentium 4 buses

237
Special purpose hardware
238
Special-Purpose HW

Special-purpose performance ? more MIPS/mm²
SIMD integer and FP instructions in several ISAs
Integration of other platform components, e.g.
memory controller, graphics
Special-purpose logic, programmable logic, and
separately programmable engines

Die Area Power Performance
General Purpose 2X 2X 1.4X
Multimedia Kernels lt10 lt10 1.5-4X
Improve power efficiency with Valued Performance
239
TCP/IP challenges
Saturated 1GbE
1GbE 1.48M pkts/sec 672 ns 10GbE 14.8M
pkts/sec 67.2 ns
General purpose MIPS will not keep up!
240
Compute power required for TCP/IP
TCP/IP Engine will provide required MIPs
241
A sample approach

A programmable hardware engine for offloading TCP
processing
Focus on
Most complex part TCP inbound processing
Handle 10Gbps Ethernet traffic with sufficient
headroom for outbound processing
Aggressive wire speed goal - minimum packet size
on saturated wire
Simple, scalable, flexible design enabling fast
time to market

242
Key features

Special purpose processor
Dual frequency, low latency, buffer-free design
High frequency execution core
Accelerated context lookup and loading
Programmability for ever-changing protocols
Programmable design with special instructions
Rapid validation and debug
Scalable solution
Across bandwidth and packet sizes
Extendable to multi-core solution

243
Packet size vs. core frequency
64
1Gbps 1GHz
10Gbps 10GHz
Increase packet size ? reduce frequency
244
Chip characteristics
Chip Area Process Interconnect Transistors Pad count 2.23 x 3.54mm2 90nm dual-VT CMOS 1 poly, 7 metal 460K 306
245
Standard FP MAC
FB
MA
Critical Path Logic Stages 26 _at_30ps per stage,
Fmax 1.2Ghz (P860, 1.1V)
246
Prototype FP MAC
M(CS)
FB(CS)
MP(CS)
ZD
0
1
1
42 compressor
1
0
M gtF
Shift by 32
Overflow detector
1
0
ME FBE
Critical Path Logic Stages 12 _at_30ps per stage,
Fmax 3GHz (P860, 1.1V)
247
Accumulator Algorithm

Key Minimize interaction between incoming
operand and accumulator result
Floating point number converted to base 32
Exponent subtraction no longer necessary
Exponent comparison reduced from 8 to 3 bits

248
Die photograph and characteristics
MULTIPLIER
FIFOs SCAN
ALIGNER
CLK
ACCUMULATE
NORMALIZE

Clock Grid Buffers
249
Design methodologies
250
Motivation

Parameter variations will become worse with
technology scaling
Robust variation tolerant circuits and
microarchitectures needed
Multi-variable design optimizations considering
parameter variations
Major shift from deterministic to probabilistic
design

251
Impact on Design Methodology
Due to variations in Vdd, Vt, and Temp
Probability
Delay
Deterministic
Deterministic
Probabilistic
Probabilistic 10X variation 50 total power
Frequency
of Paths
of Paths
Delay Target
Delay Target
Leakage Power
252
Tool Complexity

Problems
Far too many tools and tool interfaces
Data is not easily extractable
Circuit reuse is minimal
Solutions
Common tool interfaces
Standard databases
Parameterized design

253
Designer Cockpit

Everything on the menu bar

254
Designer Cockpit

Selection in either view

255
Designer Cockpit
File Edit View Select Synthesize
Parasitics Sizing Analyze Experiment
Checks Options
AMPS Tune to target sense amp memory cellSet
restrictions ? Autosize
Speed power curve Optimize with
sensitivity Optimize metal line Pick VT Delay vs
size Cell characterize Sense amp
characterize Memory cell stability Setup hold
chararacterize User specified ? New Script ?

Tools work with partial or full selection
Designer intervention allowed anywhere

Layout planner provides wiring parasitics
Not the route of the week
All tools callable from user programs
Experiment organizer

Optimization and experiments built in

256
Optimization Example

Imagine
Select gates from schematic editor or layout
planner to optimize
Select optimization for PD3
Include a metal width and space
Include VT range optimization
Force a metal line length as a function of
transistor sizes in a cell
Select Pathmill analysis
Run with sensitivity turned on

257
Optimization Example
258
Evolve a Macro Library
Feasibility studies estimation
RTL
Circuit design
Layout
Tapeout

Executable on-line documentation
Designs must be easily absorbed into the library

259
Tools and productivity

Functional uarch modules
Investigation tools and libraries
Cross discipline optimization Monte Carlo
Easy database access
Designer has same access as developer
Full chip path extraction and visualization
Productive design requires
Innovation to be early
Early innovation enabled by
Flexible and open tools

260
Development CAD and DA
Research- Core technologies
Tool Vendors
CAD Development- Productize modules and sample
flow
Design DA Groups - Interface, flows and
adaptations
Designers - Special features
261
Examples
262
Chip with bias generator (BG)
150nm Communications router (ISSCC 01)
Digital core with on-chip PMOS body bias
generator (BG). 1.5 million PMOS devices
263
Distributed biasing scheme
Central Bias Generator (CBG) and Local Bias
Generator (LBG)
264
Bias generation distribution
265
Routing details
Global routing
Vcca
Vcca 450mV
Vcca
To LBGs
FBB / ZBB control bit
Local routing
Vcc
From LBGs
Vcc 450mV
266
Router chip summary
267
Dual-VT Motivation

Low-VT used in critical paths
Achieve same frequency as all low-VT design
Leakage power much smaller than all low-VT design

268
Dual-VT Options

DVT
H-SDVT
L-SDVT
DVTS

269
Dual-VT Allocation Only (DVT)

Transistors sized for original target
Insert low-VT to meet new target frequency

270
Selective LVT Insertion (H-SDVT)

Size at target frequency
Insert low-VT to fix critical paths
Size to optimize slack (down-size)

271
Selective HVT Insertion (L-SDVT)

Convert netlist to all low-VT
Size at target frequency
Insert high-VT on non-critical paths
Size to optimize slack

272
Dual-VT and Sizing (DVTS)

Iterative DVT flow
Use different amounts of sizing, low-VT to reach
target
Pick best iteration

273
Tutorial summary

Challenges for low power and high performance
Historical device and system scaling trends
Sub-100nm device scaling challenges
Power delivery and dissipation challenges
Power efficient design choices
Circuit techniques for variation tolerance
Short channel effects
Adaptive circuit techniques for variation
tolerance

274
Tutorial summary (contd.)

Circuit techniques for leakage control
Leakage power components
Leakage power prediction and control techniques
Full-chip power reduction techniques
Micro-architecture innovations
Coding techniques for interconnect power
reduction
CMOS compatible dense memory design
Special purpose hardware
Design methodologies challenges for CAD

275
Power limited microprocessor integration choices
Special purpose processing DSP Network processing
(wired/wireless)
Adapt to Process
Present
Next decade
Adaptive general purpose units
Special purpose units
General purpose units
Dense Memory
Memory
Power (active and standby) management
276
Acknowledgements

The presenters would like to thank all the CRL
team members and Intel design and manufacturing
teams for their contribution towards the contents
of this tutorial.

277
Bibliography (1 of 7)

De, V. Borkar, S. Technology and design
challenges for low power and high performance
microprocessors, Low Power Electronics and
Design, 1999. Proceedings. 1999 International
Symposium on , 1999, Page(s) 163 168
Lundstrom, M. Ren, Z. Essential physics of
carrier transport in nanoscale MOSFETs, Electron
Devices, IEEE Transactions on , Volume 49 Issue
1 , Jan. 2002, Page(s) 133 -141
Thompson, S. et al A 90 nm logic technology
featuring 50 nm strained silicon channel
transistors, 7 layers of Cu interconnects, low k
ILD, and 1um2 SRAM cell, Electron Devices
Meeting, 2002. IEDM '02. Digest. International ,
8-11 Dec. 2002, Page(s) 61 -64
Karnik, T. Borkar, S. Vivek De Sub-90nm
technologies--challenges and opportunities for
CAD, Computer Aided Design, 2002. ICCAD 2002.
IEEE/ACM International Conference on , 2002,
Page(s) 203 206
Belady, C. Cooling and power considerations for
semiconductors into the next century, Low Power
Electronics and Design, International Symposium
on, 2001. , 6-7 Aug. 2001, Page(s) 100 -105
Karnik, T et al Selective node engineering for
chip-level soft error rate improvement, VLSI
Circuits Digest of Technical Papers, 2002.
Symposium on , 13-15 June 2002, Page(s) 204 -205
Narendra, S. De, V. Borkar, S. Antoniadis, D.
Chandrakasan, A. Full chip sub-threshold leakage
power prediction model for sub 0.18um CMOS, Low
Power Electronics and Design, 2002. ISLPED '02.
Proceedings of the 2002 International Symposium
on , 2002, Page(s) 19 -23
Narendra, S. Borkar, S. De, V. Antoniadis, D.
Chandrakasan, A. Scaling of stack effect and its
application for leakage reduction, Low Power
Electronics and Design, International Symposium
on, 2001. , 2001, Page(s) 195 200
Narendra, S. et al 1.1 V 1 GHz communications
router with on-chip body bias in 150 nm CMOS,
Solid-State Circuits Conference, 2002. Digest of
Technical Papers. ISSCC. 2002 IEEE International
, Volume 1 , 2002, Page(s) 270 -466 vol.1

278
Bibliography (2 of 7)

Tschanz, J.W. Narendra, S. Nair, R. De, V.
Effectiveness of adaptive supply voltage and body
bias for reducing impact of parameter variations
in low power and high performance
microprocessors, Solid-State Circuits, IEEE
Journal of , Volume 38 Issue 5 , May 2003,
Page(s) 826 -829
Tschanz, J.W. et al Adaptive body bias for
reducing impacts of die-to-die and within-die
parameter variations on microprocessor frequency
and leakage, Solid-State Circuits, IEEE Journal
of , Volume 37 Issue 11 , Nov. 2002, Page(s)
1396 -1402
Vangal, S. et al 5GHz 32b integer-execution core
in 130nm dual-Vt CMOS, Solid-State Circuits
Conference, 2002. Digest of Technical Papers.
ISSCC. 2002 IEEE International , Volume 2 ,
2002, Page(s) 334 -535
Narendra, S. Keshavarzi, A. Bloechel, B.A.
Borkar, S. De, V. Forward body bias for
microprocessors in 130-nm technology generation
and beyond, Solid-State Circuits, IEEE Journal of
, Volume 38 Issue 5 , May 2003, Page(s) 696
-701
Somasekhar, D Lu, Shih-Lien Bloechel, Bradley
Lai, Konrad Borkar, Shekhar De, Vivek Planar
1T-Cell DRAM with MOS Storage Capacitors in a
130nm Logic Technology for High Density
Microprocessor Caches, European Solid-State
Circuits Conference, 2002, Proceedings of the
2002 International Conference on, ESSCIRC 2002,
Page(s) 127 - 130
Khellah, M. Tschanz, J. Ye, Y. Narendra, S.
De, V. Static pulsed bus for on-chip
interconnects, VLSI Circuits Digest of Technical
Papers, 2002. Symposium on , 13-15 June 2002,
Page(s) 78 79
Anders, M. Rai, N. Krishnamurthy, R.K. Borkar,
S. A transition-encoded dynamic bus technique
for high-performance interconnects, Solid-State
Circuits, IEEE Journal of , Volume 38 Issue 5 ,
May 2003, Page(s) 709 714
Vangal, S. et al A 5GHz Floating Point Multiply
Accumulator in 90nm Dual-VT CMOS, Solid-State
Circuits Conference, 2003. Digest of Technical
Papers. ISSCC. 2003 IEEE International , Volume
46 , 2003, Page(s) 334 -335

279
Bibliography (3 of 7)

Hoskote, Y. et al A 10GHz TCP Offload
Accelerator for 10Gbps Ethernet in 90nm Dual-VT
CMOS, Solid-State Circuits Conference, 2003.
Digest of Technical Papers. ISSCC. 2003 IEEE
International , Volume 46 , 2003, Page(s)
258-259
http//www.intel.com/research/silicon/mooreslaw.ht
m
G.E. Moore, Cramming more components onto
integrated circuits, Electronics, vol. 38, no.
8, April 19, 1965.
K.G. Kempf, Improving Throughput across the
Factory Life-Cycle, Intel Technology Journal,
Q4, 1998.
S. Thompson, P. Packan, and M. Bohr, MOS
Scaling Transistor Challenges for the 21st
Century, Intel Technology Journal, Q3, 1998.
Y. Taur and T. H. Ning, Fundamentals of Modern
VLSI Devices, Cambridge University Press, 1998.
D. Antoniadis and J.E. Chung, Physics and
Technology of Ultra Short Channel MOSFET
Devices, Intl. Electron devices Meeting, pp.
21-24, 1991.
A. Chandrakasan, S. Sheng, and R. W. Brodersen,
Low-Power CMOS Digital design, IEEE J.
Solid-State Circuits, vol. 27, pp. 473-484, Apr.
1992.
Z. Chen, J. Shott, J. Burr, and J. D. Plummer,
CMOS Technology Scaling for Low Voltage Low
Power Applications, IEEE Symp. Low Power Elec.,
pp. 56-57, 1994.
H.C. Poon, L.D. Yau, R.L. Joh