Title: Impact of New Device Technologies on Design Optimization
1Impact of New Device Technologies on Design
Optimization
- David J. Frank
- IBM T.J. Watson Research Center
- Yorktown Heights, NY
- 2006 ICCAD Embedded Tutorial
2Outline
- Background
- Technology Optimization Methodology
- Optimization Results
- Summary
3Background New Technologies
- Device (FEOL)
- High-k / metal gate
- High mobility
- Alternate structures (FDSOI, FinFET, DGFET, etc.)
- Processing
- Strain
- New anneals
- EUV lithography
- New materials
- BEOL
- Low-k dielectrics
- 3-D stacking / multiple layers
- Circuits
- Lower supply voltages
- Defect / variability tolerance
- Body bias adjustments
- System
- Multiple processor cores
- Embedded DRAM / alternate memory technologies
- Data bandwidth constrained
- Packaging
- Micro-channel heatsinks with liquid cooling
- High density chip carriers
4New Design Trend
- Much more co-design
- In the past, device, circuit and architecture
design proceeded in parallel - Increased product throughput
- Reduced complexity
- Sacrificed some performance
- In the future, there will be diminishing raw
performance improvements - Greater performance must be achieved by
optimizing across boundaries. - This talk describes the use of such optimizations
to evaluate the impact of technology options on
the performance of future systems.
5Existence of an Optimal Technology
- Practicality imposes power constraints.
- Electrostatics imposes geometric constraints
- Thermodynamics imposes voltage constraints.
- Quantum mechanics imposes miniaturization
constraints due to tunneling.
Fixed architectural complexity Fixed power
constraints Device physics Existence of an
optimal tech- nology with maximal performance.
leakage increases due to tunneling effects
Declining available dynamic power overwhelms
speed improvements of scaling
leakage power
Power
log(Performance)
dynamic power
Large
Miniaturization
Large
Miniaturization
Small
Small
6Schematic organization of optimization program
7Optimization Methodology
- Power modeling
- Active power ½CV2 switching energy and duty
factor - Static power mechanisms included sub-threshold
current, gate oxide tunneling, and body-to-drain
band-to-band tunneling - Device modeling
- 'Realistic' bulk MOSFETs VT, depletion depth,
DIBL and ideality are determined by the gate
length, halo doping and oxide thickness, with 2D
effects taken into account. - Gate length is derived from the optimization.
8Methodology continued
- Circuit modeling
- Delay is for 2 input NAND gates with average
fanout1.65. - Capacitance includes gate, parasitic, and wire
components. - Wire lengths are based on Rent's rule.
- Wire resistance includes temperature dependence
and surface scattering in small wires. - Thermal modeling
- Generalized 2D/3D heat flow model estimates hot
spot temperature. - Wide range of heat sink technologies can be
simulated.
9Methodology continued
- System modeling
- Allocate fixed fraction of chip power and area to
logic, and assume fixed number of logic gates.
Logic part is optimized, and the rest is assumed
to scale similarly. - Larger systems are composed of multiple processor
cores, assumed to be interconnected in a manner
which does not greatly add to the wiring burden. - Long wires are fatter, and receive repeaters with
a spacing that is optimized. - Long wire delay is accounted for using a latency
penalty factor - On-chip tolerance/variability and noise is
accounted for.
10Local Variation Modeling
- Variation sources
- Signal coupling noise and supply noise
- Statistical doping variations
- LER gate length variations
- Consequences modeled
- Increased static power
- Critical path delay distribution
- yield-based, using estimated critical path
distribution - Single stage functionality
- use worst case (6 sigma) of doping and length,
no noise.
11Optimization Results
- General results
- Evaluating specific possible device directions
- Increasing mobility
- High-k gate dielectric and metal gates
- Margin issues
- Low-k wiring and 3D stacking
- Better heat sinks and sub-ambient cooling
- Multi-processor tradeoffs
12Optimize by generation fixed parameters
Note that the LG, tox, VDD, VT, etc. are NOT
preselected. They are solved for by the
optimizations.
13Optimize by generation
Optimizations over 7 variables tox, Lg, ND,
ltwgt, Vdd, Srpt, ltwrptgt
Dual core processor with aggressive air cooling
14Optimize by generation details
Dual core processor with aggressive air cooling
15Optimize by generation details
- Optimal gate length is not fixed by generation,
but varies with the power target.
Gate Length vs Power
Oxide Thickness vs Power
Oxynitride
High-k, for 32nm
Dual core processor with aggressive air cooling
(High-k case has 0.3 nm barrier layer, bandedge
metal gate, HfO2-like insulator.)
16Optimize by generation details
- Supply voltages should be lower for low power
applications. - High-k lowers VDD 15 at the 45nm generation.
Voltages vs Power
Dual core processor with aggressive air cooling
17Optimal Power Allocation Fractions
Active power fraction 70 at low power to 40 at
high power.
45nm technology with microchannel heat sink and
water cooling. 4 core chip.
18Mobility dependence
Mobility enhancement expected due to strain
engineering. Alternate materials are also being
studied.
- Enhanced mobility has greatest benefit at high
power. - Even for large mobility enhancements, performance
boost is modest 10-15.
45nm technology dual core processor water cooled
19Metal-gate workfunction for high-k and oxynitride
- High-k improves performance significantly, as
long as workfunction is near band-edge.
- Low power designs benefit most.
- Design changes
- Lower VDD (15)
- Shorter gate lengths and improved short-channel
effects
45nm node, dual core processor with aggressive
air cooling
20Impact of variability on performance
- Atomistic effects are leading to greater device
variability. - Increasing variability requires larger design
margins. - Designing for larger margins decreases
performance.
- Increased variability requires
- Higher supply voltages
- Less scaled FETs
65nm node, dual processor core
213D stacking and low-k wiring dielectrics
- Multiple layers offer higher performance due to
shorter wires. - Low-k wiring dielectrics improve performance by
lowering switching energy.
22Cooling scenario optimizations
Optimized over 7 variables Lg, tox, Nd,
ltwgt, Drptr, ltwrptrgt, Vdd.
- Forced liquid cooling through microchannel fins
may permit very high power densities. - Optimized (i.e., maximum) performance increases
as the log of the power.
8 core processor design 32nm technology
23Multiprocessor motivation
- The energy / performance tradeoff is very steep
at the high end. - Lower power, more parallel processors potentially
offer more computation for the same total power
level.
These results are for 4-processor chips with
micro-channel water cooling, pulling out all the
stops.
9 variables tox, Lg, ND, ltwgt, Vdd, wHP, Srpt,
ltwrptgt, xhalo
24Multiprocessors
32nm node optimizations Aggressive air
cooling Assume fixed total number of FETs,
divided into varying of cores.
Optimized over 7 variables Lg, tox, Nd, ltwgt,
Drptr, ltwrptrgt, Vdd.
25Multiprocessor optimization results
- Optimal FET conditions do not vary significantly
with the number of processor cores.
VDD
VT
26Summary
- A set of simplified models have been developed
and combined into a research tool to enable
fast turnaround comparative technology
optimizations. - Evaluated some specific possible device
directions - Increasing mobility
- High-k gate dielectric and metal gates
- Better heat sinks and sub-ambient cooling
- Multi-processor tradeoffs
- Power and temperature rise are dominant limiters.
- Performance gains can still be obtained from
improved cooling and/or from lower power, slower,
more parallel processors.
27Acknowledgements
- Wilfried Haensch
- Ghavam Shahidi
- Omer Dokumaci
- Mary Wisniewski
- Mike Scheuermann
- Phillip Restle
- Steve Kosonocky
- Evan Colgan
- Philip Wong
- Yuan Taur
- Paul Solomon
- Bob Dennard