Title: Chapter 5 CMOS Inverter
1Chapter 5CMOS Inverter
- Boonchuay Supmonchai
- Integrated Design Application Research (IDAR)
Laboratory - July 5, 2004 Revised - June 25, 2005
2Goals of This Chapter
- Quantification of Design Metrics of an inverter
- Static (or Steady-State) Behavior
- Dynamic (or Transient Response) Behavior
- Energy Efficiency
- Optimization of an inverter design
- Technology Scaling and its impact on the inverter
metrics
3Digital Gate Design Metrics Recap
- Cost
- Complexity and Area
- Reliability and Robustness ? Static Behavior
- Noise Margin, Regenerative Property
- Performance ? Dynamic Behavior
- Speed (Delay)
- Energy Efficiency
- Energy and Power Consumption, Energy-Delay
4Why CMOS Inverter?
- CMOS because it is the dominating technology of
the era. - High Packing Density
- Relatively Easy Process
- Inverter because it is the nucleus of all digital
designs. - Behavior of more intricate structures (logic
gates, adders, etc.) can be almost completely
derived by extrapolating the results obtained
from the inverters.
5CMOS Inverter A First Glance
Driven by Output Of another gate gt Fanin
Collective Capacitances Of Wires and Gates gt
Fanout
6CMOS Inverter Physical View Recap
7Two CMOS Inverters Physical View
Abut Cells
8CMOS Inverter Static Behavior
State of Transistors ON VGT VGS - VT gt
VT, Ron ? ? OFF VGT VGS - VT
gt VT, Roff finite
9CMOS Inverter Dynamic Behavior
- Gate response time is determined by the time to
charge CL through Rp (discharge CL through Rn)
10CMOS Properties
- Full rail-to-rail swing
- High noise margins
- Logic levels not dependent upon the relative
device sizes gt Ratioless - Transistors can be minimum size
- Regenerative Property
- Low output impedance
- Large Fan-out (albeit with degraded performance)
- Typical output resistance in k? range.
11CMOS Properties (2)
- Extremely high input resistance (MOS transistor
is near perfect insulator) - nearly zero steady-state input current
- No direct path between power and ground under
steady-state (but there always exists a path with
finite resistance between the output and either
VDD or GND) - no static power dissipation
- Propagation delay a function of load capacitance
and resistance of transistors
12NMOS Short Channel I-V Plot Recap
13PMOS Short Channel I-V Plot Recap
- All polarities of all voltages and currents are
reversed
14Transforming PMOS I-V Plot
IDSp -IDSn VGSn Vin VGSp Vin -
VDD VDSn Vout VDSp Vout - VDD
15CMOS Inverter Load-Line Plot
16CMOS Inverter VTC
VTC Voltage-Transfer Characteristics
17Robustness of CMOS Inverter
- Precise Values of Switching Threshold, VM
- VM is defined as the point where Vin Vout
- Noise Margins
- Piece-Wise Linear Approximation
- Maximization
- Process Variations
- Device Variations
- Technology Scaling
18Switching Threshold
- At VM where Vin Vout, both PMOS and NMOS
transistors are in saturation (since VDS VGS) - VM ? rVDD/(1 r) where r kpVDSATp/knVDSATn
- Switching threshold set by the ratio r, which
compares the relative driving strengths of the
PMOS and NMOS transistors - Goal To set VM VDD/2 (to maximize noise
margins), so r ? 1
19Switch Threshold Example
- In our generic 0.25 micron CMOS process, using
the process parameters from Table 3.2, at VDD
2.5V, and a minimum size NMOS device ((W/L)n of
1.5)
VT0(V) ?(V0.5) VDSAT(V) k(A/V2) ?(V-1)
NMOS 0.43 0.4 0.63 115 x 10-6 0.06
PMOS -0.4 -0.4 -1 -30 x 10-6 -0.1
3.5
(W/L)p 3.5 x 1.5 5.25 for a VM of 1.25 V
20Example Simulated Results
Minimum Width-to-Length 1.5
21Observations I
- VM is relatively insensitive to variations in
device ratio - Small Variations of the ratio do not
significantly disturb VTC. - Common Industry Practice to set Wp smaller than
the requirement. - Increasing the width of the PMOS moves VM towards
VDD - Increasing the width of the NMOS moves VM toward
GND
22Noise Margins Determining VIH and VIL
Gain g Slope
NMH VDD - VIH NML VIL - GND
Approximating VIH VM - VM /g VIL VM
(VDD - VM )/g
So high gain in the transition region is very
desirable
23CMOS Voltage Gain
- Gain is a strong function of the slopes of the
currents in the saturation region, for Vin VM
- Determined only by technology parameters,
especially channel length modulation (?). Only
designer influence through supply voltage and VM
(transistor sizing).
24Example VTC and Noise Margin
- For a 0.25?m, (W/L)p/(W/L)n 3.4, (W/L)n 1.5
(min size) VDD 2.5V
Real Value VIL 1.03 V, VIH 1.45 V NML
1.03, NMH 1.05
VM ? 1.25 V, g -27.5 VIL 1.2 V, VIH 1.3
V NML NMH 1.2
- Output resistance ? Sensitivity of gate output
with respect to noise - low-output 2.4 k?
- high-output 3.3 k?
- Preferably as low as possible
25Observations II
- First-Order Analysis overestimates the gain
- Max. gain only 17 at VM ? VIL 1.17V, VIH
1.33V - Piecewise Linear Approximation is too overly
optimistic - Major contributor to deviation from the true gain
- CMOS inverter is a poor analog amplifier!
- One of the major differences between analog and
digital designs is that digital circuits operate
in the regions of extreme nonlinearity. - Well-defined and well-separated high and low
signals
26Impact of Process Variation on VTC
- Process variations (mostly) cause a shift in the
switching threshold
27Scaling the Supply Voltage
Reducing VDD improves Gain
But it deteriorates for very low VDD
Practical Lower Bound VDDmin gt 2 to 4 kt /q
28Observations III
- Reducing the supply voltage has a positive impact
on the energy dissipation - But is also detrimental to the delay of the gate
- DC Characteristic becomes increasingly sensitive
to device variations once supply and intrinsic
voltages become comparable - Scaling the supply voltage reducing the swing
- Reduce internal noise (e.g., crosstalk)
- More susceptible to external noise that do not
scale
29CMOS Inverter Dynamic Behavior
- Transient behavior of the gate is determined by
the time it takes to charge and discharge the
load capacitance, CL, through on-transistors - Delay is a function of load capacitances and
transistor on-resistances - Getting CL as small as possible is crucial to the
realization of high-performance CMOS circuits - Transistor Capacitances
- Wire Capacitances
- Fanout
- Wire Resistances also become more important.
30Computing the Capacitances
Fanout
31Finding Cgd The Miller Effect
- M1 and M2 are either in cut-off or in saturation.
- The floating gate-drain capacitor is replaced by
a capacitance-to-ground (gate-bulk capacitor).
A capacitor experiencing identical but opposite
voltage swings at both its terminals can be
replaced by a capacitor to ground whose value is
two times the original value
32Diffusion Capacitances Cdb1 and Cdb2
- We can simplify the diffusion capacitance
calculations by using a Keq to linearize the
nonlinear capacitor to the value of the junction
capacitance under zero-bias - Ceq Keq Cj0
0.25 ?m Process high-to-low high-to-low low-to-high low-to-high
0.25 ?m Process Keqbp Keqsw Keqbp Keqsw
NMOS 0.57 0.61 0.79 0.81
PMOS 0.79 0.86 0.59 0.7
33Extrinsic Capacitances Cg3 and Cg4
- Simplification of the actual situation
- Assumes all the components of Cgate are between
Vout and GND (or VDD) - Assumes the channel capacitances of the loading
gates are constant - The extrinsic, or fan-out, capacitance is the
total gate capacitance of the loading gates M3
and M4. - Cfan-out Cgate(NMOS) Cgate(PMOS)
- (CGSOn CGDOn WnLnCox)
(CGSOp CGDOp WpLpCox)
34Example Layout of Two Inverters
AD Drain Area PD Drain Perimeter AS
Source Area PS Source Perimeter
0.25 ?m W/L AD (?m2) PD (?m) AS (?m2) PS (?m)
NMOS 0.375/0.25 0.3 1.875 0.3 1.875
PMOS 1.125/0.25 0.7 2.375 0.7 2.375
35Example Components of CL (0.25 ?m)
C Term Expression Value (fF) H?L Value (fF) L?H
Cgd1 2 Cgd0n Wn 0.23 0.23
Cgd2 2 Cgd0p Wp 0.61 0.61
Cdb1 KeqbpnADnCj KeqswnPDnCjsw 0.66 0.90
Cdb2 KeqbppADpCj KeqswpPDpCjsw 1.5 1.15
Cg3 (2 Cgd0n)Wn CoxWnLn 0.76 0.76
Cg4 (2 Cgd0p)Wp CoxWpLp 2.28 2.28
Cw from extraction 0.12 0.12
CL ? 6.1 6.0
36Wiring Capacitance
- The wiring capacitance depends upon the length
and width of the connecting wires and is a
function of the fan-out from the driving gate and
the number of fan-out gates. - Wiring capacitance is growing in importance with
the scaling of technology.
37Inverter Propagation Delay
Propagation delay is proportional to the
time-constant of the network formed by the
on-resistance and the load capacitance
tp f(Ron, CL)
tpLH 0.69 Reqp CL
tpHL 0.69 Reqn CL
- To equalize rise and fall times make the
on-resistance of the NMOS and PMOS approximately
equal.
38Inverter Transient Response (0.25 µm)
VDD 2.5V W/Ln 1.5 W/Lp 4.5 Reqn 13 k?
/1.5 Reqp 31 k? /4.5
tpHL 36 psec tpLH 29 psec tp (3629)/2
32.5 psec
tpHL 39.9 psec and tpLH 31.7 psec
Analysis results is too optimistic 10 better
39Inverter Propagation Delay, Revisited
- To see how a designer can optimize the delay of a
gate, we have to expand Req in the delay equation.
tpHL 0.69 Reqn CL 0.69(3CVDD)/(4IDSATn)
40Minimizing Propagation Delay
- Reduce CL
- Keep the drain diffusion as small as possible
- Increase W/L ratio of the transistor
- Most powerful and effective way
- Watch out for self-loading!
- When the intrinsic capacitance dominates
- Increase VDD
- Trade off energy efficiency for performance
- Very minimal improvement above a certain level
- Reliability concerns enforce a firm upper bound
on VDD
41PMOS-to-NMOS Ratio
- So far PMOS and NMOS have been sized such that
their Reqs match (ratio of 3 to 3.5) - symmetrical VTC
- equal high-to-low and low-to-high propagation
delays - If speed is the only concern, reduce the width of
the PMOS device! - widening the PMOS degrades the tpHL due to larger
parasitic capacitance - ? (W/L)p/(W/L)n
r Reqp/Reqn resistance ratio of
identically-sized PMOS and NMOS
42PMOS-to-NMOS Ratio Effects
tpLH
tpHL
- ? of 2.4 ( 31 k?/13 k?)
- gives symmetrical response
- ?opt 1.6 - 1.9
tp
- When wire capacitance is negligible (Cdn1Cgn2 gtgt
CW), ?opt ?r - If wire capacitance dominates then larger value
of ? must be used
43Device Sizing for Performance
- Divide capacitive load, CL, into
- Cint intrinsic - diffusion and Miller effect
- Cext extrinsic - wiring and fanout
- tp 0.69 Req Cint (1 Cext/Cint) tp0 (1
Cext/Cint) - where tp0 0.69 Req Cint is the intrinsic
(unloaded) delay of the gate - Widening both PMOS and NMOS by a factor S reduces
Req by an identical factor (Req Rref/S), but
raises the intrinsic capacitance by the same
factor (Cint SCiref) - tp 0.69 Rref Ciref (1 Cext/(SCiref)) tp0(1
Cext/(SCiref))
44Observation IV
- Intrinsic Delay of the inverter tp0 is
independent of the sizing of the gate - tp0 can be determined purely by technology and
inverter layout - With no load the increased drive strength of the
gate is totally offset by the increased
capacitance - Any S sufficiently larger than (Cext/Cint) would
yield a much better performance gain with a
substantial area increase
45Sizing Impacts on Delay
- The majority of the improvement is already
obtained for S 5. - Sizing factors larger than 10 barely yield any
extra gain (and cost significantly more area).
46Impact of Fanout on Delay
- Extrinsic capacitance, Cext, is a function of the
fanout of the gate - the larger the fanout, the larger the external
load. - First determine the input loading effect of the
inverter. Both Cg and Cint are proportional to
the gate sizing, so Cint ?Cg is independent of
gate sizing and - tp tp0 (1 Cext/ ?Cg) tp0 (1 f /?)
- The delay of an inverter is a function of the
ratio between its external load capacitance and
its input gate capacitance the effective fan-out
f - f Cext/Cg
47Inverter Chain
- The delay of the j-th inverter stage is
- tp,j tp0 (1 Cg,j1/(?Cg,j)) tp0(1 fj/ ?)
- Overall Delay tp ?tp,j tp0 ? (1
Cg,j1/(?Cg,j)) - If CL is given
- How should the inverters be sized?
- How many stages are needed to minimize the delay?
48Sizing the Inverters in the Chain
- The optimum size of each inverter is the
geometric mean of its neighbors meaning that if
each inverter is sized up by the same factor f
wrt the preceding gate, it will have the same
effective fan-out and the same delay
- where F represents the overall effective fan-out
of the circuit (F CL/Cg,1) - The minimum delay through the inverter chain is
- The relationship between tp and F is linear for
one inverter, square root for two, etc.
49Example Inverter Chain Sizing
f 2
f2 4
- CL/Cg,1 has to be evenly distributed over N 3
inverters - CL/Cg,1 8/1
- f
50Determining N Optimal Number of Inverters
- What is the optimal value for N given F (fN) ?
- If the number of stages is too large, the
intrinsic delay of the stages becomes dominate - If the number of stages is too small, the
effective fan-out of each stage becomes dominate - The optimum N is found by differentiating the
minimum delay expression divided by the number of
stages and setting the result to 0, giving
- For ? 0 (ignoring self-loading) N ln (F) and
the effective-fan out becomes f e 2.71828 - For ? 1 (the typical case) the optimum
effective fan-out (tapering factor) turns out to
be close to 3.6
51Optimum Effective Fan-Out
- Choosing f larger than optimum has little effect
on delay and reduces the number of stages (and
area). - Common practice to use f 4 (for ? 1)
- But too many stages has a substantial negative
impact on delay
52Example Inverter (Buffer) Staging
N f tp 1 64 65
2 8 18
3 4 15
4 2.8 15.3
53Impact of Buffer Staging for Large CL
F (? 1) Unbuffered Two Stage Chain Opt. Inverter Chain
10 11 8.3 8.3
100 101 22 16.5
1,000 1001 65 24.8
10,000 10,001 202 33.1
- Impressive speed-ups with optimized cascaded
inverter chain for very large capacitive loads.
54Input Signal Rise/Fall Time
- In reality, the input signal changes gradually
(and both PMOS and NMOS conduct for a brief
time). This affects the current available for
charging/discharging CL and impacts propagation
delay.
ts input signal slope
- tp increases linearly with increasing input
slope, ts, once ts gt tp - ts is due to the limited driving capability of
the preceding gate
for a minimum-size inverter with a fan-out of a
single gate
55Design Challenge
- A gate is never designed in isolation its
performance is affected by both the fan-out and
the driving strength of the gate(s) feeding its
inputs. (Revised tp expression) - tip tistep ? ti-1step (? ? 0.25)
- Keep signal rise times smaller than or equal to
the gate propagation delays. - good for performance
- good for power consumption
- Keeping rise and fall times of the signals small
and of approximately equal values is one of the
major challenges in high-performance designs -
slope engineering.
56Delay with Long Interconnect
- When gates are farther apart, wire capacitance
and resis-tance can no longer be ignored.
tp 0.69RdrCint (0.69Rdr0.38Rw)Cw
0.69(RdrRw)Cfan where Rdr (Reqn Reqp)/2
tp 0.69Rdr(CintCfan) 0.69(RdrcwrwCfan)L
0.38rwcwL2
- Wire delay rapidly becomes the dominant factor
(due to the quadratic term) in the delay budget
for longer wires.
57Where Does Power Go?
- Static Power Consumption
- Ideally zero for static CMOS but in the real
world.. - Leakage Current Loss
- Diodes and Transistors constantly losing charge
- Dynamic Power Consumption
- Charging/Discharging Capacitances
- Major Source of Power Dissipation in CMOS
Circuits - Direct-Path Current Loss
- Short circuit between Power Rail during Switching
58Dynamic Power Consumption
CL VDD2
CL VDD2 / 2
Pdyn Energy/cycle fclk
CL VDD2 fclk
59Switching Activity
- Power dissipation does not depend on the size of
the devices but depends on how often the circuit
is switched. - Switching Activity ? frequency of
energy-consuming transition f 0?1
Pdyn CL VDD2 f 0?1 CL VDD2
P0?1 fclk Ceff VDD2 fclk
P0?1 0.25, f0?1 fclk / 4
Effective Capacitance Ceff Average Capacitance
Switched per clock cycle
60Example Power Dissipation of an IC
- Consider a 0.25 micron chip, 500 MHz clock,
average load cap of 15fF/gate (for fanout of 4),
2.5V supply. - Dynamic Power consumption per gate is
- Pdyn Ceff VDD2 fclk
- 15 fF (2.5 V)2
500 MHz - 47 ?W
- With 1 million gates and a switching activity of
25 - Dynamic Power of the entire chip is
- Pchip Pdyn Ng Pa
(Ng no. of gates) - 47 ?W/gate 106
gates 0.25 - 11.75 W 12 W
61Lowering Dynamic Power
62Short Circuit Power Consumption
Finite slope of the input signal causes a direct
current path between VDD and GND for a short
period of time during switching when both the
NMOS and PMOS transistors are conducting (active).
63Short Circuit Currents Determinates
Esc tsc VDD Ipeak P0?1 Psc tsc VDD Ipeak f0?1
- tsc Duration of the slope of the input signal
- Ipeak determined by
- the saturation current of the PMOS and NMOS
transistors which depend on their sizes, process
technology, temperature, etc. - strong function of the ratio between input and
output slopes - a function of CL
64Impact of CL on Psc
Large capacitive load
Small capacitive load
Output fall time significantly larger than input
rise time.
Output fall time substantially smaller than input
rise time.
65Ipeak as a Function of CL
When load capacitance is small, Ipeak is large.
Short circuit dissipation is minimized by
matching the rise/fall times of the input and
output signals - slope engineering.
66Psc as a Function of Rise/Fall Times
When load capacitance is small (tsin/tsout gt 2
for VDD gt 2V) the power is dominated by Psc
If VDD lt VTn VTp then Psc is eliminated since
both devices are never on at the same time.
normalized wrt zero input rise-time dissipation
67Static (Leakage) Power Consumption
Pstat VDD Istat
- All leakages increase exponentially with
temperature - Junction leakage doubles every 9C
- Sub-threshold current becomes more concern in
vDSM - The closer the threshold voltage to zero, the
larger the leakage current at VGS 0V (when NMOS
off)
68Leakage as a Function of VT
- Continued scaling of supply voltage and the
subsequent scaling of threshold voltage will make
sub-threshold conduction a dominant component of
power dissipation.
- An 90mV/decade VT roll-off - so each 255mV
increase in VT gives 3 orders of magnitude
reduction in leakage (but adversely affects
performance)
69TSMC Processes Leakage and VT
From MPR, June 2000, pp. 19 Performance of
various TSMC processes (G generic, LP low power,
ULP ultra low power, HS high speed)
70Exponential Increase in Leakages
Leakage currents double every 10 degree increase
in temperature
The Leakage Power is six orders of magnitude
smaller than the dynamic power (at room
temperature)
71Energy and Power Equations
E CL VDD2 P0?1 tsc VDD Ipeak P0?1 VDD
IleakageTclock
P CL VDD2 f0?1 tsc VDD Ipeak f0?1 VDD
Ileakage
Dynamic power (90 today and decreasing
relatively)
Short-circuit power (8 today and decreasing
absolutely)
Leakage power (2 today and increasing)
72Sizing for Minimum Energy
- Goal Minimize Energy of the whole circuit
- Design parameters f and VDD
- tp ? tpref of circuit with f 1 and VDD Vref
Overall Effective Fan-out F Cext/Cg1
Intrinsic Delay of the inverter tp0 VDDt/(VDDt
- VTE)
73Sizing for Minimum Energy II
- Performance Constraint (g1)
- Energy for single Transition
74Sizing for Minimum Energy III
- Optimum sizing occurs at fopt ?F
- Increasing device sizes beyond fopt increase
self-loading factor - Deteriorate performance and require increase in
supply voltage
75Observation V
- Device sizing, combined with supply voltage
reduction, is very effective in reducing the
energy consumption - For F 1, minimum size device is the most
effective - For network with large effective fan-out (F gtgt
1), a large reduction factor of almost 10 can be
obtained. - Oversizing transistors beyond the optimal value
results in a hefty increase of energy - Unfortunately, a common approach in many todays
design - Optimal sizing factor for energy is smaller than
the one for performance (delay), especially for
large F - For a fan-out of 20, fopt(energy) 3.53,
fopt(delay) 4.47
76Power-Delay and Energy-Delay Product
- Power-delay product (PDP) Pav tp (CLVDD2)/2
- PDP is the average energy consumed per switching
event (Watts sec Joule) - Lower power design could simply be a slower
design - Energy-delay product (EDP) PDP tp Pav tp2
- EDP is the average energy consumed multiplied by
the
computation time required - Takes into account that one can trade increased
delay for lower energy/operation (e.g., via
supply voltage scaling that increases delay, but
decreases energy consumption)
77Energy-Delay Plot
VTE (VTn VTp )/2 0.8 V
VDDopt (3/2)0.8 1.2 V
VTn 0.43 V, VDSATn 0.63 V, VTEn 0.74 V VTp
-0.4 V, VDSATp -1 V, VTEp -0.9 V
78Observation VI
- Voltage Dependence of the EDP
- Higher Supply Voltages reduce delay, but harm the
energy. - Vice Versa for low voltages
- VDDopt simultaneously optimizes performance
(delay) and energy - For submicron technologies with VT in the range
of 0.5 V, VDDopt 1V. - VDDopt does not necessarily represent the optimum
voltage for a given design problem - Goal of the design (speed or power) determinates
the supply voltage
79Goals of Technology Scaling
- Make things cheaper
- Want to sell more functions (transistors) per
chip for the same money - Build same products cheaper, sell the same part
for less money - Price per transistor has to be reduced
- But also want to be faster, smaller, lower power
80Technology Scaling
- Goals of scaling the dimensions by 30
- Reduce gate delay by 30 (increase operating
frequency by 43) - Double transistor density
- Reduce energy per transition by 65 (50 power
savings _at_ 43 increase in frequency - Die size used to increase by 14 per generation
- Technology generation spans 2-3 years
81Technology Evolution (ITRS2000)
International Technology Roadmap for
Semiconductors (ITRS) (http//public.itrs.net)
Node years 2007/65nm, 2010/45nm, 2013/33nm,
2016/23nm
82Technology Evolution (1999)
83Technology Scaling Models
- Full Scaling (Constant Electrical Field)
- Ideal model - dimensions and voltage scale
together by the same factor S - Fixed Voltage Scaling
- Most common until recently
- Only dimensions scale, voltages remain constant
- General Scaling
- Most realistic for todays situation
- Voltages and dimensions scale with different
factors
84Scaling Long Channel Devices
85Scaling Short Channel Devices
86Scaling Wire Capacitances
S Technology Scaling, U Voltage Scaling, SL
Wire-length Scaling ?c impact of fringing and
interwire capacitance
Parameter Relation General Scaling
Wire Capacitance WL/t ?c/SL
Wire Delay RonCint ?c/SL
Wire Energy CmV2 ?c/SLU2
Wire Delay/Intrinsic Delay ?cS/SL
Wire Energy/ Intrinsic Energy ?cS/SL
87Power Density vs. Scaling Factor
- Power density increase approximately with S2
- In correspondance with fixed-voltage scaling
- Recent Trend is more in line with Full-scaling
- Constant power density
- Accelerated VDD scaling and more attention to
power-reducing design techniques
88Evolution of Wire Delay and Gate Delay
How the ratio of wire over intrinsic
contributions will actually evolve is debatable
89Looking into the Future (Year 2010)
- Performance 2X/16 months
- 1 TIP (terra instructions/s)
- 30 GHz clock
- Size
- No of transistors 2 Billion
- Die 4040 mm
- Power
- 10kW!!
- Leakage 1/3 active Power
90Some Interesting Questions
- What will cause this model to break?
- When will it break?
- Will the model gradually slow down?
- Power and power density
- Leakage
- Process Variation