Variation Tolerant Analog and Digital Design Methodologies

About This Presentation

Title:

Variation Tolerant Analog and Digital Design Methodologies

Description:

Regular and restricted design rule (RDR) logic fabrics ... 2/3rd Annular. s = 0.75. Pileggi 6. www.c2s2.org. Example: Gridded M1 Patterns. Pattern B ... – PowerPoint PPT presentation

Number of Views:65

Avg rating:3.0/5.0

Slides: 52

Provided by: rut128

Category:

more less

Transcript and Presenter's Notes

Title: Variation Tolerant Analog and Digital Design Methodologies

1
Variation Tolerant Analog and Digital Design
Methodologies

Larry Pileggi
Carnegie Mellon
pileggi_at_ece.cmu.edu

2
Preview

Controlling the dominant (systematic) variations
Regular and restricted design rule (RDR) logic
fabrics
Methodologies and circuits that are optimized for
regular fabrics
Analog/RF regularity
Stochastic design methods for random variations
Modeling circuit-level variability
SRAM statistical modeling and design
Analog/RF stochastic design

3
Sub-65nm CMOS Challenges

Design and manufacturing costs are now
prohibitive
Printability limited bysub-wavelength
lithography
Standard layout rules become insufficient
First eliminate systematic variability, then
address random variability

4
Gridded RDRs

Return to the past? ? gridded, fixed pitch
layouts
The translation of stick-layouts to gds2 patterns
dictates the required rules and layout density

5
Example Gridded M1 Patterns

Example rules for contacts/vias at line-ends
Which micro-regular pattern is more
manufacturable?

Pattern B
Pattern A
100nm spacing
6
Example Gridded M1 Patterns

Exploiting the slightly tighter line-ends with
pattern B can improve area particularly with
gridded layout
Relies on ability to characterize all possible
patterns

Pattern B
Desired Process Window
7
Reduced Number of Patterns

Micro-regular patterning can reduce number of
unique patterns and reduce systematic variability
But the macro-regularity, or the way we group
subtle patterns can be equally important
At what node do we benefit from limiting both
micro-regular patterns and macro-regular
groupings?
How can we best utilize this regularity?

8
Macro-Regular Predictability

Ex SRAM-layout specific SPICE models are
required for design closure of CMOS SRAMs
Statistical transistor models (90nm) based on
all possible patterns produce a much wider
noise margin distribution

s 0.060
s 0.026
SRAM-layout-specific models
DR-compliant-layout models
9
Regularity Simplifies Rules Qualification

Standard design rules created for worst case
SRAM rules created for specific patterns
Rules can be simplified (pushed) for regular
patterns with knownneighborhoods
If we can pre-qualify all regular patterns, there
is less need to pre-qualify all logic cells
Can now derive application-domain-specific logic
for improved logic efficiency and density
Requires methodology based on micro- and
macro-regularity

10
Logic Bricks for Macro-Regularity

To control the number of geometry patterns that
must be pre-qualified we can implement logic from
larger cells (bricks)
Reduces the number of edge patterns

11
CMU Experimental Brick Flow
12
ARM926EJ Example

65nm Low Power CMOS
Std Cell Spec Design
16KB D cache, 32KB I cache
250MHz worst case
Area 1.1323 mm2
Bricks derived from7 fixed-size primitives
AO22, AO12, Nand2, Nor2, And2, 41 Mux
3 Flip Flop types
Various INV sizes for buffering
16 fixed-size application-specific bricks
Compatible boundary INVs, NANDs and NORs

Identical to std cell footprint
25 man-weeks of design time
First-pass working silicon

16KB D cache 32KB I cache 250MHz worst case Area
1.1323 mm2
UnidirectionalMicroRegular Fabric
13
ARM926EJ Results

Std cells based on sizing and resynthesis using
complete library
Results do not reflect improved control of
variations, or possible improvement with
brick-specific synthesis and design flow
Normalized Leff comparison based on ACLV
simulations at nominal process conditions for DFF
cell vs. brick

14
Regular Bricks vs. Non-Regular Std Cells

DR compliant, unidirectional FEOL pattern bricks
incur 15-75 area penalty vs. non-regular
standard cells at 65nm
Simple patterns allow pushed line-end rules for
area improvement
Can merge diffusions within large brick functions

1.2
Normalizedto non-regularpattern std cells
Normalizedto non-regularpattern std cells
1.1
1.0
Pushed-Rule Bricks
0.9
Normalized Area
0.8
0.7
0.6
0.5
0
1
6
8
9
11
12
14
15
17
18
Brick Index
15
Regular Bricks vs. Non-Regular Std Cells

DR compliant, unidirectional FEOL pattern bricks
incur 15-75 area penalty vs. non-regular
standard cells
Simple patterns allow pushed line-end rules for
area improvement
Can merge diffusions within large brick functions
Transistor-level optimization (TLO) of large
brick functions offers further improvement

Normalizedto non-regularpattern std cells
Pushed-Rule Bricks
Bricks w/ Manual TLO
16
Baking Bricks

Can derive application-specific bricks that are
constructed from pre-qualified regular patterns
TLO can provide significant improvement for large
non-traditional logic functions
Some very efficient transistor-level
implementations are possible for certain
application-domain specific designs

Example Brick F ab bc bd acd Synthesis
with a standard cell library requires 18
transistors vs. 10 for this implementation
17
Mapping to Bricks

Difficult to add all possible large logic
functions to standard cell libraries technology
mapping algorithms would struggle

Function ABC(DEFG) 20 transistors (2 AO22)4
stages of logic
Function ABC(DEFG) 16 transistors2 stages of
logic
18
Logic/Pattern Co-Optimization
Gate-based Micro-Regular Gridded Layout
Micro-Regular Layoutof Gridded TLO Brick
19
Application-Specific TLO

Can also attempt to extract functions that are
particularly efficient for a specific fabric or
application
Example b0 p0p1 p2p3 p1p2p4 p0p3p4

12 transistors, 2 logic stages
26 transistors, 4 logic stages
20
Logic BRIX

Greater advantages below 65nm as methodologies
and mapping algorithms accommodate bricks
Beta version of commercial flow has demonstrated
the benefits of pre-qualified patterns and TLO

Courtesy of PDF Solutions pdBRIX
21
Analog and Mixed-Signal

Same lithography setup must work for analog and
mixed-signal components (SRAM)
SRAM has always been macro-regular, now becoming
more micro-regular
Analog layout has always been regular to control
systematic mismatch
Random variations now become dominant

22
Random Variations

Random variations most prominent for min-sized
FETs
E.g. Line edge roughness is most dominant for
min length FET
Wider FETs reduce variation via Central Limit
Thereom

W
W0
45
50
55
45
50
55
L (nm)
Distribution of DL variation
Distribution of avg. length
23
Stochastic Design SRAMs

SRAM timing is determined by small FETs in
bit-cells

BL
BL
_
Core Cell
WL
Core Cell
Core Cell
Column Mux
Replica path
SA
SAEN
Waveforms sampled from90nm CMOS low-swing
bitlineSRAM testchip (in collaboration with
Prof Ken Mai, CMU)
OUT
OUT
_
24
Replica Bitline (RBL)

Conventional RBL chooses a fixed number of driver
cells to partially average out the randomness
Increasingly difficult as random mismatch becomes
more dominant

25
Configurable Replica Bitline (CRBL)

Instead select a subset of potential driver cells
(post-manufacturing) that best average out
randomness

26
Configurable Replica Bitline (CRBL)

Post manufacturing selection provided a 100ps
tuning range using 3 cells selected from 10
candidates in 90nm testchip
Randomness provides for wider tuning range

100ps
27
RBL vs. CRBL

Simulations of read path for a commercial 65nm
SRAM design

Replica Path
Replica Path
Global Only ? 0.91
Global Local ? 0.41
Read Path
Read Path
RBL vs. CRBL (3 of 5 cells)
RBL vs. CRBL (3 of 10 cells)
RBL Delay w/o mismatch
RBL Delay w/o mismatch
Configurable RBL Delay w/ mismatch
Configurable RBL Delay w/ mismatch
28
Capturing the System Level Impact

Build statistical response surface models (RSMs)
to compare and optimize designs
Example SRAM self-timing
Self-timing circuit must track bitcell delay
Self-timing delay is part of READ delay
Buffer chain (BUF)
Insensitive to intra-die variations
Poor tracking of inter-die and environmental
variations
Replica bitline (RBL)
Better tracking for inter-die and environmental
variations
More sensitive to mismatch
Configurable Replica bitline (C-RBL)

29
Monte Carlo (Statistical) Analysis

Monte Carlo analysis
Randomly select M samples for e1, e2,...
Evaluate circuit performance at each sampling
point
Estimate performance distribution using the M
samples

model NMOS bsim4 typen tox 4e-9 1e-10?1
1.3e-10?2 ... vth0 0.6 0.24?1
0.3?2 ... ...
?1, ?2, ?3, ...
Simulator
Performance Distribution
30
Monte Carlo Samples

Applying MC at system level can be run-time
costly
1k 10k sampling points are typically required
to achieve reasonable accuracy
Even with 10k sampling points, an accurate result
is not guaranteed!
MC analysis is random, and you can be unlucky
with samples (especially for results beyond the
/-3 sigma range)
Controlling sampling points is often important,
especially for circuits like SRAMs

31
Importance Sampling

Brute-force MC simulation is impractical for rare
events
If Pr Performance lt SPEC lt 10-6, at least
million samples required to observe this event in
MC simulation
Idea Bias the random sample generation in such a
way to observe rare events with a much smaller
number of samples
Build a response surface model (RSM) to identify
the failure space for MC analysis

32
Response Surface Modeling (RSM)

Approximate the performance of interest (e.g.,
delay, power, gain, etc.) as an analytic
function of process parameters
Can cover local variations of process parameters
/-30
Use linear or quadratic functions to approximate
the corresponding local variation
Fitting RSM to samples, then performing MC on
RSM, can be more efficient than direct MC if the
number of variables (N) is small
The number of sampling points must be equal to or
greater than the number of unknown model
coefficients to fit RSM
Linear RSM contains N 1 model coefficients
Quadratic RSM contains N(N1)/2 N 1 model
coefficients
PWL RSMs stitched together can be used to cover a
larger space

Local RSM, p(X)
33
Statistical Response Surface Models

Given set of correlated inter-die variations and
set of spatially correlated intra-die variations,
build statistical RSM
Fitted analytical performance model based on
well-chosen simulation samples
Accuracy depends on model complexity Linear,
quadratic, piecewise-linear,..
Include uniform distribution orcorner models for
VDD andtemperature

34
Model Explosion

Statistical device models can be extremely
complex
Over 300 random ?s (inter-die) for a 65nm
process
Mismatch modeling can require 1020 additional
?s for every transistor
If the number of variables is large, first
convert set of correlated random variables to
independent set of random variables
Simple example

?VTH,NL and ? VTH,NR are correlated ?VTH,NL
y1y3 ? VTH,NR y2y3
35
PCA

Principal Component Analysis (PCA) does this in a
generalized way for jointly normal random
variables
Apply eigen decomposition to produce a new
(possibly smaller) set of parameters that are
uncorrelated
Similar to finding the orthogonal basis of a
vector space

Dx correlated parameters Dy uncorrelated
parameters
36
Reducing Number of Variables

If some of the eigenvalues are small, they can be
removed to reduce the random space dimension
Allows us to use a compact set of independent
random variables to approximate the original
high-dimensional space
Most large problems tend to be rank deficient

J. Friedman and W. Stuetzle, Projection pursuit
regression, Journal of the American Statistical
Association, vol. 76, no. 376, pp. 817-823,
1981 X. Li, J. Le, L. Pileggi and A. Strojwas,
"Projection-based performance modeling for
inter/intra-die variations," ICCAD, pp. 721-727,
2005
37
Statistical Modeling Example

Constructed PWL RSMs to compare designs
Buffer chain (BUF)
Replica bit line (RBL)
Configurable Replica bit-line (C-RBL)
Applied MC analysis to the region of most likely
failures

38
M-C Simulation Results for 65nm CMOS

Comparison of self timing architectures (results
based on 0.98 success rate at chosen frequency)

39
Optimizing Designs

Can we use RSMs to optimize the designs over
local statistical parameter space?
Formally find the optimum set of design
variables which minimizes a cost function and
meets a set of specifications
Both the objective and the constraint become
stochastic
Choice of optimization algorithm would depend on
the objective and constraint functions

40
Example Sense Amp Optimization

Random offset impacts the self-timing of the READ
Build RSM of offset that is dominated by VTHn
variations
Simulations suggest a linear relationship between
offset and VTHn

65nm Latch type sense amp
Based on 1000 MC samples
41
Random Input Offset

There is little or no correlation for VTHp and
other variables
Offset is less sensitive to these other
variations even across different precharge
voltages

65nm Latch type sense amp
Based on 1000 M-C samples
42
Optimization

Since dominant variation parameter shows linear
relationship, a simple linear RSM model can be
used to optimize sizing of NFETs (N1 and N2) and
PFETs (P1 and P2)
Voffset aDiff(Vtn) bDiff(Vtp) c
Model has less than 3 error
Vtn and Vtp are incorporated as independent and
Gaussian

43
Measurement Results

Large input offset voltage variation for 65nm
Optimized circuit 10 larger gate area, 25
lower offset
Measured data based on 14K SAs from 20 different
chips

44
Simulation Results

Comparison of simulation and measurement as a
function of precharge voltage, Vpc
Optimized circuit has been desensitive to
variations, including precharge

45
Pelgrom Model

It is well known Pelgrom that increasing the
device sizes will tend to average out random
variations
For random threshold variation, Pelgrom showed
that the (uncorrelated) variance improvement is
proportional to WL

Pelgrom et al, Matching Properties of MOS
Transistors, IEEE JSSC, vol 24, no. 5, Oct. 1989.
46
Results Comparison

How does Pelgrom model compare as a function of
precharge
Accuracy of the model depends on the region of
operation
Pelgrom model only applies if performance
variation is dominated by mismatch of 2 xtors

47
Analog/RF Design in Scaled CMOS

As CMOS continues to scale, oversizing
transistors can potentially cancel any benefit of
moving to the next generation technology
Example Pelgrom modelanalysis of a 65nm
differential pair
Mismatch improves slowly with increasing
transistor size 1/sqrt(area)

48
Sizing via Selection of Elements

Start with regular fabric of analog
sub-components but select only a subset of
themfor precision matching
Ex open-loop amp for pipeline ADC mismatch in
65nm CMOS
Select some (1/2) rather than all subcomponents
to minimize offset

49
Post-Silicon Element Selection for Mismatch

Some circuit overhead required to implement
post-silicon tuning
But with further scaling, post-silicon tuning
might be the only way to meet specs and reap the
benefits of next gen technology
Example Exponential vs. sqrt improvement
(Pelgrom model)with area for 65nm open-loop
amplifier

50
Conclusions

Regular patterning for logic, memory and analog
becomes increasingly important below 65nm
New circuits and methodologies can exploit this
regularity for improved performance
As systematic variations are better controlled,
random variations will become dominant
Stochastic design methods will be needed to
produce competitive chips
Configurable and tunable circuits will become
more imperative particularly for analog and
mixed-signal

51
(No Transcript)

Write a Comment

User Comments (0)