Title: ECE260B CSE241A Winter 2005 Introduction and ASIC Flow
1ECE260B CSE241AWinter 2005Introduction and
ASIC Flow
Instructor Bao LiuWebsite http//vlsicad.ucsd
.edu/courses/ece260b-w05
Slides courtesy of Prof. Andrew B. Kahng
2 Why not a Silicon Compiler?
Spec/Matlab/VHDL
?
placement
synthesis
verification
routing
Circuit on Silicon
3Teams in a Design Process
- VLSI designers
- CAD developers
- Process people
- Testing team
VLSI designers
Spec/Matlab/VHDL
CAD developers
?
placement
synthesis
verification
routing
Testing team
Circuit on Silicon
Process people
4Class Objectives
- Learn about ASIC implementation flow
Verilog?GDSII - Semi-custom implementation of CMOS digital
circuits, and optimization with respect to
different constraints area, speed, power,
reliability, cost - Understand impact of constraints, tradeoffs,
technology scaling - Get some feel for each phase of the
implementation flow - Learn about building blocks wires, gates,
memories - Prepare for future design experiences
- Get some feel for industry-standard design tools,
libraries - Will mostly use Cadence BuildGates and SOC
Encounter, and Artisan TSMC 0.18/0.13um libraries - Synthesize small cores from RTL into GDSII
5Outline
- Introduction
- Technology Evolution
- Silicon Complexity
- System Complexity
- Design Flows
- Traditional
- State of the Art
- Design Metrics
- Design Closure
6Technology Evolution Cost and Integration Drivers
- Moores Law is about cost
- Increased integration, decreased cost ? more
possibilities for semiconductor-based products - Pentium 4 die shot
2.2cm
Slide courtesy of Mary Jane Irwin, PSU
7Sense of Scale (Scaling)
- What fits on a VLSI Chip today?
- State of the art logic chip
- 20mm on a side (400mm2)
- 0.13mm drawn gate length
- 0.5?m wire pitch
- 8-level metal
- For comparison
- 32b RISC processor
- 8K l x 16Kl
- SRAM
- about 32l x 32l per bit
- 8K x 16K is 128Kb, 16KB
- DRAM
- 8l x 16l per bit
- 8K x16K is 1Mb, 128KB
0.13mm (2 l)
0.5mm (8 l)
64b FP Processor
20mm (40,000 wire pitches) 320,000 l
32b RISC Processor
Slide courtesy of Ken Yang, UCLA
8MOS Transistor Scaling (1974 to present)
S0.7 0.5x per 2 nodes
- Decreased transistor/feature sizes ?
- Increased variability (tox, BEOL, DFM, SEU, etc.)
- Short channel effect, leakage power
Source 2001 ITRS - Exec. Summary, ORTC Figure
9HP / LOP / LSTP Device Roadmaps
10SEMATECH Prototype BEOL stack, 2000
- Reverse-scaled global interconnects ?
- Growing interconnect complexity
- Performance critical global interconnects
11Intel 130nm BEOL Stack
Intel 6LM 130nm process with vias shown
(connecting layers)
Aspect ratio thickness / minimum width
12Interconnect Capacitance Parallel Plate Model
ILD interlevel dielectric
L
W
T
Bottom plate of cap can be another metal layer
H
SiO
ILD
2
Substrate
Cint eox (WL / tox)
13Line Dimensions and Fringing Capacitance
Lateral cap
w
S
- Capacitive coupling ?
- Crosstalk effect
- Signal integrity
14Interconnect Evolution and Modeling Needs
- Before 1990, wires were thick and wide while
devices were big and slow - Large wiring capacitances and device resistances
- Wiring resistance ltlt device resistance
- Model wires as capacitances only
- In the 1990s, scaling (by scale factor S) led to
smaller and faster devices and smaller, more
resistive wires - Reverse scaling of properties of wires
- RC models became necessary
- In the 2000s, frequencies are high enough that
inductance has become a major component of total
impedance
15Evolving Interconnects Affect Timing
- Interconnect capacitance gt gate input capacitance
- Better prediction
- Interconnect resistance no longer ignorable
- Better modeling distributed R(L)C network, AWE,
etc. - Effective capacitance lt total load capacitance
- Interconnect delay gt gate delay for sub-micron
technologies
16Sub-Wavelength Optical Lithography
- What are implications of this picture?
- Slide courtesy of Numerical Technologies, Inc.
17Complexity of Photomasks
18Summary of Technology Scaling
- Scaling of 0.7x every three (two?) years
- .25u .18u .13u .10u .07u .05u
- 1997 1999 2002 2005 2008 2011
- 5LM 6LM 7LM 7LM 8LM 9LM
- Interconnect delay dominates system performance
- consumes up to 70 of clock cycle
- Cross coupling capacitance is dominating
- cross capacitance ? 100, ground capacitance ? 0
- ground capacitance is 90 in .18u
- huge signal integrity implications (e.g.,
guardbands in static analysis approaches) - Multiple clock cycles required to cross chip
- whether 3 or 15 not as important as fact of
multiple gt 1
19New Materials Implications
- Lower dielectric permittivity
- reduces total capacitance
- doesnt change cross-coupled / grounded
capacitance proportions - Copper metallization
- reduces RC delay
- avoids electromigration (factor of 4-5 ?)
- thinner deposition reduces cross cap
- Multiple layers of routing
- enabled by planarization 10 extra cost per
layer - reverse-scaled top-level interconnects
- relative routing pitch may increase
- room for shielding
20Technical Issues
- Manufacturability (chip can't be built)
- antenna rules
- minimum area rules for stacked vias
- CMP (chemical mechanical polishing) area fill
rules - layout corrections for optical proximity effects
in subwavelength lithography associated
verification issues - Signal integrity (failure to meet timing targets)
- crosstalk induced errors
- timing dependence on crosstalk
- IR drop on power supplies
- Reliability (design failures in the field)
- electromigration on power supplies
- hot electron effects on devices
- wire self heat effects on clocks and signals
21Noise
- Analog design concerns are due to physical noise
sources - because of discreteness of electronic charge and
stochastic nature of electronic transport
processes - example thermal noise, flicker noise, shot noise
- Digital circuits due to large, abrupt voltage
swings, create deterministic noise which is
several orders of magnitude higher than
stochastic physical noise - still digital circuits are prevalent because they
are inherently immune to noise - Technology scaling and performance demands make
noisiness of digital circuits a big problem
Courtesy Hormoz/Muddu, ASIC99
22Silicon Complexity Challenges
- Silicon Complexity impact of process scaling,
new materials, new device/interconnect
architectures - Non-ideal scaling (leakage, power management,
circuit/device innovation, current delivery) - Coupled high-frequency devices and interconnects
(signal integrity analysis and management) - Manufacturing variability (library
characterization, analog and digital circuit
performance, error-tolerant design, layout
reusability, static performance verification
methodology/tools) - Scaling of global interconnect performance
(communication, synchronization) - Decreased reliability (soft error uncertainty,
gate insulator tunneling and breakdown, joule
heating and electromigration) - Complexity of manufacturing handoff (reticle
enhancement and mask writing/inspection flow,
manufacturing NRE cost)
- If you dont know a term, ask
23In a PDA
- Reference Design personal digital assistant
(PDA) - Composed of CPU, DSP, peripheral I/O, and memory
24 Required Performance for Multi-Media Processing
GOPS
0.01
0.1
1
10
100
Video
MPEG1 Extraction
Compression
MPEG2 Extraction
MP/ML
MP/HL
MPEG4
JPEG
Audio Voice
Sentence Translation
Dolby-AC3
Voice Auto Translation
MPEG
Word Recognition
Graphics
3D Graphics
10Mpps
100Mpps
2D Graphics
Communication Recognition
SW Defined Radio
VoIP Modem
Face Recognition
Modem
Voice Print Recognition
Moving Picture Recognition
FAX
GOPS Giga Operations Per Second
25Implemented With an SoC
0.18um / 400MHz / 470mW (typical)
MM Application MP3 JPEG Simple Moving
Picture
CPG
PWR
Processor Area
PWM
RTC
CPU
FICP
SSP
6.5MTrs.
I2C
I-cache 32KB
D-cache 32KB
GPIO
Sound
Max 400MHz
USB
USB
OST
Specification
DMA controller
MMC
MMC
I2S
Available Time 6-10Hr
LCD Cnt.
MEM Cnt.
KEY
UART
AC97
Data Transfer Area
Flash 32MB
LCD
SDRAM 64MB
Peripheral Area
100MHz
4 48MHz
- If the PDA must have 200h standby time with a
120g battery ?
26System Complexity Challenges
- System Complexity exponentially increasing
transistor counts, with increased diversity
(mixed-signal SOC, ) - Reuse (hierarchical design support, heterogeneous
SOC integration, reuse of verification/test/IP) - Verification and test (specification capture,
design for verifiability, verification reuse,
system-level and software verification, AMS
self-test, noise-delay fault tests, test reuse) - Cost-driven design optimization (manufacturing
cost modeling and analysis, quality metrics,
die-package co-optimization, ) - Embedded software design (platform-based system
design methodologies, software verification/analys
is, codesign w/HW) - Reliable implementation platforms (predictable
chip implementation onto multiple fabrics,
higher-level handoff) - Design process management (team size / geog
distribution, data mgmt, collaborative design,
process improvement)
27Outline
- Introduction
- Technology Evolution
- Silicon Complexity
- System Complexity
- Design Flows
- Traditional
- State of the Art
- Design Metrics
- Design Closure
28Levels of VLSI Design in a Traditional Flow
- Specification
- what the system (or component) is supposed to do
- Architecture
- high-level design of component
- state defined
- logic partitioned into major blocks
- Logic
- gates, flip-flops, and the connections between
them
- Circuit
- transistor circuits to realize logic elements
- Device
- behavior of individual circuit elements
- Layout
- geometry used to define and connect circuit
elements - Process
- steps used to define circuit elements
29Design Principles (Traditional)
- Partition the problem (hirarchical design)
- Different abstraction levels RTL, gate-level,
switch-level, transistor-level - Orthogonize concerns
- Abstraction vs. implementation
- Logic vs. timing
- Constrain the design space to simplify the design
process - Balance between design complexity and performance
- E.g., standard-cell methodology
30VLSI Design Flow Evolution
- Expanding in two directions
- System-on-Chip (SoC) Design
- Design for Manufacturability (DFM)
- More design metrics
- Area
- Timing
- Power
- Signal Integrity
- Reliability
- Tighter Integration
- Design closure
- RTL/GDSII sign-off re-defined
31Design Procedure and Tools
- Behavior modeling
- Matlab/C/VHDL
- Logic synthesis
- DesignCompiler, BuildGates,
- Verification of synthesis
- Formal Verification (Verplex)
- Static timing analysis (PrimeTime)
- Place and route
- Astro, SOCE,
- Verification of layout
- DRC, ERC, LVS (Calibre)
- Extraction (SignalStorm)
- Delay Calculation (CeltIC)
- Simulation (SPICE)
- DFM
32Design Principles(State of the Art)
- Integrate the problem (design closure)
- Back-annotation, predictability
- Balance design metrics
- Area/timing/power/signal integrity/reliability
- Explore the design space
- Balance between design complexity and performance
- Platform-based SoC design
33Design Methodologies ( business models)
- Full-Custom (high effort, leading-edge
performance, high-volume) - Semi-Custom (strong infrastructure, economical in
lower volumes) - ASIC (Application-Specific Integrated Circuit)
- Standard Cell/Gate Array/Via Programmable/Structur
ed ASIC - FPGA
- Special
- Analog (custom layout, I/Os and sense amps)
- Mixed-Signal / RF (unique to each process, no
scaling) - System-on-Chip (? System-in-Package)
- Various components IP blocks, ASIC, FPGA,
memory, uP, RF, etc. - Define implementation platform, hardware-software
co-design - Performance vs. complexity
34Flow
Standard Cell Library
Wire Model
Device model
Schematic Entry
r,s, m
3-D RLC Modeling Tool
Cell Characterization
Layers
Layout Entry
Layout rules
Synthesis Library (Timing/Power/Area)
Parasitic Extraction Library
Place Route Library (Ports)
C-Model
Verilog Behavioral Model
Structural Model
Global Layout
Block Layout
Synthesis
P R
Verilog Structural RTL
Floorplan
Floorplan
P R
DRC/ERC/LVS
Static/Dynamic Timing w/extract
Functional
Functional
Power/Area
Scan/Testability
Static Timing
Clock Routing/Analysis
Slide courtesy of Mary Jane Irwin, PSU
35Traditional Taxonomy
Front End
Back End
36Generic Flow Steps
- Library preparation
- Library data preparation
- Design data preparation
- Logic design
- Specification to RTL
- RTL simulation
- Hierarchical floorplanning
- Synthesis
- Formal verification
- Gate level simulation
- Static timing analysis
- Physical design
- Physical floorplanning
- Place and route
- RC extraction
- Formal verification
- Physical verification
- Release to manufacturing
- Design for test
- Engineering change order
37Library and Design Data
- Models and technology data required to execute
the design flow - Power, timing ALF, DCL, OLA, .lib, STAMP
- Layout LEF, DEF, GDSII
- Delays and path timing, parasitics SDF, GCF,
SDC, DSPF, RSPF, SPEF, SPICE - Layout rules Dracula, Calibre deck
38Architecture Design
- Platform-based SoC Design
- Platform is a library of design resources
- Helps design space exploration
- Meet in the middle
- Embedded system
- Hardware-software co-design
Application space
Application instance
Platform specification
System platform
Platform design-space exploration
Platform instance
Architecture space
Figure courtesy of Alberto Sangiovanni-Vincentelli
, UCB
39High-Level Synthesis (Behavior ? RTL)
- Scheduling
- Assignment of each operation to a time slot
corresponding to a clock cycle or time interval - Resource allocation
- Selection of the types of hardware components and
the number for each type to be included in the
final implementation - Module binding
- Assignment of operation to the allocated hardware
components - Controller synthesis
- Design of control style and clocking scheme
- Compilation
- of the input specification language to the
internal representation - Parallelism extraction
- usually via data flow analysis techniques
40Architecture Level Floorplanning
- Defines the basic chip layout architecture
- Define the standard cell rows and I/O placement
locations - Place RAMs and other macros
- Separate gate array, memory, analog, RF blocks
- Define power distribution structures such as
rings and stripes - Allow space for clock, major buses, etc.
- Rules of thumb for cell density are used to
initially calculate design size
41Logic Synthesis
- Conversion of RTL to gate-level netlist
- Targeted to a foundry-specific library
- Can be performed hierarchically (block by block)
- Timing-driven
- Clock information
- Primary input arrival times, primary output
required times - Input driving cells, output loading
- False paths, multi-cycle paths
- Interconnect delay may be calculated based on a
wireload model which uses fanout to estimate
delay - Clock parameters (insertion delay, skew, jitter,
etc.) are assumed to be attainable later in place
and route
42Formal Verification
- RTL description and gate level netlist are
compared to verify functional equivalence,
thereby verifying the synthesis results - Formal methods
- Graph isomorphism
- Binary Decision Diagram (BDD)
- Emerging technology that supplements the more
traditional gate-level simulation approach - FV also performed after place-and-route (if gate
netlist changes)
43RTL Simulation
- RTL code, written in Verilog, VHDL or a
combination of both, is simulated to verify
functional correctness - Testbenches apply input stimulus to the design
- Several methods are used to verify the outputs
- Self-checking testbenches automatically verify
output correctness and report mismatches - Results can be stored in a file and compared to
previous results - Waveform displays can be used to interactively
verify the outputs
44Gate-Level Simulation
- Covers both functionality and timing
- Correctness is only as good as the test vectors
used - Especially critical for non-synchronous designs,
verification of false path and multi-cycle path
constraints - Cell timing is included in the simulation models
and interconnect delay is passed from the
synthesis run - Worst case PVT conditions are used to analyze for
setup violations, and best case PVT conditions
are used to analyze for hold violations - PVT Process, Voltage, Temperature
45Static Timing Analysis
- Verifies that design operates at desired
frequency - Implicitly assumes correct timing constraints
(!), e.g., boundary conditions - Timing constraints are similar to those used by
logic synthesis - Verifies setup and hold times at FF inputs can
also check timing from and to PIs and POs can
also check point-to-point delay values (with
blocking of pins, etc.) - As with gate-level simulation, both best- and
worst-case analysis is performed - Typically performed on full-chip (not block)
basis - May require modified constraints for inter-block
issues multiple clock domains, multi-cycle
paths, etc. - For compatibility with timing-driven layout flow,
helps to have simple / single set of constraints - Other issues incremental analysis,
46Block-Level Physical Floorplanning
- Reconcile logical and physical hierarchies
- Cells that are interconnected want to be close
together - Take advantage of RTL hierarchy
- Generate a physical hierarchy
- RTL hierarchy best physical hierarchy?
- Often bundled within the same cockpit as the
place and route tool - Give placement some initial clues to reduce
complexity
47Place and Route
- Automatically place the standard cells
- Generate clock trees
- Add any remaining power bus connections
- Route clock lines
- Route signal interconnects
- Design rule checks on the routes and cell
placements - Timing driven tools
- Require timing constraints and analysis
algorithms similar to those used during the
static timing analysis step
48RC(L) Extraction
- Calculate resistance and capacitance (and
inductance) of interconnects - Based on placement of cells
- Routing segments
- Calculate capacitive (inductive) effects of
adjacent segments - Extract capacitance between metal segments
- RC(L) data transferred back to
- Static timing analysis (back annotation)
- Gate level simulation
- Replaces wire load model used in synthesis
- Drive delay calculation, signal integrity
analysis (crosstalk, other noise), static timing - Q How do parasitics and noise affect
performance?
49Physical Verification
- DRC Design Rule Check
- Spacing, min dimension rules
- LVS Layout Versus Schematic
- Verifies that layout and netlist are equivalent
at the transistor level - Electrical Rule Check
- Dangling nets, floating nodes
- GDSII (Stream Format)
- Final merge of layout, routing and placement data
for mask production
50Release to Manufacturing
- Final edits to the layout are made
- Metal fill and metal stress relief rules are
checked - Manufacturing information such as scribe lanes,
seal rings, mask shop data, part numbers, logos
and pin 1 identification information for assembly
are also added - DRC and LVS are run to verify the correctness of
the modified database - Tapeout documentation is prepared prior to
release of the GDSII to the foundry - Pad location information is prepared, typically
in a spreadsheet - Cadences Virtuoso is used for custom-manual
edits of the mask layers - Manufacturing steps
- generation of masks
- silicon processing
- wafer testing
- assembly and packaging
- manufacturing test
51A More Detailed Design Flow
- Architectural optimization (timing)
- Inter-group buses, bandwidth
- Clock, SI, test validation
Design Specs
Fnl. Design
Constraints
Synthesis
Lib.CWLM
- Floorplanning and custom WLM
- Power distribution (Internal, I/O)
- I/O driver, padring design
- Board-level timing, SI
Floor-plan PG
Lib.CWLM
Placement
Physical re-synth
- Row definitions
- Placement of cells
- Congestion analysis
Clock distribution
Route, scan re-order
- Placement-based re-synthesis
- Noise minimization, isolation
- Clock distribution
Timing analysis, IPO
A. Khan, Simplex/Altius
Fnl., pwr., SI ECO
- Full routing
- Scan stitching, re-ordering
Reqmts.
ERC, DRC, LVS
- Full RC back-annotation
- Hierarchical timing, electrical and SI analysis
and IPO/ECO
Tape-out
52Outline
- Introduction
- Technology Evolution
- Silicon Complexity
- System Complexity
- Design Flows
- Traditional
- State of the Art
- Design Metrics
- Design Closure
53More Design Metrics and Techniques
- Cost minimization
- Synthesis (technology mapping)
- Placement, routing
- Performance optimization
- Logic transformation, transistor sizing
- Buffering, re-routing
- Power minimization
- Gating (sleep transistors), variant Vdd
- Process optimization
- Dual-Vth
- Signal Integrity
- Sizing, net ordering, shielding
- P/G design, placement, synthesis
- Reliability
- Statistical design optimization
- Design margin
- Area
- Cell area
- Wirelength
- Timing
- Gate
- Interconnect
- Power
- Dynamic
- Static
- Leakage
- Signal Integrity
- Crosstalk (capacitive, inductive)
- Supply voltage drop (IR drop, LdI/dt)
- Reliability
- Variation (Vdd, thermal, process variation (tox,
BEOL)) - Electromigration
- Hot electron effect (SEU)
54Design Flow Evolution (ITRS-2003)
55Design Convergence Drivers and Approaches
56Wireload Model
- Helps delay estimation at synthesis stage
- Gate delay f(input slew, load cap)
- Wire cap f(fanout number)
- Empirical
- Different for each technology, library, tool,
design, and design stage - Statistical (from library), custom (multiple
iterations), structural (look at adjacent nets) - Large deviation remains
- Routing obstacles (hard IP blocks, macros, etc.)
- Routing algorithms/implementations (timing
driven, net ordering, details)
57Interconnect Statistics
- What are some implications?
58Rents Rule
- Power law distribution
- N Gp
- N number of nets
- G number of gates
- p Rent exponent between 0 1
- Foundation of statistical interconnect prediction
- Empirical, unclear theoretical root
lgN
lgG
59Constructive Interconnect Prediction
- Statistical models have their limitations
- Critical paths and the law of small numbers
- Statistics properties, e.g., average wirelength
- Extreme statistics properties, e.g., critical
path length - Implementation details
- Routing congestion, e.g., horizontal effect
- Timing optimization, e.g., layer assignment
- Via blockage, pin accessability, wrong way
routing, etc. - Predict by construction (physical synthesis)
- try a fast (global) router
- Scheffer and Nequist, Proc. ACM SLIP 2000, pp.
139-144
60Goal Design Convergence
- What must converge?
- logic, timing, power, SI, reliability in a
physical embedding - support front-end signoff with a predictable
back-end - Achieve Convergence through Predictability
- correct by construction (assume, then enforce)
- constraints and assumptions passed downstream
not much goes upstream - ignores concerns via guardbanding
- separates concerns as able (e.g., FE logic/timing
vs. BE spatial embedding) - construct by correction (tight loops)
- logic-layout unification synthesis-analysis
unification, concurrent optimization - elimination of concerns
- reduced degrees of freedom, pre-emptive design
techniques - e.g., power distribution, layer assignment /
repeater rules
61Physical Prototyping Philosophy
- Prototype delivers accurate physical data
- Levels of accuracy
- Placement-acknowledgeable synthesis (PKS)
- Including global route
- Post-detailed-route (In-Place Optimization, i.e.,
IPO) - Hierarchical timing budgeting
- Chip-level CTS, top-level route and IPO, power
analysis and grid design - Block-level synthesis, placement, IPO, routing
- Handoff with enough physical information to
ensure correct results
RTL
Functionality known
Gates
Physical Prototype
Timing / routability known
Floorplan / Placement
Routing
M. Courtoy, Silicon Perspective
62Coarse Placement Drives Partitioning, Coarse
Routing Drives Pin Assignment / Timing Opt
- Full-chip prototype results in optimal pin
placement - Results in narrower channels and reduced die size
- Reduces the routing congestion
- Improves the chip timing
- Accurate timing budgets result in predictable
timing convergence
Physical Prototype
M. Courtoy, Silicon Perspective
63Cool Pictures of the Pieces
Full Chip Power Planning
Timing Closure
Place Detailed Trial Route RC Extraction Delay
Calc / STA IPO
Hierarchical Clock Tree Synthesis
Full Chip Physical Prototype
Power IR Drop Analysis
Block-Level Optimization
Partition
M. Courtoy, Silicon Perspective
Tape Out Every Day