Title: Standard Cell Architecture for High Frequency Operation
1Standard Cell Architecture for High Frequency
Operation
Peter Hsu, Ph.D. Chief Architect Microprocessor
Development Toshiba America Electronics
Components, Inc. Created 14 March 2001 at the
University of Wisconsin in Madison
2Disclaimer
- The ideas, data and conclusions presented here
are solely those of the Author, and do not in any
way represent Toshiba Corporation policy or
strategy.
3Introduction
- High Frequency is Difficult!
- Many Issues
- Signal Integrity, Power Dissipation, ...
- My Approach
- Disciplined Methodology
- Global Optimization
- Outline
- Layout
- Circuits
- Analysis
4Layout Strategy
- Leverage Advanced Technologies
- Local Interconnect
- Flip-Chip Area Array I/O
- CAD Tool Compatibility
- Parasitic Estimation, Extraction
- Complex, High Frequency Designs
- Robust Power Grid
- Flexible Macro Embedding
5Metal Usage
Dimensions are for nominal 0.12µm generation
process
Top Metal Flip-Chip Solder Pads
VDD
600nm
450nm
Global Wires
900nm
Signal
Clock (2x)
450nm
300nm
Via
VSS
Short
200nm
Contact
Local Interconnect (M0) Tungsten, Aluminum or
Copper
6Standard Cell Layout
Unrelated Wire
U1.A
U1.Z
Cell Row Power Vias (1 every 6 Tracks)
Crosspoint Power Vias
U2.A
U2.Z
Minimum Cell 3 Tracks
7Area Array I/O
1.2?m
Core VDD
Core VSS
I/O VDD
I/O VSS
Largest SRAM Macro without sacrificing I/O (16
KBytes)
Signal
Cell
2.5?m2
225?m pitch
225?m
5 I/O Macro (50K?m2 )
8I/O Macro Cell
- Self-Contained
- 5 Signals
- VDDQ, VSSQ
- ESD Protection
- Latch-Up Ring
- SoC Flexibility
- Many I/O Types
- Different Voltages
- Routing Porosity
- 50 Channels Free in Global Wiring Layers
- Short Output Trace on Top Metal (Electromigration)
Top Metal M6
M5
Free Routing Channels
M4
M3
I/O Macro Use M0M1M2
9SRAM Metal Usage
M2
M1
6-Transistor Cell (1.2 ? 2.1 ?m )
M3 Global Wires (1? or 2? Pitch)
SRAM Macro Uses M0M1M2
CAD Tool Inserts M3M2 Power Vias
10Word Line Shielding
Signals
Signals
Signals
VDD
VSS
11Rationale
- Effective Area
- Actual Footprint Routing Disturbance
- Larger, More Porous Layout ? Faster
- Bigger Transistors
- More Space around Bit Lines
- Shielding
- SoC
- Complex Microarchitecture
- Many Small SRAMs
12Circuit Design
- Building Blocks
- Latch Array
- Malleable, Porous, Multi-Port SRAM
- Dynamic Wire-OR Gate
- High Fan-in, Safe, CAD Compatible
- Power Dissipation
- Double Edge Flipflop
- ?50 Clock Tree ? ?30 Peak Chip-Wide
- Interpolation Cells
13Latch Array
Write Data
Latch Tristate Driver
Read Address
Write Address
May Buffer during PlaceRoute
Combinatorial Read Path
Write Enable
Read Data
Test Mode
14Dynamic Wire-OR Gate
Sized for Max. Length
- Highest Leverage
- Dynamic vs. Static
- Safe, CAD Compatible
- Limit Wire Length using Timing Driven Placement
- No Dynamic Inputs
Receiver Cell
Keeper
Output
Clock
Max. Length by Max-Load, Max-Transition Spec.
Sized for 1
Input D1
Input DN
Limit Max. N by Max-Fanout Spec.
Clock
Clock
Sized for Max-Fanout
Driver Cell
15Double-Edge Flipflop
- Low Power
- Clock ½ Frequency
- Light Clock Load
- 2 Large 4 Small
- Small, Fast
- 15P 15N Transistors
- Safe, Flexible
- Fully Static
- Supports Scan
D
Q
Ck
Switching Nodes with Constant 1 Data
______ B. Nikolic, et.al., Sense
Amplifier-Based Flip-Flop, ISSCC 1999.
16Interpolation Cells
Full Power
2/3 Power
5/6 Power
For Post Route In-Place Optimization
Same Footprint, Shorter Transistors
1X Cell
2X Cell
4X Cell
17Analysis
- Signal Integrity
- Parasitics Accurate By Construction
- Uniform Metal Density
- Majority Coupling to Power Rails (Shielding)
- Speed Yield
- Balanced with Resources
- Area, Power, Design Time
- Goal Adequate Confidence
18Uniform Metal Density
Algorithmically Generated Filled Metal
Uniform Density on all Layers (except
Local Interconnect)
Post Route Metal Usage
19Advantages
- Design
- Accurate Estimation
- Capacitance has Low Variance
- Known Coupling
- ? 50 to Adjacent Power Line
- Quick Feedback
- Interconnect-Only Extraction is Accurate
- Manufacturing
- Uniform Etch Resist Loading
20Asymmetric Rise-Fall Delays
Delay
Duty Cycle
Slow
Slow
Same Size P Transistors
Same Size N Transistors
Shrink
Elongates
21Pros and Cons
- Advantages
- More Compact Cells, Faster Circuits
- Disadvantages
- Need Careful Analysis, Greater Margin
- Strategy
- Main Library
- Asymmetric, No Wasted Space
- Symmetric Subset
- Gated Clocks, Write Pulse Buffering, ...
22Speed Yield Management
Fast P
Hold Time Failures
Target Design and Characterize Library Here
Four Corner Analysis
Correct Operation
Process Center
Slow N
Fast N
Mature Process Variation
Possibly Impossible to Meet Performance Goal, or
Needlessly High Effort
Setup Time Failures
Maximum Process Variation
Slow P Transistors
23Conclusions
- Precision Physical Design
- Global
- Power Grid
- Macro Routing Porosity
- Methodical
- Signal Integrity
- Parasitic Extraction
- Timing Uncertainties (Coupling)
- Confident
- Correctness and Speed