Title: IBM Microprocessor Global Clock Directions
1IBM Microprocessor Global Clock Directions
Phillip Restle IBM T. J. Watson Research
Center Yorktown Heights, NY restle_at_us.ibm.com S
ome of the animations shown are available
through http//www.research.ibm.com/people/r/restl
e
Interconnect Focus Center (IFC) Quarterly
Workshop Optical and Electrical High-Speed
Digital Clocking Saturday, December 7,
2002 Stanford University Packard Building Room 101
2Outline
- Goals and Trends
- Interconnect Design and Modeling
- Past clock strategies issues
- Trees
- Grids
- Grid-Trees
- Future
- Major concerns with present methods
- Resonant?
- Optical?
3Clock Distribution Goal Definitions
- Generate and distribute clock signal(s) to every
latch and clocked dynamic gate - (Most chips are still quite synchronous)
- Simultaneous everywhere (no skew)
- Periodic everywhere (no jitter)
- note David Harris skew my (skew jitter)
- Use minimum power and resources (wire tracks,
cost) - Impervious to ACLV and Vdd noise
4Does Jitter Matter?
- Jitter (cycle compression) from a 4-cycle clock
distribution is large, but may NOT impact
performance of on-chip paths - Assumes clock path and data paths track similarly
with Vdd (O.K.) - Assumes Vdd averaged over clock latency time does
not vary across chip (not O.K. !) - Vdd variations create skew
5Does Skew Matter?
- Corner-to-corner skew across chip is now
irrelevant - Early-Mode (short-path) skew
- distance lt 10 of cycle
- Late-Mode (long-path) skew
- distance lt100 of cycle
Chip Size
early mode skew box
late mode skew box
6Worst case long-path scenario
Clk
- Launching Clock late
- Logic slow done in low Vdd region
- Receiving Clk early
- Long path fails
late Clk
early Clk
Logic
High Vdd
Low Vdd
7Basic structure of global clock wires Coplanar
Transmission Lines (fully shielded)
Gnd
Vdd
Vdd
Clock
Clock
Shield
Shield
Shield
1lm Cu
Orthogonal wiring
8Current Voltage Visualizations
- Use Z axis for Voltage
- Use Diameter for Current (color for sign)
I
Volts (Vdd)
X
Y
9Frames from animation
50 ps
- 1.6 um wide 1 cm long wire in good ground mesh
100 ps
(animation)
10Return Path Discontinuity
1 cm long, 2.4lm wide (wafer)
Source
Need PEEC to analyze this complex 3D case
11Discontinuous Power Grid (with L)
V
X
Y
- Return current must detour around cut
(Animation Frame)
12Discontinuous Power Grid (no L)
V
X
Y
- RC-only model shows no such effects
- Current spread out much more uniformly
(Animation Frame)
131 very wide 18 lm clock wire
V overshoot
V
X
5 mm
Y
- Overshoot at far end of wide wire
(Animation Frame)
148 wires 18 lm total width
- Splitting wire into 8 parallel fingers eliminates
overshoot at far end of wire - Capacitance is increased
V
X
Y
No V overshoot
V
X
Y
(Animation Frame)
155 mm grid at 500 MHz (64 pF load)
V
X
Y
- Grid, with drivers at all 4 edges, switches
nicely at 0.5 GHz
(Animation Frame)
c
16At 4 GHz it doesn't work
V
X
Y
- More skew (center of grid 90 out of phase)
(Animation Frame)
17Tree network driving non-uniform loads
V
X
Y
(Animation Frame)
18Tree Tuning
Delay
Delay
(Animation Frame)
194 Trees Driving One Grid
- This "grid-tree" topology less suceptible to
inductance problems. - Better local skew
- This is untuned, still looks O.K., can be tuned
for very small skew
V
X
Y
(Animation Frame)
20Grid driven at 2 edges at 5 GHz
- Drivers not simultaneous
- 40 ps skew applied at drivers
- Strange Waveforms
(Animation Frame)
21ACLV and Power Supply Noise
- Multi-cycle clock distribution amplifies ACLV
effects - 5 ACLV x 4 cycles Skew 20 of cycle
- Vdd not constant across chip
- Similar effect to ACLV, but it can change every
cycle! - If skew varies smoothly across chip, this skew
may not matter - Getting worse, but details unclear
22Grid-Tree with 40 ps Driver Skew
- Skew "smoothed"
- Waveforms determined by interconnects
- (not FET's)
(Animation Frame)
23Power4 Grid-Tree (smoothing skew from process
variations)
900
Delay (ps)
excess delay
Y
X
100
24Using Inductance
- Inductance sharpens transitions
- Inductors also store energy (in B field)
- Capacitors store energy (in E field)
- Resonate!
25Beyond the Hour Glass
- The first clocks were
- large
- inaccurate
- power-hungry
- (lossy)
- On-chip clock distribution still using this method
26Beyond the Hour Glass
- Oscillator!
- Pendulum
- kinetic and potential
- Chip L and C
- (magnetic and electric)
27Spiral Inductor
150 lm
7 lm
28Spiral Inductors
- Steady state animation of spiral inductor
- (LC circuit)
(Animation Frame)
29Distributed/Coupled Spiral Inductors
30Distributed/Coupled Spiral Inductors
- Strongly coupled spiral inductors
- Could reduce Jitter, skew, and power
(Animation Frame)
31Rotary Clock Scheme
- Make a "Moebius-like" differential
transmission-line
J. Wood et. al. ISSCC '01
32Rotary Clock Scheme
- Add restoring circuit, loads, inductance, and
Stir - Node travels around loop, recycling energy
33Rotary Clock Scheme
(Animation Frame)
34Standing Waves
- By permission from Frank O'Mahony
- single SWO
35Standing Waves
- By permission from Frank O'Mahony
- single SWO
(Animation Frame)
36Standing Waves
(Animation Frame)
37Standing Waves
(Animation Frame)
38Summary of Present Art
- On-chip transmission lines support EM waves of
1/2 speed of light for 5mm each - Buffers needed to maintain edge-rate for longer
distances - Buffers also needed to provide gain of 105 from
PLL to mesh - ACLV is annoying, but solutions exist for
constant or slowly varying skew sources (like
temperature)
39Major Concerns
- How do we
- get clock distribution and gain
- Without delay variations from power supply
noise? - Solve this, or hit brick wall
- Power is important Most techniques that reduce
average chip power create even more power supply
noise
40Resonant Techniques?
- Need clock to run at various frequencies
- Some wafer level test capability (low freq. and
variable freq. operation) - Must be tunable for sorting and line tuning (30
range) - Minor issues with start-up, clock gating, sleep
modes, etc..
41Optical Techniques?
- Gain Optical power must be very large to help
with "quiet gain" problem - Cost chip, package, and laser
- Test Wafer test, temporary module test, burn-in
42Final Thought
- Do we really want the same clock signal
everywhere? - We really want each core or unit to vary clock
frequency asynchronously with local Vdd and/or
workload - Allows greatly reduced timing margins Run each
part as fast as it can run, at every moment - asynchronous overhead only between clock islands