Title: Prediction of High-Performance On-Chip Global Interconnection
1Prediction of High-Performance On-Chip Global
Interconnection
- Yulei Zhang1, Xiang Hu1, Alina Deutsch2, A. Ege
Engin3 - James F. Buckwalter1, and Chung-Kuan Cheng1
- 1Dept. of ECE, UC San Diego, La Jolla, CA
- 2IBM T. J. Watson Research Center, Yorktown
Heights, NY - 3Dept. of ECE, San Diego State Univ., San Diego,
CA
2Outline
- Introduction
- Technology trend
- Current approaches
- On-Chip Global Interconnection
- Overview structures, tradeoffs
- Interconnect schemes
- Global wire modeling
- Performance analysis
- Design Methodologies for T-line schemes
- Prediction of Performance Metrics
- Experimental settings
- Performance metrics comparison and scaling trend
- Latency
- Energy per bit
- Throughput
- Signal Integrity
- Conclusion
3Introduction Performance Impact
- Interconnect delay determines the system
performance ITRS08 - 542ps for 1mm minimum pitch Cu global wire w/o
repeater _at_ 45nm - 150ps for 10 level FO4 delay _at_ 45nm
Ho2001 Future of Wire
4Introduction Power Dissipation
- Interconnects consume a significant portion of
power - 1-2 order larger in magnitude compared with gates
- Half of the dynamic power dissipated on repeaters
to minimize latency Zhang07 - Wires consume 50 of total dynamic power for a
0.13um microprocessor Magen04 - About 1/3 burned on the global wires.
5Introduction Different Approaches and Our
Contributions
- Different Approaches
- Repeater Insertion Approach
- Pros High throughput density.
- Cons Overhead in terms of power consumption and
wiring complexity. - T-line Approach Zhang09
- Pros Low latency.
- Cons low throughput density due to low bandwidth
and large wire dimension - Equalized T-line Approach Zhang08
- Pros Low power, Low noise, Higher throughput
than single-ended. - Cons The area overhead brought by passive
components. - We explore different global interconnection
structures and compare their performance metrics
across multiple technology nodes. - Contributions
- A simple linear model
- A general design framework
- A complete prediction and comparison
6Organization of On-Chip Global Interconnections
7Multi-Dimensional Design Consideration
- Preliminary analysis results assuming 65nm CMOS
process. - Application-oriented choice
- Low Latency
- T-TL or UT-TL -gt Single-Ended T-lines
- High Throughput
- R-RC
- Low Power
- PE-TL or UE-TL
- Low Noise
- PE-TL or UE-TL
- Low Area/Cost
- R-RC
Differential T-lines
For each architecture, the more area the pentagon
covers, the better overall performance is
achieved.
8On-Chip Global Interconnect Schemes (1)
- R-RC structure
- Repeater size/Length of segments
- Adopt previous design methodology Zhang07
- UT-TL structure
- Full swing at wire-end
- Tapered inverter chain as TX
- T-TL structure
- Optimize eye-height at wire-end
- Non-Tapered inverter chain as TX
Repeated RC wires (R-RC)
Un-Terminated and Terminated T-Line (UT-TL and
T-TL)
9On-Chip Global Interconnect Schemes (2)
Un-Equalized and Passive-Equalized T-Line (UE-TL
and PE-TL)
- Driver side Tapered differential driver
- Receiver side Termination resistance,
Sense-Amplifier (SA) inverter chain - Passive equalizer parallel RC network
- Design Constraint enough eye-opening (50mV)
needed at the wire-end
10Global Wire Modeling Single-Ended
Differential On-Chip T-lines
- Orthogonal layers replaced by ground planes -gt 2D
cap extraction, accurate when loading density is
high. - Top-layer thick wires used -gt dimension maintains
as technology scales. - LC-mode behavior dominant
Determine the bit rate
- Smallest wire dimensions that satisfy eye
constraint - Notice PE-TL needs narrower wire -gt Equalization
helps to increase density.
11Global Wire Modeling RC wires and T-lines
- RC wire modeling
- T-line 2D-R(f)L(f)C parameter extraction
- T-line Modeling
- R(f)L(f)C Tabular model -gt Transient simulation
to estimate eye-height. - Synthesized compact circuit model Kopcsay02 -gt
Study signal integrity issue.
- Distributed ? model composed of wire resistance
and capacitance - Closed-form equations Sim03 to calculate 2D
wire capacitance
2D-C Extraction Template
2D-R(f)L(f) Extraction Template
12Performance Analysis Definitions
- Normalized delay (unit ps/mm)
- Propagation delay includes wire delay and gate
delay. - Normalized energy per bit (unit pJ/m)
- Bit rate is assumed to be the inverse of
propagation delay for RC wires - Normalized throughput (unit Gbps/um)
13Performance Analysis Latency
- Variables technology-defined parameters
- Supply voltage Vdd (unit V)
- Dielectric constant
- Min-sized inverter FO4 delay (unit ps)
- R-RC structure (min-d)
- is roughly constant
- FO4 delay scales w/ scaling factor S
- T-line structures
- Sum of wire delay and TX delay
- Wire delay
- TX delay improved w/ FO4 delay
Decreasing w/ technology scaling!
Increasing w/ technology scaling!
14Performance Analysis Energy per Bit
- Same variables defined before
Constant !
- R-RC structure (min-d)
- Vdd reduces as technology scales
- reduces as technology scales
- T-line structures
- Sum of power consumed on wire and TX.
- Power of T-line
- Power of TX circuit
- FO4 delay reduces exponentially
Energy decreases w/ technology scaling!
Energy decreases w/ larger slope!!
15Performance Analysis Throughput
- Same variables defined before
- R-RC structure (min-d)
- Assuming wire pitch
- FO4 delay reduces exponentially
- T-line structures
- TX bandwidth
- Neglect the minor change of wire pitch
- K1 0, for UT-TL
- FO4 delay reduces exponentially
Throughput increases by 20 per generation!
Throughput increases by 43 per generation !!
16Design Framework for On-Chip T-line Schemes
- Proposed framework can be applied to design
UT-TL/T-TL/UE-TL/PE-TL by changing wire
configuration and circuit structure. - Different optimization routines (LP/ILP/SQP, etc)
can be adopted according to the problem
formulation.
17Experimental Settings
- Design objective min-d
- Technology nodes 90nm-22nm
- Five different global interconnection structures
- Wire length 5mm
- Parameter extraction
- 2D field solver CZ2D from EIP tool suite of IBM
- Tabular model or synthesized model
- Transistor models
- Predictive transistor model from Uemura06
- Synopsys level 3 MOSFET model tuned according to
ITRS roadmap - Simulation
- HSPICE 2005
- Modeling and Optimization
- Linear or non-linear regression/SQP routine
- MATLAB 2007
18Performance Metric Normalized Delay Results
and Comparison
- Technology trends
- R-RC ?
- T-line schemes ?
- T-line structures
- Outperform R-RC beyond 90nm
- Single-ended lowest delay
- At 22nm node
- R-RC 55ps/mm
- T-lines 8ps/mm (85 reduction)
- Speed of light 5ps/mm
- Linear model
- lt 6 average percent error
19Performance Metric Normalized Energy per Bit
Results and Comparison
- Technology trends
- R-RC and T-lines ?
- T-lines reduce more quickly
- T-line structures
- Outperform R-RC beyond 45nm
- Differential lowest energy.
- Single-ended similar to R-RC.
- T-TL gt UT-TL
- At 22nm node
- R-RC 100pJ/m
- Single-ended 60 reduction
- Differential 96 reduction
- Linear model
- lt 12 average percent error
- Error for T-TL and PE-TL
- RL and passive equalizers.
20Performance Metric Normalized Throughput
Results and Comparison
- Technology trends
- R-RC and T-lines ?
- T-lines increase more quickly
- T-line structures
- Outperform R-RC beyond 32nm
- Differential better than single-ended
- At 22nm node
- R-RC 12Gbps/um
- T-TL 30 improvement
- UE-TL 75 improvement
- PE-TL 2X of R-RC
- Linear model
- lt 7 average percent error
21Signal Integrity single-ended T-lines
Worst-case switching pattern for peak noise
simulation
Using w.c. pattern
Using single or multiple PRBS patterns
- UT-TL structure
- 380mV peak noise at 1V supply voltage w/ 7ps rise
time - SI could be a big issue as supply voltage drops
- T-TL less sensitive to noise
- At the same rise time, 50 reduction of peak
noise - Peak noise ? as technology scales
22Signal Integrity differential T-lines
Worst-case switching pattern for peak noise
simulation
- More reliable
- Termination resistance
- Common-mode noise reduction
- Peak noise
- Within 10mV range
- Eye-Heights
- UE-TL
- Eye reduces as bit rate ?
- Harder to meet constraint.
- PE-TL
- gt 70mV eye even at 22nm node
- Equalization does help!
23Conclusion
- Compare five different global interconnections in
terms of latency, energy per bit, throughput and
signal integrity from 90nm to 22nm. - A simple linear model provided to link
- Architecture-level performance metrics
- Technology-defined parameters
- Some observations from experimental results
- T-line structures have potential to replace R-RC
at future node - Differential T-lines are better than single-ended
- Low-power/High-throughput/Low-noise
- Equalization could be utilized for on-chip global
interconnection - Higher throughput density, improve signal
integrity - Even w/ lower energy dissipation (passive
equalizations)
24 25 26Introduction Technology Trend
- On-Chip Interconnect Scaling
- Dimension shrinks
- Wire resistance increases -gt RC delay
- Increasing capacitive coupling -gt delay, power,
noise, etc. - Performance of global wires decreases w/
technology scaling.
Wire Category Wire Category Technology Node Technology Node Technology Node
Wire Category Wire Category 90nm 45nm 22nm
M1 Wire Rw(kohm/mm) 1.914 8.860 34.827
M1 Wire Cw(pF/mm) 0.183 0.157 0.129
Global Wire Rw(kohm/mm) 0.532 2.970 11.000
Global Wire Cw(pF/mm) 0.205 0.179 0.151
Scaling trend of PUL wire resistance and
capacitance
Copper resistivity versus wire width
27Design methodology single-ended T-lines
2D frequency-dependent tabular Model
Inverter size, number of stages, Rload (if any)
Single-ended Inverter chains
SPICE simulation
SPICE simulation to evaluate. Optimization
Routine 1. Optimal cycle time 2. Sweep for
optimal inverter chain
SPICE simulation to check in-plane crosstalk, etc
28Design methodology differential T-lines
2D frequency-dependent Tabular Model
Wire width Driver impedance RC equalizer (if
any) Termination resistance.
Differential lines SA-based TX
Closed-form equation-based model
Evaluation based on models. Optimization
Routine 1. Binary search for wire width 2. SQP
for other var. optimization
SPICE simulation to check in-plane crosstalk, etc
29Effects of driver impedance and termination
resistance
- Lowering driver impedance improves eye
- Eye reduces as frequency goes up
- Optimal termination resistance.
30Effects of driver impedance and termination
resistance on step response
- Larger driver impedance leads to slower rise edge
and lower saturation voltage - Larger termination resistance causes sharper rise
edge but with larger reflection
31Crosstalk effects
- Three different PRBS input patterns, min-ddp
solutions - T-line Scheme A Delay increased by 9.6, Power
increased by 37 - T-line Scheme B Delay increased by 2, Power
increased by 25.7
32Transceiver Design
- Sense amplifier (SA)
- Double-tail latch-type Schinkel 07
- Optimize sizing to minimize SA delay
- Inverter chain
- Number of stage
- Fixed to 6
- Sizing of each inverter
- RS output resistance of inverter chain
- Sweep the 1st inverter size to minimize the total
transceiver delay for given Veye, RS
Double-tail latch-type voltage sense amp.
_at_45nm tech node M1/M3 45nm/45nm M2/M4
250nm/45nm M5/M6 180nm/45nm M7/M8
280nm/45nm M9 495nm/45nm M10/M11
200nm/45nm M12 1.58um/45nm
33Transceiver Modeling
- Driver side
- Voltage source Vs with output resistance Rs
- Vs full-swing pulse signal with rise time
Tr0.1Tc - Rs output resistance of the last inverter in the
chain. - Receiver side
- Extract look-up table for TX delay and power
- Fit the table using non-linear closed form
formula - The relative error is within 2 for fitting models
Histogram of fitting errors at 45nm node
Transceiver delay map at 45nm node
Transceiver power map at 45nm node
34Bit-rate 50Gbps Rs11.06ohm, Rd350ohm,
Cd0.38pF, RL107.69ohm
35Conclusion (cont)
Low-Latency Application (ps/mm)
Low-Energy Application (pJ/m)
Tech Node
Tech Node
90nm 65nm 45nm 32nm 22nm
R-RC 3/35 1/42 1/46 1/55 1/55
UT-TL 5/15 5/13 5/10 5/9 5/8
T-TL 5/15 5/13 5/10 5/9 5/8
UE-TL 1/37 3/25 3/16 3/12 5/8
PE-TL 1/37 3/25 3/16 3/12 5/8
90nm 65nm 45nm 32nm 22nm
R-RC 2/150 2/140 1/130 1/100 1/100
UT-TL 3/140 3/110 3/70 3/50 2/40
T-TL 1/260 1/200 2/100 2/60 3/40
UE-TL 4/60 4/36 4/20 4/10 5/4
PE-TL 5/26 5/16 5/8 5/5 5/2
Schemes
Schemes
High-Throughput Application (Gbps/um)
Low-Noise Application
Tech Node
90nm 65nm 45nm 32nm 22nm
R-RC 1 1 1 1 1
UT-TL 1 1 1 1 1
T-TL 3 3 3 3 3
UE-TL 5 5 4 4 4
PE-TL 4 4 5 5 5
Tech Node
90nm 65nm 45nm 32nm 22nm
R-RC 5/5 5/6 3/8 3/10 2/12
UT-TL 2/3.3 1/3.3 1/3.3 1/3.3 1/3.3
T-TL 1/3 2/3.4 2/6 2/9 3/16
UE-TL 3/3 3/5 4/9 4/13 4/21
PE-TL 4/4 4/5.3 5/9 5/15 5/24
Schemes
Schemes
Item in the table score/value. Score the
higher, the better in terms of given metric, max.
score is 5. The best structure in each column
marked using red color.
36Future Works
- Explore novel global signaling schemes for high
throughput and low energy dissipation. - Design, optimize gt 50Gbps on-chip interconnection
schemes - Architecture-level study to identify trade-offs
- Wire configuration
- Dimension optimization, ground plane, etc.
- Un-interrupted architectures
- Equalization implementation, TX/RX choice
- Distributed architectures
- Active or Passive compensation (RC equalizers,
other networks, etc) - Novel high-speed transceiver circuitry design
- Develop analysis and optimization capability to
aid co-design and co-optimization of wire and
transceiver circuit - Fabrication to verify analysis and demonstrate
feasibility
37Related Publications
Repeated RC Wire
- L. Zhang, H. Chen, B. Yao, K. Hamilton, and C.K.
Cheng, Repeated on-chip interconnect analysis
and evaluation of delay, power and bandwidth
metrics under different design goals, IEEE
International Symposium on Quality Electronic
Design, 2007, pp.251-256. - Y. Zhang, L. Zhang, A. Deutsch, G. A. Katopis, D.
M. Dreps, J. F. Buckwalter, E. S. Kuh and C.K.
Cheng, Design Methodology of High Performance
On-Chip Global Interconnect Using Terminated
Transmission-Line, IEEE International Symposium
on Quality Electronic Design, 2009, pp.451-458. - Y. Zhang, L. Zhang, A. Tsuchiya, M. Hashimoto,
and C.K. Cheng, On-chip high performance
signaling using passive compensation, IEEE
International Conference on Computer Design,
2008, pp. 182-187. - Y. Zhang, L. Zhang, A. Deutsch, G. A. Katopis, D.
M. Dreps, J. F. Buckwalter, E. S. Kuh, and C. K.
Cheng, On-chip bus signaling using passive
compensation, IEEE Electrical Performance of
Electronic Packaging, 2008, pp. 33-36. - L. Zhang, Y. Zhang, A. Tsuchiya, M. Hashimoto, E.
Kuh, and C.K. Cheng, High performance on-chip
differential signaling using passive compensation
for global communication, Asia and South
Pacific Design Automation Conference, 2009, pp.
385-390. - Y. Zhang, X. Hu, A. Deutsch, A. E. Engin, J. F.
Buckwalter, and C. K. Cheng, Prediction of
High-Performance On-Chip Global Interconnection,
ACM workshop on System Level Interconnection
Prediction, 2009
Un-Terminated/Terminated T-Line
Passive-Equalized T-Line
Overview and Comparison