Title: 3D Interconnect: Architectural Challenges and Opportunities
13D Interconnect ArchitecturalChallenges and
Opportunities
Tim Sherwood
UC SANTA BARBARA
2The Role of Architecture
(Battery Life, Performance, Programmability )
Applications
Runtime System
Architecture
3D Integration
Circuit
Device
Package
(Noise, Thermal, Yield)
3Lab Overview
4Lab Overview
5Potential for Impact from 3D
3D Integrationfor Mixed Signal
3D Bandwidth
3D Specialization
3D Integrationfor Mixed Technology
3D Bandwidth
3D Specialization
3D Integrationfor Latency
6Potential for Impact from 3D
3D Integrationfor Mixed Signal
3D Integrationfor Mixed Technology
3D Specialization
7Presented Works
- Shashidhar Mysore, Banit Agrawal, Sheng-Chih Lin,
Navin Srivastava, Kaustav Banerjee and Timothy
Sherwood. Introspective 3D Chips , Proceedings
of the Twelfth International Conference on
Architectural Support for Programming Languages
and Operating Systems (ASPLOS), October 2006. San
Jose, CA - Gian Luca Loi, Banit Agrawal, Navin Srivastava,
Sheng-Chih Lin, Timothy Sherwood, Kaustav
Banerjee. A Thermally-Aware Performance Analysis
of Vertically Integrated (3-D) Processor-Memory
Hierarchy, Proceedings of the 43nd Design
Automation Conference (DAC), June 2006. San
Francisco, CA
8Two Specific Opportunities
- 1) 3D Integration for Performance
- Bring Memory Closer to those that use it
- More Bandwidth and Lower Latency
- Tricky System Level Tradeoffs
- 2 ) 3D Integration for Specialization
- Integration offers unique specialization
opportunity - Decouple commodity from niche
- The ramifications of any radical change requires
a careful evaluation that considers all the
parameters
9A Simple Performance Ecosystem
performance
10Two Specific Opportunities
- 1) 3D Integration for Performance
- Bring Memory Closer to those that use it
- More Bandwidth and Lower Latency
- Tricky System Level Tradeoffs
- 2 ) 3D Integration for Specialization
- Integration offers unique specialization
opportunity - Decouple commodity from niche
- The ramifications of any radical change requires
a careful evaluation that considers all the
parameters
11Basic Savings in 3D
Area 4 Dist v8 2.8
Area 2 Dist v4 2 1L
Area 1 Dist v2 1.4 3L
BW v8 2.8
BW 4v2 5.6
BW 2v4 4
On-chip Latency improved, Bandwidth could improve
more
What about real wires? What about apps? What
about temp?
12Example Technology Node
Banerjee et al. IEEE 2001
133D Wire Delay
-
11
x
10
Vertical via model
Distributed RC delay
1
.
4
1
.
2
1
Vertical wire length
)
c
e
S
0
.
8
(
y
Ho
rizontal line
a
l
e
model
D
0
.
6
0
.
4
0
.
2
Ho
rizontal wire
length L
0
160
240
320
400
480
560
640
720
800
Wire length L
(
um
)
14A Typical 2D System Design
Memory Bottleneck
DRAM
DRAM
CPU core
DRAM
L
2
to Main Memory
External Bus
Memory Controller
DRAM
L1 I-Cache
L1 D-Cache
DRAM
DRAM
L
2
Unified Cache
DRAM
Board
15A 3D Memory System
8 bytes to 128 bytes
200 Mhz to 2 Ghz
Layer
2
L
2
to Main
Memory vertical
L
2
Unified
interlayer Bus
L
1
to L
2
vertical
Cache
interlayer Bus
Layer
3
to
18
L
1
I-Cache
L
1
D-Cache
Stacked three
dimensional
CPU core
main memory
Layer
1
16System-Level Simulation
- Simulator Sim-Alpha simulator
- Processor Alpha-21264 processor
- Benchmarks mcf, parser, twolf with Minnespec
reduced inputs
17Effect of Bus Width and Frequency
mcf
7
8 bytes bus width (2-D)
8 bytes bus width (3-D)
6
16 bytes bus width (3-D)
32 bytes bus width (3-D)
5
64 bytes bus width (3-D)
128 bytes bus width (3-D)
4
Execution time (sec)
Only a few vias required
3
2
1
0
10
100
1000
10000
L2 cache size in KBytes
18Effect of Clock Frequency mcf
19Effect of Clock Frequency parser
20Effect of Clock Frequency twolf
21An Example Memory System
22Self-consistent Thermal Modeling
Based on the previous thermal profile calculate
the new power dissipation considering Ion
decrease with temperature ILeakage increase with
temperature
Insert the initials values of leakage and dynamic
power for each layer
Calculate the first thermal profile
No
Yes
Is it convergent?
Calculate the new temperature profile
Finish
233D Thermally-awarePerformance Analysis
mcf
3
400
n
o
i
390
t
c
u
2
.
5
r
Min execution time in
2
-
D
t
380
s
n
)
i
Temperature constraint
K
r
(
e
370
e
2
p
r
3
-
D max chip
u
e
t
temperature
m
a
360
i
r
t
e
n
p
1
.
5
o
m
i
350
t
e
u
T
2
-
D max chip
c
e
temperature
x
340
E
1
330
Min execution time in
3
-
D
243D Thermally-awarePerformance Analysis
twolf
Maximum frequency allowed due to
1
.
1
temperature
constraint
390
n
o
i
t
c
1
u
380
r
)
t
s
K
0
.
9
Temperature constraint
(
n
i
370
e
r
r
e
u
0
.
8
p
t
a
e
360
r
e
m
3
-
D max chip
0
.
7
p
i
t
temperature
m
n
350
2
-
D max chip
e
o
0
.
6
T
i
temperature
t
u
c
340
0
.
5
e
x
E
Min execution time in
3
-
D
330
0
.
4
Min execution time in
2
-
D
0
.
3
600
1000
1400
1800
2200
2600
3000
Frequency in MHz
253D Memory Integration
- Many Unaccounted For Effects
- Effect of Multiple Cores and Memory Banks
- Spatial Variation
- Temporal Variation (thermal load balancing)
- All of these are intimately tied to the
integration methodand packaging - How to Manage
- Architecture and Software will be increasingly
involved - Exposing Variation to higher levels
- Huge demand for models, sensors, and knobs
- Thermal, Packaging, Application, Architecture all
tangled - Need to build models that capture all of these
aspects - Models need to be self consistent
26Two Specific Opportunities
- 1) 3D Integration for Performance
- Bring Memory Closer to those that use it
- More Bandwidth and Lower Latency
- Tricky System Level Tradeoffs
- 2 ) 3D Integration for Specialization
- Integration offers unique specialization
opportunity - Decouple commodity from niche
- The ramifications of any radical change requires
a careful evaluation that considers all the
parameters
273D Integration for Introspection
- Complex interactions across levels of abstraction
make debugging, optimizing, securing, and
analysis in general difficult - The first requirement visibility
- Not just data capture, we need the ability to put
togethera cohesive picture of system
interactions and correlate between them in a
sound and non-intrusive manner - The hardware/software boundary is uniquely
situated - Piece together from low level events
- What would the programmer wish list look like?
28What programmers want
Everything.
- 32 bit Memory Address
- 32 bit Memory Value
- 10 bit Opcodes
- 2, 5 bit Register Names
- 2, 32 bit Register Values
- 10 bits of status
3x 3x 3x 3x 3x 3x
4x 4x 4x 4x 4x 4x
1892 bits per cycle
1 terrabyte/sec _at_ 4Ghz
29Why programmers cant have it
- Interconnect is not free
- Huge cross chip busses
- OptBuf 285um
- 20,000 buffers
- Analysis is not free
- Significant processing required
- Extra cost of added heat
- 15 budget for cooling
- Used by developers
30Cake Eating It Too
- Need a way to provide cheap (or high margin) HW
to the masses - No paying for developer functionality
- Get developers the powerful analysis they crave
- See everything at executable rate
- Provide snap-on functionality for developers
- Separate chip for analysis engine
- Only hook it onto developer systems
- Idea is not limited to development systems
- Security, Error Correction, Confidentiality,
Accelerators, - 3d Integration offers the potential
31Thermal Impact
32Conclusion OpportunitiesChallenges
- 3D Integration for Performance
- Bring Memory Closer to those that use it
- More Bandwidth and Lower Latency
- Requires few vias for big impact
- Tricky System Level Tradeoffs
- 3D Integration for Specialization
- Integration offers unique specialization
opportunity - Requires rethinking of integration process
- Decouple commodity from niche
- Challenges
- Cross layer models from app to package
- Cross layer optimization both static and dynamic
- Thermal Management is everybody's problem
33- http//www.cs.ucsb.edu/arch/
NSF CNS 0524771, NSF CCF 0702798, NSF CCF
0448654
34Related Work
- Bryan Black, Murali M. Annavaram, Edward
Brekelbaum, John DeVale, Gabriel H. Loh, Lei
Jiang, Don McCauley, Pat Morrow, Don Nelson,
Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan
Shankar, John Paul Shen, Clair Webb, "Die
Stacking (3D) Microarchitecture," in IEEE
International Symposium on Microarchitecture,
469-479, 2006. - PUBLICATIONS on 3D STACKED IC
- 1. Karthik Balakrishnan, Vidit Nanda,
Siddharth Easwar, and Sung Kyu Lim, "Wire
Congestion And Thermal Aware 3D Global
Placement," IEEE/ACM Asia South Pacific Design
Automation Conference, p1131-1134, 2005. (pdf) - 2. Jacob Minz, Sung Kyu Lim, and Cheng-Kok
Koh, "3D Module Placement for Congestion and
Power Noise Reduction," ACM Great Lake Symposium
on VLSI, p458-461, 2005. (pdf) - 3. Jacob Minz, Eric Wong, and Sung Kyu Lim,
"Reliability-aware Floorplanning for 3D
Circuits," to appear in IEEE International SOC
Conference, 2005. (pdf) - 4. Kiran Puttaswamy and Gabriel H. Loh,
"Implementing Caches in a 3D Technology for High
Performance Processors", IEEE International
Conference on Computer Design, pp. 525-532, 2005.
(pdf) - 5. Eric Wong and Sung Kyu Lim, "3D
Floorplanning with Thermal Vias," to appear in
Design, Automation and Test in Europe, 2006. - 6. Kiran Puttaswamy and Gabriel H. Loh,
"Implementing Register Files for High-Performance
Microprocessors in a Die-Stacked (3D)
Technology," IEEE International Symposium on
VLSI, pp. 384-389, 2006. (pdf) - 7. Kiran Puttaswamy and Gabriel H. Loh, "The
Impact of 3-Dimenstional Integration on the
Design of Arithmetic Units," IEEE International
Symposium on Circuits and Systems, pp. 4951-4954,
2006. (pdf) - 8. Kiran Puttaswamy and Gabriel H. Loh,
"Thermal Analysis of a 3D Die-Stacked
High-Performance Microprocessor," ACM/IEEE Great
Lakes Symposium on VLSI, 19-24, 2006. (pdf) - 9. Kiran Puttaswamy and Gabriel H. Loh,
"Dynamic Instruction Schedulers in a
3-Dimensional Integration Technology," ACM/IEEE
Great Lakes Symposium on VLSI, 153-158, 2006.
(pdf) - 10. Yuan Xie, Gabriel H. Loh, Bryan Black and
Kerry Bernstein, "Design Space Exploration for 3D
Architectures," ACM Journal on Emerging
Technologies in Computing Systems, vol.2(2), pp.
65-103, 2006. (pdf) - 11. Eric Wong, Jacob Minz, and Sung Kyu Lim,
"Decoupling Capacitor Planning and Sizing for
Noise and Leakage Reduction," to appear in IEEE
International Conference on Computer Aided
Design, 2006. - 12. Bryan Black, Murali M. Annavaram, Edward
Brekelbaum, John DeVale, Gabriel H. Loh, Lei
Jiang, Don McCauley, Pat Morrow, Don Nelson,
Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan
Shankar, John Paul Shen, Clair Webb, "Die
Stacking (3D) Microarchitecture," in IEEE
International Symposium on Microarchitecture,
469-479, 2006. - 13. Kiran Puttaswamy, Gabriel H. Loh, "Thermal
Herding Microarchitecture Techniques for
Controlling HotSpots in High-Performance
3D-Integrated Processors," in IEEE International
Symposium on High-Performance Computer
Architecture, 2007. - 14. Kiran Puttaswamy, Gabriel H. Loh,
"Scalability of 3D-Integrated Arithmetic Units in
High-Performance Microprocessors," to appear in
ACM Design Automation Conference, 2007.