3D Interconnect: Architectural Challenges and Opportunities - PowerPoint PPT Presentation

About This Presentation
Title:

3D Interconnect: Architectural Challenges and Opportunities

Description:

Introspective 3D Chips , Proceedings of the Twelfth ... On-chip Latency improved, Bandwidth could improve more. What about real wires? What about apps? ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 35
Provided by: timshe6
Category:

less

Transcript and Presenter's Notes

Title: 3D Interconnect: Architectural Challenges and Opportunities


1
3D Interconnect ArchitecturalChallenges and
Opportunities
Tim Sherwood
UC SANTA BARBARA
2
The Role of Architecture
(Battery Life, Performance, Programmability )
Applications
Runtime System
Architecture
3D Integration
Circuit
Device
Package
(Noise, Thermal, Yield)
3
Lab Overview
4
Lab Overview
5
Potential for Impact from 3D
3D Integrationfor Mixed Signal
3D Bandwidth
3D Specialization
3D Integrationfor Mixed Technology
3D Bandwidth
3D Specialization
3D Integrationfor Latency
6
Potential for Impact from 3D
3D Integrationfor Mixed Signal
3D Integrationfor Mixed Technology
3D Specialization
7
Presented Works
  • Shashidhar Mysore, Banit Agrawal, Sheng-Chih Lin,
    Navin Srivastava, Kaustav Banerjee and Timothy
    Sherwood. Introspective 3D Chips , Proceedings
    of the Twelfth International Conference on
    Architectural Support for Programming Languages
    and Operating Systems (ASPLOS), October 2006. San
    Jose, CA
  • Gian Luca Loi, Banit Agrawal, Navin Srivastava,
    Sheng-Chih Lin, Timothy Sherwood, Kaustav
    Banerjee. A Thermally-Aware Performance Analysis
    of Vertically Integrated (3-D) Processor-Memory
    Hierarchy, Proceedings of the 43nd Design
    Automation Conference (DAC), June 2006. San
    Francisco, CA

8
Two Specific Opportunities
  • 1) 3D Integration for Performance
  • Bring Memory Closer to those that use it
  • More Bandwidth and Lower Latency
  • Tricky System Level Tradeoffs
  • 2 ) 3D Integration for Specialization
  • Integration offers unique specialization
    opportunity
  • Decouple commodity from niche
  • The ramifications of any radical change requires
    a careful evaluation that considers all the
    parameters

9
A Simple Performance Ecosystem
performance
10
Two Specific Opportunities
  • 1) 3D Integration for Performance
  • Bring Memory Closer to those that use it
  • More Bandwidth and Lower Latency
  • Tricky System Level Tradeoffs
  • 2 ) 3D Integration for Specialization
  • Integration offers unique specialization
    opportunity
  • Decouple commodity from niche
  • The ramifications of any radical change requires
    a careful evaluation that considers all the
    parameters

11
Basic Savings in 3D
Area 4 Dist v8 2.8
Area 2 Dist v4 2 1L
Area 1 Dist v2 1.4 3L
BW v8 2.8
BW 4v2 5.6
BW 2v4 4
On-chip Latency improved, Bandwidth could improve
more
What about real wires? What about apps? What
about temp?
12
Example Technology Node
Banerjee et al. IEEE 2001
13
3D Wire Delay
-
11
x
10
Vertical via model
Distributed RC delay
1
.
4
1
.
2
1
Vertical wire length
)

c
e
S
0
.
8

(

y
Ho
rizontal line
a
l
e
model
D
0
.
6
0
.
4
0
.
2
Ho
rizontal wire
length L
0
160
240
320
400
480
560
640
720
800
Wire length L
(
um
)
14
A Typical 2D System Design
Memory Bottleneck
DRAM
DRAM
CPU core
DRAM
L
2
to Main Memory
External Bus
Memory Controller

DRAM
L1 I-Cache
L1 D-Cache
DRAM
DRAM
L
2
Unified Cache
DRAM
Board
15
A 3D Memory System
8 bytes to 128 bytes
200 Mhz to 2 Ghz
Layer
2
L
2
to Main
Memory vertical
L
2
Unified
interlayer Bus
L
1
to L
2
vertical
Cache
interlayer Bus
Layer
3
to
18
L
1
I-Cache
L
1
D-Cache
Stacked three
dimensional
CPU core
main memory
Layer
1
16
System-Level Simulation
  • Simulator Sim-Alpha simulator
  • Processor Alpha-21264 processor
  • Benchmarks mcf, parser, twolf with Minnespec
    reduced inputs

17
Effect of Bus Width and Frequency
mcf
7
8 bytes bus width (2-D)
8 bytes bus width (3-D)
6
16 bytes bus width (3-D)
32 bytes bus width (3-D)
5
64 bytes bus width (3-D)
128 bytes bus width (3-D)
4
Execution time (sec)
Only a few vias required
3
2
1
0
10
100
1000
10000
L2 cache size in KBytes
18
Effect of Clock Frequency mcf
19
Effect of Clock Frequency parser
20
Effect of Clock Frequency twolf
21
An Example Memory System
22
Self-consistent Thermal Modeling
Based on the previous thermal profile calculate
the new power dissipation considering Ion
decrease with temperature ILeakage increase with
temperature
Insert the initials values of leakage and dynamic
power for each layer
Calculate the first thermal profile
No
Yes
Is it convergent?
Calculate the new temperature profile
Finish
23
3D Thermally-awarePerformance Analysis
mcf
3
400
n
o
i
390
t
c
u
2
.
5
r
Min execution time in
2
-
D
t
380
s
n
)
i
Temperature constraint
K
r
(
e
370
e
2
p
r
3
-
D max chip
u
e
t
temperature
m
a
360
i
r
t
e
n
p
1
.
5
o
m
i
350
t
e
u
T
2
-
D max chip
c
e
temperature
x
340
E
1
330
Min execution time in
3
-
D
24
3D Thermally-awarePerformance Analysis
twolf
Maximum frequency allowed due to
1
.
1
temperature
constraint
390
n
o
i
t
c
1
u
380
r
)
t
s
K
0
.
9
Temperature constraint
(
n
i
370
e
r
r
e
u
0
.
8
p
t
a
e
360
r
e
m
3
-
D max chip
0
.
7
p
i
t
temperature
m
n
350
2
-
D max chip
e
o
0
.
6
T
i
temperature
t
u
c
340
0
.
5
e
x
E
Min execution time in
3
-
D
330
0
.
4
Min execution time in
2
-
D
0
.
3
600
1000
1400
1800
2200
2600
3000
Frequency in MHz
25
3D Memory Integration
  • Many Unaccounted For Effects
  • Effect of Multiple Cores and Memory Banks
  • Spatial Variation
  • Temporal Variation (thermal load balancing)
  • All of these are intimately tied to the
    integration methodand packaging
  • How to Manage
  • Architecture and Software will be increasingly
    involved
  • Exposing Variation to higher levels
  • Huge demand for models, sensors, and knobs
  • Thermal, Packaging, Application, Architecture all
    tangled
  • Need to build models that capture all of these
    aspects
  • Models need to be self consistent

26
Two Specific Opportunities
  • 1) 3D Integration for Performance
  • Bring Memory Closer to those that use it
  • More Bandwidth and Lower Latency
  • Tricky System Level Tradeoffs
  • 2 ) 3D Integration for Specialization
  • Integration offers unique specialization
    opportunity
  • Decouple commodity from niche
  • The ramifications of any radical change requires
    a careful evaluation that considers all the
    parameters

27
3D Integration for Introspection
  • Complex interactions across levels of abstraction
    make debugging, optimizing, securing, and
    analysis in general difficult
  • The first requirement visibility
  • Not just data capture, we need the ability to put
    togethera cohesive picture of system
    interactions and correlate between them in a
    sound and non-intrusive manner
  • The hardware/software boundary is uniquely
    situated
  • Piece together from low level events
  • What would the programmer wish list look like?

28
What programmers want
Everything.
  • 32 bit Memory Address
  • 32 bit Memory Value
  • 10 bit Opcodes
  • 2, 5 bit Register Names
  • 2, 32 bit Register Values
  • 10 bits of status

3x 3x 3x 3x 3x 3x
4x 4x 4x 4x 4x 4x
1892 bits per cycle
1 terrabyte/sec _at_ 4Ghz
29
Why programmers cant have it
  • Interconnect is not free
  • Huge cross chip busses
  • OptBuf 285um
  • 20,000 buffers
  • Analysis is not free
  • Significant processing required
  • Extra cost of added heat
  • 15 budget for cooling
  • Used by developers

30
Cake Eating It Too
  • Need a way to provide cheap (or high margin) HW
    to the masses
  • No paying for developer functionality
  • Get developers the powerful analysis they crave
  • See everything at executable rate
  • Provide snap-on functionality for developers
  • Separate chip for analysis engine
  • Only hook it onto developer systems
  • Idea is not limited to development systems
  • Security, Error Correction, Confidentiality,
    Accelerators,
  • 3d Integration offers the potential

31
Thermal Impact
32
Conclusion OpportunitiesChallenges
  • 3D Integration for Performance
  • Bring Memory Closer to those that use it
  • More Bandwidth and Lower Latency
  • Requires few vias for big impact
  • Tricky System Level Tradeoffs
  • 3D Integration for Specialization
  • Integration offers unique specialization
    opportunity
  • Requires rethinking of integration process
  • Decouple commodity from niche
  • Challenges
  • Cross layer models from app to package
  • Cross layer optimization both static and dynamic
  • Thermal Management is everybody's problem

33
  • http//www.cs.ucsb.edu/arch/

NSF CNS 0524771, NSF CCF 0702798, NSF CCF
0448654
34
Related Work
  • Bryan Black, Murali M. Annavaram, Edward
    Brekelbaum, John DeVale, Gabriel H. Loh, Lei
    Jiang, Don McCauley, Pat Morrow, Don Nelson,
    Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan
    Shankar, John Paul Shen, Clair Webb, "Die
    Stacking (3D) Microarchitecture," in IEEE
    International Symposium on Microarchitecture,
    469-479, 2006.
  • PUBLICATIONS on 3D STACKED IC
  • 1. Karthik Balakrishnan, Vidit Nanda,
    Siddharth Easwar, and Sung Kyu Lim, "Wire
    Congestion And Thermal Aware 3D Global
    Placement," IEEE/ACM Asia South Pacific Design
    Automation Conference, p1131-1134, 2005. (pdf)
  • 2. Jacob Minz, Sung Kyu Lim, and Cheng-Kok
    Koh, "3D Module Placement for Congestion and
    Power Noise Reduction," ACM Great Lake Symposium
    on VLSI, p458-461, 2005. (pdf)
  • 3. Jacob Minz, Eric Wong, and Sung Kyu Lim,
    "Reliability-aware Floorplanning for 3D
    Circuits," to appear in IEEE International SOC
    Conference, 2005. (pdf)
  • 4. Kiran Puttaswamy and Gabriel H. Loh,
    "Implementing Caches in a 3D Technology for High
    Performance Processors", IEEE International
    Conference on Computer Design, pp. 525-532, 2005.
    (pdf)
  • 5. Eric Wong and Sung Kyu Lim, "3D
    Floorplanning with Thermal Vias," to appear in
    Design, Automation and Test in Europe, 2006.
  • 6. Kiran Puttaswamy and Gabriel H. Loh,
    "Implementing Register Files for High-Performance
    Microprocessors in a Die-Stacked (3D)
    Technology," IEEE International Symposium on
    VLSI, pp. 384-389, 2006. (pdf)
  • 7. Kiran Puttaswamy and Gabriel H. Loh, "The
    Impact of 3-Dimenstional Integration on the
    Design of Arithmetic Units," IEEE International
    Symposium on Circuits and Systems, pp. 4951-4954,
    2006. (pdf)
  • 8. Kiran Puttaswamy and Gabriel H. Loh,
    "Thermal Analysis of a 3D Die-Stacked
    High-Performance Microprocessor," ACM/IEEE Great
    Lakes Symposium on VLSI, 19-24, 2006. (pdf)
  • 9. Kiran Puttaswamy and Gabriel H. Loh,
    "Dynamic Instruction Schedulers in a
    3-Dimensional Integration Technology," ACM/IEEE
    Great Lakes Symposium on VLSI, 153-158, 2006.
    (pdf)
  • 10. Yuan Xie, Gabriel H. Loh, Bryan Black and
    Kerry Bernstein, "Design Space Exploration for 3D
    Architectures," ACM Journal on Emerging
    Technologies in Computing Systems, vol.2(2), pp.
    65-103, 2006. (pdf)
  • 11. Eric Wong, Jacob Minz, and Sung Kyu Lim,
    "Decoupling Capacitor Planning and Sizing for
    Noise and Leakage Reduction," to appear in IEEE
    International Conference on Computer Aided
    Design, 2006.
  • 12. Bryan Black, Murali M. Annavaram, Edward
    Brekelbaum, John DeVale, Gabriel H. Loh, Lei
    Jiang, Don McCauley, Pat Morrow, Don Nelson,
    Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan
    Shankar, John Paul Shen, Clair Webb, "Die
    Stacking (3D) Microarchitecture," in IEEE
    International Symposium on Microarchitecture,
    469-479, 2006.
  • 13. Kiran Puttaswamy, Gabriel H. Loh, "Thermal
    Herding Microarchitecture Techniques for
    Controlling HotSpots in High-Performance
    3D-Integrated Processors," in IEEE International
    Symposium on High-Performance Computer
    Architecture, 2007.
  • 14. Kiran Puttaswamy, Gabriel H. Loh,
    "Scalability of 3D-Integrated Arithmetic Units in
    High-Performance Microprocessors," to appear in
    ACM Design Automation Conference, 2007.
Write a Comment
User Comments (0)
About PowerShow.com