Title: Firefly: Illuminating Future Network-on-Chip with Nanophotonics
1Firefly Illuminating Future Network-on-Chip
with Nanophotonics
- Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik,
Yu Zhang, Alok Choudhary
EECS DepartmentNorthwestern University Evanston,
IL, USA panyan,prabhat-kumar,g-memik,yu-zhang,a-
choudhary_at_northwestern.edu
- CS DepartmentKAIST
- Daejeon, Korea
- jjk12_at_cs.kaist.ac.kr
2On-Chip Network Topologies
MeshMIT RAW TILE64 Teraflops
C-MeshBalfour06Cianchetti09
CrossbarVantrease08Kirman06
Others TorusShacham07, Flattened
ButterflyKim07, DragonflyKim08,
Hierarchical(BusMesh)Das08, ClosJoshi09,
RingLarrabee,
- Network-on-chip is critical for performance.
3Signaling technologies
- Electrical signaling
- Repeater insertion needed
- Bandwidth density (up to 8 Gbps/um) Chang
HPCA08 - Nanophotonics
- Bandwidth density 100 Gbps/ µm !!! Batten
HOTI08 - Generally distance independent power consumption
- Speed of light ? low latency
- Propagation
- Switching Cianchetti ISCA09
4Nanophotonic components
resonant detectors
Ge-doped
coupler
waveguide
off-chip laser source
resonant modulators
5Resonant Rings
- Selective
- Couple optical energy of a specific wavelength
6Putting it together
11010101
10001011
64 wavelengths DWDM 3 5µm waveguide
pitch 10Gbps per link
11010101
10001011
- Modulation detection
- 100 Gbps/µm bandwidth density Batten HOTI08
7Whats the catch?
- Power Cost
- Ring heating
- Laser Power
- E/O O/E conversions
- Distance insensitive
- For short links (2.5mm)
- Nanophotonics
- Electrical
- RC lines with repeater insertion
- For long links
- Nanophotonics
- Cost stays the same
- Electrical
- Cost increases
8Here is the idea
- Design an architecture that differentiates
traffic. - Use electrical signaling for short links.
- Use nanophotonics only for long range traffic.
- What do we gain?
- Low latency
- High bandwidth density
- High power efficiency
- Localized arbitration
- Scalability
9Outline
- Motivation
- Architecture of Firefly
- Evaluation
- Conclusion
10Layout View of 64-core Firefly
- Concentration
- 4 cores share a router
- 16 routers
11Layout View of 64-core Firefly
- Concentration
- Clusters
- Electrically connected
- Mesh topology
- 4 routers per cluster
- 4 clusters
Cluster 0(C0)
12Layout View of 64-core Firefly
- Concentration
- Clusters
- Assemblies
- Routers from different clusters
- Optically connected
- Logical crossbars
13Layout View of 64-core Firefly
- Clusters
- Electrical CMESH
- Assemblies
- Nanophotonic crossbars
Efficient nanophotonic crossbars needed!
14Nanophotonic crossbars
- Single-Write-Multiple-Read (SWMR) Kirman06
(CMXbar) - Dedicated sending channel
- Multicast in nature
- Receiver compare discard
- High fan-out ? laser power
Joshi NOCS09
SWMR Crossbar
15Nanophotonic crossbars
- Multiple-Write-Single-Read (MWSR)Vantrease08
(DMXbar) - Dedicated receiving channel
- Demux to channel
- Global arbitration needed!
Joshi NOCS09
MWSR Crossbar
16Reservation-assisted SWMR
- Goal
- Avoid global arbitration
- Reduce power
- Proposed design
- Reservation channels
- Narrow
- Multicast to reserve
- Destination ID
- Packet length
- Uni-cast data packet
R-SWMR Crossbar
17Router Microarchitecture
- Virtual-channel router
- Added optical link ports and extra buffer.
18Routing
(FIREFLY_dest)
- Routing
- Intra-cluster routing
- Traversing optical link
19Firefly another look
- Clusters
- Short electrical links
- Concentrated mesh
- Assemblies
- Long nanophotonic links
- Partitioned crossbars
- Benefits
- Traffic locality
- Reduced hardware
- Localized arbitration
- Distributed inter-cluster bandwidth
20Outline
- Motivation
- Architecture of Firefly
- Evaluation
- Conclusion
21Evaluation Setup
Electrical
Hybrid
Optical
Hybrid
- Cycle-accurate simulator (Booksim)
- Firefly vs. CMESH, Dragonfly and OP_XBAR
- Synthetic traffic patterns and traces
-
Kim et al, ISCA08
22Load / Latency Curve
Bitcomp, 1-cycle
Uniform, 1-cycle
- Throughput
- Up to 4.8x over OP_XBAR
- At least 70 over Dragonfly
23Energy Breakdown
- Reduced hardware by partitioning
- Reduced heating
- Throughput impact
- Locality
- 34 energy reduction over OP_XBAR with locality
Data Path Rings
1 radix-64 crossbar 1024K
8 radix-8 crossbar 128K
24Technology Sensitivity
bitcomp
taper_L0.7D7
- a is heating ratio and ß is laser ratio.
- Firefly favors traffic locality.
25Conclusion
- Technology impacts architecture
- New opportunities in nanophotonics
- Low latency, high bandwidth density
- Tailored architectures needed
- Firefly benefits from nanophotonics by providing
- Power Efficiency
- Hybrid signaling
- Partitioned R-SWMR crossbars ? Reduced
hardware/power - Scalability
- Scalable inter-cluster bandwidth
- Low-radix routers/crossbars