Title: Enabling Technology for OnChip Interconnection Networks
1Enabling Technology for On-Chip Interconnection
Networks
- William J. DallyComputer Systems
LaboratoryStanford University - NOCS-1
- May 7, 2007
2Outline
- Off-Chip Networks
- Demand for On-Chip Networks
- What is Unique about On-Chip Networks
- Enabling Technologies
- Circuits - set the constraints
- Topology
- Micro-Architecture
- Some Open Problems
3State of Off-Chip Networks
4Technology Trends
BlackWidow
5Some History
Torus Routing Chip 1985
MARS Router 1984
MDP 1991
Network Design Frame 1988
Reliable Router 1994
YARC 2006
MAP 1998
Imagine 2002
6Some very good books
7Summary of Off-Chip Networks
- Topology
- Fit to packaging and signaling technology
- High-radix - Clos or FlatBfly gives lowest cost
- Routing
- Global adaptive routing balances load w/o
destroying locality - Flow control
- Virtual channels/virtual cut-through
oversimplified
8Urgent Demand for OCINs
9The Future is CMPsOCINs are a Critical Component
2006
2007.5
2009
2010.5
2012
2015
2013.5
10Example CMP OCIN
11Growing Complexity of SoCs Demands an On-Chip
Interconnection Network
Avner GorenTIEPF 2004
12So, whats different about on-chip networks?
13Cost, Channels, Workload are Different
- Cost
- Off-chip cost is channels - pins, connectors,
cables, optics - On-chip cost is Si area and Power (storage and
switches), wires plentiful - Drives networks with many long, wide channels,
few buffers - Channel Characteristics
- On-chip RC lines - need a repeater every 1mm (or
less) - Short distance - low latency
- Can put logic in repeaters, motivates low-latency
routers - Workload
- CMP cache traffic
- SoC isochronous flows
- Design issues
- Floorplanning
- Different constraints motivate some surprising
differences in design.
14Enabling Technology is a Prerequisite
Channels, Buffers, Switches
Topology Routing Flow Control
Microarchitecture
15Circuits set Cost Area Constraints for
Architecture
- Can do substantially (10x-100x) better than
default circuits
16Channels
- 10x to 100x power reduction
- Eq signaling for faster propagation and increased
repeater distance (D P Chapter 8, Heaton 01) - Elastic channels provide free buffers (Mizuno
01) - Send 4-8 bits per cycle per wire (assuming 20FO4
cycle)
17Buffers
- Dense arrays (vs. Flip-Flops or Latches)
- 1/10 area/bit
- 1/10 power for low-swing read
- Low-swing write 1/10 power for writes.
- Low-swing read - can keep swing low through muxes.
18Switches
- Low-swing bit lines
- Operate at channel rate
- Reduces area and hence power
- Equalized drive
- Buffered crosspoints
- Integral allocation
19Circuits Impact Architecture
- With standard-cell approach
- Power is approximately evenly split between
channels, buffers, and routers - With efficient circuits
- Channels 1/30, buffers 1/3
- Routers dominate
- Routing gtgt Buffering gtgt Propagating
- Motivates topologies with fewer hops, longer
channels. - Just propagate bits - avoid buffering, really
avoid routing
20Properties of these elements drives optimal
network organization
21On-Chip Interconnection Network
System Processor Tiles
Source Balfour and Dally, ICS 06
22On-Chip Interconnection Network (2)
System Processor Tiles Channels
Source Balfour and Dally, ICS 06
23Interconnection Network (3)
System Processor Tiles Channels Routers
Source Balfour and Dally, ICS 06
24Router Architecture
- Input-queued
- Virtual Channel
- Speculative Pipeline
Source Balfour and Dally, ICS 06
25Router Area
Accurate modeling requires floorplan
Source Balfour and Dally, ICS 06
26Torus
Source Balfour and Dally, ICS 06
27Concentrated Mesh
Source Balfour and Dally, ICS 06
28Express Links
Source Balfour and Dally, ICS 06
29Network Replication
- Abundant wire resources build second
network - Resource allocation tradeoff
-
- Wide
- Serialization Latency
- Router Energy Efficiency
- - Router Area
Replicated Decoupled Resources Area
Efficiency ? Energy Efficiency -
Serialization Latency
SCALABLE
Source Balfour and Dally, ICS 06
30Energy Efficiency
Network Energy Completion Time (normalized to
Torus network)
Source Balfour and Dally, ICS 06
31Large differences in efficiency.Optimal
topology not obvious, not regular and very
sensitive to properties of network elements
32Where is Energy Expended?
Source Balfour and Dally, ICS 06
33On-Chip Flattened Butterfly
Conventional 2D Mesh
2D Flattened Butterfly
Source Kim and Dally, to appear
34On-Chip Flattened Butterfly
dimension 1
Layout Mapping
dimension 2
Source Kim and Dally, to appear
35Bypass Channels
Conventional Flattened Butterfly
Flattened Butterfly with Bypass Channels
connected to local router
Source Kim and Dally, to appear
36Latency Comparison
Source Kim and Dally, to appear
37Power Comparison
Source Kim and Dally, to appear
38Flow Control
- Trade channel bandwidth (cheap) for buffer space
(expensive) - Make buffers shallow
- Compensate for lower duty factor by
overprovisioning channels - Little cost in energy
- Circuit switching (no buffers)
- Elastic buffers - use free buffers in the
channels
39Flow Control in an On-Chip FlatBfly
40View as Two Buffered Links
S
X
D
41Channels Have Repeaters
S
X
D
42Buffers Decouple Channel Allocation in Time
S
X
D
S
X
D
43Circuit Switching
S
X
D
S
X
D
44With Elastic Buffers
S
X
D
S
X
D
45Research Directions
46NSF Workshop Identified 3 Critical Issues
- Power
- OCINs will have 10x the required power with
current approaches - Circuit and architecture innovations can close
this gap - Latency
- OCIN latency currently not competitive with buses
and dedicated wiring - Lower diameter topologies
- Novel flow-control strategies required
- Tool Integration
- OCINs need to be integrated with standard tool
flows to enable widespread use - See http//www.ece.ucdavis.edu/ocin06/
47A Research Agenda
- Develop efficient network elements
- Channels, buffers, switches, allocators
- Opportunities for 10x-100x improvements in
efficiency - Enabling technology
- Capture workloads representative of CMPs and SoCs
- Develop optimal topologies for 1 and 2
- Develop efficient routing and flow-control
methods - Load-balanced routing
- Buffer-efficient flow control
- Develop efficient router microarchitectures
- Single cycle, area efficient
- Prototype to test assumptions
- Iterate
48Some Specific Topics
- Efficient network elements - enabling building
blocks - Low diameter topologies with minimum cost (area
and power) - Flow control that allows packets to pass with
elastic buffers - Low-latency router microarchitecture - with high
radix
49Summary
- OCINs critically important
- Vital component of CMPs, SoCs
- Less mature technology than other components
- Very different than off-chip networks
- Cost, Channels, Workloads, Design Issues
- Efficient network elements are enabling
technology - Energy and area efficient channels, buffers,
switches - Change the equation for network design channels
ltlt buffers ltlt routers - Topology
- Minimize diameter
- Concentrated Mesh with Express Channels
- Flattened Butterfly
- Flow Control
- Minimize buffers at switch points
- Use elastic buffers to minimize latency
- Many research opportunities