High Performance Routing - PowerPoint PPT Presentation

About This Presentation
Title:

High Performance Routing

Description:

Basic Architectural Components. Datapath: per-packet processing. 2. ... Typically 5Tb/s aggregate capacity. 20. Myths about CIOQ-based crossbar switches ' ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 44
Provided by: nic8160
Category:

less

Transcript and Presenter's Notes

Title: High Performance Routing


1
High Performance Routing
Nick McKeown Assistant Professor of Electrical
Engineering and Computer Science, Stanford
University Abrizio/PMC-Sierra Inc.
nickm_at_stanford.edu
http//www.stanford.edu/nickm
2
Outline
  • Outline
  • Review What is a Router?
  • The Evolution of Routers
  • Single-stage switchingThe Fork-Join Router

3
Outline
  • Switching is the bottleneck in a router.
  • The trend has been to overcome limitations in
    memory bandwidth
  • Shared memory -gt Single-stage, crossbar-based,
    combined input and output queued (CIOQ).
  • and reduce power per-rack per-system
  • Single box systems -gt Multi-rack systems (LCS).

4
Outline (2)
  • What comes next?
  • Multistage switches solve the wrong problem
  • N2 is not the problem.
  • Multistage switches are more blocking, more
    power-hungry and less predictable.
  • Parallel single-stage switches (e.g. the
    Fork-Join Router) are non-blocking, use less
    power, can achieve as high capacity, and can be
    predictable.

5
Outline
  • Outline
  • Review What is a Router?
  • The Evolution of Routers
  • Single-stage switchingThe Fork-Join Router

6
Basic Architectural Components
Routing Protocols
Routing Table
Control Plane
Datapath per-packet processing
Switching
Forwarding Table
7
Basic Architectural ComponentsDatapath
per-packet processing
2. Interconnect
1. Ingress
3. Egress
Forwarding Table
Classifier Table
Policing Access Control
Forwarding Decision
Forwarding Table
Classifier Table
Policing Access Control
Forwarding Decision
Forwarding Table
Classifier Table
Policing Access Control
Forwarding Decision
8
Outline
  • Outline
  • Review What is a Router?
  • The Evolution of Routers
  • Single-stage switchingThe Fork-Join Router

9
First Generation Routers
Fixed length DMA blocks or cells. Reassembled
on egress linecard
Shared Backplane
Line Interface
Fixed length cells or variable length packets
Typically lt0.5Gb/s aggregate capacity
10
First Generation RoutersQueueing Structure
Shared Memory
  • Numerous work has proven and made possible
  • Fairness
  • Delay Guarantees
  • Delay Variation Control
  • Loss Guarantees
  • Statistical Guarantees

Input 1
Output 1
Output 2
Input 2
Large, single dynamically allocated memory
buffer N writes per cell time N reads per
cell time. Limited by memory bandwidth.
Input N
Output N
11
Second Generation Routers
CPU
Buffer Memory
Route Table
Line Card
Line Card
Line Card
Drop Policy Or Backpressure
Drop Policy
Buffer Memory
Buffer Memory
Buffer Memory
Fwding Cache
Fwding Cache
Output Link Scheduling
MAC
MAC
MAC
Typically lt5Gb/s aggregate capacity
12
Second Generation RoutersAs caching became
ineffective
Exception Processor
CPU
Route Table
Line Card
Line Card
Line Card
Buffer Memory
Buffer Memory
Buffer Memory
Fwding Table
Fwding Table
MAC
MAC
MAC
13
Second Generation RoutersQueueing Structure
Combined Input and Output Queueing
Bus
14
Third Generation Routers
Switched Backplane
Line Card
CPU Card
Line Card
Local Buffer Memory
Local Buffer Memory
Line Interface
CPU
Routing Table
Memory
Fwding Table
MAC
MAC
Typically lt50Gb/s aggregate capacity
15
Third Generation RoutersQueueing Structure
Switch
16
Third Generation Routers
  • Size-constrained 19 or 23 wide.
  • Power-constrained lt6kW.
  • QoS unfriendly input congestion.

7
Supply 100A/200A maximum at 48V
19 or 23
17
Fourth Generation Routers/Switches
Optical links
Switch Core
Linecards
18
Fourth Generation Routers/SwitchesThe LCS
Protocol
  • What is LCS?
  • Credit-based flow control enables separation.
  • Label-based multicast enables scaling.
  • Its Benefits
  • Large Number of Ports.Separation enables large
    number of ports in multiple racks.
  • Minimizes Switch Core Complexity and
    Power.Switch core can be bufferless and
    lossless. QoS, discard etc. performed on linecard.

19
Fourth Generation Routers/SwitchesQueueing
Structure
Virtual Output Queues
1 read per cell time
1 write per cell time
Lookup Drop Policy
Output Scheduling
Switch Fabric
Lookup Drop Policy
Output Scheduling
Switch Arbitration
Lookup Drop Policy
Output Scheduling
Switch Core (Bufferless)
Linecard
Linecard
Typically lt5Tb/s aggregate capacity
20
Myths about CIOQ-based crossbar switches
  • Input-queued crossbars have low throughput
  • An input-queued crossbar can have as high
    throughput as any switch.
  • Crossbars dont support multicast traffic well
  • A crossbar inherently supports multicast
    efficiently.
  • Crossbars dont scale well
  • Today, it is the number of chip I/Os, not the
    number of crosspoints, that limits the size of a
    switch fabric. Expect 5Tb/s crossbar switches.

21
Myths about CIOQ-based crossbar switches (2)
  • 4. Crossbar switches cant support delay/QoS
    guarantees
  • With an internal speedup of 2, a CIOQ switch can
    precisely emulate a shared memory switch for all
    traffic.

22
What makes sense today?
23
What makes sense tomorrow?
  • Single-stage (if possible)
  • Reduces complexity
  • Minimizes interconnect b/w
  • Minimizes power

24
Outline
  • Outline
  • Review What is a Router?
  • The Evolution of Routers
  • Single-stage switchingThe Fork-Join Router

25
Buffer MemoryHow Fast Can I Make a Packet Buffer?
5ns SRAM
Buffer Memory
64-byte wide bus
64-byte wide bus
  • Rough Estimate
  • 5ns per memory operation.
  • Two memory operations per packet.
  • Therefore, maximum 51.2Gb/s.
  • In practice, closer to 40Gb/s.

26
Buffer MemoryIs It Going to Get Better?
Specmarks, Memory size, Gate density
time
27
Fork-Join RouterSponsored by NSF and ITRI
  • How can we
  • Increase capacity.
  • Reduce power per subsystem.
  • While at the same time
  • Keep the system simple.
  • Support line rates faster than memory bandwidth.
  • Support guaranteed services.

Increase parallelism.
Multiple racks.
Single-stage buffering.
Pkt-by-pkt load balancing.
Hmmm.?
28
The Fork-Join Router
Router
1
rate, R
rate, R
1
1
2
rate, R
rate, R
N
N
k
Bufferless
29
The Fork-Join Router
  • Advantages
  • Single-stage of buffering
  • kh a power per subsystem i
  • kh a memory bandwidth i
  • kh a fowarding table lookup rate i

30
The Fork-Join Router
  • Questions
  • Switching What is the performance?
  • Forwarding Lookups How do they work?

31
A Parallel Packet Switch
Arriving packet tagged with egress port
1
Output Queued Switch
rate, R
rate, R
2
1
1
Output Queued Switch
rate, R
rate, R
N
N
k
Output Queued Switch
32
Performance Questions
  1. Can it be work-conserving?
  2. Can it emulate a single big output queued switch?
  3. Can it support delay guarantees,
    strict-priorities, WFQ, ?

33
Work Conservation
1
Output Queued Switch
R/k
R/k
2
Output Queued Switch
R/k
R/k
rate, R
rate, R
1
1
R/k
R/k
k
Output Queued Switch
Output Link Constraint
Input Link Constraint
34
Work Conservation
1
1
4
5
R/k
R/k
4
1
2
R/k
2
R/k
2
rate, R
rate, R
1
1
3
R/k
R/k
k
3
Output Link Constraint
35
Work Conservation
1
Output Queued Switch
S(R/k)
S(R/k)
rate, R
rate, R
S(R/k)
S(R/k)
2
1
1
Output Queued Switch
rate, R
rate, R
N
N
k
Output Queued Switch
S(R/k)
S(R/k)
36
Precise Emulation of an Output Queued Switch
Output Queued Switch
1
N
N
N
37
Parallel Packet SwitchTheorems
  • If S gt 2k/(k2) _at_ 2 then a parallel packet switch
    can be work-conserving for all traffic.
  • If S gt 2k/(k2) _at_ 2 then a parallel packet switch
    can precisely emulate a FCFS output-queued switch
    for all traffic.

38
Parallel Packet SwitchTheorems
  • 3. If S gt 3k/(k3) _at_ 3 then a parallel packet
    switch can precisely emulate a switch with WFQ,
    strict priorities, and other types of QoS, for
    all traffic.

39
Parallel Packet SwitchTheorems
  • 4. If S gt 2 then a parallel packet switch with a
    small co-ordination buffer at rate R, can
    precisely emulate a switch with WFQ, strict
    priorities, and other types of QoS, for all
    traffic.

40
The Fork-Join Router
  • Questions
  • Switching What is the performance?
  • Forwarding Lookups How do they work?

41
The Fork-Join RouterLookahead Forwarding Table
Lookups
Packet tagged with egress port at next router
Lookup performed in parallel at rate R/k
42
The Fork-Join Router
Router
1
rate, R
rate, R
1
1
2
rate, R
rate, R
N
N
k
Expect gt50Tb/s aggregate capacity
43
Conclusions
  • The main problems are power (supply and
    dissipation) and memory bandwidth.
  • Multi-stage switches solve the wrong problem.
  • Single-stage switches are here to stay.
  • Very high capacity single-stage electronic
    routers are feasible.
Write a Comment
User Comments (0)
About PowerShow.com