Packet Scheduling/Arbitration in Virtual Output Queues and Others

About This Presentation

Title:

Packet Scheduling/Arbitration in Virtual Output Queues and Others

Description:

Packet Scheduling/Arbitration in Virtual Output Queues and Others – PowerPoint PPT presentation

Number of Views:365

Avg rating:3.0/5.0

Slides: 135

Provided by: Moto150

Category:

more less

Transcript and Presenter's Notes

Title: Packet Scheduling/Arbitration in Virtual Output Queues and Others

1
Packet Scheduling/Arbitration in Virtual Output
Queuesand Others
2
Key Characteristics in Designing Internet
Switches and Routers

Scalability in terms of line rates
Scalability in terms of number of interfaces
(port numbers)

3
Switch/Router Architecture Comparison
http//www.lightreading.com/document.asp?doc_id47
959
4
Head-of-Line Blocking
Blocked!
Blocked!
5
(No Transcript)
6
(No Transcript)
7
Crossbar Switches Virtual Output Queues

Virtual Output Queues
At each input port, there are N queues each
associated with an output port
Only one packet can go from an input port at a
time
Only one packet can be received by an output port
at a time
It retains the scalability of FIFO input-queued
switches (no memory bandwidth problem)
It eliminates the HoL problem with FIFO input
Queues

8
Virtual Output Queues
9
VOQs How Packets Move
VOQs
Scheduler
10
Crossbar Scheduler in VOQ Architecture

Memory b/w2R

Can be quite complex!

11
Question do more lanes help?

Answer it depends on the scheduling

Head of Line Blocking
VOQs with Bad Scheduling
Good Scheduling? Ayalon depends on traffic
matrix
12
Crossbar Scheduler in VOQ Architecture
Which packets I can send during each
configuration of the crossbar
13
Switch core architecture
Port 1
Scheduler (Like the Processor of A Computer)
Port 256
14
Basic Switch Model
S(n)
L11(n)
A11(n)
1
1
D1(n)
A1(n)
A1N(n)
AN1(n)
DN(n)
AN(n)
N
N
ANN(n)
LNN(n)
15
Some definitions
3. Queue occupancies
16
Some possible performance goals
When traffic is admissible
17
VOQ Switch Scheduling

The VOQ switch scheduling can be represented by a
bipartite graph
The left-hand side nodes of the bipartite graph
are the input ports
The right-hand side nodes of the bipartite graph
are the output ports
The edges between the nodes are requests for
packet transmission between input ports and
output ports.

A
1
B
2
3
C
4
D
5
E
6
F
18
Maximum size bipartite match

Intuition maximizes instantaneous throughput

L11(n)gt0
Maximum Size Match
LN1(n)gt0
Bipartite Match
Request Graph
19
Network flows and bipartite matching
A
1
B
2
Sink t
Source s
3
C
4
D
5
E
6
F

Finding a maximum size bipartite matching is
equivalent to solving a network flow problem with
capacities and flows of size 1.

20
Network Flows
a
c
Source s
Sink t
b
d

Let GV,E be a directed graph with capacity
cap(v,w) on edge v,w.
A flow is an (integer) function, f, that is
chosen for each edge so that f(v,w) lt cap(v,w).
We wish to maximize the flow allocation.

21
A maximum network flow exampleBy inspection
a
c
Source s
Sink t
b
d
Step 1
22
A maximum network flow example
Step 2
a
c
10, 10
Source s
Sink t
10, 10
1
10, 10
1
10, 1
b
d
10, 1
1, 1
Flow is of size 101 11
23
Ford-Fulkerson method of augmenting paths

Set f(v,w) -f(w,v) on all edges.
Define a Residual Graph, R, in which res(v,w)
cap(v,w) f(v,w)
Find paths from s to t for which there is
positive residue.
Increase the flow along the paths to augment them
by the minimum residue along the path.
Keep augmenting paths until there are no more to
augment.

24
Example of Residual Graph
a
c
10, 10
10, 10
1
10, 10
s
t
10
1
10
b
d
1
Flow is of size 10
Residual Graph, R
res(v,w) cap(v,w) f(v,w)
a
c
10
10
10
1
t
s
10
1
10
b
d
1
Augmenting path
25
Example of Residual Graph
a
c
10, 10
10, 10
1
10, 10
s
t
10
1
10
b
d
1
Flow is of size 10
Residual Graph, R
res(v,w) cap(v,w) f(v,w)
a
c
10
10
10
1
t
s
10
1
10
b
d
1
Augmenting path
26
Example of Residual Graph
Step 2
a
c
10, 10
s
t
10, 10
1
10, 10
1
10, 1
b
d
10, 1
1, 1
Flow is of size 101 11
Residual Graph
a
c
10
s
t
10
10
1
1
1
1
b
d
9
1
9
Augmenting path
27
Example of Residual Graph
Step 3
a
c
10, 9
s
t
10, 10
1, 1
10, 10
1, 1
10, 2
b
d
10, 2
1, 1
Flow is of size 102 12
Residual Graph
a
c
10
s
t
10
10
1
2
1
2
b
d
8
1
8
28
An other Example Ford-Fulkerson method
find augmenting path p
f0
Gf
G
12
a
b
16
20
9
4
s
10
t
7
13
4
11
c
d
29
An other Example Ford-Fulkerson method
find augmenting path p
f4
Gf
G
12
12
a
b
a
b
16
20
16
20
9
9
4
s
10
t
4
s
10
t
7
7
4
4/13
4/4
4
4
4/11
9
c
d
c
d
7
30
An other Example Ford-Fulkerson method
find augmenting path p
f16
Gf
G
12/12
12
a
b
a
b
8
4
12/16
12/20
12
12
9
9
4
s
10
t
4
s
10
t
7
7
4
4/13
4/4
4
4
4/11
9
c
d
c
d
7
31
An other Example Ford-Fulkerson method
find augmenting path p
f23
Gf
G
12/12
12
a
b
a
b
1
4
12/16
19/20
12
19
9
9
4
s
10
t
4
s
10
t
7/7
7
11
11/13
4/4
4
11/11
2
c
d
c
d
11
No more augmenting path
Maximum Flow is 23
32
An example for Flow Obvious solution
Total flow 10, Sub-optimal solution!
33
Flow algorithm Optimal version
Total flow 10 9 19 units!
9
34
Complexity of network flow problems

In general, it is possible to find a solution by
considering at most V.E paths, by picking
shortest augmenting path first.
There are many variations, such as picking most
augmenting path first.
The complexity of the algorithm is less when the
graph is bipartite
There are techniques other than the
Ford-Fulkerson method.

35
Ford - Fulkerson Algorithm 1
Network flows and bipartite matching
Finding a maximum size bipartite matching is
equivalent to solving a network flow problem with
capacities and flows of size 1.
sink
1
2
3
4
5
6
a
b
c
d
e
f
source
36
Ford - Fulkerson Algorithm 2
sink
Increasing the flow by 1.
1
2
3
4
5
6
a
b
c
d
e
f
source
37
Ford - Fulkerson Algorithm 3
sink
Increasing the flow by 1.
1
2
3
4
5
6
a
b
c
d
e
f
source
38
Ford - Fulkerson Algorithm 4
sink
Increasing the flow by 1.
1
2
3
4
5
6
a
b
c
d
e
f
source
39
Ford - Fulkerson Algorithm 5
sink
Increasing the flow by 1.
1
2
3
4
5
6
a
b
c
d
e
f
source
40
Ford - Fulkerson Algorithm 6
sink
Increasing the flow by 1.
1
2
3
4
5
6
a
b
c
d
e
f
source
41
Ford - Fulkerson Algorithm 7
sink
Augmenting flow along the augmenting path.
1
2
3
4
5
6
a
b
c
d
e
f
source
42
Ford - Fulkerson Algorithm 8
sink
Maximum flow found! Thus maximum matching found.
1
2
3
4
5
6
a
b
c
d
e
f
source
43
Complexity of Maximum Matchings

Maximum Size/Cardinality Matchings
Algorithm by Dinic O(N5/2)
Maximum Weight Matchings
Algorithm by Kuhn O(N3logN)
ftp//dimacs.rutgers.edu/pub/netflow/matching/
(contains code for maximum size/weighting
algorithms)
In general
Hard to implement in hardware
Slooooow.

44
Maximum size bipartite match

Intuition maximizes instantaneous throughput
for uniform traffic.

L11(n)gt0
Maximum Size Match
LN1(n)gt0
Bipartite Match
Request Graph
45
Why doesnt maximizing instantaneous throughput
give 100 throughput for non-uniform traffic?
Three possible matches, S(n)
46
Maximum weight matching
S(n)

Weight could be length of queue or age of packet
Achieves 100 throughput under all traffic
patterns

L11(n)
A11(n)
A1(n)
D1(n)
1
1
A1N(n)
AN1(n)
AN(n)
DN(n)
ANN(n)
N
N
LNN(n)
L11(n)
Maximum Weight Match
LN1(n)
Bipartite Match
Request Graph
47
Packet Scheduling/Arbitration in Virtual Output
Queues Maximal Matching Algorithms
48
Maximum Matching in VOQ Architecture
49
Complexity of Maximum Matchings

Maximum Size/Cardinality Matchings
Algorithm by Dinic O(N5/2)
Maximum Weight Matchings
Algorithm by Kuhn O(N3logN)
In general
Hard to implement in hardware
Slooooow.

50
Maximal Matching

A maximal matching is a matching in which each
edge is added one at a time, and is not later
removed from the matching.
i.e., No augmenting paths allowed (they remove
edges added earlier) like by inspection.
No input and output are left unnecessarily idle.

51
Example of Maximal Size Matching
A
1
A
1
B
2
B
2
3
C
3
C
4
4
D
D
5
E
5
E
6
6
F
F
Maximum Matching
Maximal Matching
52
Comments on Maximal Matchings

In general, maximal matching is much simpler to
implement, and has a much faster running time.
A maximal size matching is at least half the size
of a maximum size matching.
A maximal weight matching is defined in the
obvious way.
A maximal weight matching is at least half the
size of a maximum weight matching.

53
PIM Maximal Size Matching Algorithm Performance
and Properties

It is among the very first practical schedulers
proposed for VOQ architectures (used by DEC).
It is based on having arbiters at the inputs and
outputs
It iterates the following steps until no more
requests can be accepted (or for a given number
of iterations)
Request Each unmatched input sends a request to
every output for which it has a queued cell
Grant (outputs) If an unmatched output receives
any request, it grants one by randomly selecting
a request uniformly over all requests.
Accept (inputs) If an unmatched input receives a
grant, it accepts one by selecting an output
randomly among those granted to this input.

54
Implementation of the parallel maximal matching
algorithms

Grant Arbiters

Request Arbiters

55
Implementation of the parallel maximal matching
algorithms (another similar way)
56
PIM Maximum Size Matching Algorithm Performance
and Properties
PIM 1st Iteration

Step 1 Request

Random selection

Random selection

Step 2 Grant

Step 3 Accept

57
PIM Maximum Size Matching Algorithm Performance
and Properties
PIM 2nd Iteration

Step 1 Request

1
1
1
1
2
2
2
2
3
3
3
3
4
4
4
4
1
1
2
2

Step 2 Grant

3
3

Step 3 Accept

4
4
58
Traffic Types to evaluate Algorithms
Uniform traffic
Unbalanced traffic
Hotpot traffic
59
Parallel Iterative Matching
PIM with a single iteration
60
Parallel Iterative Matching
PIM with 4 iterations
61
Parallel Iterative MatchingAnalytical Results
Number of iterations to converge
62
PIM Maximum Size Matching Algorithm Performance
and Properties

It is a fair algorithm servicing inputs
Can have 100 throughput under uniform traffic
It converges in logN iterations to a maximal size
matching
It has a very poor performance (63 throughput)
with 1 iteration because of its inability to
desynchronize the output pointers
It is not easy to build random arbiters in
hardware
The best iterative maximal size matching
algorithm takes O(N2logN) serial or O(log N)
parallel time steps.
If the number of iterations is constant, then it
can be implemented in constant time (that is why
it is practical) however the hardware design is
not trivial.

63
RRM Maximum Size Matching Algorithm Performance
and Properties

Round Robin Matching (RRM) is easier to implement
that PIM (in terms of designing the I/O
arbiters).
The pointers of the arbiters move in
straightforward way
It iterates the following steps until no more
requests can be accepted (or for a given number
of iterations)
Request. Each input sends a request to every
output for which it has a queued cell.
Grant. If an output receives any requests, it
chooses the one that appears next in a fixed,
round-robin schedule starting from the highest
priority element. The output notifies each input
whether or not its request was granted. The
pointer gi to the highest priority element of the
round-robin schedule is incremented (modulo N) to
one location beyond the granted input. If no
request is received, the pointer stays unchanged.

64
RRM Maximum Size Matching Algorithm Performance
and Properties

Accept. If an input receives a grant, it accepts
the one that appears next in a fixed, round-robin
schedule starting from the highest priority
element. The pointer ai to the highest priority
element of the round-robin schedule is
incremented (modulo N) to one location beyond the
accepted output. If no grant is received, the
pointer stays unchanged.

65
RRM Maximal Matching Algorithm (1)
Step 1 Request
66
RRM Maximal Matching Algorithm (2)
Step 2 Grant
67
RRM Maximal Matching Algorithm (2)
68
RRM Maximal Matching Algorithm (2)
69
RRM Maximal Matching Algorithm (2)
70
RRM Maximal Matching Algorithm (3)
0 3 1 2
71
RRM Maximal Matching Algorithm (3)
0 3 1 2
72
RRM Maximal Matching Algorithm (3)
0 3 1 2
73
Poor performance of RRM Maximal Matching Algorithm
0 1 0 1 ..
0
0
0 1 0 1 ..
1
1
50 Throughput
74
iSLIP Maximum Size Matching Algorithm
Performance and Properties

It is a scheduler used in most VOQ switches
(e.g., Cisco).
It is exactly like RRM algorithm with the
following change
Grant. If an output receives any requests, it
chooses the one that appears next in a fixed,
round-robin schedule starting from the highest
priority element. The output notifies each input
whether or not its request was granted. The
pointer gi to the highest priority element of the
round-robin schedule is incremented (modulo N) to
one location beyond the granted input if and only
if the grant is accepted in (Accept phase) .

75
iSLIP Maximum Size Matching Algorithm
iSlip 1st Iteration

Step 1 Request

4 1 3 2
1
1
2
2
3
3
4
4
4 1 3 2
1 4 2 3

Step 2 Grant

4 1 3 2

Step 3 Accept

76
iSLIP Maximum Size Matching Algorithm
iSlip 2nd Iteration

Step 1 Request

1
1
1
1
4 1 3 2
2
2
2
2
3
3
3
3
4
4
4
4
1
1
1 4 2 3
2
2

Step 2 Grant

3
3
4 1 3 2

Step 3 Accept

4
4

No change

77
Simple Iterative Algorithms iSlip
Step 1 Request
78
Simple Iterative Algorithms iSlip
Step 2 Grant
79
Simple Iterative Algorithms iSlip
80
Simple Iterative Algorithms iSlip
Step 3 Accept
0
0
1
1
0 3 1 2
2
2
3
3
81
Simple Iterative Algorithms iSlip
Step 3 Accept
0
0
1
1
0 3 1 2
2
2
3
3
82
Simple Iterative Algorithms iSlip
83
Simple Iterative Algorithms iSlip
Step 3 Accept
0
0
1
1
0 3 1 2
2
2
3
3
84
Simple Iterative Algorithms iSlip
Step 3 Accept
0
0
1
1
0 3 1 2
2
2
3
3
85
iSLIP Implementation
Programmable Priority Encoder
1
1
State
Decision
log2N
N
Grant
Accept
2
2
Grant
Accept
N
log2N
N
N
Grant
Accept
log2N
N
86
Hardware Design
Layout of the 256 bits Priority Encoder
87
Hardware Design
Layout of 256 bits grant arbiter
88
FIRM Maximum Size Matching Algorithm Performance
and Properties

It is exactly like iSLIP with a very small yet
significant modification.
Grant (outputs) If an unmatched output receives
a request, it grants the one that appears next in
a fixed, round-robin schedule starting from the
highest priority element. The output notifies
each input whether or not its request is granted.
The pointer to the highest priority element of
the round-robin schedule is incremented beyond
the granted input. If input does not accept the
pointer is set at the granted one.

89
Simple Iterative Algorithms FIRM
Step 3 Accept
90
Pointer Synchronization

Why this is good this small change prevents the
output arbiters from moving in lock-step (being
synchronized pointing to the same input)
leading to a dramatic improvement in performance.
If several outputs grant the same input, no
matter how this input chooses, only one match can
be made, and the other outputs will be idle.
To get as many matches as possible, it's better
that each output grants a different input.
Since each output will select the highest
priority input if a request is received from this
input, it's better to keep the output pointers
desynchronized (pointing to different locations).

91
iSLIP Maximal Matching Algorithm
0 1 0 1 ..
0
0
0 0 1 0 ..
1
1
100 Throughput
92
Pointer Synchronization Differences between
RRM, iSlip FIRM
93
Differences between RRM, iSlip FIRM
RRM iSlip FIRM FIRM
Input No grant unchanged unchanged unchanged unchanged
Input Granted one location beyond the accepted one one location beyond the accepted one one location beyond the accepted one one location beyond the accepted one
Output No request unchanged unchanged unchanged unchanged
Output Grant accepted one location beyond the granted one one location beyond the granted one one location beyond the granted one one location beyond the granted one
Output Grant not accepted one location beyond the previously granted one unchanged unchanged the granted one
94
General remarks

Since all of these algorithms try to approximate
maximum size matching, they can be unstable under
non-uniform traffic
They can achieve 100 throughput under uniform
traffic
Under a large number of iterations, their
performance is similar
They have similar implementation complexity

95
Input QueueingLongest Queue First orOldest Cell
First

Queue Length
Weight
100

Waiting Time
1
1
1
1
1
2
10
2
2
2
1

w
e
i
g
h
t
M
a
x
i
m
u
m
3
3
3
3
1
10
4
4
4
4
1
96
Input QueueingWhy is serving long/old queues
better than serving maximum number of queues?

When traffic is uniformly distributed, servicing
themaximum number of queues leads to 100
throughput.
When traffic is non-uniform, some queues become
longer than others.
A good algorithm keeps the queue lengths
matched, and services a large number of queues.

97
Maximum/Maximal Weight Matching

100 throughput for admissible traffic (uniform
or non-uniform)
Maximum Weight Matching
OCF (Oldest Cell First) wcell waiting time
LQF (Longest Queue First)winput queue occupancy
LPF (Longest Port First)wQL of the source port
Sum of QL form the source port to the
destination port
Maximal Weight Matching (practical algorithms)
iOCF
iLQF
iLPF (comparators in the critical path of iLQF
are removed )

98
Maximal Weight Matching Algorithms iLQF

Request. Each unmatched input sends a request
word of width bits to each output for which it
has a queued cell, indicating the number of cells
that it has queued to that output.
Grant. If an unmatched output receives any
requests, it chooses the largest valued request.
Ties are broken randomly.
Accept. If an unmatched input receives one or
more grants, it accepts the one to which it made
the largest valued request. Ties are broken
randomly.

99
Maximal Weight Matching Algotithms iLQF

The i-LQF algorithm has the following properties
Property 1. Independent of the number of
iterations, the longest input queue is always
served.
Property 2. As with i-SLIP, the algorithm
converges in at most logN iterations.
Property 3. For an inadmissible offered load, an
input queue may be starved.

100
Maximal Weight Matching Algotithms iOCF

The i-OCF algorithm works in similar fashion to
iLQF, and has the following properties
Property 1. Independent of the number of
iterations, the cell that has been waiting the
longest time in the input queues (it must at the
head of the queue)
Property 2. As with i-LQF, the algorithm
converges in at most logN iterations.
Property 3. No input queue can be starved
indefinitely.
Property 4. It is difficult to keep time stamps
on the cells.

101
iLQF - Implementation
102
iLQF - Implementation
Complicated hardware
103
Other research efforts

Packet-based arbitration
Exhaustive-based arbitration
Numerous other efforts

104
Packet Scheduling/Arbitration in Virtual Output
QueuesRandomized Algorithmsand Others
105
Input-Queued Packet Switch
(?i ?i,j lt 1 ?j ?i,j lt 1)
Xi,j
106
Bipartite Graph and Matrix
inputs
1
2
3
outputs
3
2
1
107
Stability of Scheduling

DefinitionLet Xi,j(t) be the number of packets
queued at input i for output j at time-slot t.
Then an algorithm is stable iff

108
Motivation

Networking problems suffer from the curse of
dimensionality
algorithmic solutions do not scale well
Typical causes
size large number of users or large number of
I/O
time very high speeds of operation
A good deterministic algorithm exists (Max Flow),
but
it needs state information, and state is too
big
it starts from scratch in each iteration

109
Randomization

Randomized algorithms have frequently been used
in many situations where the state space (e.g.,
different number of connections between input and
output N!) is very large
Randomized algorithms
are a powerful way of approximating the optimal
solution
it is often possible to randomize deterministic
algorithms
this simplifies the implementation while
retaining a (surprisingly) high level of
performance
The main idea is
to simplify the decision-making process
by basing decisions upon a small, randomly chosen
sample of the state
rather than upon the complete state

110
Randomizing Iterative Schemes (e.g., iSLIP)

Often, we want to perform some operation
iteratively
Example find the heaviest matching in a switch
in every time slot
Since, in each time slot
at most one packet can arrive at each input
and, at most one packet can depart from each
output
the size of the queues, or the state of the
switch, doesnt change by much between successive
time slots
so, a matching that was heavy at time t will
quite likely continue to be heavy at time t1
This suggests that
knowing a heavy matching at time t should help in
determining a heavy matching at time t1
there is no need to start from scratch in each
time slot

111
Summarizing Randomized Algorithms

Randomized algorithms can help simplify the
implementation
by reducing the amount of work in each iteration
If the state of the system doesnt change by much
between iterations, then
we can reduce the work even further by carrying
information between iterations
The big pay-off is
that, even though it is an approximation, the
performance of a randomized scheme can be
surprisingly good

112
Randomized Scheduling Algorithms Example

Consider a 3 x 3 input-queued switch
input traffic is Bernoulli IID and ?ij a/3
for all i, j, and a lt 1
This is admissible
note there are a total of 6 ( 3!) possible
service matrices

113
Random Scheduling Algorithms

In time slot n, let S(n) be equal to one of the 6
possible matchings independently and uniformly at
random
Stability of Random
Consider L11(n), the number of packets in VOQ11
arrivals to VOQ11 occur according to A11(n),
which is Bernoulli IID
input rate ?11 a/3
this queue gets served whenever the service
matrix connects input 1 to output 1
There are 2 service matrices that connect input 1
to output 1
since Random chooses service matrices u.a.r.,
input 1 is connected to output 1
1. for a fraction of time 2/6 1/3 --- the
service rate between input1 and output1
E(L11(n)) lt iff ?11 lt 1/3 ? a lt 1
This random algorithm is stable.

114
Random Scheduling Algorithms

Instability of Random
Now suppose ?ii a for all i and ?ij 0 for
clearly, this is admissible traffic for all a lt 1
but, under Random, the service rate at VOQ11 is
1/3 at best
hence VOQ11 and the switch will be unstable as
soon as
Stability (or 100 throughput) means it is stable
under all admissible traffic!

115
Obvious Randomized Schemes

Choose a matching at random and use it as the
schedule
doesnt give 100 throughput (already shown)

Choose 2 matchings at random and use the heavier
one as the schedule
Choose N matchings at random and use the heaviest
one as the schedule

None of these can give 100 throughput !!

116
(No Transcript)
117
Iterative Randomized Scheme(Tassiulas)

Say M is the matching used at time t
Let R be a new matching chosen uniformly at
random (u.a.r.) among the N! different matchings
At time t1, use the heavier of M and R
Complexity is very low O(1) iterations
This gives 100 throughput !
note the boost in throughput is due to memory
(saving previous matchings)
But, delays are very large

118
(No Transcript)
119
Finer Observations

Let M be schedule used at time t
Choose a good random matching R
M Merge(M,R)
M includes best edges from M and R
Use M as schedule at time t1
Above procedure yields algorithm called LAURA
There are many other small variations to this
algorithm.

120
Merging Procedure
3
1
2
2
3
3
2
4
2
1
Merging
X
R
3-12-22
W(X)12
W(R)10
2-12-4-1
M
W(M)13
121
(No Transcript)
122
Can we avoid having schedulers altogether !!!
123
Recap Two Successive Scaling Problems
124
IQ Arbitration Complexity

Scaling to 160Gbps
Arbitration Time 3.2ns
Request/Grant Communication BW 280Gbps

Two main alternatives for scaling
Increase cell size
Eliminate arbitration

125
Desirable Characteristics for Router Architecture

Ideal OQ
100 throughput
Minimum delay
Maintains packet order
Necessary able to regularly connect any input to
any output
What if the world was perfect? Assume Bernoulli
iid uniform arrival traffic...

126
Round-Robin Scheduling

Uniform non-bursty traffic gt 100 throughput
Problem traffic is non-uniform bursty

127
Two-Stage Switch (I)
External Outputs
Internal Inputs
External Inputs
First Round-Robin
Second Round-Robin
128
Two-Stage Switch (I)
External Outputs
Internal Inputs
External Inputs
First Round-Robin
Second Round-Robin
129
Two-Stage Switch Characteristics
External Outputs
Internal Inputs
External Inputs
Cyclic Shift
Cyclic Shift

100 throughput
Problem unbounded mis-sequencing

130
Two-Stage Switch (II)
New
N3 instead of N2
131
Expanding VOQ Structure
Solution expand VOQ structure by distinguishing
among switch inputs
132
What is being done in practice(Cisco for example)

They want schedulers that achieve 100 throughput
and very low delay (Like MWM)
They want it to be as simple as iSLIP in terms of
hardware implementation
Is there any solution to this !!!!!

133
Typical Performance of ISLIP-like Algorithms
PIM with 4 iterations
134
What is being done in practice(Cisco for example)
Company Switching Capacity Switch Architecture Fabric Overspeed
Agere 40 Gbit/s-2.5 Tbit/s Arbitrated crossbar 2x
AMCC 20-160 Gbit/s Shared memory 1.0x
AMCC 40 Gbit/s-1.2 Tbit/s Arbitrated crossbar 1-2x
Broadcom 40-640 Gbit/s Buffered crossbar 1-4x
Cisco 40-320 Gbit/s Arbitrated crossbar 2x

Write a Comment

User Comments (0)

About PowerShow.com

Packet Scheduling/Arbitration in Virtual Output Queues and Others - PowerPoint PPT Presentation

Packet Scheduling/Arbitration in Virtual Output Queues and Others

Packet Scheduling/Arbitration in Virtual Output Queues and Others – PowerPoint PPT presentation