Some Unsolved Problems in High Speed Packet Swtiching - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Some Unsolved Problems in High Speed Packet Swtiching

Description:

... often ignored in other work Input 1 Input 2 Input 3 Input 4 Output 1 Output 2 Output 3 Output 4 Switch Fabric ... fabric consists of multiple switching ... chip ... – PowerPoint PPT presentation

Number of Views:125
Avg rating:3.0/5.0
Slides: 56
Provided by: Shive2
Category:

less

Transcript and Presenter's Notes

Title: Some Unsolved Problems in High Speed Packet Swtiching


1
Some Unsolved Problems in High Speed Packet
Swtiching
Shivendra S. Panwar Joint work with Yihan Li,
Yanming Shen and H. Jonathan Chao Polytechnic
University, Brooklyn, NY NY State Center for
Advanced Technology in Telecommunications http//c
att.poly.edu/CATT/panwar.html
2
  • Advice to Woodward and Bernstein
  • Follow the money
  • -- Deep Throat
  • (aka Mark Felt)

3
  • Advice to performance analysts
  • Find the bottleneck

4
Packet Switching
5
Buffering in a Packet Switch
  • Fixed-size packet switches
  • Operates in a time-slotted manner
  • The slot duration is equal to the cell
    transmission time
  • Contention occurs when multiple inputs have
    arrivals destined to the same output
  • Buffering is needed to avoid packet loss
  • Buffering schemes in a packet switch
  • Output queueing (IQ)
  • Input queueing (OQ)
  • Virtual output queueing (VOQ) / combined
    input-output-queueing (CIOQ)

6
Output Queuing (OQ)
  • 100 throughput
  • Internal speedup of N
  • Impractical for large N

Output 1
Input 1
3
Output 2
Input 2
3
Output 3
Input 3
3
Output 4
Input 4
3
7
Input Queuing (IQ)
  • Easy to implement
  • HOL Blocking, throughput 58.6

Input 1
Output 1
1
2
Head of Line Blocking
Input 2
Output 2
3
2
Input 3
Output 3
3
4
Input 4
Output 4
2
4
8
Virtual Output Queuing (VOQ)
  • Virtual Output Queuing (VOQ)
  • Overcome HOL blocking
  • No speedup requirement
  • Need scheduling algorithms to resolve contention
  • Complexity
  • Performance guarantee

9
Challenges in Switch Design
  • Stability
  • 100 throughput
  • Delay performance
  • Scalability
  • Scale to high number of linecards and to high
    linecard speeds
  • Distributed scheduler is more desirable than a
    centralized scheduler
  • Scheduler complexity
  • Pin count

10
High Speed Packet Switches
  • VOQ switches and scheduling algorithms
  • Buffered crossbar switch
  • Load Balanced switch
  • Multi-stage switch

11
VOQ Switch Architecture
Input Segmentation Module (ISM) Segment packets
to fixed-length cells. Output Reassembly Module
(ORM) Reassemble cells into packets.
12
Scheduling for VOQ Switch
  • Scheduling is needed to avoid output contention
  • A scheduling problem can be modeled as a matching
    problem in a bipartite graph
  • An input and an output are connected by an edge
    if the corresponding VOQ is not empty
  • Each edge may have a weight, which can be
  • The length of the VOQ
  • The age of the HOL cell

13
Maximum Weight Matching (MWM)
7
  • MWM always finds a match with the maximum weight
  • Stable under any admissible traffic
  • Very high complexity
  • O(N3), impractical

4
3
7
8
5
6
10
  • References
  • L. Tassiulas, A. Ephremides, Stability
    properties of constrained queueing systems and
    scheduling for maximum throughput in multihop
    radio networks,'' IEEE Transactions on Automatic
    Control, Vol. 37, No. 12, pp. 1936-1949, December
    1992.
  • E. Leonardi, M. Mellia, F. Neri, Marco A. Marsan,
    On the stability of Input-Queued Switches with
    speed-up, IEEE/ACM Transactions on Networking,
    Vol.9, No.1, pp.104-118, ISSN S
    1063-6692(01)01313, February 2001

5
2
Weight of the match 25
  • N. McKeown, V. Anantharam, and J. Walrand,
    Achieving 100 Throughput in an Input-Queued
    Switch, IEEE Transaction on Comm., vol. 47, no.
    8, Aug. 1999, pp. 1260-1267.
  • J.G. Dai and B. Prabhakar, The throughput of
    data switches with and without speedup, INFOCOM
    2000.

14
Maximum Weight Matching
  • The maximum weight matching algorithm is strongly
    stable under any admissible traffic pattern
  • Lyapunov function
  • Strongly stable
  • Admissible
  • References
  • Emilio Leonardi, Marco Mellia, Fabio Neri, Marco
    Ajmone Marsan, On the stability of Input-Queued
    Switches with speed-up, IEEE/ACM Transactions on
    Networking, Vol.9, No.1, pp.104-118, ISSN S
    1063-6692(01)01313, February 2001
  • N. McKeown, V. Anantharam, and J. Walrand,
    Achieving 100 Throughput in an Input-Queued
    Switch, IEEE Transaction on Comm., vol. 47, no.
    8, Aug. 1999, pp. 1260-1267.

15
Maximum Weight Matching
  • Fluid model
  • The maximum weight matching is rate stable if
  • The arrival processes satisfy a strong law of
    large numbers (SLLN) with probability one

  • , and
  • References
  • J.G. Dai and B. Prabhakar, The throughput of
    data switches with and without speedup, INFOCOM
    2000, pp. 556-564.

16
Approximate MWM
  • 1-APRX
  • A function f(.) is a sub-linear function if
    limx?8 f(x)/x 0
  • Let the weight of a schedule obtained by a
    scheduling algorithm B be WB
  • Let the weight of the maximum weight match for
    the same switch state be W
  • If WB W - f(W)
  • B is a 1-APRX to MWM
  • B is stable if
  • Makes it possible to find stable matching
    algorithms with lower complexity than MWM.
  • References
  • D. Shah, M. Kopikare, Delay bounds for
    approximate Maximum weight matching algorithms
    for input-queued switches, IEEE INFOCOM, New
    York, USA, June 2002.

17
Average Delay Bound
  • Delay bound for MWM
  • Lyapunov function
  • References
  • E. Leonardi, M. Melia, F. Neri, and M. Ajmone
    Marson. Bounds on average delays and queue size
    averages and variances in input-queued cell-based
    switches. Proceedings of IEEE INFOCOM, 2001.

18
Average Delay Bound (contd.)
  • Delay bound for approximate-MWM
  • Lyapunov function
  • Cb weight difference to the MWM matching
  • Uniform traffic, they have the same result
  • References
  • D. Shah, M. Kopikare, Delay bounds for
    approximate Maximum weight matching algorithms
    for input-queued switches, IEEE INFOCOM, New
    York, USA, June 2002.

19
Open Issues
  • With simulations, MWM has the best delay
    performance (Cell delay)
  • Average delay Choose the weight of a queue as Qa
    , then delay is increasing with a for agt0
  • Is MWM the optimal scheduling scheme for
    achieving the minimum average cell delay?
  • What is the optimal scheduling scheme to achieve
    the minimum average packet delay (Including
    reassembly delay)?

20
Maximal Matching
  • Maximal Matching
  • Add connections incrementally, without removing
    connections made earlier
  • No more matches can be made trivially by the end
    of the operation
  • Solution may not be unique
  • Complexity O(NlogN)

21
Maximal Matching
  • A maximal matching achieves 100 throughput with
    speed-up S2 under any admissible traffic pattern
  • Leonardi, ToN 2001
  • 100 throughput
  • if
  • with probability 1
  • A maximal matching algorithm is rate stable with
    speed-up S2 Dai, Infocom 2000
  • References
  • Emilio Leonardi, Marco Mellia, Fabio Neri, Marco
    Ajmone Marsan, On the stability of Input-Queued
    Switches with speed-up, IEEE/ACM Transactions
    on Networking, Vol.9, No.1, pp.104-118, ISSN S
    1063-6692(01)01313, February 2001
  • J.G. Dai and B. Prabhakar, The throughput of
    data switches with and without speedup, INFOCOM
    2000, pp. 556-564.

22
Multiple Iterative Matching
  • Use multiple iterations to converge on a maximal
    matching
  • Parallel Iterative Matching (PIM)
  • iSLIP and DRRM
  • complexity of each iteration is O(logN)
  • O(logN) iterations are needed to converge on a
    maximal matching (iSLIP)
  • 100 throughput only under uniform traffic

23
iSLIP
  • Step 1 Request
  • Each input sends a request to every output for
    which it has a queued cell.
  • Step 2 Grant
  • If an output receives multiple requests it
    chooses the one that appears next in a fixed
    round-robin schedule.
  • The output arbiter pointer is incremented by one
    location beyond the granted input if, and only
    if, the grant is accepted in step 3.
  • Step 3 Accept
  • If an input receives multiple grants, it accepts
    the one that appears next in a fixed round-robin
    schedule.
  • The input arbiter pointer is incremented by one
    location beyond the accepted output.

Output
Input
Request Grant Accept
24
Achieving 100 Throughput without Speedup
  • Matching algorithms using memory
  • Polling system based matching

25
Low Complexity Algorithms with 100 Throughput
  • Algorithms with memory
  • Use the previous schedule as a candidate
  • References
  • L. Tassiulas, Linear complexity algorithms for
    maximum throughput in radio networks and input
    queued switches, IEEE INFOCOM 1998, vol.2, New
    York, 1998, pp.533-539.
  • P. Giaccone, B. Prabhakar, D. Shah Toward
    simple, high-performance schedulers for
    high-aggregate bandwidth switches, IEEE INFOCOM
    2002, New York, 2002.
  • Polling system based matching algorithms
  • Improve the efficiency by using exhaustive
    service
  • References
  • Y. Li, S. Panwar, H. J. Chao, Exhaustive service
    matching algorithms for input queued switches,
    2004 Workshop on High Performance Switching and
    Routing (HPSR 2004), April 2004.
  • Y. Li, S. Panwar, H. J. Chao, Performance
    Analysis of a Dual Round Robin Matching Switch
    with Exhaustive Service, IEEE GLOBECOM 2002.

26
Matching Algorithms with Memory
  • The queue length of each VOQ does not change much
    during successive time slots
  • In each time slot, there can be
  • At most one cell arrives to each input
  • At most one cell departs from each input
  • It is likely that a busy connection will continue
    to be busy over a few time slots, if the queue
    length is used as the weight of a connection
  • Use the match in the previous time slot as an
    candidate for the new match
  • Important results
  • Randomized algorithm with memory Tassiulas 98
  • Derandomized algorithm with memory Giaccone 02
  • With higher complexity APSARA, LAURA, SERENA
    Giaccone 02

27
Notations
  • For a NxN switch, there are N! possible matches
  • Q(t)qijNxN, qij is the queue length of VOQij
  • M(t), a match at time t
  • The weight of M(t)
  • W(t)ltM(t),Q(t)gt
  • the sum of the lengths of all matched VOQs

28
Randomized algorithm with memory
  • Randomized algorithm with memory
  • Let S(t) be the schedule used at time t
  • At time t1, uniformly select a match R(t1) at
    random from the set of all N! possible matches
  • Let
  • Stable under any Bernoulli i.i.d. admissible
    arrival traffic
  • Very simple to implement, complexity O(logN)
  • Delay performance is very poor

29
Derandomized Algorithm with Memory
  • Hamiltonian walk
  • A walk which visits every vertex of a graph
    exactly once.
  • In a NxN switch,
  • N! vertices (possible schedules), a Hamiltonian
    walk visits each vertex once every N! time slots
  • H(t) the value of the vertex which is visited at
    time t
  • The complexity of generating H(t1) when H(t) is
    known is O(1)
  • Derandomized algorithm with memory
  • Use the match generated by Hamiltonian walk
    instead of the random match
  • Similar performance as randomized algorithm

30
Compared to MWM
  • Simple matching algorithms can achieve stability
    as MWM does
  • Not necessary to find the best match in each
    time slot to achieve 100 throughput
  • MWM has much better delay performance than
    randomized and derandomized matching
  • better matches lead to better delay performance

31
With Higher Complexity and Lower Delay
  • Introduce higher complexity for much lower delay
    than the randomized and derandomized algorithms
  • APSARA
  • include the neighbors of the latest match as
    candidates
  • LAURA
  • merge the latest match with a random match to
    remember the heavy edges
  • SERENA
  • Merge the latest match with the arrival figure
  • Figure generated from the current arrival
    pattern
  • Complexity O(N)

32
Polling System Based Matching
  • Exhaustive Service Matching
  • Inspired by exhaustive service polling systems
  • All the cells in the corresponding VOQ are served
    after an input and an output are matched
  • Slot times wasted to achieve an input-output
    match are amortized over all the cells waiting in
    the VOQ instead of only one
  • Cells within the same packet are transferred
    continuously
  • Hamiltonian walk is used to guarantee stability

33
Exhaustive Service Matching with Hamiltonian Walk
(EMHW)
  • EMHW
  • Let S(t) be the match at time t.
  • At time t1, generate match Z(t1) by the
    Exhaustive Service Matching algorithm based on
    S(t), and H(t1) by Hamiltonian walk
  • Let
  • where ltS,Q(t1)gt is the weight of S at time t1.
  • Stable under any admissible traffic
  • Analyzed by an exhaustive service polling system
  • Implementation complexity
  • HE-iSLIP O(logN)

34
E-iSLIP Average Delay Analysis
  • Exhaustive random polling system model
  • Symmetric system -- only consider one input
  • N VOQs per input, exhaustive service policy -- an
    exhaustive service polling system with N stations
  • The service order of the VOQs are not fixed --
    random polling system, assume all station VOQs
    have the same probability of selection for
    service after a VOQ is served
  • Switch over time S
  • Average delay T Levy and Kleinrock

35
Delay Performance of HE-iSLIP
  • Packet delay the sum of cell delay and
    reassembly delay
  • Cell delay measured from VOQ to destination
    output
  • Reassembly delay time spent in an ORM, often
    ignored in other work

36
Performance Summary
schemes complexity stable packet delay performance
iSLIP O(logN) No Always higher than HE-iSLIP.
HE-iSLIP O(logN) Yes Lowest when packet size is larger than 1 cell.
Derandomized O(logN) Yes Highest for all traffic patterns.
SERENA O(N) Yes Lower than HE-iSLIP only under nonuniform diagonal traffic.
MWM O(N3) Yes Lowest when packet size is 1 cell.
37
Packet Delay under Uniform Traffic
  • Pattern 1 packet size is 1 cell.

SERENA
iSLIP
HE-iSLIP
MWM
38
Packet Delay under Uniform Traffic
  • Pattern 3 packet length is variable, the average
    is 10 cells (Internet packet size distribution)
  • Pattern 2 packet length is 10 cells

SERENA
SERENA
iSLIP
MWM
iSLIP
MWM
HE-iSLIP
HE-iSLIP
39
When packet length is larger than 1 cell
  • Why does HE-iSLIP have a lower packet delay than
    MWM?
  • For example, when packet length is 10 cells
  • Reassembly delay
  • Cell delay
  • Low cell delay low reassembly delay needed for
    low packet delay

Open Problem Which scheduler minimizes packet
delay performance?
40
Packet-Based Scheduling
  • Packet-based scheduling algorithm
  • once it starts transmitting the first cell of a
    packet to an output port, it continues the
    transmission until the whole packet is completely
    received at the corresponding output port
  • Packet-based MWM is stable for any admissible
    Bernoulli i.i.d. traffic
  • Lyapunov function, MA. Marsan, A. Bianco, P.
    Giaccone, E. Leonardi, and F. Neri, Packet
    Scheduling in Input-Queued Cell-Based Swithces,
    INFOCOM 2001, pp. 1085-1094.
  • Packet-based MWM is stable under regenerative
    admissible input traffic
  • Fluid model, Y. Ganjali, A. Keshavarzian, D.
    Shah, Input Queued Switches Cell switching v/s
    Packet switching", Proceedings of Infocom, 2003.
  • regenerative Let T be the time between two
    successive occurrences of the event that all
    ports are free with E(T) being finite
  • Modified waiting PB-MWM algorithm is stable under
    any admissible traffic

41
Buffered Crossbar Switch
  • One buffer for each crosspoint
  • Distributed arbitration for inputs and outputs
  • From each input, one cell can be sent to a
    crosspoint buffer if it has space
  • One cell can be sent to an output if at least one
    crosspoint buffer to that output is nonempty
  • References
  • Y. Doi and N. Yamanaka, A High-Speed ATM Switch
    with Input and Cross-Point Buffers, IEICE TRANS.
    COMMUN., VOL. E76, NO.3, pp. 310-314, March 1993.
  • R. Rojas-Cessa, E. Oki, Z. Jing, and H. J. Chao,
    CIXB-1 Combined Input-One-Cell-Crosspoint
    Buffered Switch, Proceedings of IEEE Workshop of
    High Performance Switches and Routers 2001.

42
Birkhoff-von Neumann Switch
  • When traffic matrix is known
  • Birkhoff-von Neumann decomposition
  • Reference
  • Cheng-Shang Chang, Wen-Jyh Chen and Hsiang-Yi
    Huang, "On service guarantees for input buffered
    crossbar switches a capacity decomposition
    approach by Birkhoff and von Neumann," IEEE
    IWQoS'99, pp. 79-86, London, U.K., 1999.

43
Birkhoff-von Neumann Switch
  • Example
  • High complexity, impractical

44
Load-Balanced Switch
  • Load-balanced switch
  • Convert the traffic to uniform, then fixed
    switching
  • 100 throughput for broad class of traffic
  • No centralized scheduler needed, scalable

45
Original Work on LB Switch
  • Stability the load-balanced switch is stable
  • Delay burst reduction
  • Problem unbounded out-of-sequence delays
  • Reference
  • C.-S. Chang, D.-S. Lee and Y.-S. Jou, Load
    balanced Birkhoff-von Neumann switches, Part I
    one-stage buffering, Computer Comm., Vol. 25,
    pp. 611-622, 2002.

46
LB Switch variants
  • Solve the out-of-sequence problem
  • FCFS (First come first serve)
  • Jitter control mechanism
  • Increase the average delay
  • EDF (Earliest deadline first)
  • Reduce the average delay
  • High complexity
  • Mailbox switch
  • Prevent packets from being out-of-sequence
  • Not 100 throughput
  • References
  • C.-S. Chang, D.-S. Lee and C.-M. Lien, Load
    balanced Birkhoff-von Neumann switches, Part II
    multi-stage buffering, Computer Comm., Vol. 25,
    pp. 623-634, 2002.
  • C.S. Chang, D. Lee, and Y. J. Shih, Mailbox
    switch A scalable twostage switch architecture
    for conflict resolution of ordered packets, In
    Proceedings of IEEE INFOCOM, Hong Kong, March
    2004.

47
More LB switch variants
  • FFF (Full frames first) (Infocom 2002, Mckeown)
  • Frame-based
  • No need for resequencing
  • Require multi-stage buffer communication-high
    complexity
  • FOFF (Full ordered frames first) (Sigcomm 2003,
    Mckeown)
  • Frame-based
  • Maximum resequencing delay N2
  • Bandwidth wastage
  • References
  • I. Keslassy and N. McKeown, Maintaining packet
    order in two-stage switches, Proc. of the IEEE
    Infocom, June 2002.
  • I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M.
    Horowitz, O. Solgaard and N. McKeown , Scaling
    Internet routers using optics, ACM SIGCOMM 03,
    Karlsruhe, Germany, Aug. 2003.

48
Byte-Focal Switch Architecture
Re-sequencing buffer
2nd stage switch fabric
1st stage switch fabric
Arrival
Input VOQ
Second-stage VOQ
(1,1)
(1,1)
1
1
1
(1,k)
(1,k)
(1,N)
(1,N)


(i,1)
(j,1)

j
k
i
(j,k)
(i,k)
(j,N)
(i,N)


(N,1)
(N,1)

(N,k)
N
N
(N,k)
N
(N,N)
(N,N)
49
Byte-Focal Switch
  • Packet-by-packet scheduling
  • Improves the average delay performance
  • The maximum resequencing delay is N2
  • The time complexity of the resequencing buffer is
    O(1)
  • Does not need communications between linecards
  • References
  • Y. Shen, S. Jiang, S.S.Panwar, H.J. Chao,
    Byte-Focal a practical load-balanced swtich,
    HPSR 2005, Hongkong.

50
Multi-Stage Switches
  • Single Stage Switches (e.g., Cross-point switch)
  • Single path between each input-output pair
  • Cannot meet the increasing demands of Internet
    traffic
  • No packets out-of-sequence
  • Easy to design
  • Lack of scalability
  • Multi-stage Switches (e.g., Clos-network switch)
  • Multiple paths between each input-output pair
  • Better tradeoff between the switch performance
    and complexity
  • Highly scalable and fault tolerant
  • Memory-less multi-stage switches
  • No packets out-of-sequence, may encounter
    internal blocking
  • Buffered multi-stage switches
  • Packet may be out-of-sequence, easy scheduling

51
Multi-Stage Architecture
52
Trueway A Multi-Plane Multi-Stage Switch
53
Trueway Switch
  • The switch fabric consists of multiple switching
    planes, with each being a three-stage Clos
    network with m center modules
  • Each input/output pair has multiple routing paths
  • Highly scalable

54
Challenges in Multi-Stage Switching
  • How to efficiently allocate and share the limited
    on-chip memory?
  • How to schedule packets on multiple paths to
    maximize memory utilization and system
    performance?
  • How to minimize link congestion and prevent
    buffer overflow (i.e., stage-to-stage flow
    control)?
  • How to maintain cells/packet order if they are
    delivered over multiple paths (i.e., port-to-port
    flow control)?
  • How to achieve 100 throughput?

55
Conclusion
  • Introduced switch architecture trends
  • Many open research problems
  • Bottleneck keeps changing!
Write a Comment
User Comments (0)
About PowerShow.com