Dr' Multicast for Data Center Communication Scalability - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Dr' Multicast for Data Center Communication Scalability

Description:

IPMC is not used in data centers. Would speed up products that ... size, assign P-IPMC addresses greedily, we minimize bandwidth. ... Greedily collapse ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 47
Provided by: Goog580
Category:

less

Transcript and Presenter's Notes

Title: Dr' Multicast for Data Center Communication Scalability


1
Dr. Multicast    for Data Center Communication
Scalability
HotNets, October 5, 2008
  • Ymir Vigfusson   Hussam Abu-Libdeh   Mahesh
    Balakrishnan   Ken Birman
  • Cornell University
  • Yoav Tock
  • IBM Research Haifa

2
IP Multicast in Data Centers
  • IPMC is not used in data centers

3
IP Multicast in Data Centers
  • IPMC is not used in data centers
  • Would speed up products that use multicast

4
IP Multicast in Data Centers
  • Why is IP multicast rarely used?

5
IP Multicast in Data Centers
  • Why is IP multicast rarely used?
  • Limited IPMC scalability on switches/routers and
    NICs

6
IP Multicast in Data Centers
  • Why is IP multicast rarely used?
  • Limited IPMC scalability on switches/routers and
    NICs
  • Broadcast storms Loss triggers a horde of NACKs,
    which triggers more loss, etc. 
  • Disruptive even to non-IPMC applications.

7
IP Multicast in Data Centers
  • IP multicast has a bad reputation

8
IP Multicast in Data Centers
  • IP multicast has a bad reputation
  • Works great up to a point,                       
             after which it breaks                  
                          catastrophically

9
IP Multicast in Data Centers
  • Bottom line
  • Administrators have no control over multicast use
    ...
  • Without control, they opt for never.

10
(No Transcript)
11
Dr. Multicast  
12
Dr. Multicast (MCMD)
  • Policy Permits data center operators to
    selectively enable and control IPMC
  •  
  • Transparency Standard IPMC interface, system
    calls are overloaded.
  •  
  • Performance Uses IPMC when possible, otherwise
    point-to-point unicast
  •  
  • Robustness Distributed, fault-tolerant service
  •  

13
Terminology
  • Process Application that joins logical IPMC
    groups
  • Logical IPMC group A virtualized abstraction
  • Physical IPMC group As usual
  • UDP multi-send New kernel-level system-call 
  •  
  •  
  • Collection Set of logical IPMC groups with
    identical membership

14
Acceptable Use Policy
  • Assume a higher-level network management tool
    compiles policy into primitives
  • Explicitly allow a process to use IPMC groups
  • allow-join(process,logical IPMC)
  • allow-send(process,logical IPMC)
  • UDP multi-send always permitted
  • Additional restraints
  • max-groups(process,limit)
  • force-udp(process,logical IPMC)

15
 
Overview
  • Library module
  • Mapping module
  • Gossip layer
  •  
  • Optimization questions
  •  
  • Results

16
MCMD Library Module
  • Transparent. Overloads the IPMC functions
  • setsockopt(), send(), etc.
  •  
  • Translation. Logical IPMC map to a set of
    P-IPMC/unicast addresses.
  • Two extremes

17
MCMD Mapping Role
  • MCMD Agent runs on each machine
  • Contacted by the library modules 
  • Provides a mapping
  •  
  •  
  • One agent elected to be a leader
  • Allocates IPMC resources according to the current
    policy
  •  
  •  
  •  
  •  

18
MCMD Mapping Role
  •  
  • Allocating IPMC resources An optimization
    problem
  •  
  •  
  •  
  •  
  •  
  •  

Procs   Collections L-IPMC
Procs   L-IPMC
This box intentionally left   BLACK
19
MCMD Gossip Layer
  • Runs system-wide as part of the agent
  •  
  • Automatic failure detection 
  •  
  • Group membership fully replicated via gossip
  • Node reports its own state
  • Future Replicate more selectively
  • Leader runs optimization algorithm on data and
    reports the mapping
  •  
  •  
  •  
  •  

20
MCMD Gossip Layer
  • But gossip is slow...
  •  
  • Implications
  • Slow propagation of group membership
  • Slow propagation of new maps
  • We assume a low rate of membership churn
  •  
  • Remedy Broadcast module
  • Leader broadcasts urgent messages 
  • Bounded bandwidth of urgent channel
  • Trade-off between latency and scalability
  •  
  •  
  •  
  •  

21
Overview
  • Library module
  • Mapping module
  • Gossip layer
  •  
  • Optimization questions
  •  
  • Results

22
Optimization Questions
Collections
BLACK
Procs   L-IPMC
Procs    L-IPMC
  • First step compress logical IPMC groups

23
Optimization Questions
  • klkl
  •  
  •  
  •  
  • How compressible are subscriptions?
  • Multi-objective optimization 
  • Minimize number of collections
  • Minimize bandwidth overhead on network
  •  
  • Thm The general problem is NP-complete
  • Thm In uniform random allocation, "little"
    compression opportunity. 
  • Social preferences
  • Lots of duplicates due to replication (e.g. for
    load balancing)
  •  
  •  

24
Optimization Questions
  • klkl
  •  
  •  
  •  
  • Which collections get an IPMC address?
  • Thm Ordered by decreasing trafficsize,  assign
    P-IPMC addresses greedily, we minimize bandwidth.
  • Tiling heuristic
  • Sort L-IPMC by trafficsize
  • Greedily collapse identical groups
  • Assign IPMC to collections in reverse order of
    trafficsize, UDP-multisend to the rest
  • Building tilings incrementally
  •  

25
Experimental Results
  • klkl
  •  
  •  
  •  

26
Overhead (max. throughput)
klkl      
  • Insignificant overhead when mapping L-IPMC to
    P-IPMC.
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

27
Overhead (CPU utilization)
klkl      
  • Insignificant overhead when mapping L-IPMC to
    P-IPMC.
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

28
Network Overhead
  • klkl
  •  
  •  
  •  
  • Gossip Layer uses constant background bandwidth,
    urgent channel behaves well
  •  
  •  
  •  
  •  
  •  
  •  
  •  

29
Latency
  • Latency of propagation of joins/leaves and new
    maps
  •  
  •  
  •  
  •  

30
Policy control
klkl      
  • A malfunctioning node bombards an existing IPMC
    group.
  • MCMD policy prevents ill-effects
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

ltTraffic starts
ltNew policy
31
Conclusion
  • IPMC has been a bad citizen...
  •  

32
Conclusion
  • IPMC has been a bad citizen...
  •  
  • Dr. Multicast has the cure!
  • Opportunity for big performance enhancements and
    policy control.

33
Thank you!
34
Thank you!  
35
Overhead
klkl      
  • Insignificant overhead when mapping L-IPMC to
    P-IPMC.
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

36
Policy control
klkl      
  • A malfunctioning node bombards an existing IPMC
    group.
  • MCMD policy prevents ill-effects
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

37
Policy control
klkl      
  • A malfunctioning node bombards an existing IPMC
    group.
  • MCMD policy prevents ill-effects
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

38
Overhead
klkl      
  • Linux kernel module increases UDP-multisend
    throughput by 17 (compared to user-space
    UDP-multisend)
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

39
Latency of events
  • klkl
  •  
  •  
  •  
  • Gossip 99 of nodes aware of change within 9
    epochs (now 1 sec)
  •  
  •  
  •  
  •  

40
Conclusions
  • Policy Allows data center operators to       
    enable and control IPMC
  •  
  • Transparency Standard IPMC interface, system
    calls are overloaded.
  •  
  • Performance Uses IPMC when possible, otherwise
    point-to-point UDP
  •  
  • Robustness Distributed, fault-tolerant service
  •  

41
Results
  • klkl
  •  
  •  
  •  
  • Library Module
  • Insignificant slowdown
  •  
  •  
  •  
  •  
  • Linux Kernel module provides 17 speed-up for UDP
    multi-send

42
Optimization questions
  • klkl
  •  
  •  
  •  

Users Topics
This box intentionally left   BLACK
Users   Groups Topics
  • Multi-objective 
  • Minimize number of groups
  • Minimize bandwidth overhead on network
  • Thm This problem is NP-complete
  • Reduction to Minimum Normal Set Basis
  •  
  •  
  •  

43
MCMD Library Layer
  • Overloads the IPMC functions
  • setsockopt(), send(), etc.
  • Translates logical IPMC addresses to physical
    IPMC, or point-to-point UDP packets depending on
    policy
  • Notifies MCMD immediately about joins/leaves
  • Learns about new mappings from MCMD
  • Keeps statistics about group traffic rates

44
MCMD Library Layer
  • Overloads the IPMC functions
  • setsockopt(), send(), etc.
  • Translates logical IPMC addresses to physical
    IPMC, or point-to-point UDP packets depending on
    policy
  •  
  • Caches translation maps
  • Maintains a connection to MCMD for updates

45
(No Transcript)
46
Overview
  • Library module
  • Mapping module
  • Gossip layer
  •  
  • Optimization questions
  •  
  • Results
Write a Comment
User Comments (0)
About PowerShow.com