Title: Dr' Multicast for Data Center Communication Scalability
1Dr. Multicast  for Data Center Communication
Scalability
HotNets, October 5, 2008
- Ymir Vigfusson  Hussam Abu-Libdeh  Mahesh
Balakrishnan  Ken Birman - Cornell University
- Yoav Tock
- IBM Research Haifa
2IP Multicast in Data Centers
- IPMC is not used in data centers
3IP Multicast in Data Centers
- IPMC is not used in data centers
- Would speed up products that use multicast
4IP Multicast in Data Centers
- Why is IP multicast rarely used?
5IP Multicast in Data Centers
- Why is IP multicast rarely used?
- Limited IPMC scalability on switches/routers and
NICs
6IP Multicast in Data Centers
- Why is IP multicast rarely used?
- Limited IPMC scalability on switches/routers and
NICs - Broadcast storms Loss triggers a horde of NACKs,
which triggers more loss, etc. - Disruptive even to non-IPMC applications.
7IP Multicast in Data Centers
- IP multicast has a bad reputation
8IP Multicast in Data Centers
- IP multicast has a bad reputation
- Works great up to a point,            Â
     after which it breaks         Â
           catastrophically
9IP Multicast in Data Centers
- Bottom line
- Administrators have no control over multicast use
... - Without control, they opt for never.
10(No Transcript)
11Dr. Multicast Â
12Dr. Multicast (MCMD)
- Policy Permits data center operators to
selectively enable and control IPMC - Â
- Transparency Standard IPMC interface, system
calls are overloaded. - Â
- Performance Uses IPMC when possible, otherwise
point-to-point unicast - Â
- Robustness Distributed, fault-tolerant service
- Â
13Terminology
- Process Application that joins logical IPMC
groups - Logical IPMC group A virtualized abstraction
- Physical IPMC group As usual
- UDP multi-send New kernel-level system-callÂ
- Â
- Â
- Collection Set of logical IPMC groups with
identical membership
14Acceptable Use Policy
- Assume a higher-level network management tool
compiles policy into primitives - Explicitly allow a process to use IPMC groups
- allow-join(process,logical IPMC)
- allow-send(process,logical IPMC)
- UDP multi-send always permitted
- Additional restraints
- max-groups(process,limit)
- force-udp(process,logical IPMC)
15Â
Overview
- Library module
- Mapping module
- Gossip layer
- Â
- Optimization questions
- Â
- Results
16MCMD Library Module
- Transparent. Overloads the IPMC functions
- setsockopt(), send(), etc.
- Â
- Translation. Logical IPMC map to a set of
P-IPMC/unicast addresses. - Two extremes
17MCMD Mapping Role
- MCMD Agent runs on each machine
- Contacted by the library modulesÂ
- Provides a mapping
- Â
- Â
- One agent elected to be a leader
- Allocates IPMC resources according to the current
policy - Â
- Â
- Â
- Â
18MCMD Mapping Role
- Â
- Allocating IPMC resources An optimization
problem - Â
- Â
- Â
- Â
- Â
- Â
Procs  Collections L-IPMC
Procs  L-IPMC
This box intentionally left  BLACK
19MCMD Gossip Layer
- Runs system-wide as part of the agent
- Â
- Automatic failure detectionÂ
- Â
- Group membership fully replicated via gossip
- Node reports its own state
- Future Replicate more selectively
- Leader runs optimization algorithm on data and
reports the mapping - Â
- Â
- Â
- Â
20MCMD Gossip Layer
- But gossip is slow...
- Â
- Implications
- Slow propagation of group membership
- Slow propagation of new maps
- We assume a low rate of membership churn
- Â
- Remedy Broadcast module
- Leader broadcasts urgent messagesÂ
- Bounded bandwidth of urgent channel
- Trade-off between latency and scalability
- Â
- Â
- Â
- Â
21Overview
- Library module
- Mapping module
- Gossip layer
- Â
- Optimization questions
- Â
- Results
22Optimization Questions
Collections
BLACK
Procs  L-IPMC
Procs   L-IPMC
- First step compress logical IPMC groups
23Optimization Questions
- How compressible are subscriptions?
- Multi-objective optimizationÂ
- Minimize number of collections
- Minimize bandwidth overhead on network
- Â
- Thm The general problem is NP-complete
- Thm In uniform random allocation, "little"
compression opportunity. - Social preferences
- Lots of duplicates due to replication (e.g. for
load balancing) - Â
- Â
24Optimization Questions
- Which collections get an IPMC address?
- Thm Ordered by decreasing trafficsize, assign
P-IPMC addresses greedily, we minimize bandwidth. - Tiling heuristic
- Sort L-IPMC by trafficsize
- Greedily collapse identical groups
- Assign IPMC to collections in reverse order of
trafficsize, UDP-multisend to the rest - Building tilings incrementally
- Â
25Experimental Results
26Overhead (max. throughput)
klkl   Â
- Insignificant overhead when mapping L-IPMC to
P-IPMC. - Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
27Overhead (CPU utilization)
klkl   Â
- Insignificant overhead when mapping L-IPMC to
P-IPMC. - Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
28Network Overhead
- Gossip Layer uses constant background bandwidth,
urgent channel behaves well - Â
- Â
- Â
- Â
- Â
- Â
- Â
29Latency
- Latency of propagation of joins/leaves and new
maps - Â
- Â
- Â
- Â
30Policy control
klkl   Â
- A malfunctioning node bombards an existing IPMC
group. - MCMD policy prevents ill-effects
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
ltTraffic starts
ltNew policy
31Conclusion
- IPMC has been a bad citizen...
- Â
32Conclusion
- IPMC has been a bad citizen...
- Â
- Dr. Multicast has the cure!
- Opportunity for big performance enhancements and
policy control.
33Thank you!
34Thank you! Â
35Overhead
klkl   Â
- Insignificant overhead when mapping L-IPMC to
P-IPMC. - Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
36Policy control
klkl   Â
- A malfunctioning node bombards an existing IPMC
group. - MCMD policy prevents ill-effects
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
37Policy control
klkl   Â
- A malfunctioning node bombards an existing IPMC
group. - MCMD policy prevents ill-effects
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
38Overhead
klkl   Â
- Linux kernel module increases UDP-multisend
throughput by 17 (compared to user-space
UDP-multisend) - Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
- Â
39Latency of events
- Gossip 99 of nodes aware of change within 9
epochs (now 1 sec) - Â
- Â
- Â
- Â
40Conclusions
- Policy Allows data center operators to      Â
enable and control IPMC - Â
- Transparency Standard IPMC interface, system
calls are overloaded. - Â
- Performance Uses IPMC when possible, otherwise
point-to-point UDP - Â
- Robustness Distributed, fault-tolerant service
- Â
41Results
- Library Module
- Insignificant slowdown
- Â
- Â
- Â
- Â
- Linux Kernel module provides 17 speed-up for UDP
multi-send
42Optimization questions
Users Topics
This box intentionally left  BLACK
Users  Groups Topics
- Multi-objectiveÂ
- Minimize number of groups
- Minimize bandwidth overhead on network
- Thm This problem is NP-complete
- Reduction to Minimum Normal Set Basis
- Â
- Â
- Â
43MCMD Library Layer
- Overloads the IPMC functions
- setsockopt(), send(), etc.
- Translates logical IPMC addresses to physical
IPMC, or point-to-point UDP packets depending on
policy - Notifies MCMD immediately about joins/leaves
- Learns about new mappings from MCMD
- Keeps statistics about group traffic rates
44MCMD Library Layer
- Overloads the IPMC functions
- setsockopt(), send(), etc.
- Translates logical IPMC addresses to physical
IPMC, or point-to-point UDP packets depending on
policy - Â
- Caches translation maps
- Maintains a connection to MCMD for updates
45(No Transcript)
46Overview
- Library module
- Mapping module
- Gossip layer
- Â
- Optimization questions
- Â
- Results