Title: Slingshot: Time-Critical Multicast for Clustered Applications
1Slingshot Time-Critical Multicast for Clustered
Applications
- Mahesh Balakrishnan
- Stefan Pleisch
- Ken Birman
- Cornell University
2The Contemporary Datacenter
- Building-wide super-clusters 1000s of commodity
blade-servers - Typically used as commercial website back-ends
Amazon, etc. - Software Paradigms SOA, Eventing,
Publish/Subscribe - many-to-many communication, Multicast!
3Multicast in the Datacenter
- IP Multicast available adding reliability to it
is a well-researched technology - Scalability dimensions
- Number of receivers
- Number of senders?
- Number of groups?
- Metrics
- Throughput
- Timeliness?
4Time-Critical Applications
- dealing in perishable data stock quotes,
location updates - willing to trade complete reliability for
timeliness - requiring tunable reliability/ timeliness/
overhead tradeoffs - Probabilistic Guarantee of Timeliness?
- For x overhead, y of lost packets are recovered
in time t. - Remainder can be optionally recovered in time t.
5Design Space
- Reactive vs. Proactive
- Reactive Loss Discovery
- ACK
- Sender-Based Sequencing
- If the multicast rate in a group is constant, the
inter-multicast time at any sender goes up
linearly with the number of senders - Gossip Scalable
- Proactive FEC Tunable
6Slingshot Overview
Receiver-Based FEC Senders send initially via
unreliable IP Multicast Phase 1 Receivers
repair losses by proactively sending each other
FEC repair packets Phase 2 Remaining losses are
recovered from the sender
Each receiver sends an error correction (XOR)
packet to c randomly selected receivers with the
last r packets it received Rate-of-fire parameter
(r, c) Allows tuning of overhead-timeliness
tradeoff
7Protocol Details 0
Repair Packet
List of Data Packet IDs (sender1,seqno1),
(sender2,seqno2).
Data Packet
Packet ID (Sender, SeqNo)
Less than Network MTU
XOR of Data Packets
Application MTU 1024
Application Payload
Terminology Data packets are included in repair
packet
8Protocol Details 1
- Data Structures
- Data Buffer received data packets
- Repair Bin pointers to last ltr data packets
- Arrival of Data Packet dp at Receiver
- dp is added to the data buffer
- dp is added to the repair bin
- If repair bin size equals r, a repair packet rp
is created from its contents, and the repair bin
is cleared - rp is dispatched to c random receivers
9Protocol Details 2
- Arrival of Repair Packet rp at Receiver If
(missing included data packets) - 0 rp is discarded
- 1 it is recovered by XORing rp with the other
r-1 data packets - gt1 rp is stored in a special buffer, in case
future data packet arrivals and recoveries make
it usable
10Evaluation Setup
- 64 node rack-style cluster at Cornell
- Loss rate fixed at 1 packets dropped at end
buffers - All nodes send and receive
- Inter-node latencies 50-100 microseconds
- Group Data Rate 1000 packets per second
- Each node multicasts 64 packets per second i.e
one packet every 64 milliseconds
11Slingshot Tunability
For 27 overhead, 93.5 Lost Packets are
recovered at an avg. of 3.5 milliseconds
Example Tradeoff Points between Overhead,
Timeliness, and Reliability
Overhead and Recovered Packets plotted on left
y-axis, Recovery Time on right
12Slingshot vs SRM
Slingshot recovers 93 in 10 ms, 97 in 25 ms
Fastest SRM packet Recovery is 2.2 seconds 93 in
4.85 seconds, 97 in 5.1 seconds
2-3 Orders of Magnitude faster
13Slingshot Scalability Group Size
Simulation Results
Gossip-Style Scalability Insensitive to scale
beyond a certain size
14Conclusion
- Slingshot provides a tunable, probabilistic
guarantee of timeliness - Outperforms SRM by 2 orders of magnitude in a 64
node system - Insensitive to number of senders
- Future Work
- Achieve scalability in other dimensions (number
of groups) - Build a time-critical middleware layer that uses
Slingshot as a generic primitive