Title: Informed Content Delivery Across Adaptive Overlay Networks
1Informed Content Delivery Across Adaptive
Overlay Networks
John Byers Dept. of Computer Science, Boston
University www.cs.bu.edu/byers Joint work with
Jeffrey Considine, Michael Mitzenmacher and
Stanislav Rost
2Overlays for Content Delivery
- Build distribution topology out of unicast
connections (tunnels). - Requires active participation of end-systems.
- Native IP multicast unnecessary.
- Saves considerable bandwidth over N unicast
solution. - Basic paradigm easy to build and deploy.
3Use of Overlays
- Killer apps
- Millions of users want to download a new movie
watch the SIGCOMM technical sessions. - CDNs want to populate thousands of servers with
new movies for those users. - Research directions to date
- Considerable effort on optimizing overlay layout
(Narada, Overcast, RON, etc.). - Scalable solutions for indexing/locating content
using overlays (CAN, Chord, etc.). - Our focus
- Maximize throughput of large transfers across
overlays.
4Limitations of Existing Schemes
- Tree-like topologies
- Rooted in history (IP Multicast)
- Limitations
- bandwidth decreases monotonically from the source
- losses increase monotonically along a path
- Does this matter in practice?
- Anecdotal and experimental evidence says yes
- Downloads from multiple mirror sites in
parallelBLM 99, RKB 00 - Availability of better routes SCHSA 99, ABKM
01. - Peer-to-peer Morpheus, Kazaa and Grokster.
5An Illustrative Example
1
1. A basic tree topology.
6Our Philosophy
- Go beyond trees.
- Use additional links and bandwidth by
- downloading from multiple peers in parallel
- taking advantage of perpendicular bandwidth
- Has potential to significantly speed up
downloads - But only effective if
- collaboration is carefully orchestrated
- methods are amenable to frequent adaptation of
the overlay topology
7Suitable Applications
- Prerequisite conditions
- Available bandwidth between peers.
- Differences in content received by peers.
- Rich overlay topology.
- Applications
- Downloads of large, popular files.
- Video-on-demand or nearly real-time streams.
- Shared virtual environments.
8Erasure Codes
- We typically think of data as an ordered stream.
I need packets 1-1,000. - Using erasure codes, data is like water
- Can generate a pool of redundant data from full
original content. - You dont care what droplets you get.
- You dont care if some spills.
- You just want enough to get through the pipe.
I need any 1,000 packets. - The digital fountain model BLMR 98 is ideal
for use in a fluid overlay environment.
9Erasure Codes Offer Freedom
- Intrinsic resilience to packet loss, reordering.
- Better support for transient connections via
stateless migration, suspension. - Peers with full content can always generate
useful symbols. - Peers with partial content are more likely to
have content to share. - But using erasure codes comes at a price
- Content is no longer an ordered stream.
- Therefore, collaboration is more difficult.
10Informed Content DeliveryDefinitions and
Problem Statement
- Peers A and B have working sets of symbols SA,
SB drawn from a large universe U and want to
collaborate effectively. - Key components
- Summarize Furnish a concise and useful sample
of a working set to a peer. - Approximately Reconcile Compute as many
elements in SA - SB as possible and transmit
them. - Do so with minimal control messaging overhead.
11Min-Wise Summaries
- Problem Neighboring peers may have similar
content. - Solution Give peers a calling card (fits in 1
packet) to summarize the content they have, check
similarity.
12Recoding
- Problem What to transmit when peers have
similar content? - Solution Allow peers to probabilistically
hedge their bets, minimizing chance of
transmission of useless content. - Example
- Suppose the resemblance between SA and SB is
0.9.If A sends a symbol at random the
probability of it being useful to B is 0.1. - A better strategy is to XOR 10 random symbols
together. - B can extract one useful symbol with
probability10 x (1/10) x (9/10)9 1/e ?
0.37
13 Approximate Reconciliation Trees
- Problem Collaborating peers have overlapping
content. - Solution Efficient data structures for
reconciliation.
14Experimental Scenarios
- Three methods for collaboration
- Uninformed A transmits symbols at random to B.
- Speculative B transmits a minwise summary to
A A then sends recoded symbols to B. - Reconciled B transmits a digest of its set to
A A then sends packets from the set
difference. - Overhead
- Decoding overhead with erasure codes, fixed
2.5. - Reception overhead useless duplicate packets.
- Recoding overhead useless recoding packets.
15Pairwise Reconciliation
128MB file 96K input symbols 115K distinct
symbols in system initially
16Four peers in parallel
128MB file 96K input symbols 105K distinct
symbols in system initially
17Four peers, periodic updates
128MB file 96K input symbols 105K distinct
symbols in system initiallyDigests updated at
every 10.
18Conclusions
- Even with ultimate routing topology optimization,
the choice of what to send is paramount to
content delivery. - Digital fountain model ideal for fluid and
ephemeral network environments. - Richly connected topologies are key to harnessing
perpendicular bandwidth. - Wanted more algorithms for intelligent
collaboration.