Towards a Scalable, Adaptive and Network-aware Content Distribution Network

About This Presentation
Title:

Towards a Scalable, Adaptive and Network-aware Content Distribution Network

Description:

... and servers capacity constraints Self-organize replica into a scalable application-level multicast for disseminating ... 3084 paths w/ 5% improvment: ... –

Number of Views:170
Avg rating:3.0/5.0
Slides: 67
Provided by: YanC6
Category:

less

Transcript and Presenter's Notes

Title: Towards a Scalable, Adaptive and Network-aware Content Distribution Network


1
Towards a Scalable, Adaptive and Network-aware
Content Distribution Network
Yan Chen EECS Department UC Berkeley
2
Outline
  • Motivation and Challenges
  • Our Contributions SCAN system
  • Case Study Tomography-based overlay network
    monitoring system
  • Conclusions

3
Motivation
  • The Internet has evolved to become a commercial
    infrastructure for service delivery
  • Web delivery, VoIP, streaming media
  • Challenges for Internet-scale services
  • Scalability 600M users, 35M Web sites, 2.1Tb/s
  • Efficiency bandwidth, storage, management
  • Agility dynamic clients/network/servers
  • Security, etc.
  • Focus on content delivery - Content Distribution
    Network (CDN)
  • Totally 4 Billion Web pages, daily growth of 7M
    pages
  • Annual traffic growth of 200 for next 4 years

4
How CDN Works
5
Challenges for CDN
  • Replica Location
  • Find nearby replicas with good DoS attack
    resilience
  • Replica Deployment
  • Dynamics, efficiency
  • Client QoS and server capacity constraints
  • Replica Management
  • Replica index state maintenance scalability
  • Adaptation to Network Congestion/Failures
  • Overlay monitoring scalability and accuracy

6
SCAN Scalable Content Access Network
Provision Dynamic Replication Update
Multicast Tree Building
Replica Management (Incremental) Content
Clustering
Network DoS Resilient Replica Location Tapestry
Network End-to-End Distance Monitoring Internet
Iso-bar latency TOM loss rate
7
Replica Location
  • Existing Work and Problems
  • Centralized, Replicated and Distributed Directory
    Services
  • No security benchmarking, which one has the best
    DoS attack resilience?
  • Solution
  • Proposed the first simulation-based network DoS
    resilience benchmark
  • Applied it to compare three directory services
  • DHT-based Distributed Directory Services has best
    resilience in practice
  • Publication
  • 3rd Int. Conf. on Info. and Comm. Security
    (ICICS), 2001

8
Replica Placement/Maintenance
  • Existing Work and Problems
  • Static placement
  • Dynamic but inefficient placement
  • No coherence support
  • Solution
  • Dynamically place close to optimal of replicas
    with clients QoS (latency) and servers capacity
    constraints
  • Self-organize replica into a scalable
    application-level multicast for disseminating
    updates
  • With overlay network topology only
  • Publication
  • IPTPS 2002, Pervasive Computing 2002

9
Replica Management
  • Existing Work and Problems
  • Cooperative access for good efficiency requires
    maintaining replica indices
  • Per Website replication, scalable, but poor
    performance
  • Per URL replication, good performance, but
    unscalable
  • Solution
  • Clustering-based replication reduces the overhead
    significantly without sacrificing much
    performance
  • Proposed a unique online Web object popularity
    prediction scheme based on hyperlink structures
  • Online incremental clustering and replication to
    push replicas before accessed
  • Publication
  • ICNP 2002, IEEE J-SAC 2003

10
Adaptation to Network Congestion/Failures
  • Existing Work and Problems
  • Latency estimation
  • Clustering-based network proximity based,
    inaccurate
  • Coordinate-based symmetric distance, unscalable
    to update
  • General metrics n2 measurement for n end hosts
  • Solution
  • Latency Internet Iso-bar - clustering based on
    latency similarity to a small number of landmarks
  • Loss rate Tomography-based Overlay Monitoring
    (TOM) - selectively monitor a basis set of O(n
    logn) paths to infer the loss rates of other
    paths
  • Publication
  • Internet Iso-bar SIGMETRICS PER 2002
  • TOM SIGCOMM IMC 2003

11
SCAN Architecture
  • Leverage Distributed Hash Table - Tapestry for
  • Distributed, scalable location with guaranteed
    success
  • Search with locality

data plane
data source
Dynamic Replication/Update and Replica Management
Replica Location
Web server
SCAN server
Overlay Network Monitoring
network plane
12
Methodology
Analytical evaluation
PlanetLab tests
  • Network topology
  • Web workload
  • Network end-to-end latency measurement

13
Case StudyTomography-based Overlay Network
Monitoring
14
TOM Outline
  • Goal and Problem Formulation
  • Algebraic Modeling and Basic Algorithms
  • Scalability Analysis
  • Practical Issues
  • Evaluation
  • Application Adaptive Overlay Streaming Media
  • Conclusions

15
Existing Work
Goal a scalable, adaptive and accurate overlay
monitoring system to detect e2e
congestion/failures
  • General Metrics RON (n2 measurement)
  • Latency Estimation
  • Clustering-based IDMaps, Internet Isobar, etc.
  • Coordinate-based GNP, ICS, Virtual Landmarks
  • Network tomography
  • Focusing on inferring the characteristics of
    physical links rather than E2E paths
  • Limited measurements -gt under-constrained system,
    unidentifiable links

16
Problem Formulation
  • Given an overlay of n end hosts and O(n2) paths,
    how to select a minimal subset of paths to
    monitor so that the loss rates/latency of all
    other paths can be inferred.
  • Assumptions
  • Topology measurable
  • Can only measure the E2E path, not the link

17
Our Approach
  • Select a basis set of k paths that fully describe
    O(n2) paths (k O(n2))
  • Monitor the loss rates of k paths, and infer the
    loss rates of all other paths
  • Applicable for any additive metrics, like latency

18
Algebraic Model
A
1
3
D
C
2
B
  • Path loss rate p, link loss rate l

19
Putting All Paths Together
A
1
3
D
C
2
B
Totally r O(n2) paths, s links, s r


20
Sample Path Matrix
  • x1 - x2 unknown gt cannot compute x1, x2
  • Set of vectors
  • form null space
  • To separate identifiable vs. unidentifiable
    components x xG xN

21
Intuition through Topology Virtualization
  • Virtual links
  • Minimal path segments whose loss rates uniquely
    identified
  • Can fully describe all paths
  • xG is composed of virtual links

All E2E paths are in path space, i.e., GxN 0
22
More Examples
Virtualization
Real links (solid) and all of the overlay paths
(dotted) traversing them
Virtual links
23
Basic Algorithms
  • Select k rank(G) linearly independent paths to
    monitor
  • Use QR decomposition
  • Leverage sparse matrix time O(rk2) and memory
    O(k2)
  • E.g., 79 sec for n 300 (r 44850) and k 2541
  • Compute the loss rates of other paths
  • Time O(k2) and memory O(k2)
  • E.g., 1.89 sec for the example above




24
Scalability Analysis
  • k O(n2) ?
  • For a power-law Internet topology
  • When the majority of end hosts are on the overlay
  • When a small portion of end hosts are on overlay
  • If Internet a pure hierarchical structure (tree)
    k O(n)
  • If Internet no hierarchy at all (worst case,
    clique) k O(n2)
  • Internet has moderate hierarchical structure
    TGJ02

k O(n) (with proof)
For reasonably large n, (e.g., 100), k
O(nlogn) (extensive linear regression tests on
both synthetic and real topologies)
25
TOM Outline
  • Goal and Problem Formulation
  • Algebraic Modeling and Basic Algorithms
  • Scalability Analysis
  • Practical Issues
  • Evaluation
  • Application Adaptive Overlay Streaming Media
  • Summary

26
Practical Issues
  • Topology measurement errors tolerance
  • Router aliases
  • Incomplete routing info
  • Measurement load balancing
  • Randomly order the paths for scan and selection
    of
  • Adaptive to topology changes
  • Designed efficient algorithms for incrementally
    update
  • Add/remove a path O(k2) time (O(n2k2) for
    reinitialize)
  • Add/remove end hosts and Routing changes

27
Evaluation Metrics
  • Path loss rate estimation accuracy
  • Absolute error p p
  • Error factor BDPT02
  • Lossy path inference coverage and false positive
    ratio
  • Measurement load balancing
  • Coefficient of variation (CV)
  • Maximum vs. mean ratio (MMR)
  • Speed of setup, update and adaptation

28
Evaluation
  • Extensive Simulations
  • Experiments on PlanetLab
  • 51 hosts, each from different organizations
  • 51 50 2,550 paths
  • On average k 872
  • Results on Accuracy
  • Avg real loss rate 0.023
  • Absolute error mean 0.0027 90 lt 0.014
  • Error factor mean 1.1 90 lt 2.0
  • On average 248 out of 2550 paths have no or
    incomplete routing information
  • No router aliases resolved

Areas and Domains Areas and Domains Areas and Domains of hosts
US (40) .edu .edu 33
US (40) .org .org 3
US (40) .net .net 2
US (40) .gov .gov 1
US (40) .us .us 1
Interna-tional (11) Europe (6) France 1
Interna-tional (11) Europe (6) Sweden 1
Interna-tional (11) Europe (6) Denmark 1
Interna-tional (11) Europe (6) Germany 1
Interna-tional (11) Europe (6) UK 2
Interna-tional (11) Asia (2) Taiwan 1
Interna-tional (11) Asia (2) Hong Kong 1
Interna-tional (11) Canada Canada 2
Interna-tional (11) Australia Australia 1
29
Evaluation (contd)
  • Results on Speed
  • Path selection (setup) 0.75 sec
  • Path loss rate calculation 0.16 sec for all 2550
    paths
  • Results on Load Balancing
  • Significantly reduce CV and MMR, up to a factor
    of 7.3

30
TOM Outline
  • Goal and Problem Formulation
  • Algebraic Modeling and Basic Algorithms
  • Scalability Analysis
  • Practical Issues
  • Evaluation
  • Application Adaptive Overlay Streaming Media
  • Conclusions

31
Motivation
  • Traditional streaming media systems treat the
    network as a black box
  • Adaptation only performed at the transmission end
    points
  • Overlay relay can effectively bypass
    congestion/failures
  • Built an adaptive streaming media system that
    leverages
  • TOM for real-time path info
  • An overlay network for adaptive packet buffering
    and relay

32
Adaptive Overlay Streaming Media
Stanford
UC San Diego
UC Berkeley
X
HP Labs
  • Implemented with Winamp client and SHOUTcast
    server
  • Congestion introduced with a Packet Shaper
  • Skip-free playback server buffering and
    rewinding
  • Total adaptation time lt 4 seconds

33
Adaptive Streaming Media Architecture
34
Summary
  • A tomography-based overlay network monitoring
    system
  • Selectively monitor a basis set of O(n logn)
    paths to infer the loss rates of O(n2) paths
  • Works in real-time, adaptive to topology changes,
    has good load balancing and tolerates topology
    errors
  • Both simulation and real Internet experiments
    promising
  • Built adaptive overlay streaming media system on
    top of TOM
  • Bypass congestion/failures for smooth playback
    within seconds

35
Tie Back to SCAN
Provision Dynamic Replication Update
Multicast Tree Building
Replica Management (Incremental) Content
Clustering
Network DoS Resilient Replica Location Tapestry
Network End-to-End Distance Monitoring Internet
Iso-bar latency TOM loss rate
36
Contribution of My Thesis
  • Replica location
  • Proposed the first simulation-based network DoS
    resilience benchmark and quantify three types of
    directory services
  • Dynamically place close to optimal of replicas
  • Self-organize replicas into a scalable app-level
    multicast tree for disseminating updates
  • Cluster objects to significantly reduce the
    management overhead with little performance
    sacrifice
  • Online incremental clustering and replication to
    adapt to users access pattern changes
  • Scalable overlay network monitoring

37
Thank you !
38
Backup Materials
39
Existing CDNs Fail to Address these Challenges
No coherence for dynamic content
X
Unscalable network monitoring - O(M N) M of
client groups, N of server farms
Non-cooperative replication inefficient
40
Network Topology and Web Workload
  • Network Topology
  • Pure-random, Waxman transit-stub synthetic
    topology
  • An AS-level topology from 7 widely-dispersed BGP
    peers
  • Web Workload

Web Site Period Duration Requests avg min-max Clients avg min-max Client groups avg min-max
MSNBC Aug-Oct/1999 1011am 1.5M642K1.7M 129K69K150K 15.6K-10K-17K
NASA Jul-Aug/1995 All day 79K-61K-101K 5940-4781-7671 2378-1784-3011
  • Aggregate MSNBC Web clients with BGP prefix
  • BGP tables from a BBNPlanet router
  • Aggregate NASA Web clients with domain names
  • Map the client groups onto the topology

41
Network E2E Latency Measurement
  • NLANR Active Measurement Project data set
  • 111 sites on America, Asia, Australia and Europe
  • Round-trip time (RTT) between every pair of hosts
    every minute
  • 17M daily measurement
  • Raw data Jun. Dec. 2001, Nov. 2002
  • Keynote measurement data
  • Measure TCP performance from about 100 worldwide
    agents
  • Heterogeneous core network various ISPs
  • Heterogeneous access network
  • Dial up 56K, DSL and high-bandwidth business
    connections
  • Targets
  • 40 most popular Web servers 27 Internet Data
    Centers
  • Raw data Nov. Dec. 2001, Mar. May 2002

42
Internet Content Delivery Systems
Properties Web caching (client initiated) Web caching (server initiated) ConventionalCDNs (Akamai) SCAN
Replica access Non-cooperative Cooperative (bloomfilter) Non-cooperative Cooperative
Load balancing No No Yes Yes
Pull/push Pull Push Pull Push
Transparent to clients No No Yes Yes
Coherence support No No No Yes
Network- awareness No No Yes, unscalable monitoring system Yes, scalable monitoring system
43
Absolute and Relative Errors
  • For each experiment, get its 95 percentile
    absolute and relative errors for estimation of
    2,550 paths

44
Lossy Path Inference Accuracy
  • 90 out of 100 runs have coverage over 85 and
    false positive less than 10
  • Many caused by the 5 threshold boundary effects

45
PlanetLab Experiment Results
  • Loss rate distribution
  • Metrics
  • Absolute error p p
  • Average 0.0027 for all paths, 0.0058 for lossy
    paths
  • Relative error BDPT02
  • Lossy path inference coverage and false positive
    ratio
  • On average k 872 out of 2550

loss rate 0, 0.05) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1) lossy path 0.05, 1.0 (4.1)
loss rate 0, 0.05) 0.05, 0.1) 0.1, 0.3) 0.3, 0.5) 0.5, 1.0) 1.0
95.9 15.2 31.0 23.9 4.3 25.6
46
Experiments on Planet Lab
Areas and Domains Areas and Domains Areas and Domains of hosts
US (40) .edu .edu 33
US (40) .org .org 3
US (40) .net .net 2
US (40) .gov .gov 1
US (40) .us .us 1
Interna-tional (11) Europe (6) France 1
Interna-tional (11) Europe (6) Sweden 1
Interna-tional (11) Europe (6) Denmark 1
Interna-tional (11) Europe (6) Germany 1
Interna-tional (11) Europe (6) UK 2
Interna-tional (11) Asia (2) Taiwan 1
Interna-tional (11) Asia (2) Hong Kong 1
Interna-tional (11) Canada Canada 2
Interna-tional (11) Australia Australia 1
  • 51 hosts, each from different organizations
  • 51 50 2,550 paths
  • Simultaneous loss rate measurement
  • 300 trials, 300 msec each
  • In each trial, send a 40-byte UDP pkt to every
    other host
  • Simultaneous topology measurement
  • Traceroute
  • Experiments 6/24 6/27
  • 100 experiments in peak hours

47
Motivation
  • With single node relay
  • Loss rate improvement
  • Among 10,980 lossy paths
  • 5,705 paths (52.0) have loss rate reduced by
    0.05 or more
  • 3,084 paths (28.1) change from lossy to
    non-lossy
  • Throughput improvement
  • Estimated with
  • 60,320 paths (24) with non-zero loss rate,
    throughput computable
  • Among them, 32,939 (54.6) paths have throughput
    improved, 13,734 (22.8) paths have throughput
    doubled or more
  • Implications use overlay path to bypass
    congestion or failures

48
SCAN
Coherence for dynamic content
X


s1, s4, s5
Cooperative clustering-based replication
Scalable network monitoring O(MN)
49
Problem Formulation
  • Subject to certain total replication cost (e.g.,
    of URL replicas)
  • Find a scalable, adaptive replication strategy to
    reduce avg access cost

50
SCAN Scalable Content Access Network
CDN Applications (e.g. streaming media)
Provision Cooperative Clustering-based
Replication
Coherence Update Multicast Tree Construction
Network Distance/ Congestion/ Failure Estimation
User Behavior/ Workload Monitoring
Network Performance Monitoring
red my work, black out of scope
51
Evaluation of Internet-scale System
  • Analytical evaluation
  • Realistic simulation
  • Network topology
  • Web workload
  • Network end-to-end latency measurement
  • Network topology
  • Pure-random, Waxman transit-stub synthetic
    topology
  • A real AS-level topology from 7 widely-dispersed
    BGP peers

52
Web Workload
Web Site Period Duration Requests avg min-max Clients avg min-max Client groups avg min-max
MSNBC Aug-Oct/1999 1011am 1.5M642K1.7M 129K69K150K 15.6K-10K-17K
NASA Jul-Aug/1995 All day 79K-61K-101K 5940-4781-7671 2378-1784-3011
World Cup May-Jul/1998 All day 29M 1M 73M 103K13K218K N/A
  • Aggregate MSNBC Web clients with BGP prefix
  • BGP tables from a BBNPlanet router
  • Aggregate NASA Web clients with domain names
  • Map the client groups onto the topology

53
Simulation Methodology
  • Network Topology
  • Pure-random, Waxman transit-stub synthetic
    topology
  • An AS-level topology from 7 widely-dispersed BGP
    peers
  • Web Workload

Web Site Period Duration Requests avg min-max Clients avg min-max Client groups avg min-max
MSNBC Aug-Oct/1999 1011am 1.5M642K1.7M 129K69K150K 15.6K-10K-17K
NASA Jul-Aug/1995 All day 79K-61K-101K 5940-4781-7671 2378-1784-3011
  • Aggregate MSNBC Web clients with BGP prefix
  • BGP tables from a BBNPlanet router
  • Aggregate NASA Web clients with domain names
  • Map the client groups onto the topology

54
Online Incremental Clustering
  • Predict access patterns based on semantics
  • Simplify to popularity prediction
  • Groups of URLs with similar popularity? Use
    hyperlink structures!
  • Groups of siblings
  • Groups of the same hyperlink depth smallest of
    links from root

55
Challenges for CDN
  • Over-provisioning for replication
  • Provide good QoS to clients (e.g., latency bound,
    coherence)
  • Small of replicas with small delay and
    bandwidth consumption for update
  • Replica Management
  • Scalability billions of replicas if replicating
    in URL
  • O(104) URLs/server, O(105) CDN edge servers in
    O(103) networks
  • Adaptation to dynamics of content providers and
    customers
  • Monitoring
  • User workload monitoring
  • End-to-end network distance/congestion/failures
    monitoring
  • Measurement scalability
  • Inference accuracy and stability

56
SCAN Architecture
  • Leverage Decentralized Object Location and
    Routing (DOLR) - Tapestry for
  • Distributed, scalable location with guaranteed
    success
  • Search with locality
  • Soft state maintenance of dissemination tree (for
    each object)

data plane
data source
Dynamic Replication/Update and Content Management
Web server
Request Location
SCAN server
network plane
57
Wide-area Network Measurement and Monitoring
System (WNMMS)
  • Select a subset of SCAN servers to be monitors
  • E2E estimation for
  • Distance
  • Congestion
  • Failures

Cluster C
Cluster B
Cluster A
network plane
Monitors
SCAN edge servers
Clients
58
Dynamic Provisioning
  • Dynamic replica placement
  • Meeting clients latency and servers capacity
    constraints
  • Close-to-minimal of replicas
  • Self-organized replicas into app-level multicast
    tree
  • Small delay and bandwidth consumption for update
    multicast
  • Each node only maintains states for its parent
    direct children
  • Evaluated based on simulation of
  • Synthetic traces with various sensitivity
    analysis
  • Real traces from NASA and MSNBC
  • Publication
  • IPTPS 2002
  • Pervasive Computing 2002

59
Effects of the Non-Uniform Size of URLs
1
2
4
3
  • Replication cost constraint bytes
  • Similar trends exist
  • Per URL replication outperforms per Website
    dramatically
  • Spatial clustering with Euclidean distance and
    popularity-based clustering are very
    cost-effective

60
SCAN Scalable Content Access Network
61
Web Proxy Caching
ISP 1
Client
ISP 2
62
Conventional CDN Non-cooperative Pull
Client 1
Web content server
ISP 1
Inefficient replication
ISP 2
63
SCAN Cooperative Push
Client 1
CDN name server
ISP 1
Significantly reduce the of replicas and update
cost
ISP 2
64
Internet Content Delivery Systems
Properties Web caching (client initiated) Web caching (server initiated) Pull-based CDNs (Akamai) Push-based CDNs SCAN
Efficiency ( of caches or replicas) No cache sharing among proxies Cache sharing No replica sharing among edge servers Replica sharing Replica sharing
Scalability for request redirection Pre-configured in browser Use Bloom filter to exchange replica locations Centralized CDN name server Centralized CDN name server Decentra-lized P2P location
Coherence support No No Yes No Yes
Network- awareness No No Yes, unscalable monitoring system No Yes, scalable monitoring system
65
Previous Work Update Dissemination
  • No inter-domain IP multicast
  • Application-level multicast (ALM) unscalable
  • Root maintains states for all children (Narada,
    Overcast, ALMI, RMX)
  • Root handles all join requests (Bayeux)
  • Root split is common solution, but suffers
    consistency overhead

66
Comparison of Content Delivery Systems (contd)
Properties Web caching (client initiated) Web caching (server initiated) Pull-based CDNs (Akamai) Push-based CDNs SCAN
Distributed load balancing No Yes Yes No Yes
Dynamic replica placement Yes Yes Yes No Yes
Network- awareness No No Yes, unscalable monitoring system No Yes, scalable monitoring system
No global network topology assumption Yes Yes Yes No Yes
Write a Comment
User Comments (0)
About PowerShow.com