Title: SCAN: a Scalable, Adaptive, Secure and Networkaware Content Distribution Network
1SCAN a Scalable, Adaptive, Secure and
Network-aware Content Distribution Network
Yan Chen CS Department Northwestern University
2Motivation
- The Internet has evolved to become a commercial
infrastructure for service delivery - Web delivery, VoIP, streaming media
- Challenges for Internet-scale services
- Scalability 600M users, 35M Web sites, 2.1Tb/s
- Efficiency bandwidth, storage, management
- Agility dynamic clients/network/servers
- Security proliferate attacks/viruses/worms
- E.g., content delivery - Content Distribution
Network (CDN) - Web delivery
- Grid computing
3How CDN Works
4Challenges for CDN
- Content Location
- Find nearby replicas with good DoS attack
resilience - Dynamic, scalable semantic search
- Replica Deployment
- Dynamics, efficiency
- Client QoS (latency, coherence) and server
capacity constraints - Replica Management
- Replica index state maintenance scalability
- Adaptation to Network Congestion/Failures
- Overlay monitoring scalability and accuracy
- Security
- Proactive anomaly/intrusion detection on
high-speed network
5SCAN Scalable Content Access Network
Provision Dynamic Replication Update
Multicast Tree Building
Replica Management (Incremental) Content
Clustering
DHT-based Replica Location Network DoS Attack
Resilient Semantic Search Support
Proactive Anomaly/Intrusion Detection on
High-speed Network
Network End-to-End Distance Monitoring (latency
loss rate)
6Replica Location (security)
- Existing Work and Problems
- Centralized, Replicated and Distributed Directory
Services - No security benchmarking, which one has the best
DoS attack resilience? - Solution
- Proposed the first simulation-based network DoS
resilience benchmark - Applied it to compare three directory services
- DHT-based Distributed Directory Services has best
resilience in practice - Publication
- 3rd Int. Conf. on Info. and Comm. Security
(ICICS), 2001
7Replica Location (semantic search)
- Existing Work and Problems
- Mostly keyword/title based search
- Emerging semantic search systems, but static,
unscalable - Solution
- Apply DHT to distribute the indices
- Use concept indexing to incrementally grow the
semantic space gt incrementally add new concepts
documents - Group the indices based on semantic locality gt
semantic routing, better query accuracy and
efficiency
8Replica Placement Coherence Support
- Existing Work and Problems
- Static placement
- Dynamic but inefficient placement
- No coherence support
- Solution
- Dynamically place close to optimal of replicas
with clients QoS (latency) and servers capacity
constraints - Self-organize replica into a scalable
application-level multicast for disseminating
updates - With overlay network topology only
- Publication
- IPTPS 2002, Pervasive Computing 2002
9Replica Management
- Existing Work and Problems
- Cooperative access for good efficiency requires
maintaining replica indices - Per Website replication, scalable, but poor
performance - Per URL replication, good performance, but
unscalable - Solution
- Clustering-based replication reduces the overhead
significantly without sacrificing much
performance - Proposed a unique online Web object popularity
prediction scheme based on hyperlink structures - Online incremental clustering and replication to
push replicas before accessed - Publication
- ICNP 2002, IEEE J-SAC 2003
10Adaptation to Network Congestion/Failures
- Existing Work and Problems
- Latency estimation systems scalable, but cannot
monitor congestion/failures which require n2
measurement for n end hosts - Solution
- Tomography-based Overlay Monitoring (TOM) -
selectively monitor a basis set of O(n logn)
paths to infer the loss rates of other paths - Works in real-time, adapts to topology changes,
has good load balancing and tolerates topology
errors - Built an adaptive overlay streaming media system
on top of TOM - Root-cause diagnosis in progress
- Publication
- Modeling SIGCOMM IMC 2003 (extended abstract)
- Full version under submission
11Proactive Anomaly/Intrusion Detection on
High-speed Network
- Existing Work and Problems
- A/I detection requires flow-level traffic
monitoring, unscalable for high-speed network - Most IDS are signature-based, only for known
attacks - Solution
- Leverage K-ary sketch, a compact probabilistic
summary of flow-level traffic, constant
update/query cost, linearity - Use statistical methods, like Hidden Markov Model
(HMM) and time series analysis for proactive
detection - Profile characteristics of new apps to reduce
false positive - Publication
- K-ary sketch SIGCOMM IMC 2003