Title: Towards Autonomic Computing: Service Discovery and Web Hotspot Rescue
1Towards Autonomic Computing Service Discovery
and Web Hotspot Rescue
- Weibin Zhao
- zwb_at_cs.columbia.edu
- Department of Computer Science
- Columbia University
- April 5, 2006
With Prof. Henning Schulzrinne
2Autonomic Computing
- The growing complexity of computing systems
- Interconnected heterogeneous systems
- Roaming users in changing networking environments
- A myriad of network-enabled devices
- A grand vision
- Build self-managing systems so as to reduce
management complexity and cost - A grand challenge
- Fully autonomic systems vs. systems that have
substantial autonomic components
3Thesis Research
- Autonomic networking and distributed systems
- Self-configuration of networking applications
- Dynamic scalability of Internet servers
- Service discovery
- A building block
- Enable end systems to discover desired services
on networks automatically - Web hotspot rescue
- A prototype system
- Allow web sites to scale dynamically to handle
short-term load spikes effectively
4Thesis Outline
- Service discovery
- Enhancements to Service Location Protocol (SLP)
- Selective anti-entropy for high availability
partial replication - Web hotspot rescue
- DotSlash an automated web hotspot rescue system
- Traffic prediction for overload prevention
- Summary
5Service Discovery
- Service description ? service access points
- Service providers advertise
- Service users query
- Discovery mechanisms
- Multicast
- Service registries (directory services)
- Existing service discovery systems
- Jini, UPnP, UDDI, SLP,
- Challenges
- Scalability small scale, local domain, one
service type - New discovery scenarios best match,
multi-access-point services
match
discovery
6Our Approach
- Enhance Service Location Protocol (SLP)
- ? leverage existing efforts
- IETF standard for service discovery in IP
networks - Flexible and powerful multicast, directories,
scopes, search filters, security features - Enhancements to SLP
- Mesh enhancement (mSLP)
- Remote service discovery
- Preference filters
- Global attributes
7mSLP Overview
- mSLP
- Scope-based fully-meshed peer DA architecture
- Simplify SA registration
- Improve DA consistency
- SLP Entities
- UA user agent
- SA service agent
- DA directory agent
mSLP Example
DA4 (S3)
(S1, S2)
(S3)
(S2)
(S2)
DA1 (S1, S2)
DA3 (S2, S3)
DA2 (S1, S2)
8mSLP Design
- Peer relationship management
- Learn about new peers
- Set up, maintain, tear down a peer relationship
- Exchange information about known peers
- Registration propagation control
- In shared scopes (partial replication)
- New updates only
- Version control
- Propagation methods
- Anti-entropy (for initial data and after
failures) - Direct forwarding (for newly accepted updates)
9Anti-Entropy
- For high availability full replication PODC87
- Eventual consistency by exchanging new updates
only - Subset ltRigt all updates accepted by Ri
- Summary vector largest TS for each subset
- Exchange updates in all subsets during a session
(2) Update1Scope1TS1,
Update3Scope1TS3
Problems for partial replication
Replica2 (Scope1)
Replica3 (Scope1, Scope2)
(3) Update2Scope2TS2 ?
(1) Update1Scope1TS1,
Update3Scope1TS3
Replica1 (Scope1,Scope2)
Update1Scope1TS1, Update2Scope2,TS2,
Update3Scope1TS3
10Selective Anti-Entropy
- For high availability partial replication
PODC02 - Exchange updates in any number of subsets during
a session - Use safe sessions only (no summary problem)
- (R6, ltR3gt, R1) is safe
- (R6, ltR3gt, R2) and (R6, ltR3gt, R4) are not safe
R1 (S1,S2,S3)
R2 (S1)
R3 (S1,S2)
R4 (S2,S3)
R5 (S3)
R6 (S1,S2)
11Thesis Outline
- Service discovery
- Enhancements to Service Location Protocol (SLP)
- Selective anti-entropy for high availability
partial replication - Web hotspot rescue
- DotSlash an automated web hotspot rescue system
- Traffic prediction for overload prevention
- Summary
12Web Hotspots
Web Server
Internet
- Flash crowds, the Slashdot effect
- Short-term dramatic load spikes
- Dynamic content sites more vulnerable, different
bottlenecks - Challenges
- Capacity planning (clusters, mirrors, CDNs) not
cost-effective
13Our Approach
- DotSlash
- An automated web hotspot rescue system
- Address different bottlenecks
- Usage model
- Mutual-aid community for different web sites
- Three types of communities
- Open
- Closed authentication
- Flood-insurance closed authentication tokens
14DotSlash Overview
www.origin.com/1.2.3.4
www.rescue.com/5.6.7.8
(8) Reverse proxy
Origin Server
Rescue Server
(9)
Cache
Dynamic DNS
Dynamic DNS
(10)
(7)
(3)
(4) HTTP redirect vh1.www.rescue.com
Origin DNS
Rescue DNS
DNS RR
HTTP redirect cache miss
(5) vh1.www.rescue.com
Client1
(6) 5.6.7.8
(2) 1.2.3.4
(1) www.origin.com
Client2
(2) 5.6.7.8
(3)
(1) www.origin.com
(4)
DNS RR cache hit
15DotSlash Components
- Basic system (static dynamic content)
- Dynamic virtual hosting
- Request direction
- Workload monitoring
- Rescue control
- Rescue server discovery
- Extensions for dynamic content
- Dynamic replication of application programs
- On-demand query result caching
16Workload Migration
- Request redirection at origin server
- DNS-RR first-level crude load distribution
- Add rescue server IP address to local DNS
- HTTP redirect second-level fine-grained load
balancing - Policies weighted round robin based on rescue
capacity - Dynamic virtual hosting at rescue server
- Assign virtual host name to origin server
- Used in origin servers HTTP redirect
- Map client requests
- Its own name www.rescue.com ? its own content
- An alias vh1.www.rescue.com (HTTP redirect) ?
origin server - An origin server www.origin.com (DNS-RR) ?
origin server
17Rescue Management
- Workload monitoring
- Network and CPU utilization
- Load regions
- Trigger different rescue actions
- Rescue protocol
- Rescue server discovery
- DotSlash registries
- Replicated based on mSLP
- Registry discovery via DNS SRV
- dot-slash.net
Heavy load region
Desired load region
Light load region
Rescue Server
Origin Server
SOS
200 OK
TOKEN
RATE
KEEPALIVE
SHUTDOWN
18Server States Rescue Actions
Get more rescue
Pr Redirect probability Rr Allowed redirect rate
SOS
Increase Pr
Decrease Pr
Get rescue
Release rescue
Normal
Provide rescue
Shutdown last rescue
Rescue
Increase Rr
Decrease Rr
Provide more rescue
Shutdown some rescue
19Dynamic Script Replication
- Dynamic content web sites
- LAMP configuration Linux, Apache, MySQL, PHP
- Remove web/application server bottleneck
MySQL
Apache
Origin Server
Database
(1)
Client
(2)
(5) PHP
(6)
(4)
(3)
(7)
Rescue Server
(8)
Apache
20On-demand Query Result Caching
- Reduce workload at read-mostly databases
Origin Server
Query Result Cache
Web/Application Server
Database Server
Client
Data Driver
Query Result Cache
Web/Application Server
Database Server
Data Driver
Rescue Server
21Caching Data Driver Control
Normal State
Caching Control
Heavy Load
Upper Threshold
Cache On
Cache On
SOS State
Rescue State
Cache On
Desired Load
Cache Off
Cache Off
Cache Off
Lower Threshold
Light Load
Data Driver Control
22Implementation
- Three configurations in using DotSlash
- Dots_Apache
- Dots_Apache Dots_PHP
- Dots_Apache Dots_PHP Dots_MySQL
Shared Memory
DotSlash Rescue Protocol
Apache
DotSlash Module
DotSlash Daemon
DotSlash Daemon
HTTP
Client
SLP
DNS
BIND
mSLP DA
23Evaluation
- Experimental Setup
- LAMP Redhat 9.0, Apache 2.0.49, PHP 4.3.6, MySQL
4.0.18 - Dynamic DNS BIND 9.2.2, dot-slash.net
- Service discovery enhanced SLP
- LAN and PlanetLab
- Workload
- Static content httperf (HP Labs)
- Dynamic content RUBBoS (Rice University)
- Metrics
- Max data rate delivered
- Max request rate supported
24Relieving Network Bottleneck
Origin Server A PlanetLab node behind DSL
Rescue Server
A local machine connected to Internet2
25Workload Control and Migration
Network Workload Control
Workload Migration
Request/redirect rate at origin server Rescue
rate at rescue servers
Data rate at origin server and rescue
servers Total data rate delivered to clients
26Removing Web/Application Server Bottleneck
- Different Configurations
- for Origin Server
- HC 2 GHz CPU, 1GB
- LC 1GHz CPU, 512 MB
- Origin Server HC / LC
- Rescue Server LC
- Database Server HC
- CPU Utilization
- Origin Server 50-60
- Rescue Servers 50
- Database Server 95
27Reducing Database Workload (Read-only)
c cache r rescue sc shared cache
28Reducing Database Workload (Submission)
c cache r rescue
29Thesis Outline
- Service discovery
- Enhancements to Service Location Protocol (SLP)
- Selective anti-entropy for high availability
partial replication - Web hotspot rescue
- DotSlash an automated web hotspot rescue system
- Traffic prediction for overload prevention
- Summary
30Traffic Prediction
- Traditional approach
- Based on a number of history intervals
- At a single time scale
- Use curve fitting
- Our approach
- Predict upper bound of future web traffic volume
- For overload prevention
- Use a multiple-time-scale approach
- Only based on current interval
- At different time scales self-similarity
- Use statistical properties of web traffic
31Design
- Prediction algorithm WWW03
- Given a time scale T
- D(T) difference of traffic volume between
adjacent intervals - ?(D(T)) mean of D(T)
- ?(D(T)) standard deviation of D(T)
- Divide T into n sub-intervals TT/n
- ?(D(T))nH?(D(T)), ?(D(T))nH?(D(T)), H Hurst
parameter - Parameter selection
- T prediction interval, lt100 second
- n scaling factor, 10, 100
- H Hurst parameter, 0.8, 0.9
32Experimental Results
Three servers on Day74 n10, H0.85
Server41 on Day65 n10, H0.85, T100 seconds
1998 World Cup data set, 1.35 billion requests,
30 servers, 92 days
33Thesis Summary
- Major thesis contributions
- Enhancements to Service Location Protocol
- Selective anti-entropy for high availability
partial replication - DotSlash an automated web hotspot rescue system
- Web traffic prediction for overload prevention
- Open-source software releases
- http//mslp.sourceforge.net
- http//dotslash.sourceforge.net
- Future work
- Apply DotSlash to other Internet servers, P2P
systems, web services - Address security issues in DotSlash
- Location-based service discovery
34Major Publications
- Request for Comments (RFCs)
- RFC 3528 mSLP
- RFC 3421 preference filters
- RFC 3832 remote service discovery
- Conference and journal papers
- DotSlash PODC04, WCW04, GI05, ICAC06
- Traffic prediction WWW03
- Selective anti-entropy PODC02
- Service discovery ICCCN00, ICCCN02, JSS05