Title: On the Placement of Web Server Replicas
1On the Placement of Web Server Replicas
- Lili Qiu, Microsoft Research
- Venkata N. Padmanabhan, Microsoft Research
- Geoffrey M. Voelker, UCSD
- IEEE INFOCOM2001, Anchorage, AK, April 2001
2Outline
- Overview
- Related work
- Our approach
- Simulation methodology results
- Summary
3Motivation
- Growing interests in Web server replicas
- Exponential growth in Web usage
- Content providers want to offer better service at
lower cost - Solution replication
- Forms of Web server replicas
- Mirror sites
- Content Distribution Networks (CDNs)
- CDN a network of servers
- Examples Akamai, Digital Island
Internet
replica
replica
replica
replica
replica
Content Providers
Clients
4Placement of Web Server Replicas
- Problem specification
- Among a set of N potential sites, pick K sites as
replicas to minimize users latency or bandwidth
usage
Internet
Content Providers
Clients
5Related Work
- Placement of Web proxies LGI99
- Cache location KRS00
- Placement of Internet instrumentation JJJ00
6Our Approach
- Model Internet as a graph
- Parameterize the graph using measured inputs
- requests generated from each region
- Distance between different regions
- Map the placement problem onto a graph
optimization problem - Assumption
- Each client uses a single replica that is closest
to it - Solve graph optimization problem
- Using various approximation algorithms
7Minimum K-median Problem
- Given a complete graph G(V,E), d(j), c(i,j)
- d(j) requests
- c(i,j) distance between node i and j
- Latency
- or hop counts
- or other metric to be optimized
- Find a subset V ?V with V K s.t. it
minimizes - ?v?V minw?V d(v)c(v,w)
- NP-hard problem
8
7
4
5
3
2
2
2
4
8
6
3
5
10
6
8Placement Algorithms
- Tree based algorithm LGG99
- Assume the underlying topologies are trees, and
model it as a dynamic programming problem - O(N3M2) for choosing M replicas among N potential
places - Random
- Pick the best among several random assignments
- Hot spot
- Place replicas near the clients that generate the
largest load
9Placement Algorithms (Cont.)
- Greedy algorithm
- Calculate costs of assigning clients to replicas
- Select replica with lowest cost
- Adjust costs based upon assignment, repeat until
done - Super-Optimal algorithm
- Lagrangian relaxation subgradient method
10Simulation Methodology
- Network topology
- Randomly generated topologies
- Using GT-ITM Internet topology generator
- Real Internet network topology
- AS level topology obtained using BGP routing data
from a set of seven geographically dispersed BGP
peers - Web Workload
- Real server traces
- MSNBC, ClarkNet, NASA Kennedy Space Center
- Performance Metric
- Relative performance costpractical/costsuper-opti
mal
11Simulation Methodology (Cont.)
- Simulate a network of N nodes (100 ? N ? 3000)
- Cluster clients using network aware clustering
KW00 - IP addresses with the same address prefix belong
to a cluster - A small number of popular clusters account for
most requests - Top 10, 100, 1000, 3000 clusters account for
about 24, 45, 78, and 94 of the requests
respectively - Pick the top N clusters
- Map them to different nodes
12Simulation Methodology (Cont.)
- Random trees
- Random graphs
- AS-level topologies
- Sensitivity to the error in the input
13Random Tree Topologies
Tree-based algorithm performs well as
expected. Greedy algorithm performs equally as
well.
14Random Graph Topologies
The greedy and hot-spot algorithms out-perform
the tree-based algorithm.
15Large Random Graph Topologies
The greedy performs the best, and the hot-spot
performs nearly as well.
16AS-level Internet Topologies
The greedy performs the best, and the hot-spot
performs nearly as well.
17Effects of Imperfect Knowledge about Input Data
- Predicted workload (using moving window average)
- Perfect topology information
Within 5 degradation when using predicted
workload
18Effects of Imperfect Knowledge about Input Data
(Cont.)
- Predicted workload (using moving window average)
- Noisy topology information
- Perturb the distance between two nodes i and j by
up to a factor of 2
Within 15 degradation when using predicted
workload and noisy topology information
19Summary
- One of the first experimental studies on
placement of Web server replicas - Knowledge about client workload and topology is
needed for provisioning replicas - The greedy algorithm performs very well
- Within a factor of 1.1 1.5 of the super-optimal
- Insensitive to noise
- Stay within a factor of 2 of the super-optimal
when the salted error is a factor of 4 - The hot spot algorithm performs nearly as well
- Within a factor of 1.6 2 of the super-optimal
- Obtaining input data
- Moving window average for load prediction
- Using BGP router data to obtain topology
information
20Conclusion
- Recommend using the greedy algorithm for deciding
the placement of Web server replicas
21Acknowledgement
- Craig Labovitz
- Yin Zhang
- Ravi Kumar
22Comments on greedy algorithm performance
- Worst-case performance unbounded
- Bad example
- A full homogeneous binary tree with n2i leaves
and n caches - optimal cost
0 - greedy cost
(n-1)d - However, the worst-case scenario seems unlikely
to occur in real and random topologies
0
0
0
d
d
d
d
23Simulation Results inRandom Tree Topologies
24Random Tree Topologies
Tree-based algorithm performs well as
expected. Greedy algorithm performs equally as
well.
25Random Graph Topologies
The greedy and hot-spot algorithms out-perform
the tree-based algorithm.
26Large Random Graph Topologies
The greedy performs the best, and the hot-spot
performs nearly as well.
27AS-level Internet Topologies
The greedy performs the best, and the hot-spot
performs nearly as well.
28Simulation Results inReal Internet Topologies
29Obtaining Input Data
- Workload
- The number of requests generated by popular
client clusters - Stable
- Placement algorithm can use moving window average
for predicting load with negligible impact on
performance - Network topology
- Propagation delay
- Hop count
- AS hop count
- Internet weather map
30Placement of Web Server Replicas
- Goal
- Placing K replicas to minimize users latency or
bandwidth usage - Minimum K-median problem
- Select K servers to minimize the sum of
assignment costs - NP-hard problem
Internet
replica
replica
replica
replica
replica
Content Providers
Clients