Title: Taxonomy and Design Analysis for Distributed Web Caching
1Taxonomy and Design Analysis for Distributed Web
Caching
- Sandra G. Dykes Clinton L. Jeffery Samir
Das - Division of Computer Science
- University of Texas at San Antonio
- http//www.cs.utsa.edu/research/proxy/proxy.html
2Outline
- Web traffic characteristics
- Taxonomy for distributed web caches
- Analysis of web cache design
- Which taxonomy categories best match Web traffic?
- Server-directed proxy sharing
- Simulation results
3Proxy Server Caching
ISP boundary
User
Proxy
User User User
4Proxy server caching is not enough
- Hit rates depend upon overlap in user requests
- high hit rates only if many users or if users
access the same objects. - Maximum proxy cache hit rates - requests
- DEC 30 - 50
- Virginia Tech. 27, 28, 43
- AOL 50
-
- Measured hit rates
- Duska, Marwood and Feeley 24 - 45
5Web Traffic Characteristics
UTSA Other Small objects lt12 KB lt21
KB Pareto distribution Duplicated
requests gt99 gt97 Skewed popularity 10
objects satisfy 74 req. 90 req. HTML and
images Transfers 94 gt90 Bytes 61 52 - 94
Embedded images Transfers 42 Bytes 34 per
page 3.3 3 - 5
HICSS 32
6Web Traffic Characteristics
- Most objects are long-lived
- (Bestavros, et.al.)
- HTML 50 days
- GIF 85 days
- JPEG 100 days.
- Popularity varies by region.
- Popularity can change rapidly.
- Requests often arrive in bursts.
7Implications for Web caching
- Small objects ? latency important
- Pareto distribution ? bandwidth important
- Bursty arrivals ? dynamically
distribute load - Most cached objects wont be out-of-date.
- Adapt quickly to popularity shifts.
- Utilize popularity skew and Web page information
(embedded images). - Consider geography.
- Consider network topology.
8Taxonomy for Web Caching
Discovery Fixed cache Group
query Manual Automatic Directory
lookup Centralized Distributed
Dissemination Client-initiated Server-initiat
ed
Delivery Direct Indirect
HICSS 32
9Web Server
Web Server
Proxy Caching (Fixed Cache)
Flat mesh (Directory)
Web Server
Hierarchical (Group query)
10Discovery
- Distributed metadata directories
- fast, local lookups
- separates discovery from site selection
- metadata can include URLs of related objects,
timestamps, object size, site performance data,
... -
- Fixed cache, non-hierarchical groups, and
centralized directories - do not scale.
- Hierarchical group query
- large miss penalties at each level
HICSS 32
11Dissemination
- Client-initiated
- adapts automatically to popularity shifts
- uses current request patterns.
-
- Server-initiated
- uses historic data
- may be sensitive to time window
- proxies do not control what they cache
12Delivery
- Direct
- lower latency
- fewer connections
- less network traffic.
-
- Indirect
- Each intermediate site requires a remote
connection and object transfer. - HTTP delivery is store-and-forward, making
indirection most costly for large files
13 - How do we propagate metadata?
14Server-Directed Proxy Sharing
Web Server
PROXY TABLE
Request
Object Popular List
15SDP Components
Web Server
PROXY TABLE
Metadata Directory Proxies and other
metadata for remote objects.
Popular List Most popular objects in
the local object cache.
Proxy Table Proxies and other metadata for
the servers objects.
16SDP Protocol
1. Proxy looks in local object cache.
17SDP Local cache miss
2. Proxy looks in Metadata Directory.
18SDP Directory miss
3. Proxy requests object from server. Server
returns object metadata for related objects
(server direction).
Request
Web Server
PROXY TABLE
Object Proxy List
19SDP Site selection
Web Server
Ping
Ping
Response
Ping
20SDP Proxy-to-Proxy request
Peer proxy returns object metadata for popular
objects (lazy prefetching).
Request
Object Popular List
21 Web server traffic (MB/s)
Simulation Results
Web server requests (req/s)
22Simulation results
Connections refusals
23 Advantages of SDP design
- Takes advantage of Web characteristics
- linked object requests
- object popularity skew
- rapid changes in popularity
- Scalable
- Internet-wide sharing using local discovery
- Reduces server load, response time, network
congestion - Separates discovery from site selection
- Flexible
- Metadata content can support different purposes
- No change to routers or HTTP protocol
- No change to local cache policy
- No central administration, equipment or personnel
24Questions and Future Work
- What will hit rates be at Metadata Directories?
- Cache site selection
- Static site bandwidth, network proximity, ...
- Statistical avg site load, historical
performance, ... - Run-time ping, tcping, bandwidth probe,
- Implementation
24
25Web Caching Projects
Project Discovery Dissemin. Delivery Proxy
server cache Fixed cache Client-init. Direct Harv
est / Squid Group query Client-init. Indirect
- Manual Zhang, Floyd, Jacobson Group
query Client-init. Indirect -
Automatic Gwertzman Seltzer Directory Server-in
it. Direct - Centralized Bestavros,
et.al. Directory Server-init. Direct -
Centralized Tewari, Dahlin, Vin,
Directory Server-init. Direct -
Distributed Dykes, Jeffery, Das Directory Client-
init. Direct - Distributed
26 Simulation
- Analytical workload
- Preliminary single server model
- Future extentions
- multiple servers
- network protocols router queues
27 Why Analytical Workloads?
- Model functional dependencies of performance
metrics on workload variables. - Separate the effects of workload variables.
- Predict behavior for different workloads.
28 Single Server Simulation
- Sessions Ta varied Exp (a)
- Embedded images Ta 221 ms Log-Normal (a,b)
- Connection duration Ts 289 ms
Log-Normal (a,b) - Session Probability P 0.59
- HTML with images P 0.13
- HTML no images P 0.30
- Non-HTML P 0.16
- Embedded image P 0.41
28
29Size and Type Distributions
- Object type P_request Avg Size
Distribution - HTML 0.430 4 KB Pareto
(a) - Image 0.506 11 KB Pareto
(a) - Audio 0.003 140 KB Pareto
(a) - Application 0.007 260 KB Pareto
(a) - Dynamic 0.019 1 KB Pareto
(a) - Other 0.031 11 KB Pareto
(a)
29
30Prototype Design
SDP Server
SDP Proxy Server
SDP
31How does SDP reduce server load?
- Proxies retrieve objects from other proxies
without involving the Web server. - Balances load between servers and proxies by
including load estimators in site selection.
32How does SDP reduce congestion?
- Direct delivery reduces number of
store-and-forward - object transfers.
- Using run-time probes for site selection favors
less congested routes. - Retrieving embedded objects from multiple sites
helps distribute network traffic.
33How does SDP reduce response time ?
- Discovery
- local lookup
- low overhead for metadata propagation
- servers piggyback metadata onto HTML files
- proxies piggyback metadata onto objects
- Delivery
- direct
- select cache site from estimates of response time
- retrieve embedded images concurrently from
multiple sites
34Popularity SkewUTSA-CS, UTSA-VIS
35 Object Size - TransfersUTSA-CS, UTSA-VIS