Title: Simulation Evaluation of Web Caching Architectures
1Simulation Evaluation of Web Caching Architectures
- Carey Williamson
- Mudashiru Busari
- Department of Computer Science
- University of Saskatchewan
2Outline
- Introduction Web Caching
- Proxy Workload Generator (ProWGen)
- Evaluation of Single-Level Caches
- Evaluation of Multi-Level Caches
- Conclusions and Future Work
- Questions?
3Introduction
- The Web is both a blessing and a curse
- Blessing
- Internet available to the masses
- Seamless exchange of information
- Curse
- Internet available to the masses
- Stress on networks, protocols, servers, users
- Motivation techniques to improve the performance
and scalability of the Web
4Why is the Web so slow?
- Three main possible reasons
- Client-side bottlenecks (PC, modem)
- Solution better access technologies (TRLabs)
- Server-side bottlenecks (busy Web site)
- Solution faster, scalable server designs
- Network bottlenecks (Internet congestion)
- Solutions caching, replication improved
protocols for client-server communication
5What is a Web proxy cache?
- Intermediary between Web clients (browsers) and
Web servers - Controlled Internet access point for an
institution or organization (e.g., firewall) - Natural point for Web document caching
- Store local copies of popular documents
- Forward requests to servers only if needed
6Web Caching Proxy
Web Server
Web Server
Internet
Region or Organization Boundary
Proxy
Web Clients
C
C
C
C
7Some Technical Issues
- Size of cache
- Replacement policy when cache is full
- Cache coherence (Get-If-Modified)
- Some content is uncacheable
- Multi-cache coordination, peering (ICP)
- Security and privacy hit metering
- Other issues...
8Our Previous Work
- Collaborative project with CANARIE, through the
Advanced Networks Applications program
(July98-June99) - Design and evaluation of Web caching strategies
for Canadas CAnet II backbone (National Web
Caching Infrastructure) - For more information, see URL http//www.cs.usask.
ca/faculty/carey/projects/nwci.html
9CAnet II Web Caching Hierarchy (Dec 1998)
10CAnet II Web Caching Hierarchy (Dec 1998)
(selected measurement points for our traffic
analyses 3-6 months of data
from each)
USask
CANARIE (Ottawa)
To NLANR
11Caching Hierarchy Overview
Top-Level/International (20-50 GB)
Cache Hit Ratios
Proxy
5-10
(empirically observed)
Proxy
National (10-20 GB)
Proxy
15-20
Regional/Univ. (5-10 GB)
Proxy
Proxy
Proxy
30-40
...
...
C
C
C
C
C
C
C
12NWCI Project Contributions
- Workload characterization and evaluation of
CAnet II Web caching hierarchy (IEEE
Network, May/June 2000) - Developed Web proxy caching simulator for
trace-driven simulation evaluation of Web proxy
caching hierarchies - Recommendations for CANARIE NWCI about
configuration of future caches
13Overview of This Talk
- Constructed synthetic Web proxy workload
generation tool (ProWGen) that captures the
salient characteristics of empirical Web proxy
workloads - Use ProWGen to evaluate sensitivity of proxy
caches to workload characteristics - Use ProWGen to evaluate effectiveness of
multi-level Web caching hierarchies (and
cache management techniques)
14Research Methodology
- Design, construction, and parameterization of
workload models - Validation of ProWGen (statistically, and versus
empirical workloads) - Simulation evaluation of single cache
- Sensitivity to workload characteristics
- Different cache sizes, replacement policies
- Simulation evaluation of multi-level cache
- Sensitivity to workload characteristics
- Novel (heterogeneous) cache management policies
15Key Workload Characteristics
- One-timers (60-70 useless!!!)
- Zipf-like document referencing popularity
- Heavy-tailed file size distribution (i.e., most
files small, but most bytes are in big files) - Correlations (if any) between document size and
document popularity (debate!) - Temporal locality (temporal correlation between
recent past and near future references) Mahanti
et al. 2000
16ProWGen Conceptual View
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
17ProWGen Conceptual View
Zipf
P
r
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
18ProWGen Conceptual View
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
19ProWGen Conceptual View
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
20ProWGen Workload Modeling Details
- Modeled workload characteristics
- One-time referencing
- Zipf-like referencing behaviour (Zipfs Law)
- File size distribution
- Body lognormal distribution
- Tail Pareto Distribution
- Correlation between file size and popularity
- Temporal locality
- Static probabilities in finite-size LRU stack
model - Dynamic probabilities in finite-size LRU stack
model
21Validation of ProWGen
- To establish that the synthetic workloads possess
the desired characteristics (quantitative and
qualitative), and that the characteristics are
similar to those in empirical workloads
- Example analyze 5 million requests from a proxy
server trace and parameterize ProWGen to generate
a similar workload
22Workload Synthesis
23Zipf-like Referencing Behaviour
Empirical Trace Slope 0.81
Synthetic Trace Slope 0.83
24Transfer Size Distribution
25Research QuestionsSingle-Level Caches
- In a single-level proxy cache, how sensitive is
Web proxy caching performance to certain workload
characteristics (one-timers, Zipf-ness,
heavy-tail index)? - How does the degree of sensitivity change
depending on the cache replacement policy?
26Simulation Model
Web Servers
Web Clients
27Factors and Levels
- Cache size
- Cache Replacement Policy
- Recency-based LRU
- Frequency-based LFU-Aging
- Size-based GD-Size
- Workload Characteristics
- One-timers, Zipf slope, tail index, correlation,
temporal locality model
28Performance Metrics
- Cache hit ratio
- Percent of requested docs found in cache (HR)
- Percent of requested bytes found in cache (BHR)
- User response time
- Estimated analytically using request rates, cache
hit ratios, and (relative) cache miss penalties
29Simulation Results (Preview)
- Cache performance is very sensitive to
- Slope of Zipf-like doc referencing popularity
- Temporal locality property
- Correlations between size and popularity
- Cache performance relatively insensitive to
- Tail index of heavy-tailed file size distribution
- One-timers
30Sensitivity to One-timers (LRU)
(a) Hit Ratio
(a) Byte Hit Ratio
31Sensitivity to Zipf Slope (LRU)
Difference of 0.2 in Zipf slope impacts
performance by as much as 10-15 in hit ratio
and byte hit ratio
(a) Hit Ratio
(b) Byte Hit Ratio
32Sensitivity to Heavy Tail Index (LRU Replacement
Policy)
(a) Hit Ratio
(b) Byte Hit Ratio
33Sensitivity to Heavy Tail Index (GD-Size
Replacement Policy)
Difference of 0.2 in heavy tail index impacts
performance by less than 3
(a) Hit Ratio
(a) Byte Hit Ratio
34Sensitivity to Correlation (LRU)
(a) Hit Ratio
(a) Byte Hit Ratio
35Sensitivity to Temporal Locality (LRU)
(a) Hit Ratio
(b) Byte Hit Ratio
36Summary Single-Level Caches
- Cache performance is sensitive to
- Slope of Zipf-like document referencing
popularity - Temporal locality
- Correlation between size and popularity
- Cache Performance is insensitive to
- Tail index of heavy-tailed file size
distribution - One-timers
37Multi-Level Caching...
- Workload characteristics change as you move up
the Web caching hierarchy (due to filtering
effects, aggregation, etc) - Idea 1 Try different cache replacement policies
at different levels of hierarchy - Idea 2 Limit replication of cache content in
overall hierarchy through partitioning (size,
type, sharing,)
38Research QuestionsMulti-Level Caches
- In a multi-level caching hierarchy, can overall
caching performance be improved by using
different cache replacement policies at different
levels of the hierarchy? - In a multi-level caching hierarchy, can overall
performance be improved by keeping disjoint
document sets at each level of the hierarchy?
39Simulation Model
Web Servers
Web Clients
40Experiment 1 Different Policies at Different
Levels of the hierarchy
(a) Hit Ratio
(b) Byte Hit Ratio
41Experiment 2 Shared files at the upper level of
the hierarchy
42Experiment 3 Size-based Partitioning
- Partition files across the two levels based on
sizes (e.g., keep small files at the lower level
and large files at the upper level) (or vice
versa) - Three size thresholds
- 5,000 bytes
- 10,000 bytes
- 100,000 bytes
43Small files at the lower level Large files at
the upper level
Parent
Size threshold 5,000 bytes
44Large files at the lower level Small files at
the upper level
Size threshold 5,000 bytes
45Summary Multi-Level Caches
- Different Policies at different levels
- LRU/LFU-Aging at the lower level GD-Size at the
upper level provided improvement in performance - GD-Size GD-Size provided better performance in
hit ratio, but with some penalty in byte hit ratio
- Sharing-based approach
- no benefit compared to the other cases studied
- Size-threshold approach
- small files at the lower level large files at
the upper level provided improvement in
performance - reversing this policy offered no perf advantage
46Conclusions
- ProWGen is a valuable tool for the evaluation of
Web proxy caching architectures, using synthetic
workloads - Existing multi-level caching hierarchies are not
always that effective - Heterogeneous caching architectures may better
exploit workload characteristics and improve Web
caching performance
47Future Work
- Extend ProWGen
- model response time
- model file size modifications
- Extend the multi-level experiments
- look into configurations where there is
communication between the lower level proxies - investigate configurations involving more levels
and and more lower level proxies
48For More Information...
- M. Busari, Simulation Evaluation of Web Caching
Hierarchies, M.Sc. Thesis, June 2000 - Two papers available soon (under review)
- ProWGen tool is available now
- Email carey_at_cs.usask.ca
- http//www.cs.usask.ca/faculty/carey/