Simulation Evaluation of Web Caching Architectures - PowerPoint PPT Presentation

About This Presentation
Title:

Simulation Evaluation of Web Caching Architectures

Description:

Simulation Evaluation of Web Caching Architectures Carey Williamson Mudashiru Busari Department of Computer Science University of Saskatchewan – PowerPoint PPT presentation

Number of Views:250
Avg rating:3.0/5.0
Slides: 49
Provided by: Carey49
Category:

less

Transcript and Presenter's Notes

Title: Simulation Evaluation of Web Caching Architectures


1
Simulation Evaluation of Web Caching Architectures
  • Carey Williamson
  • Mudashiru Busari
  • Department of Computer Science
  • University of Saskatchewan

2
Outline
  • Introduction Web Caching
  • Proxy Workload Generator (ProWGen)
  • Evaluation of Single-Level Caches
  • Evaluation of Multi-Level Caches
  • Conclusions and Future Work
  • Questions?

3
Introduction
  • The Web is both a blessing and a curse
  • Blessing
  • Internet available to the masses
  • Seamless exchange of information
  • Curse
  • Internet available to the masses
  • Stress on networks, protocols, servers, users
  • Motivation techniques to improve the performance
    and scalability of the Web

4
Why is the Web so slow?
  • Three main possible reasons
  • Client-side bottlenecks (PC, modem)
  • Solution better access technologies (TRLabs)
  • Server-side bottlenecks (busy Web site)
  • Solution faster, scalable server designs
  • Network bottlenecks (Internet congestion)
  • Solutions caching, replication improved
    protocols for client-server communication

5
What is a Web proxy cache?
  • Intermediary between Web clients (browsers) and
    Web servers
  • Controlled Internet access point for an
    institution or organization (e.g., firewall)
  • Natural point for Web document caching
  • Store local copies of popular documents
  • Forward requests to servers only if needed

6
Web Caching Proxy
Web Server
Web Server
Internet
Region or Organization Boundary
Proxy
Web Clients
C
C
C
C
7
Some Technical Issues
  • Size of cache
  • Replacement policy when cache is full
  • Cache coherence (Get-If-Modified)
  • Some content is uncacheable
  • Multi-cache coordination, peering (ICP)
  • Security and privacy hit metering
  • Other issues...

8
Our Previous Work
  • Collaborative project with CANARIE, through the
    Advanced Networks Applications program
    (July98-June99)
  • Design and evaluation of Web caching strategies
    for Canadas CAnet II backbone (National Web
    Caching Infrastructure)
  • For more information, see URL http//www.cs.usask.
    ca/faculty/carey/projects/nwci.html

9
CAnet II Web Caching Hierarchy (Dec 1998)
10
CAnet II Web Caching Hierarchy (Dec 1998)
(selected measurement points for our traffic
analyses 3-6 months of data
from each)
USask
CANARIE (Ottawa)
To NLANR
11
Caching Hierarchy Overview
Top-Level/International (20-50 GB)
Cache Hit Ratios
Proxy
5-10
(empirically observed)
Proxy
National (10-20 GB)
Proxy
15-20
Regional/Univ. (5-10 GB)
Proxy
Proxy
Proxy
30-40
...
...
C
C
C
C
C
C
C
12
NWCI Project Contributions
  • Workload characterization and evaluation of
    CAnet II Web caching hierarchy (IEEE
    Network, May/June 2000)
  • Developed Web proxy caching simulator for
    trace-driven simulation evaluation of Web proxy
    caching hierarchies
  • Recommendations for CANARIE NWCI about
    configuration of future caches

13
Overview of This Talk
  • Constructed synthetic Web proxy workload
    generation tool (ProWGen) that captures the
    salient characteristics of empirical Web proxy
    workloads
  • Use ProWGen to evaluate sensitivity of proxy
    caches to workload characteristics
  • Use ProWGen to evaluate effectiveness of
    multi-level Web caching hierarchies (and
    cache management techniques)

14
Research Methodology
  • Design, construction, and parameterization of
    workload models
  • Validation of ProWGen (statistically, and versus
    empirical workloads)
  • Simulation evaluation of single cache
  • Sensitivity to workload characteristics
  • Different cache sizes, replacement policies
  • Simulation evaluation of multi-level cache
  • Sensitivity to workload characteristics
  • Novel (heterogeneous) cache management policies

15
Key Workload Characteristics
  • One-timers (60-70 useless!!!)
  • Zipf-like document referencing popularity
  • Heavy-tailed file size distribution (i.e., most
    files small, but most bytes are in big files)
  • Correlations (if any) between document size and
    document popularity (debate!)
  • Temporal locality (temporal correlation between
    recent past and near future references) Mahanti
    et al. 2000

16
ProWGen Conceptual View
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
17
ProWGen Conceptual View
Zipf
P
r
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
18
ProWGen Conceptual View
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
19
ProWGen Conceptual View
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
20
ProWGen Workload Modeling Details
  • Modeled workload characteristics
  • One-time referencing
  • Zipf-like referencing behaviour (Zipfs Law)
  • File size distribution
  • Body lognormal distribution
  • Tail Pareto Distribution
  • Correlation between file size and popularity
  • Temporal locality
  • Static probabilities in finite-size LRU stack
    model
  • Dynamic probabilities in finite-size LRU stack
    model

21
Validation of ProWGen
  • To establish that the synthetic workloads possess
    the desired characteristics (quantitative and
    qualitative), and that the characteristics are
    similar to those in empirical workloads
  • Example analyze 5 million requests from a proxy
    server trace and parameterize ProWGen to generate
    a similar workload

22
Workload Synthesis
23
Zipf-like Referencing Behaviour
Empirical Trace Slope 0.81
Synthetic Trace Slope 0.83
24
Transfer Size Distribution
25
Research QuestionsSingle-Level Caches
  • In a single-level proxy cache, how sensitive is
    Web proxy caching performance to certain workload
    characteristics (one-timers, Zipf-ness,
    heavy-tail index)?
  • How does the degree of sensitivity change
    depending on the cache replacement policy?

26
Simulation Model
Web Servers
Web Clients
27
Factors and Levels
  • Cache size
  • Cache Replacement Policy
  • Recency-based LRU
  • Frequency-based LFU-Aging
  • Size-based GD-Size
  • Workload Characteristics
  • One-timers, Zipf slope, tail index, correlation,
    temporal locality model

28
Performance Metrics
  • Cache hit ratio
  • Percent of requested docs found in cache (HR)
  • Percent of requested bytes found in cache (BHR)
  • User response time
  • Estimated analytically using request rates, cache
    hit ratios, and (relative) cache miss penalties

29
Simulation Results (Preview)
  • Cache performance is very sensitive to
  • Slope of Zipf-like doc referencing popularity
  • Temporal locality property
  • Correlations between size and popularity
  • Cache performance relatively insensitive to
  • Tail index of heavy-tailed file size distribution
  • One-timers

30
Sensitivity to One-timers (LRU)
(a) Hit Ratio
(a) Byte Hit Ratio
31
Sensitivity to Zipf Slope (LRU)
Difference of 0.2 in Zipf slope impacts
performance by as much as 10-15 in hit ratio
and byte hit ratio
(a) Hit Ratio
(b) Byte Hit Ratio
32
Sensitivity to Heavy Tail Index (LRU Replacement
Policy)
(a) Hit Ratio
(b) Byte Hit Ratio
33
Sensitivity to Heavy Tail Index (GD-Size
Replacement Policy)
Difference of 0.2 in heavy tail index impacts
performance by less than 3
(a) Hit Ratio
(a) Byte Hit Ratio
34
Sensitivity to Correlation (LRU)
(a) Hit Ratio
(a) Byte Hit Ratio
35
Sensitivity to Temporal Locality (LRU)
(a) Hit Ratio
(b) Byte Hit Ratio
36
Summary Single-Level Caches
  • Cache performance is sensitive to
  • Slope of Zipf-like document referencing
    popularity
  • Temporal locality
  • Correlation between size and popularity
  • Cache Performance is insensitive to
  • Tail index of heavy-tailed file size
    distribution
  • One-timers

37
Multi-Level Caching...
  • Workload characteristics change as you move up
    the Web caching hierarchy (due to filtering
    effects, aggregation, etc)
  • Idea 1 Try different cache replacement policies
    at different levels of hierarchy
  • Idea 2 Limit replication of cache content in
    overall hierarchy through partitioning (size,
    type, sharing,)

38
Research QuestionsMulti-Level Caches
  • In a multi-level caching hierarchy, can overall
    caching performance be improved by using
    different cache replacement policies at different
    levels of the hierarchy?
  • In a multi-level caching hierarchy, can overall
    performance be improved by keeping disjoint
    document sets at each level of the hierarchy?

39
Simulation Model
Web Servers
Web Clients
40
Experiment 1 Different Policies at Different
Levels of the hierarchy
(a) Hit Ratio
(b) Byte Hit Ratio
41
Experiment 2 Shared files at the upper level of
the hierarchy
42
Experiment 3 Size-based Partitioning
  • Partition files across the two levels based on
    sizes (e.g., keep small files at the lower level
    and large files at the upper level) (or vice
    versa)
  • Three size thresholds
  • 5,000 bytes
  • 10,000 bytes
  • 100,000 bytes

43
Small files at the lower level Large files at
the upper level
Parent
Size threshold 5,000 bytes
44
Large files at the lower level Small files at
the upper level
Size threshold 5,000 bytes
45
Summary Multi-Level Caches
  • Different Policies at different levels
  • LRU/LFU-Aging at the lower level GD-Size at the
    upper level provided improvement in performance
  • GD-Size GD-Size provided better performance in
    hit ratio, but with some penalty in byte hit ratio
  • Sharing-based approach
  • no benefit compared to the other cases studied
  • Size-threshold approach
  • small files at the lower level large files at
    the upper level provided improvement in
    performance
  • reversing this policy offered no perf advantage

46
Conclusions
  • ProWGen is a valuable tool for the evaluation of
    Web proxy caching architectures, using synthetic
    workloads
  • Existing multi-level caching hierarchies are not
    always that effective
  • Heterogeneous caching architectures may better
    exploit workload characteristics and improve Web
    caching performance

47
Future Work
  • Extend ProWGen
  • model response time
  • model file size modifications
  • Extend the multi-level experiments
  • look into configurations where there is
    communication between the lower level proxies
  • investigate configurations involving more levels
    and and more lower level proxies

48
For More Information...
  • M. Busari, Simulation Evaluation of Web Caching
    Hierarchies, M.Sc. Thesis, June 2000
  • Two papers available soon (under review)
  • ProWGen tool is available now
  • Email carey_at_cs.usask.ca
  • http//www.cs.usask.ca/faculty/carey/
Write a Comment
User Comments (0)
About PowerShow.com