Title: A Hybrid Caching Strategy for Streaming Media Files
1A Hybrid Caching Strategy for Streaming Media
Files
- Jussara M. Almeida Derek L. Eager
Mary K. Vernon - University of Wisconsin-Madison
- University of Saskatchewan
- November 2001
2Outline
- Characteristics of Streaming Media (SM) files
- Delivery of SM files
- Hypothesis and Assumptions
- Previous Caching Policies
- New Policy Performance Comparison
- New Caching Policies
- Conclusions and Future Work
3Characteristics of SM Files
- Large file size
- cache on disk
- Sustained I/O bandwidth
- inserting and reading new content
- Clients access partial files
- initial portion
- favored segment
- base variable number of layers of layered
encoding
4Delivery of SM Files
- Unicast streaming
- server bandwidth is linear in client request rate
- goal maximize byte hit ratio
- Multicast streaming
- save bandwidth
- cost sharing introduces new tradeoffs
5Caching for Multicast Streams Tradeoffs
- example
- 10 distributed proxy servers each serving a
local region, - 100 requests (on avg) arrive per region
during a given popular video - need 7 streams per region, or 12 streams at the
remote server
6Caching for Multicast Streams Tradeoffs
- caching popular content reduces the load on the
remote server and network - delivering popular content from the remote server
amortizes the cost of a stream over more clients - earlier portions of a popular video require more
bandwidth and have less cost-sharing than later
portions
7New Caching Policies Research
- Hypothesis popularity-based strategy will
outperform replacement-based strategy - significant fraction of requests to uncached
files may be for files that are accessed very
sporadically - Assumptions
- limited disk space implies limited disk bandwidth
- proxy bandwidth for delivering cached streams is
equal to min of proxy disk bw and proxy network
bw - (call this proxy disk bandwidth)
8Current Web Caching Policies
- Replacement based (cache on each miss)
- Top replacement candidate is an ad-hoc
combination of - large files
- least recently access or lower access frequency
- miss penalty (server latency, bandwidth)
- Cache whole file or none
- Unicast
- Ignore limited disk bandwidth
9Previous SM Caching Policies
- Interval Caching DaSi93, KaRT95
- Resource Based Caching (RBC) TVDS98
- Least Frequently Used (LFU)
- Block-based insertion and deletion AcSm00
- Popularity-based caching for layered encoding
RYHE00 - Prefix and Segment Caching for smoothing
SeRT99,WZDS98
10Interval Caching
- Cache smallest intervals
- Target memory caches (lots of insertions)
File f
11Resource Based Caching
- Cache entire files and intervals/runs
- Goal efficiently utilize the limited resource
- limited space cache smallest space requirement
- limited bandwidth cache smallest write overhead
- Pre-allocate bandwidth to each cached entity
- Complex algorithm
- Complex implementation
- High time complexity
12RBC Algorithm
13Least Frequently Used
- Different implementation options
- What to do when receive first access to an
object? - How to estimate frequency?
- Version studied Currently Most Popular (CMP)
- Insert only most frequently accessed
(file or segment) - On-line popularity estimate future research
14Previous comparison RBC vs. CMP TVDS98
- Fixed file access frequencies
- RBC outperforms CMP for all parameter values
studied - Limited design space
- e.g. total cache size ? 16GB
- Inconsistent results
15New Performance Comparison
- Re-evaluate byte hit ratio of CMP and RBC
- Simulation with synthetic workload
- Broad design space
- New Pooled RBC
- New simple hybrid CMP/interval caching (CMP/IC)
policy
16System Assumptions
- Arrivals Poisson(?)
- extra experiments with Pareto(?,k)
- File access frequency Zipf(?)
- Perfect File popularity
- extra experiments with approximate file
popularity - Uniform file size and delivery rate
- extra experiments with variable file size and
delivery rate - Load balanced across multiple disks
17System Parameters
- n number of files
- ? Zipf parameter
- N arrival rate
(avg. number of requests per
avg. file duration T) - N ? ? T
- C cache size (fraction of media data accessed)
18System Parameters
- B normalized disk bandwidth
- (fraction of the average number of
simultaneous streams needed to deliver data that
is cached by CMP) - B depends on N, ?, n, C and disk technology
- Relative performance of policies depends mainly
on B - B 1.0 CMP system is bandwidth balanced
- B ? 1.0 CMP system is bandwidth deficient
- B ? 1.0 CMP system is bandwidth abundant
19Normalized Disk Bandwidth (B)Example
- Ultrastar 72ZX disk
- disk space 116.76 hours of MPEG-1 video (73.4GB)
- disk bandwidth 108 MPEG-1 streams (22-37 MB/s )
- Assume 100 requests / hour for cached files
- If cache contains 2-hour movies
- Need 200 streams
- B 108/200 0.54
- If cache contains 30-minute TV shows
- Need 50 streams for cache content
- B 108/50 2.16
20RBC vs. CMP
N 450, n 100, ?0
- CMP outperforms RBC if B ? 1.0
- RBC slightly outperforms CMP if B ? 1.0 and
small caches
21Files Cached by RBC
- Average fraction of each file cached by RBC (N
450, n 100, C0.25)
B 0.75
B 2.0
B 1.0
22Space and Bandwidth Utilization
B 0.75
B 2.0
B 1.0
23Pooled RBC
- Three improvements over RBC
- simpler rule to select entity to cache
- can keep cached intervals when deleting a full
file - pool of pre-allocated bandwidth
- Similar complexity as RBC
24Pooled RBC, RBC and LFU
N 450, n 100, ?0
- Pooled RBC ? CMP
- BUT, Pooled RBC is much more complex than CMP
25Hybrid CMP/IC Policies
- Do interval caching on a separate (small) cache
- Interval Cache in Main Memory
CMP/ICmem and Pooled RBC/ICmem - Interval Cache on Disk CMP/ICdisk
- e.g. 5 of disk cache
26CMP/ICmem vs. Pooled RBC/ICmem
N 450, n 100, ?0
- Memory cache improves CMP and Pooled RBC
- B ? 1.0 greater improvement for CMP
27CMP/ICdisk vs. Pooled RBC
N 450, n 100, ?0
- CMP/ICdisk ? Pooled RBC ? CMP
28Conclusions
- Simple CMP
- simple to implement
- performance similar to Pooled RBC, CMP/ICdisk
(static file popularities) - Hybrid CMP/IC policy
- Performance ? Pooled RBC
- simple to implement
- possibly more robust
(imperfect and dynamic popularity
measures)
29Future Work
- Develop on-line estimate of file popularity
- Server log analysis
- client behavior and workloads (NOSSDAV01 paper)
- More logs!!!!
- Caching Policies for Multicast Streams
- popular file has greater cache-sharing if not
cached - determine cache content that minimizes per-client
cost - caching principles / on-line policy
- (coming up soon)
- Prototype, experimental ( live ) workloads