Cache Storage For the Next Billion - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Cache Storage For the Next Billion

Description:

Emerging option: disk 1TB disk now $ ... memory in 256MB range Making Storage Work Populate disk with content Preloaded HTTP cache Preloaded WAN accelerator cache ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 23
Provided by: Vive9
Category:

less

Transcript and Presenter's Notes

Title: Cache Storage For the Next Billion


1
Cache Storage For the Next Billion
  • Students Anirudh Badam, Sunghwan Ihm
  • Research Scientist KyoungSoo Park
  • Presenter Vivek Pai
  • Collaborator Larry Peterson

2
The Next Billion
  • Developing regions are not all alike
  • Many people have stable food, clean water,
    reasonable power
  • Connectivity, however, is bad
  • Growing middle class with desire for education
    technology
  • These people are the next billion

3
Bad Networking Options
  • Africa often backhauled through Europe
  • Satellite latency not fun
  • Ghana 2Mbps, 6000/month!
  • Emerging option disk
  • 1TB disk now 200
  • Even latency better than satellite

4
Enter the Tiny Laptops
  • Problem memory in 256MB range

5
Making Storage Work
  • Populate disk with content
  • Preloaded HTTP cache
  • Preloaded WAN accelerator cache
  • Preloaded Web sites Wikipedia, etc
  • Ship disk to schools
  • Update as needed
  • Pull update caches on-demand during peak
  • Push updates off peak, overnight

6
Deployment Scenarios
  • Special servers per school
  • 2 for redundancy
  • Average school size 100 students
  • _at_ 100/laptop, 10K/school
  • Problems
  • 2 servers _at_ 5K doubles per-school cost
  • Servers dont ride laptop commodity curves
  • Solution no servers, just laptops

7
Goal 1 TB Cache Store on a 256MB Laptop
  • Why caching?
  • Improves Web access
  • Improves WAN access
  • Problem
  • Large disks are really slow
  • Disk storage requires index
  • In-memory indices optimize disk access

8
Memory Index Sizing
  • Squid popular HTTP cache
  • 72 bytes/object
  • Web objects average 8KB each
  • 1TB 125M objects
  • 125M objects 9GB RAM just for index
  • Commercial caches better RAM usage
  • 32 bytes/object
  • 1TB disk 4GB RAM

9
Revisiting Cache Indexing
  • Seek reduction important
  • Most objects small
  • Access largely random
  • High insert rate
  • Assume hit rate is 50
  • Assume cachable rate is 50
  • Insert rate 25 of request rate
  • High delete rate
  • Caches largely full
  • If insert rate 25, delete rate 25
  • Deletion using LRU, etc

10
Restarting the Design
  • Eliminate in-memory index
  • Treat disk like memory
  • Optimize data structures for locality
  • Use location-sensitive algorithms
  • Measure performance
  • Now consider what to add
  • For each addition, measure performance

11
What This Yields
  • HashCache family
  • One basic storage engine
  • Pluggable algorithms indexing
  • HashCache proxy
  • Web proxy using HashCache engine

12
Performance Comparison
13
Index Bits Per Object
240
576
14
Index Bits Per Object
39
240
31
11
576
0
0
15
HashCache Memory
16
Storage Limits w/2GB Index
17
Beyond Diminishing Returns
  • HTTP cachability has upper limit
  • Beyond that, items revalidated helps
  • Revalidation on demand, or background
  • Uncached content still cachable
  • Wide-area accelerators
  • Must still contact servers, though

18
Why WAN Acceleration?
  • Lots of slowly-changing data
  • Wikipedia
  • News sites
  • Customized sites
  • WAN acceleration middleboxes
  • Custom protocol between boxes
  • Standard protocols to rest of net
  • Less desirable than caches for Web

19
WAN Acceleration Dilemma
  • WAN accelerators use chunks
  • Transit stream broken into chunks
  • Small chunks high compression
  • Also lots of small objects
  • Large chunks high performance
  • But worse for compression
  • Memory disk important

20
Merging WAN Acc HashCache
  • Easily index huge chunks
  • Small chunks OK
  • Large chunks better
  • Store chunks redundantly
  • Optimize for performance compression
  • Communicate tradeoffs to cache layer

21
Deployments
  • Two cache instances deployed
  • Both in Africa
  • Shared machines, multiple services
  • Working with OLPC on deployment
  • Working on licensing
  • Hopefully resolved this year
  • Goal all-in-one server for schools

22
Longer Term Goals
  • Effort started around server consolidation
  • Virtualization nice, except for memory
  • Many apps very page-fault sensitive
  • Extracting sharing components desirable
  • More work in developing regions
  • Even within the US poor, rural, etc
  • Customization for school-like workloads
  • More work on peak/off-peak behavior
Write a Comment
User Comments (0)
About PowerShow.com