Web%20Caching - PowerPoint PPT Presentation

About This Presentation

Title:

Web%20Caching

Description:

Web Caching Dr. Yingwu Zhu – PowerPoint PPT presentation

Number of Views:192

Avg rating:3.0/5.0

Slides: 52

Provided by: Ying139

Learn more at: http://fac-staff.seattleu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Web%20Caching

1
Web Caching

Dr. Yingwu Zhu

2
What is Web Caching

Introducing proxy servers at certain points in
the network that serve in caching Web documents
for faster client access.
Comparable to the cache memory in a computer
system

3
Proxy Cache
clients
servers
Reply
Req.
proxy
Req.
Reply
4
How?

Client send requests to the proxy.
If the requested document is in its cache, the
proxy serves the request from its cache.
Otherwise, the proxy forward the request to the
server.
Server replies the request through the proxy
(proxy keep a copy of the requested document).

5
Why Web Caching?

Rapid growth in HTTP traffic to form the largest
part of the Internet traffic which causes more
network congestion and server unavailability.
The number of Web static pages almost doubles
every year
Some old data
Number of unique pages 800M lt X lt 2.2B
Number of unique web sites 8,500,000
static pages 30 - 40
pages revisited 80
expected hit-rate 24 - 32

6
Why Web Caching?

Bandwidth
Latency
Performance Response Time
Server Load
Failure Redundancy

7
Expected Gains

Bandwidth saving
Improving content availability.
Improving web server availability.
Server load balancing.
Reducing user-perceived latency

8
What Content and Protocols

HTTP 1.0 Basic protocol
Send Request based on fix number of verbs
GET
HEAD
POST
Receive response, meta-data, content

9
What Content and Protocols

HTTP Request
Request Simple-Request Full-Request
Simple-Request "GET" SP Request-URI CRLF
Full-Request Request-Line
( General-Header
Request-Header
Entity-Header )
CRLF
Entity-Body

10
What Content and Protocols

Example
GET /pub/www/index.html HTTP/1.0
Response
HTTP/1.1 200 OK
Server Microsoft-IIS/5.0
Date Sat, 19 Oct 2002 054653 GMT
Expires Sun, 20 Oct 2002 160000 GMT
Content-Length 2291
Content-Type text/html
Cache-control private

11
What Content and Protocols

Example if-modified-since
GET /pub/www/index.html HTTP/1.0
If-Modified-Since Sat, 19 Oct 2002 194331 GMT
Response
HTTP/1.1 200 OK
Server Microsoft-IIS/5.0
Date Thu, 13 Jul 2000 054653 GMT
Expires Sun, 20 Oct 2002 160000 GMT
Content-Length 2291
Content-Type text/html
Cache-control private

12
What Content and Protocols

Example if-modified-since
GET /pub/www/index.html HTTP/1.0
If-Modified-Since Sat, 19 Oct 2002 194331 GMT
Response
HTTP/1.1 304 Not Modified

13
HTTP support for caching

Conditional requests (IMS)
Servers can set expires and max-age
Request indirection application level routing
Range requests, entity tag
Cache-control header
Requests min-fresh, max-stale, no-transform
Responses must-revalidate, public, private,
no-cache

14
Where
Local ISP
Content Server
Reverse Proxy
cache
cdn
L4 Switch
Data Center ISP
Intranet
cache
Browser
cache
Browser
cache
Browser
cdn
cache
15
Cache Types

Proxy Caching
Reverse Proxy Caching
Transparent Caching
Adaptive Caching
Push Caching
Active Caching

16
Proxy Caching

Harvest/Squid
Provide web content for a fixed user base
Deployed at the network edges (company or
institutional gateway or firewall hosts)
Standalone operation
Manual configuration in web browsers
Commodity product/technology
Single point of failures

17
Reverse Proxy Caching

Designed to offload duties from one or more
specific servers
Data size is limited to size of static content on
the server
Challenge is fast, disk-less operation
Cache consistency is easy

18
Transparent Caching

Intercept HTTP requests and redirect them to web
cache servers or cache clusters
No client configuration
Violates end-to-end paradigm
Client thinks it is talking directly to server
Server thinks it is talking to cache
Implemented as L4-switch
Layer 4 switch makes switching decisions based on
TCP or UDP port number, i.e., 80

19
Transparent Caching
20
Adaptive Caching

ISP Level caching, global data placement
optimization
Cooperating multiple distributed caches
Operate as a cache-mesh based on content demand
Cache Group Management Protocol
How meshes are formed
How individual caches join/leave the meshes
Content Routing Protocol sends request to the
appropriate cache within the meshes
Uses distributed cache meshes to solve the hot
spot problem
Caches dynamically join and leave the groups
based on content demand
Administrative boundaries must be relaxed

21
Push Caching

Keep data close to those clients requesting this
information
Send the data out proactively
Assumption we are able launch caches that may
cross administrative boundaries
Incurs cost (storage and transmission)

22
Active Caching

Applies caching to dynamic documents
30 of client HTTP requests contains cookies
The servers provides the cache with the objects
and any associated cache applets
Use an applet inside of the cache to customize
dynamic pages on the fly

23
Cache Placement/Deployment

Close to clients/content consumers
Proxy caching
Transparent proxy caching
Close to servers/content providers
Improve access to logical sets of data
Delay-sensitive data video, audio
Reverse proxy caching
Push caching
Network choke points strategic deployment
Adaptive caching
Problem with administrative control

24
Zipf Law vs. Web Access

Zipf Law
Web Access
Caching?

25
Zipfs Law

Zipfs law The frequency of an event P as a
function of rank i is a power law function
Pi ? / ia where a 1

26
Zipfs Law

Observed to be true for
Frequency of written words in English texts
Population of cities
Income of a company as a function of rank

27
Zipfs Law vs. Web Access

For a given server, page access by rank follows
Zipfs law
Web requests from a fixed population of users
follows Zipfs law 0.64 lt a lt 0.83

28
Observations

Top 1 of all documents account for 20 - 35 of
proxy requests
Top 10 account for 45 - 55 of requests
It takes 25 to 40 of all documents to account
for 70 of requests
It takes 70 to 80 of all documents to account
for 90 of requests

29
Zipfs Law and Caching

Discussion
How does this help in cache design?

30
Basic caching algorithm

Pages may be
Fresh up-to-date
Expired current date gt expiration date
Stale old

31
Basic caching algorithm - 2

If (page is in the cache)
if ( page is expired or stale )
Get from server - if-modified-since
If not modified, Get from cache
Get from Server
Else
Get from Server

Soft Miss
32
Basic caching algorithm - 3

If cache has space
Store the file
Else
Delete expired from cache
Delete stale from cache
Delete LRU from cache
Delete largest/smallest from cache?

33
Cache Replacement

Cache size is limited, need replacement policy
LRU
LFU
Greedy-dual size
Many others

34
Cache Consistency

Multiple copies of objects created
How and when renewing the copies?
Goals
Avoid stale copies
Keep non useful traffic as low as possible

35
Cache Consistency Polling

Solution 1 polling every time

implemented in HTTP using the optional
if-modified-since" request header field Benefit
strong consistency Drawback very slow cache hit
36
Cache Consistency Polling

Solution 2 polling if TTL expires, widely used
Associate a TTL (12 hours or 2 days) with each
cached object

implemented in HTTP using the optional "expires"
header field Benefit fast cache hit Drawback
weak cache consistency (5 stale) due to TTL is
an a priori estimate of an object's life time
37
Cache Consistency

Solution 3 Invalidation Protocols
The server helps the proxy in maintaining
consistency
Invalidation protocols
When the proxy makes a request,
Piggyback cache validation (PCV) the proxy
provides some other potentially stale copies for
server validating
Piggyback cache invalidation (PCI) the server
provides some copies which have been updated
since last access
Use of volumes
Volume lease
The client receive a lease from the server
During the lease validity the client can retreive
copies from proxy
When the lease expire the client has to renew it
Problems scalability, servers needs keep cache
states

38
Cache Cooperation

Hierarchical caching
Cache servers form a hierarchy, tree-like
structures
Parent servers top of the hierarchy, receive
requests from child servers. If they do not have
the requested objects, either ask their parents
or original web servers
Sibling servers if the local cache does not have
the requested object, then ask its sibling
caches. If the sibling caches do not have the
object, then the local cache asks the parent cache

39
(No Transcript)
40
Cache Hierarchies

Use hierarchy to scale a proxy
Why?
Larger population higher hit rate (less
compulsory misses)
Larger effective cache size
Why is population for single proxy limited?
Performance, administration, policy, etc.
NLANR cache hierarchy
Most popular
9 top level caches
Internet Cache Protocol based (ICP)
Squid/Harvest proxy
How to locate content?

41
ICP (Internet cache protocol)

Simple protocol to query another cache for
content
Uses UDP why?
ICP message contents
Type query, hit, hit_obj, miss
Other identifier, URL, version, sender address
Special message types used with UDP echo port
Used to probe server or dumb cache
Query and then wait till time-out (2 sec)
Transfers between caches still done using HTTP

42
Squid
Parent
ICP Query
ICP Query
Child
Child
Child
Web page request

Client

43
Squid
Parent
ICP MISS
ICP MISS
Child
Child
Child

Client

44
Squid
Parent
Web page request
Child
Child
Child

Client

45
Squid
Parent
ICP Query
ICP Query
ICP Query
Child
Child
Child
Web page request

Client

46
Squid
Parent
ICP HIT
ICP MISS
ICP HIT
Child
Child
Child
Web page request

Client

47
Squid
Parent
Web page request
Child
Child
Child

Client

48
Hierarchical caching

Ideally, want the cache mesh to behave as a
single cache with equivalent capacity and
processing capability
ICP many copies of popular objects created
capacity wasted
High Latency More than one hop needed for
searching object
How to improve? ? Discuss!

49
Problems with caching

Over 50 of all HTTP objects are uncacheable.
Sources
Dynamic data ? stock prices, frequently updated
content
CGI scripts ? results based on passed parameters
SSL ? encrypted data is not cacheable
Most web clients dont handle mixed pages well
?many generic objects transferred with SSL
Cookies ? results may be based on passed data
Hit metering ? owner wants to measure of hits
for revenue, etc, so, cache busting

50
Risks of Using Proxy

Benefits reduce latency, bandwidth saving, etc.
Risks
Obsolete data
Violate client privacy the proxy can keep a log
file telling which objects the client has
requested
Data integrity

51
Real Proxy Servers

Squid The most widely used. The better working
and the free one.
http//www.squid-cache.org/
Microsoft ISA Server 2004 Microsoft developed
ISA to replace Microsoft proxy server. Its fully
functional with Active Directory
http//www.microsoft.com/isaserver/
Apache Apache web server has a module to do
reverse caching (experimental)
http//httpd.apache.org/docs-2.0/mod/mod_cach
e.html
Cisco Cache Engine sits next to (mostly) Cisco
routers and receives transparently redirected
HTTP requests http//www.cisco.com/warp/public/cc/
pd/cxsr/500/index.shtml
CERN/W3C HTTPd It was the original proxy server.
http//www.w3.org/hypertext/WWW/Daemon/Status.html