Title: Web Caching
1Web Caching
- By
- Amisha Thakkar
- Alpa Shah
2Overview
- What is a Web Cache ?
- Caching Terminology
- Why use a cache?
- Disadvantages of Web Cache
- Other Features
- Caching Rules
3Overview
- Caching Architectures
- Comparison of Architectures
- Cache Deployment Scheme
- Client Side Cache Cooperation
- Active Caching
4What is a Web Cache ?
- Cache is a place where temporary copies of
objects are stored - Cached information is generally closer to the
requester than the permanent information is - Objects -HTML pages, images, files
5What is a Web Cache?
6Caching Terminology
- Client - An application program that establishes
connections for sending requests - Server- An application program that accepts
connection to service requests by sending back
responses - Origin Server-The server on which the given
resource resides or is to be created
7Caching Terminology
- Proxy- An intermediary program which acts both as
a server and a client which requests on behalf
of the other clients - Proxy is not necessarily a cache
- Proxy does not always cache the replies
passing through it - It may be used on a firewall to monitor
accesses
8Why use a cache ?
- To reduce latency
- To reduce network traffic
- Load on origin servers will be reduced
- Can isolate end users from network failures
9Disadvantages of Web cache
- With cached data there is always a chance of
receiving stale information - Content providers lose access counts when cache
hits are served - Manual configuration is often required
- Operation of cache requires additional resources
- In some situations the cache can be a single
point of failure
10Other Features
- Depending on the perspective the following may be
good or bad - Cache requests on behalf of clients the
servers never see the clients IP addresses - Cache provides an easy opportunity to
monitor and analyze browsing activities - Cache can be used to block certain requests
11Types of Web Caches
- Proxy caches
- Serve a large number of users
- Large corporations and ISPs often set
- them up on the firewalls
- They are type of shared caches
- Browser caches
- Use a section of the computers hard disk
- to store objects that you have seen
12 Caching Rules
- Rules on which caches work -
- Some of them set in protocols
- Some are set by cache administrator
- Most common rules
- If the object is authenticated or secure it
- wont be cached
- Objects headers indicate whether the
- object is cacheable or not
13Caching Rules
- Object is considered fresh when -
- ? It has an expiry time or other age
- controlling directive set is still
- within the fresh period
- ? If the browser cache has already seen
- the object has been set to check
- once a session
14Caching Rules
- ? If a proxy cache has seen the object
- recently it was modified relatively
- long ago
- Fresh documents are served directly from the
- cache without checking with the origin server
15Caching Rules
- For a stale object , the origin server will
- be asked to validate the object , or tell
the - cache whether the copy is still good
- The most common validator is the time
- that the object was last changed
16Caching Architectures Hierarchical /Simple Cache
- Browser-cache interaction is same as browser
-host interaction, i.e. a TCP connection is made
item requested - If not found send request to parent cache
- Hierarchy built up - each level serving
indirectly a wider community of users
17Caching Architectures Hierarchical /Simple Cache
18Caching Architectures Distributed /Co-operating
Cache
- Decentralized(Cache Mesh)
- Multiple servers cooperate in such a way that
they share their individual caches to create a
large distributed one - Simply put caching proxies communicating with
each other to serve different users - On a cache miss, it checks with other proxy
caches before contacting the origin server
19Caching Architectures Distributed /Co-operating
Cache
- Caches communicate amongst themselves using a
protocol like ICP (Internet Cache Protocol) - Caches can be selected on the basis of
- Distances from the end user
- Specialize in particular URLs(location hint).
20Caching Architectures Distributed /Co-operating
Cache
- Why Distributed - limitations of hierarchy
- Width of cache in hierarchy caches at same
level are inaccessible to each other - LRU policy implies sufficient disk space
- Cost in replication of disk storage
- Amount of disk space reqd. depends on number
of users served breadth of reading
21Caching Architectures Distributed /Co-operating
Cache
- More the users ? more disk space higher in the
hierarchy - Exponential growth of number of documents on
WWW
22Caching Architectures Distributed /Co-operating
Cache
- Caching close to user - more effective, higher
the level lower the efficiency - Can be created for load balancing
- Most effective when serving a community of
interests
23Caching Architectures Distributed /Co-operating
Cache
- First an UDP packet sent for cache inquiry.
- Cache selection decision is determined by RTT
- Potential problem -network congestion because of
UDP - In favor-
- UDP exchange 2 IP packets, TCP at least 8
packets
24Caching Architectures Distributed /Co-operating
Cache
- UDP reply from cache can indicate
- a. Presence
- b. Speed
- c. Availability of requested documents
25Caching Architectures Hybrid Cache
26Comparison of Architectures
- Hierarchical caches placed at multiple levels
- Distributed caches only at bottom level no
intermediate caches
27Comparison of Architectures
- Performance parameters.
- ? Connection time (Tc)is defined as the time
since the document is requested first data byte
is received - ? Transmission time (Tt)is defined as the time
taken to transmit the document - ? Total latency Tc Tt .
- ? Bandwidth usage
28Comparison of Architectures
- Fig 3 -Connection time for different documents
popularity
29Comparison of Architectures
- For unpopular documents high connection time
- No of requests increases ? avg.. connection time
decreases - For extremely popular documents distributed has
smaller connection times
30Comparison of Architectures
- Fig 4 Network traffic generated
31Comparison of Architectures
- On lower levels, distributed caching practically
double the network bandwidth usage - Around the root node in national network, the
network traffic is reduced to half - Distributed caching uses all possible network
shortcuts between institutional caches,
generating more traffic in the less congested low
network levels
32Comparison of Architectures
- Fig 5 a, Not congested national network
33Comparison of Architectures
- The only bottleneck on the path from the client
to the origin server is the international path.
Hence transmission times are similar for both
34Comparison of Architectures
- Fig 5 b Congested National Networks
35Comparison of Architectures
- Both have higher transmission times compared to
the previous case - Distributed caching gives shorter transmission
times than hierarchical because many requests
travel through lower network levels
36Comparison of Architectures
- Fig 6 Average total latency
37Comparison of Architectures
- For large documents transmission time is more
relevant than connection times - Hierarchical caching gives lower latencies for
documents smaller 200 KB due to lower connection
times - Distributed caching gives lower latencies for
larger documents due to lower transmission times
38Comparison of Architectures
- The size- threshold depends on the degree of
congestion in national network - Higher the congestion, lower is the size-
threshold - Distributed caching has lower latencies than
hierarchical
39Comparison of ArchitecturesWith Hybrid Scheme
40Comparison of ArchitecturesWith Hybrid Scheme
41Comparison of ArchitecturesWith Hybrid Scheme
- In the hybrid scheme if the number of
cooperating caches (kc) is very small , the
connection time is high - When number of cooperating caches increases, the
connection times decreases up to a minimum - If the number increases over the threshold , the
connection time increases very fast
42Comparison of ArchitecturesWith Hybrid Scheme
43Comparison of ArchitecturesWith Hybrid Scheme
- For un-congested n/w the no.of coop caches (kt)
at every level hardly influences Tt - If no. of coop caches is very small , high Tt
vice -versa - If the no increases above the threshold the Tt
increases - Optimum no. of caches depends on the no of caches
reachable avoiding congested links
44Comparison of ArchitecturesWith Hybrid Scheme
45Comparison of ArchitecturesWith Hybrid Scheme
46Comparison of ArchitecturesWith Hybrid Scheme
- The no. of coop caches(kopt) at every level
depend on the document size to minimize the total
latency - For small documents the optimum no. is closer to
kc - For large documents the the optimum no. is closer
to kt
47Comparison of ArchitecturesWith Hybrid Scheme
48Comparison of ArchitecturesWith Hybrid Scheme
- For any document the optimum kopt that minimizes
the total latency is such that kc? kopt?kt
49Cache Deployment Schemes
50Cache Deployment Schemes
- Advantages
- ? Clients point all web requests directly to
cache no effect on non web traffic - ?Cost of upgrading h/w s/w is limited
- ? Administration on caches limited to basic
configuration -
51Cache Deployment Schemes
- Disadvantages
- ?Every browser must be configured to point to
the cache - ?Each client can hit only one cache
- ?Single point of failure
- ? Unnecessary duplication of data
- ? Bottleneck in cases where content is otherwise
available in LAN
52Cache Deployment Schemes
- Transparent Proxy caching
53Cache Deployment Schemes
- Advantages
- ?No browser configuration
- ?Cost of upgrading h/w s/w is limited
- ?No administration of intermediate systems
required
54Cache Deployment Schemes
- Disadvantages
- ? Each client can hit only one cache
- ?If cache goes down internet as well as
intranet access lost - ? Negative impact on non web traffic
- ? Cache has to route non web traffic
- ? Routing ,packet examination n/w addr.
translation steal CPU cycles from the main cache
serving function
55Cache Deployment Schemes
- Transparent proxy caching with web cache
redirection.
56Cache Deployment Schemes
- Advantages
- ?Switch/ router examines the packets
- ?Minimal impact on non-web traffic
- ?Frees up CPU cycles for the web cache
- ? Allows client load to be dynamically spread
over multiple caches - ? Eliminates single point of failure especially
if redundant redirectors are used
57Cache Deployment Schemes
- Disadvantages
- ?Additional intermediate systems must be
deployed - ? Increases expense
58Client Side Cache Cooperation.
59Active Caching
- Current problem unable to cache dynamic documents
- Caching Dynamic contents on the web using active
web - Cache applet is server supplied code that is
attached with an URL , or collection of URLs - Applet is written in platform independent language
60Active Caching
- On a user request the applet is invoked by the
cache - The applet decides what is to be sent to the user
- Other functions of the applet-
- Logging user accesses
- Checking access permissions
- Rotating advertising banners
61Active Caching
- The proxy has the freedom to not invoke the
applet but send the request to the server - Proxy promises to not send back a cached copy
without invoking the applet - If applet too huge ,send request to server
- Proxy not obligated to cache any applet , in that
case agrees to not service the request for that
document
62Active Caching
- Proxy can devote resources to the applets
associated with the hottest URLs to its user - Proxy that receives the request is typically the
proxy closest to the user , the scheme
automatically migrates the server processing to
the nodes that are close to users - Thus increasing the scalability of web based
services