Title: 20755: The Internet
120-755 The Internet
2Reading Assignments
3Chapter 5HTTP Support For Caching and
Replication
4Caching To update or not to update?Conditional
Requests
- Client specifies a condition
- If the condition is true, the server sends the
message body. - If the condition is not true, the server
indicates this with a status code in a
conditional header. - The predicate should be equivalent to Has this
object changed since I cached a copy?
5Conditional Headers Common in Cachingif-modified
-since
- if-modified-since last-modified-date
- If the object has changed since the specified
date, execute the request. The dates are rounded
to a whole second. - Otherwise, send the 304 Not Modified status
code. - This is typically used to request another copy of
an object if, and only if, it has changed since a
prior request cached a copy. - Can you think of any pitfalls with this approach?
Is it possible to get a stale item from the
cache? Try to provide two scenarios.
6Conditional Headers Common in Cachingif-modified
-since
- Scenario 1
- The requested object was modified during the
same second that our prior request was serviced. - Although the object has changed, it wasnt within
the precision of the time stamp. - Please note A stale object might be accepted as
up-to-date indefinitely, not just for one second - Scenario 2
- The object is changed just after the 304 Not
Modified is dispatched, but before this message
is received by the client - Can you think of a solution?
7Conditional Headers Common in CachingEntity Tags
- We need of naming an object, such that the name
changes if the object changes. - This name is specified by the server when
delivering the object as part of the header
through an Entity Tag. - Typically, the entity tag is the MD5 hash of the
object, but it could be any unique name - MD5, is a technique for computing a hash value or
message digest for an object. The goal is to
produce a fingerprint for each object something
that is extremely unlikely to be the same for two
different objects. - Then, these fingerprints can be compared instead
of the objects, themselves. - This saves time and congestion, since they are
very small 16 bytes
8Conditional Headers Common in CachingETag and
if-none-match
- if-none-match ETag
- Each time the server sends an object, it can send
the MD5, or other, fingerprint as part of the
header in an ETag line. - The client can then make future requests for the
object conditional on the ETag changing using the
if-none-match conditional header. - If it hasnt change, the request wont be made.
Instead the server will return 412 Precondition
Failed. - Typically, the client will make a GET conditional
upon the ETag of the object having changed from
the cached copy.
9When to Validate the Cache?
- We can use either if-modified-since or
if-none-match to validate the cache. - But, how often should we check?
- Each time?
- Every minute?
- Every few days?
- Would it be the same for, for example, CNNs
background wallpaper as the breaking news HTML
container page?
10When to Validate the Cache?
- Really, we need the servers guidance only the
content author really knows how often a
particular object is likely to change and the
consequences of using a stale copy. - Sometimes, the client might have a comfort zone
smaller than the server. It should be able to
specify tighter bounds to control intermediate
caches, for example, a browsers own cache or
that of an intermediate proxy.
11The expires Header
- The server can specify the date and time, to the
nearest second, when any cached copies become
invalid using the expires header. - When this happens, the cached copy is said to
expire.
12cache-control Header Directive
- HTTP/1.1 added a new, more powerful mechanism to
control caching. - It allows both the client and the server to
specify bounds. - It also allows more than one type of
specification to be combined the lowest bound
wins. - If a server presents both an expires header, and
a cache-control header, the cache-control header
wins. In this case, the expires header is viewed
as the less-sophisticated technique provided only
for backward compatibility.
13cache-control Header Directive In Responses
- no-cache
- Do not cache this response (at the requestor or
proxy) - no-store
- Do not cache this response, and furthermore,
dont even store it (at the requestor or proxy) - private
- If cached, it is only valid for the client that
made the original request. - public
- Cached copies can be used by any client
14cache-control Header Directive In Responses,
cont.
- must-revalidate
- Okay to cache, but copy needs to be revalidated
using a conditional request. - proxy-revalidation
- Same as above, but applies only to proxies, not
client caches. - max-age time_in_seconds
- A cached copy can remain valid for not more than
the specified number of seconds - s-maxage
- Same as above, but applies only to proxies, not
client caches. - no-transform
- Do not change the content in any way. For
example, do not reduce its quality for a
low-bandwidth network or reformat it for a PDA.
15cache-control Header Directive In Requests
- no-cache
- Do not provide a cache response
- no-store
- Do not store the response to this request in any
cache. - max-age
- Independent of its expiration date, dont satisfy
this request with an object older than specified.
- min-fresh
- Dont provide a cached copy that will expire
soon. For example, the client may not want a
cached copy that will expire before it actually
expects to use it.
16cache-control Header Directive In Requests,
cont.
- max-stale
- Used to indicate that even a recently expired
object can be used to satisfy the request, if its
expiration time was less than the specified
amount of time in the past. - No-transform
- Proxies should not transform responses provided
by the original server. For example, to reduce
quality for a low-bandwidth network or reformat
for a PDA. - Only-if-cached
- A proxy should not forward the request on a cache
miss.
17cache-control Header Directive, cont.
- As you might have observed, many header
directives in requests prompt the server to issue
the same directives in the response. - For example, no-cache and no-store
- Directives can be strung together in a comma
separated list. - Cache-control public, max-aget, no-transform
18Best-effort Replication
updates
Replica 1 Version 1
Replica 2 Version 1
Replica 3 Version 1
Replica 4 Version 1
Version 2
Version 2
updates
Client 1 Changes object
Client 4 Changes object
Which version 2 remains on each replica?
19Conditional Headers Common in Replicationif-unmo
dified-since
- if-modified-since last-modified-date
- Servers can avoid over-writing changes by doing
the updates only if the target replica has the
same old version. - If the target replica does not have the same
older version as the updating replica, it
either has missed an update, or it has a
concurrent update. - If the update is concurrent, both versions are
equally up-to-date. This should be logged and
corrected by an administrator, who can determine
which one is correct or merge changes, c - Is there a problem with this approach?
20Conditional Headers Common in Replicationif-matc
h
- if-match ETag
- As before, things could still change after the
updating replica gets the reply, but before the
update arrives. - As before, the timestamp has a resolution of
whole seconds. - As before, we can work around the later problem
by using Entity tags instead of the timestamp. - This time, if-match is used instead of
- if-not-match.
21Request Redirection
- The ability to redirect a request from one server
to another is important in content deliver. - It, for example, allows a server to tell the
client about mirrors or proxy caches - These status codes are in the 3xx range.
- The new location is provided using the Location
header, except for 300 Multiple Choice, in which
case it is provided in the body.
22Request Redirection, cont.
- 300 Multiple Choices
- A list of locations that can satisfy the request
is provided in the body of the message. This
might be used to distribute load to mirrors. - 301 Moved Permanently
- This might be used as a result of infrastructure
organizational changes. It advises the client to
look to a different server for the requested
object. - 302 Found
- Deprecated. Equivalent to temporary redirect.
- 303 See Other
- A redirection that provides an alternate object
name, not just location. -
- 305 Use Proxy
- Use the specified proxy
- 307 Temporary Redirect
- Use the provided URL this time, but not in the
future
23Range Requests
- Sometimes the transfer of an object is
interrupted, for example by the failure of a
modem line. - Clients can request only partial retransmissions
using the if-range header. - The client makes the request exactly as before,
except it specifies the range using the Range
header. - range 2000-4000
- range 4000 -
- Typically, it also specifies the Etag and an
if-range header, to ensure that it is getting
pieces from the same object. - if-range ETAG
24Cookies
- Cookies allow servers to record information about
clients when they make requests. - They then give this information back to the
client and ask that the client represent it each
time - Often times, cookies are used to drive dynamic
content in this way, servers can information
about prior contact with a client to drive future
contact. - In many ways, cookies allow client-preserved
sessions.
25Cookies, cont.
- Cookies are sent by the server to the client
using the set-cookie header. - This header can specify the name of the cookie,
the domain to which it is applicable, the path
(name of object) to which it is applicable, and a
value - Set-cookie nameprofile, domainwww.gregorykesden
.com, pathlogin.html,cookiegkesden
26Cookies, cont.
- Although permitted, cookies should not be cached,
because they are specific to a user, not to the
object requested. - HTTP/1.1 allows the caching of responses with
cookies, so explicit cache-control should be
used. But, as a precaution, with or without
explicit direction form the server, many proxies
will not cache responses with cookies. - Cookies are often considered a privacy concern,
since they are sent in plaintext and since the
client has no control over how the information
contained within the cookie will be used.
27Varying Objects
- Sometimes, the content of an object may differ,
depending on the header information contained in
the request. - This, for example, could be the result of
information contained within a cookie. - The vary header is provided by a server to
indicate which request header iformation shaped
the object. - Caches can then use this information to ensure
that they dont serve the wrong object for a
request.
28Virtual Hosting
- It is possible to host multiple domains from the
same server using virtual hosts. - The Web server is configured to discover the
domain and service the request differently
depending on the domain not just the object and
location. - In order to support virtual hosts, HTTP/1.1 added
the hosts request header. It specifies the name
of the host. This supplements the name and
directory of the object as included the request
line. - Hosts www.virtualdomain.com
29Learning the Proxy Chain
- If a request is made using TRACE, each hop
through a proxy, or other intermediary, will be
recorded in a via header. This records the name
of the proxy and the protocol version. - After reaching the server, the information from
the via header is sent back in the body of the
response.
30Cacheability of Content
- Things within a cgi-bin directory are generally
not cached, the same is true of a (?), or an
object with the .cgi extension these indicate
CGI programs with dynamic content. - Requests invovling cookies are generally not
cached the content could be cookie dependent. - Usually only GET and HEAD responses are cached.
- Requests with authetication information are
generally not cached.