Title: Chapter 9 HTTP, Caching, Load Balancing
1Chapter 9HTTP, Caching, Load Balancing
- Professor Rick Han
- University of Colorado at Boulder
- rhan_at_cs.colorado.edu
2Announcements
- HW 6 due next week, Thursday April 17
- Programming Assignment 4 soon
- No OH Today, Wed OH 1-3 pm
- Attending a conference on Tuesday April 22
- Next, Application Layer
3Recap of Previous Lecture
- Domain Name Service
- Translate/resolve a name to an IP address
- www.cs.colorado.edu 128.9.17.42
- Hierarchical name space
- Hierarchical name servers
- Root name servers about a dozen
- Then, .edu, .com, .gov, .mil, .org, .net,
- Local name server
- Authoritative name server gives back final IP
address - Recursive vs. iterative queries
- Caching
4DNS Lookup Example
root edu DNS server
www.cs.colorado.edu
www.cs.colorado.edu
NS colorado.edu
colorado.edu DNS server
Local DNS server
NS cs.colorado.edu
Client
cs.colorado.edu Authoritative DNS server
wwwIPaddr
Courtesy Srini Seshan
5More on DNS
- In addition to name translation, DNS helps with
- Host aliasing
- DNS supports multiple host names for a single IP
address, e.g. yahoo.com and www.yahoo.com - Load distribution
- Instead of HTTP Redirect, use DNS!
- A busy site like cnn.com will have multiple
replicated Web servers, each with a different IP
address - A set of IP addresses associated with cnn.com
- DNS can return multiple records that match a
single name - Order of replicated server addresses is rotated
6DNS Message Format
Identification
Flags
No. of Questions
No. of Answer RRs
12 bytes
No. of Authority RRs
No. of Additional RRs
Name, type fields for a query
Questions (variable number of answers)
Answers (variable number of resource records)
RRs in response to query
Authority (variable number of resource records)
Records for authoritative servers
Additional Info (variable number of resource
records
Additional helpful info that may be used
7More on DNS (2)
Server Farm
DNS
Client
- DNS helps with
- Load distribution (cont.)
- DNS round robin to N servers
- Akamaizing Smarter than DNS round robin
choose the server closest to you better
response time! - Akamai serves a subset of cnn.com
- Each URL in Akamai subset has a name for which
the Akamai DNS server is authoritative - www.cnn.com/foo.gif ? a128.g.akamai.net/foo.gif
8More on DNS (3)
- RFC 1794, DNS Support for Load Balancing
- DNS also helps with
- Mail server aliasing
- Given hotmail.com, return the specific host name
- BIND is a popular name server for Unix
9More on DNS (4)
- Dynamic DNS
- Mapping your well-known Web name to a dynamic IP
address (from DHCP) - Suppose youve reserved a hostname www.myweb.org
to serve Web pages from your home PC - Each time your PC connects via cable/DSL, your
ISP assigns your PC a different dynamic IP
address via DHCP - Users wont know your dynamic IP address but may
remember your Web address. How can they reach
you? - Solution your PC includes code snippet to update
DNS each time your PC gets a new IP address via
DHCP - Your PC must have authorization at a DNS server
to update its DNS record - Dynamic DNS services are being offered on the
Web, some for free, others you pay
10HyperText Transfer Protocol (HTTP)
- Basis for Web
- Application-layer protocol built on top of TCP
- Request-Response type of protocol
- Request e.g. GET URL HTTP_version_
- Response from server
- Requests and responses are encoded in text
- Stateless after request and response, no further
state maintained - Cookies maintain session state outside of HTTP
11HTTP Request
- Request headers
- Authorization authentication info
- Acceptable document types/encodings
- From user email
- If-Modified-Since ? return page only if mod
after date - Referrer what caused this page to be requested
- User-Agent client software
- Blank line
- Body
12HTTP Request Example GET
- GET / HTTP/1.1
- Accept /
- Accept-Language en-us
- Accept-Encoding gzip, deflate
- User-Agent Mozilla/4.0 (compatible MSIE 5.5
Windows NT 5.0) - Host www.seshan.org
- Connection Keep-Alive
13HTTP Response
- Headers
- Location for redirection
- Server server software
- WWW-Authenticate request for authentication
- Allow list of methods supported (get, head,
etc) - Content-Encoding E.g x-gzip
- Content-Length ? bytes in content
- Content-Type ? MIME type
- Expires ? when contents become stale
- Last-Modified ? time contents last mod by servr
- Blank-line
- Body
14HTTP Response Example
- HTTP/1.1 200 OK
- Date Tue, 27 Mar 2001 034938 GMT
- Server Apache/1.3.14 (Unix) (Red-Hat/Linux)
mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2
PHP/4.0.1pl2 mod_perl/1.24 - Last-Modified Mon, 29 Jan 2001 175418 GMT
- ETag "7a11f-10ed-3a75ae4a"
- Accept-Ranges bytes
- Content-Length 4333
- Keep-Alive timeout15, max100
- Connection Keep-Alive
- Content-Type text/html ? MIME Type
- ..
15HTTP 0.9/1.0
- One HTTP 1.0 request/response per TCP connection
- Simple to implement
- Disadvantages
- Multiple connection setups ? three-way handshake
each time - Several extra round trips added to transfer
- Netscape browser opens up to 4 parallel HTTP 1.0
connections - Multiple slow starts
16HTTP 1.0 Interaction With TCP
Server
SYN
0 RTT
SYN
Client opens TCP connection
1 RTT
ACK
DAT
Client sends HTTP request for HTML
ACK
Server reads from disk
DAT
FIN
2 RTT
ACK
Client parses HTML Client opens TCP connection
FIN
ACK
SYN
SYN
3 RTT
ACK
DAT
Client sends HTTP request for image
Server reads from disk
ACK
4 RTT
DAT
Image begins to arrive
Courtesy Srini Seshan
17More HTTP 1.0 TCP Interaction Problems
- Lots of extra connections
- Increases server state/processing
- Server also forced to keep TIME_WAIT connection
state for dead TCP connections - Tends to be an order of magnitude greater than
of active connections
18HTTP 1.1 Persistent Connection Solution
- Multiplex multiple requests onto one open TCP
connection ( multiple responses in reverse
direction) - Serialize transfers ? client makes next request
only after previous response - Reduce slow start latency
- Reduce amount of TCP state at both endpoints
- Reduce overhead
- HTTP 1.1 adds complexity because multiple
requests (and responses) have to be multiplexed
and demultiplexed
19HTTP 1.1 Persistent Connection Example
Server
0 RTT
DAT
Client sends HTTP request for HTML
ACK
Server reads from disk
DAT
1 RTT
ACK
Client parses HTML Client sends HTTP request for
image
DAT
Server reads from disk
ACK
DAT
2 RTT
Image begins to arrive
Courtesy Srini Seshan
20Web Caching Proxies
- Place a Web caching proxy in the network between
Web client and Web server - Reduces client response time
- HTTP GET only goes as far as intermediate cache,
rather than all the way to server - Reduces network bandwidth usage
- HTTP GET doesnt travel over wide area from
caching proxy to server - Reduces server load
- HTTP GET never reaches server
21Web Proxies
- Used for Caching
- Improved response time, etc. from previous slide
- Provides a centralized coordination point to
share cached information across all of a
companys client hosts - Also used for security
- Proxy for a company can be the only host that can
access Internet - Administrators makes sure that it is secure
- Used for protocol translation
- Translate HTTP 1.0 to/from HTTP 1.1. Enables old
HTTP 1.0 clients to connect to HTTP 1.1 servers,
and benefit from HTTP 1.1 performance boosts
22Designing Caching Proxies
- How much can/should be cached?
- How large a cache is necessary?
- On disk vs. in memory ? typically on disk
- What are the cache hit rates?
- If user behavior is uncorrelated, have to cache a
lot of data to improve response time, resulting
in small cache hit rate - If user behavior is correlated, i.e. everyone
visits only a few Web sites, then cache less data
and still improve response time (high cache hit
rate)
23Designing Caching Proxies (2)
- What can be cached?
- Cache first-time unknown documents/objects
- Non-cacheable documents
- CGI-scripts
- Personalized documents (cookies, etc)
- Encrypted data (SSL)
- Document should no longer be cached if
updated/expired before reuse
24Designing Caching Proxies (3)
- Performance
- How many TCP connections can the proxy handle?
- How to efficiently index into database/cache?
- Early caches used file system to find file
- Metadata now kept in memory on most caches
- Prefetching combine with caching to reduce
response time - Proxy parses a Web page and prefetch its
hyperlinked objects before the client asks for
them - Example when a proxy fetches a Web page on
behalf of a client, the proxy will parse and
cache the Web page returned by the server, and
then prefetch all links before client requests
them - Not widely used due to poor hit rates?
25Caching Policy at Proxy
- Relevant HTTP fields
- Request
- If-Modified-Since
- Response
- Last-Modified
- Expires
- Caching proxy doesnt cache pages with
- Pragma no-cache header field
- WWW-Authenticate or Authorization headers
- Server and proxy clocks must be reliable
26Caching Policy at Proxy (2)
- Browser has its own browser cache.
- Browser sends a conditional GET with
If-Modified-Since header field when - a user hits Reload, or
- a page expired in browser cache, or
- browser set to always ask for a page,
- A conditional GET will only succeed in returning
a page if that page has been modified since the
If-Modified-Since date. - Otherwise, get back a status code 304 Not
Modified - Caching proxy receives a conditional GET what
is its policy?
27Caching Policy at Proxy (3)
- If page not in proxys cache, or cached page has
expired, or if cached page was Last-Modified
earlier than requests If-Modified-Since date,
then - Forward Conditional GET to server. If server
finds its page was Last-Modified earlier than
If-Modified-Since date, then servers response to
proxy is status 304 Not Modified - Proxy returns status 304 Not Modified to client
- Else, server returned fresher page, so proxy
caches it and returns it to client - Else
- return cached page to client (its not expired
and its freshly modified)
28Caching Policy at Proxy (4)
- Summary
- cached page returned only if not expired and new
enough (recently modified), otherwise return
fresher page from server or status 304 message - Expires header may be missing, so proxy has
to guess a probable expiration date - If Last-Modified is recent, then guess the page
is changing frequently, so choose a quick
expiration date - Chaining of caches is allowed, e.g. Browser cache
chained with Proxy caches - SQUID caching proxies are common freeware
- based on Harvest caches developed in part at the
University of Colorado, see http//www.squid-cache
.org
29More on HTTP
- Configure the Web browser to access the Web via
the HTTP proxy - Internet Explorer Tools ? Internet Options ?
Connections ? LAN Settings - Netscape Edit ? Preferences ? Advanced ? Proxies
? Manual - DNS lookup by HTTP Proxy
- Given URL http//www.cs.colorado.edu/index.html,
then proxy must call DNS to translate
www.cs.colorado.edu to 128.138.242.195 - Then, proxy establishes HTTP over TCP connection
to 128.138.242.195 to retrieve URLs page
30More on HTTP (2)
- Most browsers now support HTTP 1.1
- Compatibility with HTTP 1.0 is expected but not
mandated - Eased via HTTP proxies
- Load balancing via HTTP Redirect
- In response to a GET request, a server can return
an HTTP Redirect Response - Server selects another server that is less loaded
- Client is redirected to again send GET request to
less loaded server
Redirect (2)
HTTP Server 2
HTTP Server 1
GET (1)
GET (3)
Client
31More on HTTP (3)
- GET
- Retrieve document, No payload
- One-step roundtrip process
- Incorporate parameters via long URL
- The server returns a response file with a MIME
header identifying the type of file. - MIME was developed for email, but is reused by
HTTP - POST sent from Client to Server
- typically used by HTML to send data to a
back-end CGI script - Two-roundtrip process contact form-processing
server, then send data - Give information to a server, has payload
- Expect a response
32More on HTTP (4)
- POST vs. GET
- Use POST instead of GET if you want to send
complex long text fields/parameters to server - PUT sent from Client to Server
- Store document at server under specified URL,
- May be disabled at server to avoid modifying
files - Receive a response Created, Modified,
- POST vs. PUT
- POST URL specifies the CGI process that will
handle the enclosed form - PUT URL specifies the enclosed document to be
created/stored
33Load Balancing Techniques
- HTTP Redirection
- DNS Load Balancing
- Router-based
- Zany idea 1 N servers each advertise the same
IP address. Let IP shortest-hop routing
determine the nearest server. - Hopefully no loops.
34Load Balancing Techniques (2)
- Router-based
- Better idea Place an IP router in front of N
servers the router balances the load - Example each server has different IP address,
and router substitutes IP address of lightest
loaded server - If a TCP connection is established to a specific
server X, router must remember to route packets
for this TCP connection to server X only - router cant just choose most lightly loaded
server L, because L might not be X, so server L
would not be expecting server Xs TCP packets - NAT-Based (see NAT section)