Chapter 9 HTTP, Caching, Load Balancing - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Chapter 9 HTTP, Caching, Load Balancing

Description:

Dynamic DNS services are being offered on the Web, some for free, others you pay ... Allow list of methods supported (get, head, etc) Content-Encoding E.g x-gzip ... – PowerPoint PPT presentation

Number of Views:390
Avg rating:3.0/5.0
Slides: 35
Provided by: profri
Category:

less

Transcript and Presenter's Notes

Title: Chapter 9 HTTP, Caching, Load Balancing


1
Chapter 9HTTP, Caching, Load Balancing
  • Professor Rick Han
  • University of Colorado at Boulder
  • rhan_at_cs.colorado.edu

2
Announcements
  • HW 6 due next week, Thursday April 17
  • Programming Assignment 4 soon
  • No OH Today, Wed OH 1-3 pm
  • Attending a conference on Tuesday April 22
  • Next, Application Layer

3
Recap of Previous Lecture
  • Domain Name Service
  • Translate/resolve a name to an IP address
  • www.cs.colorado.edu 128.9.17.42
  • Hierarchical name space
  • Hierarchical name servers
  • Root name servers about a dozen
  • Then, .edu, .com, .gov, .mil, .org, .net,
  • Local name server
  • Authoritative name server gives back final IP
    address
  • Recursive vs. iterative queries
  • Caching

4
DNS Lookup Example
root edu DNS server
www.cs.colorado.edu
www.cs.colorado.edu
NS colorado.edu
colorado.edu DNS server
Local DNS server
NS cs.colorado.edu
Client
cs.colorado.edu Authoritative DNS server
wwwIPaddr
Courtesy Srini Seshan
5
More on DNS
  • In addition to name translation, DNS helps with
  • Host aliasing
  • DNS supports multiple host names for a single IP
    address, e.g. yahoo.com and www.yahoo.com
  • Load distribution
  • Instead of HTTP Redirect, use DNS!
  • A busy site like cnn.com will have multiple
    replicated Web servers, each with a different IP
    address
  • A set of IP addresses associated with cnn.com
  • DNS can return multiple records that match a
    single name
  • Order of replicated server addresses is rotated

6
DNS Message Format
Identification
Flags
No. of Questions
No. of Answer RRs
12 bytes
No. of Authority RRs
No. of Additional RRs
Name, type fields for a query
Questions (variable number of answers)
Answers (variable number of resource records)
RRs in response to query
Authority (variable number of resource records)
Records for authoritative servers
Additional Info (variable number of resource
records
Additional helpful info that may be used
7
More on DNS (2)
Server Farm
DNS
Client
  • DNS helps with
  • Load distribution (cont.)
  • DNS round robin to N servers
  • Akamaizing Smarter than DNS round robin
    choose the server closest to you better
    response time!
  • Akamai serves a subset of cnn.com
  • Each URL in Akamai subset has a name for which
    the Akamai DNS server is authoritative
  • www.cnn.com/foo.gif ? a128.g.akamai.net/foo.gif

8
More on DNS (3)
  • RFC 1794, DNS Support for Load Balancing
  • DNS also helps with
  • Mail server aliasing
  • Given hotmail.com, return the specific host name
  • BIND is a popular name server for Unix

9
More on DNS (4)
  • Dynamic DNS
  • Mapping your well-known Web name to a dynamic IP
    address (from DHCP)
  • Suppose youve reserved a hostname www.myweb.org
    to serve Web pages from your home PC
  • Each time your PC connects via cable/DSL, your
    ISP assigns your PC a different dynamic IP
    address via DHCP
  • Users wont know your dynamic IP address but may
    remember your Web address. How can they reach
    you?
  • Solution your PC includes code snippet to update
    DNS each time your PC gets a new IP address via
    DHCP
  • Your PC must have authorization at a DNS server
    to update its DNS record
  • Dynamic DNS services are being offered on the
    Web, some for free, others you pay

10
HyperText Transfer Protocol (HTTP)
  • Basis for Web
  • Application-layer protocol built on top of TCP
  • Request-Response type of protocol
  • Request e.g. GET URL HTTP_version_
  • Response from server
  • Requests and responses are encoded in text
  • Stateless after request and response, no further
    state maintained
  • Cookies maintain session state outside of HTTP

11
HTTP Request
  • Request headers
  • Authorization authentication info
  • Acceptable document types/encodings
  • From user email
  • If-Modified-Since ? return page only if mod
    after date
  • Referrer what caused this page to be requested
  • User-Agent client software
  • Blank line
  • Body

12
HTTP Request Example GET
  • GET / HTTP/1.1
  • Accept /
  • Accept-Language en-us
  • Accept-Encoding gzip, deflate
  • User-Agent Mozilla/4.0 (compatible MSIE 5.5
    Windows NT 5.0)
  • Host www.seshan.org
  • Connection Keep-Alive

13
HTTP Response
  • Headers
  • Location for redirection
  • Server server software
  • WWW-Authenticate request for authentication
  • Allow list of methods supported (get, head,
    etc)
  • Content-Encoding E.g x-gzip
  • Content-Length ? bytes in content
  • Content-Type ? MIME type
  • Expires ? when contents become stale
  • Last-Modified ? time contents last mod by servr
  • Blank-line
  • Body

14
HTTP Response Example
  • HTTP/1.1 200 OK
  • Date Tue, 27 Mar 2001 034938 GMT
  • Server Apache/1.3.14 (Unix) (Red-Hat/Linux)
    mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2
    PHP/4.0.1pl2 mod_perl/1.24
  • Last-Modified Mon, 29 Jan 2001 175418 GMT
  • ETag "7a11f-10ed-3a75ae4a"
  • Accept-Ranges bytes
  • Content-Length 4333
  • Keep-Alive timeout15, max100
  • Connection Keep-Alive
  • Content-Type text/html ? MIME Type
  • ..

15
HTTP 0.9/1.0
  • One HTTP 1.0 request/response per TCP connection
  • Simple to implement
  • Disadvantages
  • Multiple connection setups ? three-way handshake
    each time
  • Several extra round trips added to transfer
  • Netscape browser opens up to 4 parallel HTTP 1.0
    connections
  • Multiple slow starts

16
HTTP 1.0 Interaction With TCP
Server
SYN
0 RTT
  • Client

SYN
Client opens TCP connection
1 RTT
ACK
DAT
Client sends HTTP request for HTML
ACK
Server reads from disk
DAT
FIN
2 RTT
ACK
Client parses HTML Client opens TCP connection
FIN
ACK
SYN
SYN
3 RTT
ACK
DAT
Client sends HTTP request for image
Server reads from disk
ACK
4 RTT
DAT
Image begins to arrive
Courtesy Srini Seshan
17
More HTTP 1.0 TCP Interaction Problems
  • Lots of extra connections
  • Increases server state/processing
  • Server also forced to keep TIME_WAIT connection
    state for dead TCP connections
  • Tends to be an order of magnitude greater than
    of active connections

18
HTTP 1.1 Persistent Connection Solution
  • Multiplex multiple requests onto one open TCP
    connection ( multiple responses in reverse
    direction)
  • Serialize transfers ? client makes next request
    only after previous response
  • Reduce slow start latency
  • Reduce amount of TCP state at both endpoints
  • Reduce overhead
  • HTTP 1.1 adds complexity because multiple
    requests (and responses) have to be multiplexed
    and demultiplexed

19
HTTP 1.1 Persistent Connection Example
Server
  • Client

0 RTT
DAT
Client sends HTTP request for HTML
ACK
Server reads from disk
DAT
1 RTT
ACK
Client parses HTML Client sends HTTP request for
image
DAT
Server reads from disk
ACK
DAT
2 RTT
Image begins to arrive
Courtesy Srini Seshan
20
Web Caching Proxies
  • Place a Web caching proxy in the network between
    Web client and Web server
  • Reduces client response time
  • HTTP GET only goes as far as intermediate cache,
    rather than all the way to server
  • Reduces network bandwidth usage
  • HTTP GET doesnt travel over wide area from
    caching proxy to server
  • Reduces server load
  • HTTP GET never reaches server

21
Web Proxies
  • Used for Caching
  • Improved response time, etc. from previous slide
  • Provides a centralized coordination point to
    share cached information across all of a
    companys client hosts
  • Also used for security
  • Proxy for a company can be the only host that can
    access Internet
  • Administrators makes sure that it is secure
  • Used for protocol translation
  • Translate HTTP 1.0 to/from HTTP 1.1. Enables old
    HTTP 1.0 clients to connect to HTTP 1.1 servers,
    and benefit from HTTP 1.1 performance boosts

22
Designing Caching Proxies
  • How much can/should be cached?
  • How large a cache is necessary?
  • On disk vs. in memory ? typically on disk
  • What are the cache hit rates?
  • If user behavior is uncorrelated, have to cache a
    lot of data to improve response time, resulting
    in small cache hit rate
  • If user behavior is correlated, i.e. everyone
    visits only a few Web sites, then cache less data
    and still improve response time (high cache hit
    rate)

23
Designing Caching Proxies (2)
  • What can be cached?
  • Cache first-time unknown documents/objects
  • Non-cacheable documents
  • CGI-scripts
  • Personalized documents (cookies, etc)
  • Encrypted data (SSL)
  • Document should no longer be cached if
    updated/expired before reuse

24
Designing Caching Proxies (3)
  • Performance
  • How many TCP connections can the proxy handle?
  • How to efficiently index into database/cache?
  • Early caches used file system to find file
  • Metadata now kept in memory on most caches
  • Prefetching combine with caching to reduce
    response time
  • Proxy parses a Web page and prefetch its
    hyperlinked objects before the client asks for
    them
  • Example when a proxy fetches a Web page on
    behalf of a client, the proxy will parse and
    cache the Web page returned by the server, and
    then prefetch all links before client requests
    them
  • Not widely used due to poor hit rates?

25
Caching Policy at Proxy
  • Relevant HTTP fields
  • Request
  • If-Modified-Since
  • Response
  • Last-Modified
  • Expires
  • Caching proxy doesnt cache pages with
  • Pragma no-cache header field
  • WWW-Authenticate or Authorization headers
  • Server and proxy clocks must be reliable

26
Caching Policy at Proxy (2)
  • Browser has its own browser cache.
  • Browser sends a conditional GET with
    If-Modified-Since header field when
  • a user hits Reload, or
  • a page expired in browser cache, or
  • browser set to always ask for a page,
  • A conditional GET will only succeed in returning
    a page if that page has been modified since the
    If-Modified-Since date.
  • Otherwise, get back a status code 304 Not
    Modified
  • Caching proxy receives a conditional GET what
    is its policy?

27
Caching Policy at Proxy (3)
  • If page not in proxys cache, or cached page has
    expired, or if cached page was Last-Modified
    earlier than requests If-Modified-Since date,
    then
  • Forward Conditional GET to server. If server
    finds its page was Last-Modified earlier than
    If-Modified-Since date, then servers response to
    proxy is status 304 Not Modified
  • Proxy returns status 304 Not Modified to client
  • Else, server returned fresher page, so proxy
    caches it and returns it to client
  • Else
  • return cached page to client (its not expired
    and its freshly modified)

28
Caching Policy at Proxy (4)
  • Summary
  • cached page returned only if not expired and new
    enough (recently modified), otherwise return
    fresher page from server or status 304 message
  • Expires header may be missing, so proxy has
    to guess a probable expiration date
  • If Last-Modified is recent, then guess the page
    is changing frequently, so choose a quick
    expiration date
  • Chaining of caches is allowed, e.g. Browser cache
    chained with Proxy caches
  • SQUID caching proxies are common freeware
  • based on Harvest caches developed in part at the
    University of Colorado, see http//www.squid-cache
    .org

29
More on HTTP
  • Configure the Web browser to access the Web via
    the HTTP proxy
  • Internet Explorer Tools ? Internet Options ?
    Connections ? LAN Settings
  • Netscape Edit ? Preferences ? Advanced ? Proxies
    ? Manual
  • DNS lookup by HTTP Proxy
  • Given URL http//www.cs.colorado.edu/index.html,
    then proxy must call DNS to translate
    www.cs.colorado.edu to 128.138.242.195
  • Then, proxy establishes HTTP over TCP connection
    to 128.138.242.195 to retrieve URLs page

30
More on HTTP (2)
  • Most browsers now support HTTP 1.1
  • Compatibility with HTTP 1.0 is expected but not
    mandated
  • Eased via HTTP proxies
  • Load balancing via HTTP Redirect
  • In response to a GET request, a server can return
    an HTTP Redirect Response
  • Server selects another server that is less loaded
  • Client is redirected to again send GET request to
    less loaded server

Redirect (2)
HTTP Server 2
HTTP Server 1
GET (1)
GET (3)
Client
31
More on HTTP (3)
  • GET
  • Retrieve document, No payload
  • One-step roundtrip process
  • Incorporate parameters via long URL
  • The server returns a response file with a MIME
    header identifying the type of file.
  • MIME was developed for email, but is reused by
    HTTP
  • POST sent from Client to Server
  • typically used by HTML to send data to a
    back-end CGI script
  • Two-roundtrip process contact form-processing
    server, then send data
  • Give information to a server, has payload
  • Expect a response

32
More on HTTP (4)
  • POST vs. GET
  • Use POST instead of GET if you want to send
    complex long text fields/parameters to server
  • PUT sent from Client to Server
  • Store document at server under specified URL,
  • May be disabled at server to avoid modifying
    files
  • Receive a response Created, Modified,
  • POST vs. PUT
  • POST URL specifies the CGI process that will
    handle the enclosed form
  • PUT URL specifies the enclosed document to be
    created/stored

33
Load Balancing Techniques
  • HTTP Redirection
  • DNS Load Balancing
  • Router-based
  • Zany idea 1 N servers each advertise the same
    IP address. Let IP shortest-hop routing
    determine the nearest server.
  • Hopefully no loops.

34
Load Balancing Techniques (2)
  • Router-based
  • Better idea Place an IP router in front of N
    servers the router balances the load
  • Example each server has different IP address,
    and router substitutes IP address of lightest
    loaded server
  • If a TCP connection is established to a specific
    server X, router must remember to route packets
    for this TCP connection to server X only
  • router cant just choose most lightly loaded
    server L, because L might not be X, so server L
    would not be expecting server Xs TCP packets
  • NAT-Based (see NAT section)
Write a Comment
User Comments (0)
About PowerShow.com