Title: Internet and Intranet Protocols and Applications
1Internet and Intranet Protocols and Applications
- Lecture 8a
- WWW - Proxy Servers
- March 20, 2002
- Joseph Conron
- Computer Science Department
- New York University
- jconron_at_cs.nyu.edu
2Terminology
- Origin Server
- the Web server that hosts the resource
- Destination Server
- same as Origin Server
- Proxy Server
- Intermediate server that accepts requests from
clients and forwards them to origin servers, to
other proxy servers, or services request from its
cache. - Acts as server to requesting client, and as
client to origin server
3More Terminology
- Firewall
- general term for hardware, software, or
combination used to protect internal network from
intruders. - Uses packet filtering to enforce generic security
policies - Uses application level proxy servers to enforce
protocol-specific polices - Packet filter
- control based on something in packet headers
(e.g., IP addresses or port numbers) - Application level proxy
- control based on knowledge of application level
protocol (.e.g, SMTP headers or HTTP methods)
4Application Level Proxy
- Implemented completely in software
- Usually handle a specific protocol (HTTP)
- Can provide a rich set of features
- Improve performance (latency reduction, bandwidth
conservation - Advanced access control (authentication
authorization) - Advanced filtering (e.g. detect espionage!)
- Logging and auditing
5Web Proxy Servers
- We will focus our attention on proxy servers for
a specific application - WWW. - Web proxy servers use HTTP.
- In particular, we focus on the use of proxy
servers to improve performance. - We will use the terms proxy server and Web proxy
server interchangeably. - We will use HTTP 1.1 only.
6Proxy Servers - General Properties
- Transparency
- users get same response whether connection was
direct or to proxy - non-transparent proxy modifies content in some
way - Use is client controlled
- client programs (e.g., browsers) can be
configured to use (or not use) proxy servers. - Origin Server is unaware of proxy server
- OS does not have t process request from proxy
differently than from browser.
7Virtual Servers(understanding partial vs. full
URI)
- As WWW became popular, many companies wanted
several domain names, but did not want separate
server hardware. - OK - just define CNAME entries in DNS and point
to same IP address. - Problem since all DNS names resolve to same IP
address, how does server know which domain name
(server) was selected in request?!
8Partial vs. Absolute URI
- Absolute URI includes host name
- GET http//www.starwind.com/somefilename HTTP/1.1
- Partial URI contains resource name only
- GET /somefilename HTTP/1.1
- Read RFC 2616, section 3.2 and RFC 2396!
- So, if all URI are absolute, the origin server
can parse URI and detect the virtual host name. - In HTTP 1.1, we can use relative URI and Host
header.
9Proxy Server Basic Operation
- Accept connection request from client
- establishes new Socket client_sock
- Read HTTP request
- Parse HTTP request
- reject invalid requests with appropriate response
code - Connect to requested server
- establishes new socket serv_sock
- Send original HTTP request to server
10Proxy Server Basic Operation (continued)
- Read response from Server
- time-out server connection!
- Send response to client
- If Connection close header received, close
client connection (client_sock) - What about server connection (serv_sock)?
11HTTP State Management Cookies
- We said earlier that HTTP is a stateless protocol
- We also said that stateful protocols can provide
improved performance. This feature is usually
established by the idea of a session between
client and server. - So, how can we get sessions in HTTP?
- COOKIES!
12COOKIES (briefly)
- Cookie protocol - RFC 2109
- A cookie is a token given to a client by a
server. - Server sends Set-cookie header in response
- Client associates cookie with issuing server
(directory) - The token is just a file with a simple format
(name/value pairs) - Each cookie has a unique name
13Client-server interaction cookies
server
client
- server sends cookie to client in response mst
- Set-cookie 1678453
- client presents cookie in later requests
- cookie 1678453
- server matches presented-cookie with
server-stored info - authentication
- remembering user preferences, previous choices
usual http request msg
usual http response Set-cookie
cookie- spectific action
cookie- spectific action
14Cookie example
1. User Agent - Server POST
/acme/login HTTP/1.1 form data
User identifies self via a form. 2.
Server - User Agent HTTP/1.1 200 OK
Set-Cookie Customer"WILE_E_COYOTE"
Version"1" Path"/acme" Cookie
reflects user's identity. 3. User Agent -
Server POST /acme/pickitem HTTP/1.1
Cookie Version"1" Customer"WILE_E_COYOTE
" Path"/acme" form data
User selects an item for "shopping basket."
15Cookie example (continued)
4. Server - User Agent HTTP/1.1 200
OK Set-Cookie Part_Number"Rocket_Launch
er_0001" Version"1"
Path"/acme" Shopping basket contains
an item. 5. User Agent - Server
POST /acme/shipping HTTP/1.1 Cookie
Version"1" Customer"WILE_E_CO
YOTE" Path"/acme"
Part_Number"Rocket_Launcher_0001"
Path"/acme" form data User
selects shipping method from form.
16Cookie example (continued)
6. Server - User Agent HTTP/1.1 200
OK Set-Cookie Shipping"FedEx"
Version"1" Path"/acme" New cookie
reflects shipping method. 7. User Agent -
Server POST /acme/process HTTP/1.1
Cookie Version"1"
Customer"WILE_E_COYOTE" Path"/acme"
Part_Number"Rocket_Launcher_0001"
Path"/acme" Shipping"FedEx"
Path"/acme" form data User
chooses to process order. 8. Server - User
Agent HTTP/1.1 200 OK
Transaction is complete.
17Cookies and Proxies
- HTTP cookies are meant for the end-point entities
(client and origin server) - Cannot be used for state between proxy and
end-point - Why would we need cookies for proxy servers?
18A Case for Proxy Cookies
- A common use of cookies is for authentication -
so cookie may contain IP Address of client - In a network of load balancing servers, requests
between two endpoints may not follow the same
route. This would invalidate the client cookie! - Proxy cookies might be used to establish proxy
credentials. - Note proxy cookies do not exist!
19Web Caching
- Web proxy Servers store copies of documents
retrieved from origin servers - Advantages
- improves performance
- decreases latency
- saves bandwidth
- Disadvantages
- stale (out of date) data
20Web Caches (proxy server)
Goal satisfy client request without involving
origin server
origin server
- user sets browser Web accesses via web cache
- client sends all http requests to web cache
- if object at web cache, web cache immediately
returns object in http response - else requests object from origin server, then
returns http response to client
Proxy server
http request
http request
client
http response
http response
http request
http request
http response
http response
client
origin server
21Why Web Caching?
- Assume cache is close to client (e.g., in same
network) - smaller response time cache closer to client
- decrease traffic to distant servers
- link out of institutional/local ISP network is
often a bottleneck
origin servers
public Internet
1.5 Mbps access link
institutional network
10 Mbps LAN
institutional cache
22On-Demand Caching vs On-Command Caching
- On-Demand
- document does not exist in cache unless it has
been requested (at least once) by some client. - On-Command
- proxy server automatically retrieves documents
(or even entire web sites!) at regular intervals.
23HTTP 1.1 Cache Control(Definitions)
- Freshness of objects a document is fresh when
- it is first retrieved from an origin server.
- When the origin server is contacted to make
up-to-date check - when its age does not exceed its freshness
lifetime - Age of an object
- time that has elapsed since object was retrieved,
or - time since last up-to-date check
24HTTP 1.1 Cache Control(determining an objects
age)
- Cache servers use Date header in response plus
some compensation for latency between response
creation and receipt to calculate an initial-age. - When Cache server sends this object in a
response, it adds elapsed time (since object
receipt) and initial-age and sends an Age he
25HTTP 1.1 Cache Control(determining an Objects
Freshness Lifetime)
- Cache-Control header contains max-age directive,
or - Expires header in response contains date and
time the object becomes stale - Since both of these values come from server, no
latency compensation is needed.
26HTTP 1.1 Cache Control(controlling an objects
cacheability)
- Cache-Control general header is used to specify
directives then MUST be obeyed by ALL proxy
servers handling the request or response. - Directives used in Requests
- no-cache an end-to-end revalidation should be
peformed - no-store do not store any part of request or
response on disk - max-age max age acceptable to
client
27HTTP 1.1 Cache Control(controlling an objects
cacheability)
- Directives used in responses
- public response is cacheable by any cache
(prxy or client - private response is cacheable by client only
- no-cache response is completely uncacheable
- no-store response may not be written to disk
28Cache Architectures
- A Web proxy cache requires several components
- storage mechanism for storing the cache data
- mapping mechanism to establish relationship
between URLs and their cached copies - format for cached object content and its metadata
29Cache Architecture mapping
- Direct mapping
- e.g, map URL to a file system path
- direct mappings are reversible
- Hash mapping
- compute some unique ID
- could be file name or index to table
- not reversible
- Why do we care about reversibility?
30Existing Cache Architectures
- Directly mapping URLs to filesystem
- CERN httpd used a tree map (like DNS tree!)
- Easy to implement, but not a good performer
- long pathnames long inode search
- garbage collection requires complete traversal of
tree
31Existing Cache Architectures
- Hashing URLs (Netscape Proxy server uses URL
hashing) - Object location (on disk) based on MD5 hash
- very fast
- good distribution of different object types
(image, text) across cache - disadvantage cannot compute URLs from hash