Internet and Intranet Protocols and Applications - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Internet and Intranet Protocols and Applications

Description:

Web proxy servers use HTTP. ... proxy server automatically retrieves documents (or even entire web sites!) at regular intervals. ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 32
Provided by: joseph50
Category:

less

Transcript and Presenter's Notes

Title: Internet and Intranet Protocols and Applications


1
Internet and Intranet Protocols and Applications
  • Lecture 8a
  • WWW - Proxy Servers
  • March 20, 2002
  • Joseph Conron
  • Computer Science Department
  • New York University
  • jconron_at_cs.nyu.edu

2
Terminology
  • Origin Server
  • the Web server that hosts the resource
  • Destination Server
  • same as Origin Server
  • Proxy Server
  • Intermediate server that accepts requests from
    clients and forwards them to origin servers, to
    other proxy servers, or services request from its
    cache.
  • Acts as server to requesting client, and as
    client to origin server

3
More Terminology
  • Firewall
  • general term for hardware, software, or
    combination used to protect internal network from
    intruders.
  • Uses packet filtering to enforce generic security
    policies
  • Uses application level proxy servers to enforce
    protocol-specific polices
  • Packet filter
  • control based on something in packet headers
    (e.g., IP addresses or port numbers)
  • Application level proxy
  • control based on knowledge of application level
    protocol (.e.g, SMTP headers or HTTP methods)

4
Application Level Proxy
  • Implemented completely in software
  • Usually handle a specific protocol (HTTP)
  • Can provide a rich set of features
  • Improve performance (latency reduction, bandwidth
    conservation
  • Advanced access control (authentication
    authorization)
  • Advanced filtering (e.g. detect espionage!)
  • Logging and auditing

5
Web Proxy Servers
  • We will focus our attention on proxy servers for
    a specific application - WWW.
  • Web proxy servers use HTTP.
  • In particular, we focus on the use of proxy
    servers to improve performance.
  • We will use the terms proxy server and Web proxy
    server interchangeably.
  • We will use HTTP 1.1 only.

6
Proxy Servers - General Properties
  • Transparency
  • users get same response whether connection was
    direct or to proxy
  • non-transparent proxy modifies content in some
    way
  • Use is client controlled
  • client programs (e.g., browsers) can be
    configured to use (or not use) proxy servers.
  • Origin Server is unaware of proxy server
  • OS does not have t process request from proxy
    differently than from browser.

7
Virtual Servers(understanding partial vs. full
URI)
  • As WWW became popular, many companies wanted
    several domain names, but did not want separate
    server hardware.
  • OK - just define CNAME entries in DNS and point
    to same IP address.
  • Problem since all DNS names resolve to same IP
    address, how does server know which domain name
    (server) was selected in request?!

8
Partial vs. Absolute URI
  • Absolute URI includes host name
  • GET http//www.starwind.com/somefilename HTTP/1.1
  • Partial URI contains resource name only
  • GET /somefilename HTTP/1.1
  • Read RFC 2616, section 3.2 and RFC 2396!
  • So, if all URI are absolute, the origin server
    can parse URI and detect the virtual host name.
  • In HTTP 1.1, we can use relative URI and Host
    header.

9
Proxy Server Basic Operation
  • Accept connection request from client
  • establishes new Socket client_sock
  • Read HTTP request
  • Parse HTTP request
  • reject invalid requests with appropriate response
    code
  • Connect to requested server
  • establishes new socket serv_sock
  • Send original HTTP request to server

10
Proxy Server Basic Operation (continued)
  • Read response from Server
  • time-out server connection!
  • Send response to client
  • If Connection close header received, close
    client connection (client_sock)
  • What about server connection (serv_sock)?

11
HTTP State Management Cookies
  • We said earlier that HTTP is a stateless protocol
  • We also said that stateful protocols can provide
    improved performance. This feature is usually
    established by the idea of a session between
    client and server.
  • So, how can we get sessions in HTTP?
  • COOKIES!

12
COOKIES (briefly)
  • Cookie protocol - RFC 2109
  • A cookie is a token given to a client by a
    server.
  • Server sends Set-cookie header in response
  • Client associates cookie with issuing server
    (directory)
  • The token is just a file with a simple format
    (name/value pairs)
  • Each cookie has a unique name

13
Client-server interaction cookies
server
client
  • server sends cookie to client in response mst
  • Set-cookie 1678453
  • client presents cookie in later requests
  • cookie 1678453
  • server matches presented-cookie with
    server-stored info
  • authentication
  • remembering user preferences, previous choices

usual http request msg
usual http response Set-cookie
cookie- spectific action
cookie- spectific action
14
Cookie example
1. User Agent - Server POST
/acme/login HTTP/1.1 form data
User identifies self via a form. 2.
Server - User Agent HTTP/1.1 200 OK
Set-Cookie Customer"WILE_E_COYOTE"
Version"1" Path"/acme" Cookie
reflects user's identity. 3. User Agent -
Server POST /acme/pickitem HTTP/1.1
Cookie Version"1" Customer"WILE_E_COYOTE
" Path"/acme" form data
User selects an item for "shopping basket."
15
Cookie example (continued)
4. Server - User Agent HTTP/1.1 200
OK Set-Cookie Part_Number"Rocket_Launch
er_0001" Version"1"
Path"/acme" Shopping basket contains
an item. 5. User Agent - Server
POST /acme/shipping HTTP/1.1 Cookie
Version"1" Customer"WILE_E_CO
YOTE" Path"/acme"
Part_Number"Rocket_Launcher_0001"
Path"/acme" form data User
selects shipping method from form.
16
Cookie example (continued)
6. Server - User Agent HTTP/1.1 200
OK Set-Cookie Shipping"FedEx"
Version"1" Path"/acme" New cookie
reflects shipping method. 7. User Agent -
Server POST /acme/process HTTP/1.1
Cookie Version"1"
Customer"WILE_E_COYOTE" Path"/acme"
Part_Number"Rocket_Launcher_0001"
Path"/acme" Shipping"FedEx"
Path"/acme" form data User
chooses to process order. 8. Server - User
Agent HTTP/1.1 200 OK
Transaction is complete.
17
Cookies and Proxies
  • HTTP cookies are meant for the end-point entities
    (client and origin server)
  • Cannot be used for state between proxy and
    end-point
  • Why would we need cookies for proxy servers?

18
A Case for Proxy Cookies
  • A common use of cookies is for authentication -
    so cookie may contain IP Address of client
  • In a network of load balancing servers, requests
    between two endpoints may not follow the same
    route. This would invalidate the client cookie!
  • Proxy cookies might be used to establish proxy
    credentials.
  • Note proxy cookies do not exist!

19
Web Caching
  • Web proxy Servers store copies of documents
    retrieved from origin servers
  • Advantages
  • improves performance
  • decreases latency
  • saves bandwidth
  • Disadvantages
  • stale (out of date) data

20
Web Caches (proxy server)
Goal satisfy client request without involving
origin server
origin server
  • user sets browser Web accesses via web cache
  • client sends all http requests to web cache
  • if object at web cache, web cache immediately
    returns object in http response
  • else requests object from origin server, then
    returns http response to client

Proxy server
http request
http request
client
http response
http response
http request
http request
http response
http response
client
origin server
21
Why Web Caching?
  • Assume cache is close to client (e.g., in same
    network)
  • smaller response time cache closer to client
  • decrease traffic to distant servers
  • link out of institutional/local ISP network is
    often a bottleneck

origin servers
public Internet
1.5 Mbps access link
institutional network
10 Mbps LAN
institutional cache
22
On-Demand Caching vs On-Command Caching
  • On-Demand
  • document does not exist in cache unless it has
    been requested (at least once) by some client.
  • On-Command
  • proxy server automatically retrieves documents
    (or even entire web sites!) at regular intervals.

23
HTTP 1.1 Cache Control(Definitions)
  • Freshness of objects a document is fresh when
  • it is first retrieved from an origin server.
  • When the origin server is contacted to make
    up-to-date check
  • when its age does not exceed its freshness
    lifetime
  • Age of an object
  • time that has elapsed since object was retrieved,
    or
  • time since last up-to-date check

24
HTTP 1.1 Cache Control(determining an objects
age)
  • Cache servers use Date header in response plus
    some compensation for latency between response
    creation and receipt to calculate an initial-age.
  • When Cache server sends this object in a
    response, it adds elapsed time (since object
    receipt) and initial-age and sends an Age he

25
HTTP 1.1 Cache Control(determining an Objects
Freshness Lifetime)
  • Cache-Control header contains max-age directive,
    or
  • Expires header in response contains date and
    time the object becomes stale
  • Since both of these values come from server, no
    latency compensation is needed.

26
HTTP 1.1 Cache Control(controlling an objects
cacheability)
  • Cache-Control general header is used to specify
    directives then MUST be obeyed by ALL proxy
    servers handling the request or response.
  • Directives used in Requests
  • no-cache an end-to-end revalidation should be
    peformed
  • no-store do not store any part of request or
    response on disk
  • max-age max age acceptable to
    client

27
HTTP 1.1 Cache Control(controlling an objects
cacheability)
  • Directives used in responses
  • public response is cacheable by any cache
    (prxy or client
  • private response is cacheable by client only
  • no-cache response is completely uncacheable
  • no-store response may not be written to disk

28
Cache Architectures
  • A Web proxy cache requires several components
  • storage mechanism for storing the cache data
  • mapping mechanism to establish relationship
    between URLs and their cached copies
  • format for cached object content and its metadata

29
Cache Architecture mapping
  • Direct mapping
  • e.g, map URL to a file system path
  • direct mappings are reversible
  • Hash mapping
  • compute some unique ID
  • could be file name or index to table
  • not reversible
  • Why do we care about reversibility?

30
Existing Cache Architectures
  • Directly mapping URLs to filesystem
  • CERN httpd used a tree map (like DNS tree!)
  • Easy to implement, but not a good performer
  • long pathnames long inode search
  • garbage collection requires complete traversal of
    tree

31
Existing Cache Architectures
  • Hashing URLs (Netscape Proxy server uses URL
    hashing)
  • Object location (on disk) based on MD5 hash
  • very fast
  • good distribution of different object types
    (image, text) across cache
  • disadvantage cannot compute URLs from hash
Write a Comment
User Comments (0)
About PowerShow.com