Title: Proxy Lab Recitation I
1Proxy Lab Recitation I
2Outline
- What is a HTTP proxy?
- HTTP Tutorial
- HTTP Request
- HTTP Response
- Sequential vs. concurrent proxies
- Caching
3What is a proxy?
Proxy
Client
Server
Browser
www.google.com
- Why a proxy?
- Access control (allowed websites)
- Filtering (viruses, for example)
- Caching (multiple people request CNN)
4Brief HTTP Tutorial
- Hyper-Text Transfer Protocol
- Protocol spoken between a browser and a
web-server - From browser ? web-server REQUEST
- GET http//www.google.com/ HTTP/1.0
- From web-server ? browser RESPONSE
- HTTP 200 OK
- Other stuff
5HTTP Request
Request Type
Path
Host
Version
- GET http//csapp.cs.cmu.edu/simple.html HTTP/1.1
- Host csapp.cs.cmu.edu
- User-Agent Mozilla/5.0 ...
- Accept text/xml,application/xml ...
- Accept-Language en-us,enq0.5 ...
- Accept-Encoding gzip,deflate ...
An empty line terminates a HTTP request
6HTTP Request
- GET http//csapp.cs.cmu.edu/simple.html HTTP/1.1
- Host csapp.cs.cmu.edu
- User-Agent Mozilla/5.0 ...
- Accept text/xml,application/xml ...
- Accept-Language en-us,enq0.5 ...
- Accept-Encoding gzip,deflate ...
The Host header is optional in HTTP/1.0 but we
recommend that it be always included
7HTTP Request
- GET http//csapp.cs.cmu.edu/simple.html HTTP/1.1
- Host csapp.cs.cmu.edu
- User-Agent Mozilla/5.0 ...
- Accept text/xml,application/xml ...
- Accept-Language en-us,enq0.5 ...
- Accept-Encoding gzip,deflate ...
The User agent identifies the browser type. Some
websites use it to determine what to send. And
reject you if you say you use MyWeirdBrowser ?
Proxy must send this and all other headers
through
8HTTP Response
Status
- HTTP/1.1 200 OK
- Date Mon, 20 Nov 2006 033417 GMT
- Server Apache/1.3.19 (Unix)
- Last-Modified Mon, 28 Nov 2005 233135 GMT
- Content-Length 129
- Connection Keep-Alive
- Content-Type text/html
Status indicates whether it was successful or
not, if it is a redirect, etc. The complete
response should be transparently sent back to the
client by the proxy.
9HTTP Response
- HTTP/1.1 200 OK
- Date Mon, 20 Nov 2006 033417 GMT
- Server Apache/1.3.19 (Unix)
- Last-Modified Mon, 28 Nov 2005 233135 GMT
- Content-Length 129
- Connection Keep-Alive
- Content-Type text/html
This field identifies how many bytes are there in
the response. Not sent by all web-servers. DO
NOT RELY ON IT !
10Concurrent Proxy
- Need to handle multiple requests simultaneously
- From different clients
- From the same client
- E.g., each individual image in a HTML document
needs to be requested separately - Serving requests sequentially decreases
throughput - Server is waiting for I/O most of the time
- This time can be used to start serving other
clients - Multiple outstanding requests
11Concurrent Proxy
- Use threads for making proxy concurrent
- Create one thread for each new client request
- The thread finishes and exists after serving the
client request - Use pthread library
- pthread_create(), pthread_detach(), etc.
- Can use select() as well for adding concurrency
- Much more difficult to get right
12Caching Proxy
- Most geeks visit http//slashdot.org/ every 2
minutes - Why fetch the same content again and again?
- (If it doesnt change frequently)
- The proxy can cache responses
- Serve directly out of its cache
- Reduces latency, network-load
13Caching Implementation Issues
- Use the GET URL (host/path) to locate the
appropriate cache entry - THREAD SAFETY
- A single cache is accessed by multiple threads
- Easy to create bugs thread 1 is reading an
entry, while thread 2 is deleting the same entry
14General advice
- Use RIO routines
- rio_readnb, rio_readlineb
- Be very careful when you are reading line-by-line
(HTTP request), versus just a stream of bytes
(HTTP response) - When to use strcpy() vs. memcpy()
- gethostbyname(), inet_ntoa() are not thread-safe!
- Path sequential ? concurrency ? caching