Title: Lecture 19 Overview
1Lecture 19 Overview
2Hyper Text Transfer Protocol
- HTTP is the protocol that supports communication
between web browsers and web servers. - A Web Server is a HTTP server
- Most clients/servers today speak version 1.1, but
1.0 is also in use. - HTTP is an application-level protocol with the
lightness and speed necessary for distributed,
hypermedia information systems.
3HTTP overview
- Webs application layer protocol
- client/server model
- client browser that requests, receives,
displays Web objects - server Web server sends objects in response to
requests
HTTP request
PC running Explorer
HTTP response
HTTP request
Server running Apache Web server
HTTP response
Mac running Navigator
4Request - Response
- HTTP has a simple structure
- client sends a request
- server returns a reply
- HTTP can support multiple request-reply exchanges
over a single TCP connection - The well known TCP port for HTTP servers is
port 80 - Other ports can be used as well
5HTTP connections
- HTTP is stateless
- server maintains no information about past client
requests - Nonpersistent HTTP
- At most one object is sent over a TCP connection
- Persistent HTTP
- Multiple objects can be sent over single TCP
connection between client and server
6Request Line
- Method URI HTTP-Version\r\n
- The request line contains 3 tokens (words)
- space characters separate the tokens
7URI Universal Resource Identifier
- URIs defined in RFC 2396
- Absolute URI
- scheme//hostnameport/path
- http//www.cse.unr.edu80/mgunes/cpe401
- Relative URI
- /path
- /blah/foo
No server mentioned
8Request Method
- The Request Method can be
- GET HEAD DELETE
- PUT POST TRACE
- OPTIONS
- future expansion is supported
- GET, HEAD and POST are supported everywhere
- HTTP 1.1 servers often support PUT, DELETE,
OPTIONS TRACE
9Methods
- GET retrieve information identified by the URI
- Typically used to retrieve an HTML document
- HEAD retrieve meta-information about the URI
- used to find out if a document has changed
- POST send information to a URI and retrieve
result - used to submit a form
10More Methods
- PUT Store information in location named by URI
- DELETE remove entity identified by URI
- TRACE used to trace HTTP forwarding through
proxies, tunnels, etc - OPTIONS used to determine capabilities of
server, or characteristics of a named resource
11The Header Lines
- Request Headers provide information to the server
about the client - what kind of client
- what kind of content will be accepted
- who is making the request
- Each header line contains
- an attribute name followed by a followed by a
space and the attribute value - HTTP 1.1 requires a Host header
12End of the Headers
- Each header ends with a CRLF ( \r\n )
- The end of the header section is marked with a
blank line - just CRLF
- For GET and HEAD requests, the end of the headers
is the end of the request!
13HTTP request message format
14POST
- A POST request includes some content (some data)
after the headers - after the blank line
- There is no format for the data
- just raw bytes
- A POST request must include a Content-Length line
in the headers - Content-length 267
15HTTP Response
- ASCII Status Line
- Headers Section
- Content can be anything
- not just text
- typically an HTML document or some kind of image
16Response Status Line
- HTTP-Version Status-Code Message
- Status Code is 3 digit number (for computers)
- 1xx Informational
- 2xx Success
- 3xx Redirection
- 4xx Client Error
- 5xx Server Error
- Message is text (for humans)
17Response Headers
- Provide the client with information about the
returned entity (document) - what kind of document
- how big the document is
- how the document is encoded
- when the document was last modified
- Response headers end with blank line
18Content
- Content can be anything
- sequence of raw bytes
- Content-Length header is required for any
response that includes content - Content-Type header also required
19Single Request/Reply
- The client sends a complete request
- The server sends back the entire reply
- The server closes its socket
- If the client needs another document it must open
a new connection
This was the default for HTTP 1.0
20Persistent Connections
- HTTP 1.1 supports persistent connections
- this is the default
- Multiple requests can be handled over a single
TCP connection - The Connection header is used to exchange
information about persistence (HTTP/1.1) - 1.0 Clients used a Keep-alive header
21User-server state cookies
- Four components
- 1) cookie header line of HTTP response message
- 2) cookie header line in HTTP request message
- 3) cookie file kept on users host, managed by
users browser - 4) back-end database at Web site
- Cookies and privacy
- cookies permit sites to learn a lot about you
- you may supply name and e-mail to sites
22Cookies keeping state
client
server
cookie file
backend database
one week later
23Cookies (continued)
- What cookies can bring
- authorization
- shopping carts
- recommendations
- user session state (Web e-mail)
- How to keep state
- protocol endpoints maintain state at
sender/receiver over multiple transactions - cookies http messages carry state
24Web caches (proxy server)
Goal satisfy client request without involving
origin server
origin server
- user sets browser Web accesses via cache
- browser sends all HTTP requests to cache
- object in cache cache returns object
- else cache requests object from origin server,
then returns object to client
Proxy server
client
client
origin server
25More about Web caching
- cache acts as both client and server
- typically cache is installed by ISP
- university, company, residential ISP
- Why Web caching?
- reduce response time for client request
- reduce traffic on an institutions access link.
- Internet dense with caches enables poor
content providers to effectively deliver content
(but so does P2P file sharing)
26Conditional GET
- Goal dont send object if cache has up-to-date
cached version - cache specify date of cached copy in HTTP
request - If-modified-since ltdategt
- server response contains no object if cached
copy is up-to-date - HTTP/1.0 304 Not Modified
server
cache
HTTP request msg If-modified-since ltdategt
object not modified
HTTP request msg If-modified-since ltdategt
object modified
HTTP response HTTP/1.0 200 OK ltdatagt
27Lecture 20Dynamic Web Servers
- CPE 401 / 601
- Computer Network Systems
slides are modified from Dave Hollinger
slides are modified from Dave Hollinger
28Web Server
- Talks HTTP
- Looks at METHOD, URI to determine what the client
wants. - For GET, URI often is just the path of a file
- relative to some directory on the web server
29GET /foo/blah
/
blah
30Dynamic Documents
- Dynamic Documents can provide
- automation of web site maintenance
- customized advertising
- database access
- shopping carts
- date and time service
31Web Programming
- Writing programs that create dynamic documents
has become very important - There are a number of general approaches
- Create custom server for each service desired
- Each is available on different port.
- Develop a real smart web server
- Server Side Includes, scripting, server APIs
- Have web server run external programs
32Custom Server
- Write a TCP server that watches a well known
port for requests - Develop a mapping from http requests to service
requests - Send back HTML (or whatever) that is
created/selected by the server process - Have to handle http errors, headers, etc
33Drawbacks to Custom Server Approach
- We might have lots of ideas custom services
- Each requires dedicated address (port)
- Each needs to include
- basic TCP server code
- parsing HTTP requests
- error handling
- headers
- access control
34Smart Web Server
- Take a general purpose Web server (that can
handle static documents) and - have it process requested documents as it sends
them to the client - The documents could contain commands that the
server understands - the server includes some kind of interpreter
35Example Smart Server
- Have the server read each HTML file as it sends
it to the client - The server could look for this
- ltSERVERCODEgt some command lt/SERVERCODEgt
- The server doesnt send this part to the client,
instead it interprets the command and sends the
result to the client - Everything else is sent normally
36Server Side Includes
- Server Side Includes (SSI) provides a set of
commands that a server will interpret - Typically the server is configured to look for
commands only in specially marked documents - so normal documents arent slowed down
- SSI commands are called directives
- Directives are embedded in HTML comments
37SSI Directives
- A comment looks like this
- lt!-- this is an HTML comment --gt
- A directive looks like this
- lt!--command parameterarg--gt
- SSI servers keep a number of useful things in
environment variables - DOCUMENT_NAME, DOCUMENT_URL
- echo inserts the value of an environment
variable into the page - This page is located at lt!--echo
varDOCUMENT_URL--gt
38SSI Directives
- include inserts the contents of a text file.
- lt!--include filebanner.htmlgt
- flastmod inserts the time and date that a file
was last modified. - Last modified
- lt!--flastmod filefoo.htmlgt
- exec runs an external program and inserts the
output of the program. - Current users lt!--exec cmd/usr/bin/whogt
Danger! Danger! Danger!
39More Power
- Some servers support elaborate scripting
languages - Scripts are embedded in HTML documents, the
server interprets the script - Microsoft Active Server Pages (ASP)
- JScript, VBScript, PerlScript
- Netscape LiveWire
- JavaScript, SQL connection library.
- Many others
40Server Mapping and APIs
- Some servers include a programming interface that
allows to extend the capabilities of the server
by writing modules - Specific URLs are mapped to specific modules
instead of to files
41External Programs
- Another approach is to provide a standard
interface between external programs and web
servers - We can run the same program from any web server
- The web server handles all the http,
- we focus on the special service only
- It doesnt matter what language we use to write
the external program
42Common Gateway Interface
- CGI is a standard interface to external programs
supported by most (if not all) web servers - CGI programs are often written in scripting
languages (perl, tcl, etc.), - The interface that is defined by CGI includes
- Identification of the service (i.e.,external
program) - Mechanism for passing the request to the external
program
43Common Gateway Interface
- CGI is a standard mechanism for
- Associating URLs with programs that can be run by
a web server - A protocol (of sorts) for how the request is
passed to the external program - How the external program sends the response to
the client
44CGI Programming
HTTP SERVER
setenv(), dup(), fork(), exec(), ...
http request
CLIENT
CGI Program
http response
45CGI URLs
- There is mapping between URLs and CGI programs
provided by a web sever - The exact mapping is not standardized
- web server admin can set it up
- Typically
- requests that start with /CGI-BIN/ , /cgi-bin/ or
/cgi/, etc. - not to static documents
46HTTP Server - CGI Interaction
Environment Variables
stdin
CGI Program
HTTP SERVER
stdout
47Environment Variables
- The web server sets some environment variables
with information about the request - The web server fork()s and the child process
exec()s the CGI program - The CGI program gets information about the
request from environment variables
48STDIN, STDOUT
- Before calling exec(), the child process sets up
pipes so that - stdin comes from the web server and
- stdout goes to the web server
- In some cases part of the request is read from
stdin - Anything written to stdout is forwarded by the
web server to the client
49Request Method Get
- GET requests can include a query string as part
of the URL - GET /cgi-bin/login?mgunes HTTP/1.0
Delimiter
Request Method
Resource Name
Query String