Title: Hypertext Transfer Transfer Protocol HTTP
1Hypertext Transfer Transfer Protocol (HTTP)
- http//www.w3c.org/Protocols/HTTP
2Outline
- HTTP 0.9, HTTP/1.0
- HTTP/1.1 overview
- HTTP-NG overview
- HTTP Extension Framework -an extension mechanism
for HTTP/1.1 - WebMux - a simple transport multiplexing protocol
3Static HTTP connection
4Listen ... port 80.
b. Find setup connection to www.csie.ncnu.edu.tw
a. click anchor ltA hrefhttp//www.csie.ncnu.edu
.tw80/hychen/index.htmlgt
5a. Look for /hychen/index.html
d. break connection
c. Send file (index.html) to client
6Dynamic HTTP connection
7Listen ...
b. Find setup connection to www.csie.ncnu.edu.tw
a. Submit ltform actioncgi-bin/add method
GETgt
8cgi-bin
add
9HTTP 1.0 Hypertext Transfer Protocol
10HTTP Usage
- HTTP is the protocol that supports communication
between web browsers and web servers. - A Web Server is a HTTP server
- We will look at HTTP Version 1.0
11From the RFC
- HTTP is an application-level protocol with the
lightness and speed necessary for distributed,
hypermedia information systems.
12Transport Independence
- The RFC states that the HTTP protocol generally
takes place over a TCP connection, but the HTTP
protocol itself is not dependent on a specific
transport layer.
13Request - Response
- HTTP has a simple structure
- client sends a request
- server returns a reply.
- HTTP can support multiple request-reply exchanges
over a single TCP connection, but this is a
special case.
14Well Known Address
- The well known TCP port for HTTP servers is
port 80. - Other ports can be used as well...
15HTTP Versions
- The original version now goes by the name HTTP
Version 0.9 - HTTP 0.9 was used for many years.
- Jan. 1992 to 1996
- Starting with HTTP 1.0 the version number is part
of every request. - HTTP is still changing...
16HTTP 1.0 Request
- Lines of text (ASCII).
- Lines end with CRLF \r\n
- First line is called Request-Line
17Request Line
- Method URI HTTP-Version \r\n
- The request line contains 3 tokens (words).
- space characters separate the tokens.
- Newline seems to work by itself (but the protocol
requires CRLF)
18Request Method
- The Request Method can be
- GET HEAD PUT
- POST DELETE LINK
- UNLINK
- future expansion allowed
19Methods
- GET retrieve information identified by the URI.
- HEAD retrieve meta-information about the URI.
- POST send information to a URI and retrieve
result. (used in Form for CGI applications)
20Methods (cont.)
- PUT Store information in location named by URI.
- DELETE remove entity identified by URI.
- LINK, UNLINK create/destroy a link
relationship??
21Common Usage
- GET, HEAD and POST are supported everywhere.
- HTTP 1.1 servers often support PUT, DELETE,
OPTIONS TRACE.
22URIUniversal Resource Identifier
- URIs defined in RFC 1630. (1994)
- URI is a superset of URL and URN.
- Full URI proto//hostname/path
- http//www.csie.ncnu.edu.tw80/hychen/
- Partial URI /path
- /hychen/
Identifies the Server
No server mentioned
23URI Usage
- When dealing with a HTTP server, only a partial
URI is used. - When dealing with a proxy HTTP server, a full URI
is used. - client has to tell the proxy where to get the
document! - more on proxy servers in a bit.
24HTTP Version Number
- HTTP/1.0 or HTTP/1.1
- HTTP 0.9 did not include a version number in a
request line. - If a server gets a request line with no HTTP
version number it assume 0.9
25The Header Lines
- After the Request-Line come a number of HTTP
headers. - Each header line contains an attribute name
followed by a followed by the attribute value.
26Headers
- Request Headers provide information to the server
about the client - what kind of client
- what kind of content will be accepted
- who is making the request
- There can be 0 headers!
27Example HTTP Headers
- Accept text/html
- From hychen_at_csie.ncnu.edu.tw
- User-Agent Netscape 4.7
- Referer http//www.csie.ncnu.edu.tw/hychen
28End of the Headers
- Each header ends with a CRLF
- The end of the header section is marked with a
blank line - \r\n\r\n
- For GET and HEAD requests the end of the headers
is the end of the request!
29POST
- A POST request includes some data after the
headers (after the blank line). - There is no format for the data (just raw bytes).
- A POST request must include a Content-Length line
in the headers - Content-Length 267
- (this information is provide by browser)
30Example Request
- GET /hychen/testanswers.html HTTP/1.0
- Accept /
- User-Agent Internet Explorer
- From cheater_at_cheaters.org
- Referer http//foo.com/
31Example Post
- POST /CGI-BIN/add_appointments HTTP/1.0
- Accept /
- User-Agent Internet Explorer
- Content-Length 34
- 1220surgery0110doom0320bypass
32Typical Method Usage
- GET used to retrieve an HTML document.
- HEAD used to find out if a document has changed.
- POST used to submit a form.
33HTTP Response
- ASCII Status Line
- Headers Section
- Content can be anything (not just text)
- typically is HTML document
34Response Status Line
- HTTP-Version Status-Code Message
- Status Code is 3 digit number (for computers)
- Message is text (for humans)
35Status Codes
- 1xx Informational
- 2xx Success
- 3xx Redirection
- 4xx Client Error
- 5xx Server Error
36Example Status Lines
- HTTP/1.0 200 OK
- HTTP/1.0 301 Moved Permanently
- HTTP/1.0 400 Bad Request
- HTTP/1.0 500 Internal Server Error
37Response Headers
- Provide the client with information about the
returned entity (document). - what kind of document
- how big the document is
- how the document is encoded
- when the document was last modified
- Response headers end with blank line
38Response Header Examples
- Date Thu, 27 Jan 2000 124817 EST
- Server Apache/1.17
- Content-Type text/html
- Content-Length 1756
- Content-Encoding gzip
39Content
- Content can be anything (sequence of raw bytes).
- Content-Length header is required for any
response that includes content. - Content-Type header also required
40Try it with telnet
- gt telnet www.csie.ncnu.edu.edu 80
- GET / HTTP/1.0
- HTTP/1.0 200 OK
- Server Apache
- ...
Request
Blank Line (end of headers)
Response
41HTTP 1.0
- Stateless, request-response protocol
- Trial
- telnet www.ncnu.edu.tw 80
- Trying 163.22.3.4...
- Connected to moon.ncnu.edu.tw.
- Escape character is ''.
- GET /index.html HTTP/1.0
- responding data .
42Continuing
- HTTP/1.1 200 OK
- Date Mon, 29 Oct 2001 055709 GMT
- Server Apache/1.3.19 (Unix)
- Last-Modified Tue, 25 Jul 2000 071849 GMT
- ETag "2c68f-81-397d3f59"
- Accept-Ranges bytes
- Content-Length 129
- Connection close
- Content-Type text/html
- lthtmlgt
- ltheadgt
- ltmeta http-equivrefresh content1url"http//163
.22.4.67"gt - lt/headgt
- lt/htmlgt
43HTTP/1.0 other features
- Post
- Client can send information to server
- Allow forms
- If-modified-since request header
- Client says I have old data, give me new data
or tell me Im okay - Expires return header
- Server can set data to time-out
44HTTP/1.0 Authentication
- Basic authentication
- When challenged,client sends user-id and
password in plain-text to server - Not at all secure (snooping is easy), but widely
used for simple thing
45HTTP caching
- Proxy site between client and server
- Hopefully reduces client time, long-distance
bandwidth
Browser
HTTP Server
Long distance Requests are slow
Proxy Cache
Browser
HTTP Server
Hopefully fewer long, slow requests
Short, fastrequest to local proxy
46Extensions Secure Socket Layer
- A proprietary extension to HTTP/1.0
- Use public key encryption to establish an
encrypted (secured) channel
47HT TP Quick overview
48Web Technologies
Hypertext Web E-Publishing
Simple Response Web Fill-in Forms
Object Web Full-Blown Client/Server
Interactive Responsiveness
- JavaBeans/Applets
- ActiveX Controls
- Application Servers and OTMs
- ORB-Based interactions via CORBA or DCOM
- Shippable Places
- Object based documents XML, DOM and XSL
- Dynamic HTML
- Scripts
- Cookies/Sessions
- Active Server Pages (ASPs)
- CORBA plug-ins (WAI)
- Push
- WebObjects
- Servlets
Function
- Forms
- CGI
- Tables
- ISAPI
- NSAPI
49Web Application Servers
Web Browser HTML Forms
HTTP Over TCP/IP
Server
CGI
Application
Internet
HTML Documents
Web Browser Java
Client
Middleware
Server
50HTTP Request
Web
HTTP Request Syntax ltmethodgtltresource
identifiergtltHTTP versiongtltcrlfgt ltHeadergt
ltvaluegtltcrlfgt ltHeadergt ltvaluegtltcrlfgt
blank line ltcrlfgt entity body
request line
request header fields
entity body
Example GET /path/file.html HTTP/1.0 Accept
text/html Accept audio/x User-agent MacWeb
request line
request header fields
51HTTP Response
Web
HTTP Request Syntax ltHTTP Versiongtltresult
codegtltexplanationgtltcrlfgt ltHeadergt
ltvaluegtltcrlfgt ltHeadergt ltvaluegtltcrlfgt
blank line ltcrlfgt entity body
response header
header fields
entity body
Example HTTP/1.0 200 OK Server
Apache/1.1 Mime_version 1.0 Content_type
text/html Content_length 2000 ltHTMLgt ltHEADgtltTITL
Egt .
response header
header fields
entity body (i.e html doc)
52URI, URN URL
- Uniform Resource Identifier (URI)
- Uniform Resource Name (URN) and Uniform Resource
Locator (URL) - URN are meant to be persistent
- URL syntaxprotocol//usernamepasswd_at_hostname
port/path/subdirs/resrouce?param1value1param2va
lue2
Protocol Scheme
Arguments
Identification
Target Resource
Service Address
53Parameters passing
- With cookies in header
- With GET method
- Through URLs
- ltA HREFtiti.php3?arg1val1arg2val2gtthat
linklt/Agt - Through forms
- ltFORM ACTIONtiti.php3gtltINPUT .gtlt/FORMgt
- With POST method
- Only through forms
54CGI-Model
Client
Server
Web Browser
Web Server
Environment (variables)
N
1
Submit
2
3
4
CGI Programm
5
6
7
8
9
10
55Interaction problem
- HTTP is connectionless
- Stateless (lacking persistence)
- No out-of-the-box user tracking
- Replacement solution
- Cookies
- HTTP-Authentication
- Hidden fields
56User tracking
- Authentication password user-id
- HTTP-AUTH
- Hidden field token to be generated
- Cookie idem
- Session id with expiration time
- Problems
- Password hidden
- Session ending (clearing identification)
57Support
- Apache included
- Apache cookie (!) validity browser session
- UNIQUE_ID Apache environment variable
- PHP support
- setCookie( name , value ,expiration)
58Definitions I
- Message
- The basic unit of HTTP communication, consisting
of structured sequence of octets matching the
HTTP syntax and transmitted via the connection. - Request
- An HTTP request message.
- Response
- An HTTP response message.
- Resource
- A netword data object or service that can be
identified by a URI. Resources may be available
in multiple representations (eg. Multiple
languages, data formats, size, resolutions) or
vary in other ways
59Definitions II
- Entity
- The information transferred as the payload of a
request or response. An entity consists of
metainformation in the form of entity-header
fields and content in the form of an entity-body. - Representation
- An entity included with a response that is
subject to content negotiation. There may exist
multiple representation associated with a
particular response status. - Content Negociation
- The mechanism for selecting the appropriate
representation when servicing a request. The
representation of entitites in any response can
be negociated (including error responses).
60Definitions III
- Variant
- A resource may have one, or more than on,
representation(s) associated with it at any given
instant. Each of these representations is termed
as variant. Use of the term variant does not
necessarily imply that the resource is subject to
content negociation. - Client
- A program that establishes connections for the
purpose of sending requests. - User agent
- The client which initiates a request. These are
often browsers, editors, spiders (web-traversing
robots), or other end user tools
61Definitions IV
- Server
- An application program that accepts connections
in order to service requests by sending back
responses. Any given program may be capable of
being both a client and a server these terms
refer only to the role being performed by the
program for a particular connection, rather than
to the programs capabilities in general. - Proxy
- An intermediary program which acts as both a
server and a client for the purpose of making
requests on behalf of other clients. Requests are
serviced internally or by passing them on, with
possible translation, to other servers. A proxy
must implement both the client and server
requirements.
62Definitions V
- Cache
- A programs local store of response messages and
the subsystem that controls its message storage,
retrieval, and deletion. A cache stores cachable
responses in order to reduce the response time
and network bandwith consumption on future,
equivalent requests. Any client or server may
include a cache, though a cache cannot by used by
a server that is acting as a tunnel. - Cachable
- A response is cachable if a cache is allowed to
store a copy of the response message for use in
answering subsequent requests (see rules in
ref.). Even if a resource is cachable, there may
be additional constraints on whether a cache can
use the cached copy for a particular request.
63Definitions VI
- Gateway
- A server which acts as an intermediary for some
other server. Unlike a proxy, a gateway receives
requests as if it were the origin server for the
requested resource the requesting client may not
be aware that it is communicating with a gateway. - Tunnel
- An intermediary program which is acting as a
blind relay between two connections. Once active,
a tunnel is not considered a party to the HTTP
communication, though the tunnel may have been
initiated by an HTTP request. The tunnel ceases
to exists when both ends of the relay connections
are closed.
64HTTP 1.1 RFC 2068
65Problems of HTTP/1.0
- Each request requires a new connection
- Starting up new connection is slow (TCP
slow-start) - Starting up connections takes several packets.
- Caching is not very flexible
- Primitive cache model
- Lack of support transfer of entities
- Insecure basic authentication mechanism
- Virtual Hosts (servers) require lots of IPs
- Assisted by DNS
66IP-based Virtual hosts
Before HTTP/1.1
Server
Domain Name System www.vh1.com
163.22.21.50 www.vh2.com 163.22.21.51
www.vh1.com
www.vh1.com
163.22.21.50 163.22.21.51
67HTTP main features (1)
- Supporting Host header field
- Enable non IP-based virtual hosts
- Report an error without host field
- Accept absolute URLs in requests
- HTTP/1.0 do this only in requesting to Proxy
- New request methods
- DELETE, OPTIOINS, PUT, and TRACE
68HTTP/1.1 main features (2)
- Partial entities transfer
- bandwidth saving
- Continue a interrupted content
- Content negotiation
- Make a selection between different
representations for a resource - Language, quality, encoding, or other parameters.
- Chunked encoding
- For unknown content-length applications
(dynamically created content) - Save buffering time in server site
69HTTP/1.1 main feature (3)
- Bandwidth optimization
- Persistent connections
- pipelining
- More sophisticated support for caching
- More secure authentication scheme
- A digest access authentication MD5
- HTTP 1.0 insecure basic authentication
70Non IP-based Virtual hosts
HTTP/1.1 introduce Host header field in HTTP
request header
Server
Domain Name System www.vh1.com
163.22.21.50 www.vh2.com 163.22.21.50
www.vh1.com
www.vh1.com
163.22.21.50
71HTTP/1.1 Host header
- Problem virtual servers use too many IP
addresses in HTTP/1.0 - GET /index.html HTTP/1.0 doesnt specify
server name - Solution include hostname in the request
- GET /index.html HTTP/1.1
- Host www.csie.ncnu.edu.tw (required)
- Example
72HTTP/1.1 Range request
- Support partial content (specified in Byte)
request - Use Range bytess1-e1,s2-e2
- Example
- Optional
- If-Range validate tag (e.g. cc678-12d12-66394036
) - If-Unmodified-Since (e.g., Tue, 29 Oct 2002
105020 GTM)
73HTTP/1.1 Persistent connections
- Problem opening new connection is slow
- Solution Send multiple requests over one
connection - GET /index.html HTTP/1.1 (response)connection
keep-aliveGET /images/map.gif
HTTP/1.1connection close - Example
74HTTP Evolution
open
open
open
close
open
close
close
close
open
close
HTTP 1.0
HTTP 1.1
HTTP 1.1pipelining
75Chunked transfer coding
- In persistent connection,
- For most static resource, server knows the
Content-Length in advance - However, in CGI (Common Gateway Interface)
applications, the content-length is unknown in
advance - HTTP/1.1 defined chunked transfer coding, which
is specified in the Transfer-Encoding - Example
chunk size
chunk data
chunk size
chunk data
0 size
trailer
76Cache model and Proxy
HTTP Server
Browser
cache Proxy
cache
cache
77Content Negotiation
client
Server
request
Entity
select
Entity
Entity
Entity
response
78Content Negotiation
client
Server
request
select
Entity
response
Entity
request
Entity
Entity
response
79Content Negotiation
- Transparent content negotiation
client
PROXY
Server
request
request
select
Entity
response
Entity
select
request
Entity
Entity
response
response
80Security Issues
- Create a secure transmission
- Using a secure transport infrastructureHTTP over
SSL (Secure Socket Layer)HTTPS - Using a secure application level protocolSecure
HTTP (S-HTTP)
81SSL solution
Application layer
FTP
Telnet
Others
HTTP
SSL (Secure Socket Layer)
Transport layer
TCP/IP
82Secure HTTP (S-HTTP)
Application layer
S-HTTP
Telnet
FTP
HTTP
Transport layer
TCP/IP
83Cookies State management
- HTTP is a stateless protocol
- A piece of information exchanged between client
and server, and it used to maintain the state
information - RFC 2109, evolved from Netscapes initial
specification
84HTTP state mangement cookies
HTTPserver
Client
application
request
Forward request
Generate Cookie
Output cookie
Response Set-Cookie
request Cookie
Forward request Cookie
Analyze cookie
Output
Response
HTTP
CGI
85Usage of HTTP
Order Headers Count Percent 1
5 10844 41.73 2 6 6615 25.45 3 8
4008 15.42 4 3 2444 9.40 5 4
1047 4.03 6 7 909 3.50 7 9
50 0.19 8 2 46 0.18 9 0
20 0.08 10 1 4 0.02 11 10
1 0.00 Total 25988
1997 UK web survey
86Frequency of HTTP response header
87The future of HTTP
- Caching
- Hit count reporting
- Compression
- Distributed Authoring and Versioning of web
contents - Transparent content negotiation
- HTTP protocol extensibility
- Multiplexing of HTTP streams
88HTTP 1.0 and HTTP/1.1
- Overview of HTTP
- RFC 1945 and RFC 2616
- HTTP extensibility
- Security
- HTTPS using SSL in Web exchanges
- Security in HTTP/1.0
- HTTP compression
- Paper
- Key Differences between HTTP/1.0 and HTTP/1.1
89Protocol Extension
- HTTP/1.2 , /1,3
- HTTP Extension Framework
- http//www.w3.org/Protocols/HTTP/ietf-http-ext/
- Feb 14, 2000 HTTP Extension Framework moved to
Experimental RFC (RFC2774)
90HTTP next generation (HTTP-ng)
- is very different from HTTP/1.1
- A first working draft has been published.
- Part of the HTTP-ng initiative contains
Multiplexing Protocol issue, -- multiplex
multiple HTTP-ng connections over a single
transport connection.
91Multiplexing Protocol (SMUX)
- Althougth HTT/1.1 improves over HTTP/1.0
- Persistent connections
- Pipelining
- Its not possible to transfer request/responses
in parallel over a single HTTP connections. - SMUX is designed as a layer
- Between TCP and HTTP
92Why MUX?
- HTTP/1.0 opens a TCP connection for each URI
retrieved (at a cost of both packets and round
trip times (RTTs)), and then closes the
connection. For small HTTP requests, these
connections have poor performance due to TCP slow
start as well as the round trips required to open
and close each TCP connection - HTTP/1.1 persistent connections and pipelining
will reduce network traffic and the amount of TCP
overhead caused by opening and closing TCP
connections.
93WebMUX Overview
- MUX is a session management protocol separating
the underlying transport from the upper level
application protocols. - It provides a lightweight communication channel
to the application layer by multiplexing data
streams on top of a reliable stream oriented
transport.
94WebMUX Overview (cont.)
- By supporting coexistence of multiple application
level protocols (e.g. HTTP and HTTP-NG), MUX will
ease transitions to future Web protocols, and
communications of client applets using private
protocols with servers over the same connection
as the HTTP conversation.
95HTTP Related Protocols
- IMAP
- MIME
- RFC 822 defines a message representation protocol
- File Transfer Protocol (FTP)
- is defined in RFC 959
- Network News Transfer Protocol (NNTP)
- is defined in RFC 977
- News format is defined in in RFC 850,
- Gopher
96HTTP Performance Issues
- Compression and Performance
- HTTP and TCP Interactions
- HTTP and System Overhead
- TCP Analysis Tools
- Papers
- Network Performance Effects of HTTP/1.1,
CSS1, and PNG - Other interesting papers about performance
97Web Traffic and Performance
- Overview for web traffic measurements
- monitoring the web transfers
- generating the measurement records
- preprocessing the data in preparation for
analysis - Performance Turing
98Web applications
- Information retrieval and search engine
- Multimedia streaming
- Real Time Streaming Protocol (RTSP)
- which borrow several key concept from HTTP/1.1
99(No Transcript)