Title: Internet Engineering Course
1Internet Engineering Course
2Introduction
- Company needs to provide various web services
- Hosting intranet applications
- Company web site
- Various internet applications
- Therefore there is a need to provide http server
- First we have a look at what http protocol is
- Then we talk about Web Servers and Apache as
leading web server application
3The World Wide Web (WWW)
- Global hypertext system
- Initially developed in 1989
- By Tim Berners Lee at the European Laboratory for
Particle Physics, CERN in Switzerland. - To facilitate an easy way of sharing and editing
research documents among a geographically
dispersed groups of scientists. - In 1993, started to grow rapidly
- Mainly due to the NCSA developing a Web browser
called Mosaic (an X Window-based application) - First graphical interface to the Web ? More
convenient browsing - Flexible way people can navigate through
worldwide resources in the Internet and retrieve
them
4Web Browsers
- Provides access to a Web server
- Basic components
- HTML interpreter
- HTTP client used to retrieve HTML pages
- Some also support
- FTP, NTTP, POP, SMTP,
5Web Servers
- Definitions
- A computer, responsible for accepting HTTP
requests from clients, and serving them Web
pages. - A computer program that provides the above
mentioned functionality. - Common features
- Accepting HTTP requests from the network
- Providing HTTP response to the requester
- Typically consists of an HTML
- Usually capable of logging
- Client requests/Server responses
6Web Servers cont.
- Returned content
- Static
- Comes from an existing file
- Dynamic
- Dynamically generated by some other
program/script called by the Web server. - Path translation
- Translate the path component of a URL into a
local file system resource - Path specified by the client is relative to the
servers root dir
7Basic Client/Server Architecture in WWW
- Overall organization of the Web.
- Basic function operation is to fetch documents
- Client issues requests, browser displays document
- Server responsible for retrieving document from
local file system - Client/server communications based on HTTP
protocol
8Dynamic Content
- Parts of documents may be specified via
scripts/programs - Client-side (executed on client machine, e.g.,
within the browser) - Client-side script - Script embedded in html
document - Applet - pre-compiled program passed to client
- Server-side (executed on server machine)
- Server-side script embedded in document
- Servelet - precompiled program executed within
the servers address space - CGI scripts
9Common Gateway Interface (CGI)
- The principle of using server-side CGI programs.
- Allows documents can be generated dynamically
on-the-fly - Provides a standard way for web server to execute
a program using user-provided data as input - To the server, CGI program appears as program
responsible for fetching the requested document
10Architectural Overview
- Architectural details of a client and server in
the Web.
- Document fetch (and possibly server-side script)
2b-3b - Execute CGI Script (separate process) 2c-3c-4c
- Execute servlet program (run within server)
2a-3a-4a
11http protocol
- Defines the communication between a web server
and a client - Used to deliver virtually all files and other
data (collectively called resources) on the World
Wide Web - A browser is an HTTP client because it sends
requests to an HTTP server (Web server - The standard (and default) port for HTTP servers
to listen on is 80, though they can use any port.
12Structure of http transactions
- Request/Response, text based protocol
- Format of a http message
- ltinitial line, different for request vs.
responsegt - Header1 value1
- Header2 value2
- Header3 value3
- ltoptional message body goes here, like file
contents or query data it can be many lines
long, or even binary data gt
13The Format of a Request
method
sp
URL
sp
version
header
value
header
value
Entity Body
14Request Example
- GET /index.html HTTP/1.1 CRLF
- Accept image/gif, image/jpeg CRLF
- User-Agent Mozilla/4.0 CRLF
- Host www.ui.ac.ir80 CRLF
- Connection Keep-Alive CRLF
- CRLF
15Request Example
- GET /index.html HTTP/1.1
- Accept image/gif, image/jpeg
- User-Agent Mozilla/4.0
- Host www.ui.ac.ir80
- Connection Keep-Alive
- blank line here
16The Format of a Response
status line
version
sp
status code
sp
phrase
header
value
header
value
Entity Body
17Response Example
- HTTP/1.0 200 OK
- Date Fri, 31 Dec 1999 235959 GMT
- Content-Type text/html
- Content-Length 1354
- lthtmlgt
- ltbodygt
- lth1gtHello Worldlt/h1gt
- (more file contents) . . .
- lt/bodygt
- lt/htmlgt
18Response Example
- HTTP/1.0 200 OK
- Date Fri, 31 Dec 1999 235959 GMT
- Content-Type text/html
- Content-Length 1354
- lthtmlgt
- ltbodygt
- lth1gtHello Worldlt/h1gt
- (more file contents) . . .
- lt/bodygt
- lt/htmlgt
19Initial line
- A typical initial request line
- GET /path/to/file/index.html HTTP/1.0
- Initial response line
- HTTP/1.0 200 OK
- HTTP/1.0 404 Not Found
- Status code
- 1xx indicates an informational message only
- 2xx indicates success of some kind
- 3xx redirects the client to another URL
- 4xx indicates an error on the client's part
- 5xx indicates an error on the server's part
- Common status codes
- 200 OK
- 404 Not Found
- 301 Moved Permanently
- 302 Moved Temporarily
- 303 See Other (HTTP 1.1 only)
- 500 Server Error
20Header lines
- Typical request headers
- From email address of requester
- User-Agent for example User-agent Mozilla/3.0Gol
d - Typical response headers
- Server for example Server Apache/1.2b3-dev
- Last-modified for example Last-Modified , 19
Feb 2006 235959 GMT
21Message body
- In a response, this is where the requested
resource is returned to the client (the most
common use of the message body), or perhaps
explanatory text if there's an error. - In a request, this is where user-entered data or
uploaded files are sent to the server. - If an HTTP message includes a body, there are
usually header lines in the message that describe
the body. In particular, - The Content-Type header gives the MIME-type of
the data in the body, such as text/html or
image/gif. - The Content-Length header gives the number of
bytes in the body.
22MIME Media types
- Multipurpose Internet Mail Extensions
- HTTP sends the media type of the file using the
Content-Type header - Some important media types are
- text/plain, text/html
- image/gif, image/jpeg
- audio/basic, audio/wav
- model/vrml
- video/mpeg, video/quicktime
- application/, application-specific data that
does not fall under any other MIME category, e.g.
application/octet-stream
23Sample HTTP exchange
- To retrieve the file at the URL
http//www.somehost.com/path/file.html - Request
- GET /path/file.html HTTP/1.0
- From someuser_at_jmarshall.com
- User-Agent HTTPTool/1.0
- blank line here
- Response
- HTTP/1.0 200 OK
- Date Fri, 31 Dec 1999 235959 GMT
- Content-Type text/html
- Content-Length 1354
- lthtmlgt ltbodygt lth1gtHappy New Millennium!lt/h1gt
(more file contents) . . . lt/bodygt lt/htmlgt
24HTTP methods
- GET request a resource by url
- HEAD
- is just like a GET request, except it asks the
server to return the response headers only, and
not the actual resource (i.e. no message body). - This is useful to check characteristics of a
resource without actually downloading it, thus
saving bandwidth. - POST
- A POST request is used to send data to the server
to be processed in some way, like by a CGI
script. - There's a block of data sent with the request, in
the message body. There are usually extra headers
to describe this message body, like Content-Type
and Content-Length. - The request URI is not a resource to retrieve
it's usually a program to handle the data you're
sending. - The HTTP response is normally program output, not
a static file.
25HTTP 1.1
- It is a superset of HTTP 1.0. Improvements
include - Faster response, by allowing multiple
transactions to take place over a single
persistent connection. - Faster response and great bandwidth savings, by
adding cache support. - Faster response for dynamically-generated pages,
by supporting chunked encoding, which allows a
response to be sent before its total length is
known. - Efficient use of IP addresses, by allowing
multiple domains to be served from a single IP
address.
26Manually Experimentingwith HTTP
- gttelnet eng.ui.ac.ir 80
- Trying 192.168.50.84
- Connected to eng.ui.ac.ir
- Escape character is .
27Sending a Request
- gt GET /ladani/index.htm HTTP/1.0
- blank line
28The Response
- HTTP/1.1 200 OK
- Date Fri, 29 Feb 2008 082333 GMT
- Server Apache/2.0.52 (CentOS)
- Last-Modified Wed, 07 Nov 2007 122744 GMT
- ETag "6ccb6-741c-43e55e05a5000"
- Accept-Ranges bytes
- Content-Length 29724
- Connection close
- Content-Type text/html charsetWINDOWS-1256
- lt!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN"gt -
lthtmlgt -
ltheadgt -
ltmeta - http-equiv"Content-Type" content"text/html
charsetwindows-1252"gt -
ltmeta name" - GENERATOR" content"Microsoft FrontPage 5.0"gt
- .
29GET /ladani/index.htm HTTP/1.0
HTTP/1.1 200 OK
HTML code
30GET /ladani/no-such-page.htm HTTP/1.0
HTTP/1.1 404 Not Found
HTML code
31GET /index.html HTTP/1.1
HTTP/1.1 400 Bad Request
HTML code
Why is it a Bad Request?
HTTP/1.1 without Host Header
32Session-persistent State
- What does session-persistent state mean?
- State information that is preserved between
browsing sessions. - Information that is stored semi-permanently
(i.e., on disk) for later access. - Why was calculator example not session-persistent?
- Sum, current display, etc. not preserved if we
went to a different website and back to
calculator.
33Why session-persistence?
- User-based customizations.
- MyYahoo, ETrade, etc.
- Long transactions.
- Electronic shopping carts.
- Order preparation
- Server-side state maintenance.
- Large amounts of state info that you dont want
to pass back and forth.
34Cookie Overview
- HTTP cookies are a mechanism for creating and
using session-persistent state. - Cookies are simple string values that are
associated with a set of URLs. - Servers set cookies using an HTTP header.
- Client transmits the cookie as part of HTTP
request whenever an associated URL is visited in
the future.
35Anatomy of a cookie.
- Cookie has 6 parts
- Name
- Value
- Domain
- Path
- Expiration
- Security flag
- Name and Value are required, others have default
value.
36Setting a cookie.
- A cookie is set using the Set-cookie header in
an HTTP response. - String value of the Set-cookie header is parsed
into semi-colon separated fields that define the
different parts of the cookie. - Cookie is stored by the client.
37Sending cookies
- Every time a client makes an HTTP request, it
tests every cookie for a match. - Cookies match if
- Cookie domain is suffix of URL server.
- Cookie expiration has not passed.
- Cookie path is prefix of URL path.
- Cookie security flag is on and connection is
secure. - If a match is made, then name/value pair of
cookie is sent as Cookie header in request.
38Setting a Cookie
- Full cookie
- Set-Cookie my_cookie This is my cookie value
domain.eng.ui.ac.ir path/ladani expires Thu,
06-March-08 120000 GMT - Can have more than one Set-Cookie header, or can
combine more than one cookie in one header by
separating with ,
39Cookie Matching
- Biggest misunderstanding
- Servers do not RETRIEVE cookies!!!!
- Servers RECEIVE cookies previously planted.
- Step 1
- Some response by server installs cookie with
Set-cookie header. - Client saves cookie to disk.
40Cookie Matching
- Step 2
- Browser goes to some page which matches
previously received cookie. - Cookie name and value sent in request as Cookie
HTTP header. - Step 3
- CGI program detects presence of cookie and uses
it. - Where is the cookie info?
- Environment variable HTTP_COOKIE
41Where are cookies stored on client?
- Client-specific locations.
- No standard.
- Latest IE stores in a folder called Temporary
Internet Files - Each cookie stored in a separate file.
- Netscape stores in cookies.txt
42Typical Cookie Usages
- Cookies as Database Index
- Most common use of cookies.
- State information is kept in some sort of
database and the cookie acts as an index. - Cookies as State Variables
- Name of cookie is like variable name.
- Value of cookie is state information.
43Cookie Security
- Security flag restricts when browser will send a
cookie back to server. - Requires secure connection.
- For example https in effect.
- What does this mean about when the cookies was
set?
44First Web Server
- Berners-Lee wrote two programs
- A browser called WorldWideWeb
- The worlds first Web server, which ran on
NeXSTEP - The machine is on exhibition at CERNs public
museum
45Most Famous Web Servers
- Apache HTTP Server from Apache Software
Foundation - Internet Information Services (IIS) from
Microsoft - Google Web Server (GWS)
- Started from May 2007
- Lighttpd
- powers several popular Web 2.0 sites like
YouTube, wikipedia and meebo
46Web Servers Usage Statistics
- The most popular Web servers, used for public Web
sites, are tracked by Netcraft Web Server Survey - Details given by Netcraft Web Server Reports
- Apache is the most popular since April 1996
- Currently (February 2008) about
- 50.93 ? Apache
- 35.56 ? Microsoft (IIS, PWS, etc.)
- 5.16 ? Google
- 0.99 ? Lighttpd
47Web Servers Usage Statistics cont.
Total Sites Across All Domains August 1995 -
February 2008
48Web Servers Usage Statistics cont.
Market Share for Top Servers Across All Domains
August 1995 - February 2008
49Web Servers Usage Statistics cont.
Totals for Active Servers Across All DomainsJune
2000 - February 2008
50Apache (A PAtCHy) Web Server
- Origins NCSA (Univ. of Illinois,Urbana/Champaign)
- Now Apache Software Foundation (www.apache.org),
developers world-wide - Most widely used web server today NetCraft web
survey, 2/2008 - Open source software
- Geographically distributed developers
- Modular, extensible design needed where
third-party developers could override or extend
basic characteristics
51Web Server Processing Steps
52Apache HTTP Server
- Apache Core
- Receives client request
- Typically, allocate new process for each incoming
request - Allocates request record
- Invokes handlers on individual modules in
sequence - Modules register handlers during configuration
- Handler
- Request record passed as single parameter
- Each handler reads/modifes request record
53Web Server Phases
- Apache core invokes a handler for each phase
- Resolve document reference (URI) to a local file
name (or CGI programparameters) - Client authentication (verify client identity)
- Client access control (determine access rights)
- Request access control (check if access allowed)
- MIME type determination of the response
- General phase for handling leftovers (e.g., check
syntax of returned response, build up user
profile) - Transmission of the response to client
- Logging data on the processing of the request
54References
- http//www.jmarshall.com/easy/http/
- TCP/IP Tutorial and Technical Overview,
Rodriguez, Gatrell, Karas, Peschke, IBM redbooks,
August 2001 - Wikipedia, the free encyclopedia
- Apache The Definitive Guide, 2nd edition, Ben
Laurie, Peter Laurie, OReilly, February 1999 - Webmaster in a nutshell, 1st edition, Stephen
Spainhour, Valerie Quercia, OReilly, October
1996 - Netcraft February 2006 Web Server Survey