Title: 20755: The Internet Lecture 10: Web Services III
120-755 The InternetLecture 10 Web Services III
- David OHallaron
- School of Computer Science and
- Department of Electrical and Computer Engineering
- Carnegie Mellon University
- Institute for eCommerce, Summer 1999
2Todays lecture
- Anatomy of a simple Web server (40 min)
- Break (10 min)
- Advanced server features (45 min)
3Anatomy of Tiny A simple Web server
!/usr/local/bin/perl5 -w use IOSocket
tiny.pl - The Tiny HTTP server
4Tiny configuration
Configuration port 8000
the port we listen on htmldir
"./html/" the base html directory
cgidir "./cgi-bin/" the base cgi
directory server "Tiny Web server 1.0"
server info
5Tiny error messages
Error messages Terse error messages go
in the response header terse_errors (
"403", "Forbidden", "404", "Not Found",
"501", "Not Implemented", )
Verbose error messages go in the response message
body verbose_errors ( "403", "You
are not allowed to access this item",
"404", "Tiny couldn't find the requested item on
the server", "501", "Tiny does not support
the given request type", )
6TinyCreate a listening socket
Create a TCP listening socket file
descriptor LocalPort list on port port
Type use TCP Resuse reuse
address right away Listen buffer at most
10 requests listenfd IOSocketINET-new(
LocalPort port,
Type SOCK_STREAM,
Reuse 1,
Listen 10) or die "Couldn't
listen on port port _at_\n"
7Tinymain loop structure
Loop forever waiting for HTTP requests
while(1) Wait for a connection
request from a client connfd
listenfd-accept() Determine the
domain name and IP address of this client
Parse the request line (after stripping the
newline) Parse the URI Parse the
request headers OPTIONS method HEAD
method GET method misc POST, PUT,
DELETE, and TRACE methods
8Tiny error procedure
error - send an error message back to the
client _0 the error number _1
the method or URI that caused the error sub
error local(errno) _0
local(errmsg) "errno terse_errorserrno"
print connfd errmsg Content-type text/html
errmsg bgcolor"ffffff" errmsg
verbose_errorserrno _1
The Tiny Web Server
EndOfMessage
9Tinyget clients name and address
Determine the domain name and IP address of
this client client_sockaddr
getpeername(connfd) (client_port,
client_iaddr) unpack_sockaddr_in(client_sockad
dr) client_port client_port so -w won't
complain client_name gethostbyaddr(client_iad
dr, AF_INET) (a1, a2, a3, a4)
unpack('C4', client_iaddr) print "Opened
connection with client_name (a1.a2.a3.a4)\n"
10Tinyparsing the request line
Parse the request line (after stripping the
newline) chomp(line )
(method, uri, version) split(/\s/, line)
print "received line\n"
11Tinyparsing the URI
Parse the URI Either
the URI refers to a CGI program... if (uri
m/cgi-bin/) is_static 0
extract the program name and its
arguments (filename, cgiargs)
split(/\?/, uri) if
(!defined(cgiargs)) cgiargs
"" replace /cgi-bin with
the default cgi directory filename
s/cgi-bin/cgidiro
12TinyParsing the URI
... or the URI refers to a file else
is_static 1 static content
cgiargs "" replace the
first / with the default html directory
filename uri filename
s/htmldiro use index.html for
the default file filename
s//index.html debug
statements like this will help you a lot
print "parsed URI is_staticis_static,
filenamefilename, cgiargscgiargs\n"
13Tinyparsig the request headers
Parse the request headers
content_length 0 content_type
"text/html" while () read
request header into _ Delete CR and
NL chars s/\n\r//g delete CRLF and
CR chars from _ Determine the
length of the message body search for
"Content-Length" at beginning of string _
ignore the case if
(/Content-Length (\S)/i)
content_length 1
14Tinyparse the command line (cont)
determine the type of content (if any)
in msg body search for
"Content-Type" at beginning of string _
ignore the case if (/Content-Type
(\S)/i) content_type 1
If _ was a blank line, exit
the loop if (length 0)
last
15TinyOPTIONS
OPTIONS method if
(method eq "OPTIONS") today
gmtime()." GMT" connfd-print("version
200 OK\n") connfd-print("Date
today\n") connfd-print("Server
server\n") connfd-print("Content-leng
th 0\n") connfd-print("Allow
OPTIONS HEAD GET\n") connfd-print("\n"
)
16TinyHEAD
HEAD method elsif
(method eq "HEAD") we're
dissallowing HEAD methods on scripts if
(!is_static) error(403,
filename) else
today gmtime()." GMT"
head_method(filename, uri, today, server)
17TinyHEAD (cont)
process the HEAD method on static content
_0 the file to be processed _1
the uri _2 today's date _3
server name sub head_method local
(filename) _0 local (uri) _1
local (today) _2 local (server)
_3 local modified local
filesize local filetype
18TinyHEAD (cont)
make sure the requested file exists if
(!(-e filename)) error(404, uri)
make sure the requested is
readable elsif (!(-r filename))
error(403, uri)
19Tiny HEAD (cont)
serve the response header but not the file
else determine file modifcation
date modified gmtime((stat(filename))
9)." GMT" determine filesize in
bytes filesize (stat(filename))7
determin filetype (default is text)
if (filename /\.html/)
filetype "text/html"
elsif (filename /\.gif/)
filetype "image/gif"
elsif (filename /\.jpg/)
filetype "image/jpeg"
else filetype "text/plain"
20TinyHEAD (cont)
print the response header
connfd-print("HTTP/1.1 200 OK\n")
connfd-print("Date today\n")
connfd-print("Server server\n")
connfd- print("Last-modified
modified\n") connfd-
print("Content-length filesize\n")
connfd-print("Content-type filetype\n")
print("\n") CRLF required by HTTP standard
end of else end of procedure
21Some Tiny issues
- How would you serve static and dynamic content
with GET? - How would you serve dynamic content with POST?
- How safe are your CGI scripts?
- hint consider the impact of allowing .. in
URIs.
22Break time!
Fish
23Todays lecture
- Anatomy of a simple Web server (40 min)
- Break (10 min)
- Advanced server features (45 min)
24Cookies
- An HTTP session is a sequence of request and
response messages between a client and a server. - Regular HTTP sessions are stateless
- Each request/response pair is independent of the
others - Cookies are a mechanism for creating stateful
sessions (RFC 2109) - Allows servers and CGI scripts to maintain state
information (e.g., which items are in a shopping
cart) during a session. - Based on HTTP Set-Cookie (server-client) and
Cookie (client-server) headers.
25Cookies
server
client
request 1
Client initiates request to server.
Server includes a Set-Cookie header in the HTTP
response that contains info (the cookie) the
identifies the user. The client stores the
cookie on disk.
server
client
response 1 (Set-Cookie)
26Cookies
Next time the client sends a request to the
server, it includes the cookie as a Cookie header
in the HTTP request message.
server
client
request 2 (Cookie)
The server incorporates any relevant new info
from request 2 into the Set-Cookie header in
response 2.
server
client
response 2 (Set-Cookie)
27Cookie example(from RFC 2109)
- Initially the client has no stored cookies.
- Client - server
- POST /acme/login HTTP/1.1
- form data
- user identifies self in form data
- Server - client
- HTTP/1.1 200 OK
- Set-Cookie CustomerWILY_COYOTE path /acme
- cookie identifies user
- client stores cookie for the next request to this
server
28Cookie example (cont)
- Client - server
- POST /acme/pickitem HTTP/1.1
- Cookie CustomerWILY_COYOTE Path /acme
- form data
- User selects an item for a shopping basket
- Server - client
- HTTP/1.1 200 OK
- Set-Cookie Part_NumberRocket_Launcher_0001
path/acme - Server remembers that shopping basket contains an
item
29Cookie example (cont)
- Client - server
- POST /acme/shipping HTTP/1.1
- Cookie CustomerWILY_COYOTE Path/acme
Part_NumberRocket_Launcher_0001 Path/acme - form data
- user selects a shipping method from form
- Server - client
- HTTP/1.1 200 OK
- Set-Cookie ShippingFedEx path/acme
30Cookie example (cont)
- Client - server
- POST /acme/process HTTP/1.1
- Cookie CustomerWILY_COYOTE Path/acme
Part_NumberRocket_Launcher_0001
Path/acme ShippingFedEx Path/acme - form data
- user chooses to process order
- Server - client
- HTTP/1.1 200 OK
- transaction complete
31Cookies
- Cookies are groups by the URI pathname in the
request headers (in this case /acme) - The server adds cookies to the client in the
response headers. - The server an implicitly delete cookies by
setting an expiration data in the Set-Cookie
header (not shown in previous example)
32Applications and implications of cookies
- Click tracking
- can be used to correlate a users activity at
many different sites. - Doubleclick.com pays a web site to place an src tag on the sites page.
- Causes an advertising banner and a cookie from
Doubleclick.com to be loaded into the client when
the sites page is referenced. - Firms like Doubleclick maintain a unique id per
client machine, but have no way to determine the
users name or other info unless the user
supplies it.
33Applications of cookies
- Content customization
- Cookies can be used to remember user preferences
and customize content to suit those preferences. - Firms like Doubleclick can record past browsing
patterns and target advertising based on the
reference pattern and where they are currently
browsing.
34Refer links
- User looking at page www.cs.cmu.edu/droh/755/foo.
html clicks a link to kittyhawk.cmcl.cs.cmu.edu/ba
r.html - Browser sends a referer (sic) header to identify
the source page of the request
GET /bar.html HTTP/1.1 Accept image/gif,
image/x-xbitmap, image/jpeg, image/pjpeg,
application/vnd.ms-excel, application/msword,
application/vnd.ms-powerpoint, / Referer
http//www.cs.cmu.edu/droh/755/foo.html Accept-La
nguage en-us Accept-Encoding gzip,
deflate User-Agent Mozilla/4.0 (compatible MSIE
4.01 Windows 98) Host kittyhawk.cmcl.cs.cmu.edu
8000 Connection Keep-Alive
35Applications of refer links
- Allows advertisers to gauge the effectiveness of
ads they place on other sites. - Allows the kind of 3rd party referral businesses
like BeFree.com.
36Log files
extissnj1.foo.com - - 14/Jul/1999201438
-0400 "GET /people/faculty/dohallaron
HTTP/1.0" 301 375 "http//www.ecom.cmu.edu/peo
ple/faculty/" "Mozilla/4.05 en (WinNT
I)" inet-fw1-o.foo.com - - 15/Jul/1999025810
-0400 "GET /people/faculty/dohallaron
HTTP/1.0" 301 375 "http//www.ecom.cmu.edu/peo
ple/faculty/" "Mozilla/4.06 en (WinNT
U)" internet5.foo.com - - 15/Jul/1999163559
-0400 "GET /people/faculty/dohallaron HTTP/1.0"
301 375 "http//www.ecom.cmu.edu/people/facul
ty/" "Mozilla/4.04 enC-c32f404p (Win95
I)" tmpce001.foo.com - - 16/Jul/1999160418
-0400 "GET /people/faculty/dohallaron HTTP/1.0"
301 375 "http//www.ecom.cmu.edu/people/facul
ty/" "Mozilla/4.06 en (Win95
I)" hqinbh2.foo.com - - 22/Jul/1999160351
-0400 "GET /people/faculty/dohallaron/droh.q
uake.gif HTTP 1.0" 200 14336
"http//www.ecom.cmu.edu/people/faculty/dohallaron
/" "Mozilla/4.6C-CCK-MCD en (X\
37Implications of logs
- Contain a great deal of personal information
about the browsing patterns of people inside and
outside a site. - Important issue?
- Who has access to logs?
- How is the log information being used?
38Virtual hosting
- Virtual hosting allows one web server to serve
requests for multiple domains. - Allows ISPs to provide customers with their own
vanity sites. - Each eCommerce student has their own virtual Web
server running at .student.ecom.cmu.edu.
- e.g., http//zak.student.ecom.cmu.edu
- equivalent to http//euro.ecom.cmu.edu/zack
39Virtual hostingHow it works
- Configure DNS so that all virtual hosts have the
same IP address - e.g., each eCommerce student site has the IP
address 128.2.218.2 (same as euro.ecom) - verify this yourself with nslookup
- Server maintains a list of (domain name,
directory tree) pairs in a hash. - Server sets base html and cgi directories
according to the target domain name.
40Virtual hosting
server
Requests to 128.2.218.2
elenak.student.ecom.cmu.edu
zak.student.ecom.cmu.edu
zak
elenak
mansoo
www
www
www
cgi-bin
html
cgi-bin
html
cgi-bin
html
41Server-side includes
- Server mechanism that inserts dynamic or static
content directly into an HTML document.
some html so
me more html
some html some more html