Title: Internet Applications
1Internet Applications
2The World Wide Web
- By far the best known distributed application is
the World Wide Web (WWW), or the Web for short.
Technically, the web is a distributed system of
HTTP servers and clients, more commonly known as
web servers and web browsers. - Prior to the emergence of the web, the user
community of the Internet largely comprised of
researchers and academics who used network
services such as electronic mail and file
transfer to exchange data. - The World Wide Web originated with Tim
Berners-Lee in late 1990 for CERN, the European
Particle Physics Laboratory in Geneva,
Switzerland. A proposal for a "universal
hypertext system" was submitted in November 1990
by Tim Berners-Lee and Robert Cailliau for a
"universal hypertext system."
3The World Wide Web
- Since the original proposal, the growth of the
World-Wide Web has been extraordinary (see Figure
1), and has expanded far beyond the research and
academic community into all sectors world-wide,
including commerce and private homes. The
continued development of the Web technology is
currently coordinated by the World-Wide Web
Consortium, W3C.
4The World Wide Web
- The genius of the World-Wide Web is that it
combines three important and well-established
computing technologies - Hypertext documents documents in which chosen
words or phrases, typically highlighted, can be
marked as links to other documents, so that a
user is able to access the linked documents by
clicking with a mouse on the highlighted text. - Network based information retrieval the File
Transfer Protocol (FTP) service was the most
widely used service for such information
retrieval. - Standard Generalized Markup Language (SGML), an
ISO standard which allows documents to be marked
up with tags so that they can be displayed in a
uniform format on any platform, independent of
the presentation mechanics.
5The World Wide Web
- At its most basic, the World-Wide Web is a
client-server application based on a protocol
named the HyperText Transfer Protocol (HTTP). - A web server is a connection-oriented server that
implements the HTTP. By default, an HTTP server
runs at the well-known port 80. - A user runs a World-Wide Web client (sometimes
referred to as a browser) on a local computer.
The client interacts with a web server according
to the HTTP, specifying a document to be fetched.
If the document is located by the server in its
directory, the documents contents is returned to
the client, which presents the it to the user.
6The Hypertext Markup Language (HTML)
- HTML is a markup language used to create
documents that can be retrieved using the World
Web Web. - HTML is based on SGML, with semantics that are
appropriate for representing information of a
wide range of types. HTML markup can represent
hypertext news, mail, documentation, and
hypermedia menus of options database query
results simple structured documents with
in-lined graphics and hypertext views of
existing bodies of information.
7HTML
- ltHTMLgt
- ltHEADgt
- ltTITLEgtA Sample Web Pagelt/TITLEgt
- lt/HEADgt
- ltHRgt
- ltBODYgt
- ltcentergt
- ltH1gtMy Home Pagelt/H1gt
- ltIMG SRC"/images/myPhoto.gif"gt
- ltbgtWelcome to Kelly's page!lt/bgt
- ltpgt
- lt! A list of hyperlinks follows.gt
- lta href"/doc/myResume.html"gt My resumelt/agt.
- ltpgt
- lta href"http//www.someUniversity.edu/"gtMy
universityltagt - lt/centergt
- ltHRgt
- lt/BODYgt
8The Extensible Markup Language XML
- Whereas HTML is a language that allows a document
to be marked up for the presentation or display
of the information contained in a document, XML
allows a document to be marked up for structured
information. - Also based on SGML, XML uses tags to describe the
information contained in a document. - ltmessagegt
- lttogtyou_at_yourAddress.comlt/togt
- ltfromgtme_at_myAddress.comlt/fromgt
- ltsubjectgtThis is a messagelt/subjectgt
- lttextgt
- Hello world!
- lt/textgt
- lt/messagegt
9HTTP
10The HyperText Transfer Protocol (HTTP)
- Originally conceived for fetching and displaying
text files, HTTP has been extended to allow the
transfering of web contents of virtually
unlimited types. - The first version of HTTP, HTTP/0.9, was a simple
protocol for raw data transfer. - The most widely used HTTP version is HTTP/1.0,
which has a draft proposed by Tim Berners
Lee13, but has no formal specification,
although its common usage'' is described in
RFC19458. - Since then, an improved protocol, known as
HTTP/1.1, has been developed and often adopted.
HTTP/1.1 is a far more extensive protocol than
HTTP/1.0. However, the basics of the protocol is
well represened in the simpler HTTP/1.0.
11The HyperText Transfer Protocol (HTTP)
- HTTP is a connection-oriented, stateless,
request-response protocol. - An HTTP server, or web server, runs on TCP port
80 by default. - HTTP clients, colloquially called web browsers,
are processes which implements HTTP to interacts
with a web server to retrieve documents phrased
in HTML, whose contents are displayed according
to the documents markups.
12The HyperText Transfer Protocol (HTTP)
- In HTTP/1.0, each connection allows only one
round of request-response. - A client obtains a connection, issues a request
- The server processes the request, issues a
response, and closes the connection thereafter.
13The HyperText Transfer Protocol (HTTP)
- HTTP is text-based the request and responses are
character strings. - Each request and response is composed of these
parts, in order - The request/response line
- A header section
- A blank line
- The body
14A sample HTTP session
15The HTTP request
- A client request is sent to the server after the
client has established a connection to the
server. - A request line is of the following form
- ltHTTP methodgtltspacegtltRequest-URIgtltspacegtltprotocol
specificationgt\r\n - where
- ltHTTP methodgt is the name of a method defined for
the protocol, - ltRequest-URIgt is the URI of a web document, or,
more generally, a web object, - ltprotocol specificationgt is a specification of
the protocol observed by the client, and - ltspacegt is a space character.
- An example client request is as follows
- GET /index.html HTTP/1.0
- Â
16HTTP Methods in a client request
- The HTTP method in a client request is a reserved
word (in uppercase) which specifies an operation
of the server that the client desires. - Some of the key client request methods are
listed below - GET for retrieving the contents of web object
referenced by the specified URI - HEAD for retrieving a header from the server
only, not the object itself. - POST used to send data to a process on the
server host. - PUT used to request the server to store the
contents enclosed with the request to the
server machine in the file location specified by
the URI.
17The Request Header
- The request header fields allow the client to
pass additional information about the request,
and about the client itself, to the server. These
fields act as request modifiers, with semantics
equivalent to the parameters on a programming
language method (procedure) invocation. - A header is composed of one or more lines, each
line in the form of - ltkeywordgt ltvaluegt\r\n
18The Request Header
- Some of the keywords and values that may appear
in a request header are - Accept content types acceptable by the client
- User-Agent specifies the type of browser
- Connection Keep-Alive can be specified so that
the server does not immediately close a
connection after sending a response. - Host host name of the server
- An example request header is as follows
- Accept /
- Connection Keep-Alive
- Host www.someU.edu
- User-Agent Generic
19Request Body
- A request optionally ends with a request body,
which contains data that needs to be transferred
to the server in association with the request. - For example, if the POST method is specified in
the request line, then the body contains data to
be passed to the target process. (This is an
important feature and will become clearer when we
discuss CGI, servlet, and SOAP.)
20Examples of a complete client request
- Example1
- GET / HTTP/1.1
- ltblank linegt
- Â
- Example2
- HEAD / HTTP/1.1
- Accept /
- Connection Keep-Alive
- Host somehost.com
- User-Agent Generic
- ltblank linegt
- Â
21Examples of a complete client request
- Example3
- POST /servlet/myServer.servlet HTTP/1.0
- Accept /
- Connection Keep-Alive
- Host somehost.com
- User-Agent Generic
- ltblank linegt
- Namedonaldemaildonald_at_someU.edu
- Â
- Â
22The HTTP Server Response
- In response to a request received from a client,
the HTTP server sends to it a response. - Like the request, an HTTP response is composed of
these parts, in order - 1. The response or status line
- 2.   A header section
- 3.   A blank line
- 4.   The body
23The response status line
- The status line is in the form of
- ltprotocolgtltspgtltstatus-codegtltspgtltdescriptiongt\r\n
- The status code designations are as follows
- 100-199 Informational
- 200-299 Client request successful
- 300-399 Client request redirected
- 400-499 Client request incomplete
- 500-599 Server errors
- Example 1
- HTTP/1.0 200 OK
- Example 2
- HTTP/1.1 404 NOT FOUND
- Â
- Â
24HTTP Response Header
- The status line is followed by a response header.
A response header is composed of one or more
lines, each line in the form of - ltkeywordgt ltvaluegt\r\n
- There are two types of response header lines
- Response header lines
- Entity header lines
25HTTP Response Header
- Response header lines these header lines
return information about the response, the
server, and further access to the resource
requested, as follows - Age seconds
- Location URI
- Retry-After dateseconds
- Server string
- WWW-Authenticate scheme realm
- Â
26HTTP Response Header
- Entity header lines these header lines contain
information about the contents of the object
requested by the client, as follows - Content-Encoding
- Content-Length
- Content-Type type/subtype (see MIME)
- Expires date
- Last-Modified date
- Â
27HTTP Response Header
- An Example response header is as follows
- Date Mon, 30 Oct 2000 185208 GMT
- Server Apache/1.3.9 (Unix) ApacheJServ/1.0
- Last-modified Mon, 17 June 2001 164513 GMT
- Content-Length 1255
- Connection close
- Content-Type text/html
- The Content-Type specifies the type of the data,
using the contents type designation of the MIME
protocol. - The Content-Encoding specifies the encoding
scheme (such as uuencode or base64) of the data,
usually for the purpose of data compression. - The expiration date gives the date/time
(specified in a format defined with HTTP)after
which the web object should be considered stale - The Last-Modifed date specifies the date that the
object was last modified.
28HTTP Response Body
- The body of the response follows the header and
a blank line, and contains the contents of the
web object requested. - HTTP/1.1 200 OK
- Date Sat, 15 Sep 2001 065530 GMT
- Server Apache/1.3.9 (Unix) ApacheJServ/1.0
- Last-Modified Mon, 30 Apr 2001 230236 GMT
- ETag "5b381-ec-3aedef0c"
- Accept-Ranges bytes
- Content-Length 236
- Connection close
- Content-Type text/html
- Â
- lthtmlgt
- ltheadgt
- lttitlegtMy web page lt/titlegt
- lt/headgt
- ltbodygt
- Hello world!
- lt/BODYgtlt/HTMLgt
29Content Type MIME Protocol
30Content Type and the Mime Protocol
- One of the header lines returned in a server
response is the Contents Type of the object
requested. - Specification of the contents type follows the
scheme established in a protocol known as MIME
(Multipurpose Internet Mail Extension.) - Originally used for Email, MIME is now widely
used for describing the content of a document
sent over a network. - It supports a large number and evolving set of
predefined content types, specified in the format
Type/Subtype.
31The Mime Protocol
- A small subset of the types and subtypes are
32Simple implementations of an HTTP Client
33A Basic HTTP Client implememtation
- InetAddress host
- InetAddress.getByName(args0)
- int port Integer.parseInt(args1)
- String fileName args2.trim()
- String request
- "GET " fileName " HTTP/1.0\n\n"
- MyStreamSocket mySocket
- new MyStreamSocket(host, port)
- mySocket.sendMessage(request)
- // now receive the response from the HTTP
server - String response mySocket.receiveMessage()
- // read and display one line at a time
- while (response ! null)
- System.out.println(response)
- response mySocket.receiveMessage()
-
-
34The Java URL Class
- The Java API provides a class called URL
specifically for retrieving the data from a web
object identified using a URI.
35The URLBrowser
- String host args0
- String port args1.trim()
- String fileName args2.trim()
- String HTTPString
- "http//"host""port"/"fileName
- URL theURL new URL(HTTPString)
- InputStream inStream
theURL.openStream( ) - BufferedReader input
- new BufferedReader
- (new InputStreamReader(inStream))
- String response input.readLine()
- // read and display one line at a
time - while (response ! null)
- System.out.println(response)
- response input.readLine()
- //end while
36Characteristics of HTTP
37HTTP is a Connection-Oriented Protocol
- With HTTP1.0, a connection to a server is
automatically closed as soon as the server
returns a response. Thus exactly one round of
exchange is allowed between a client and a web
server if a client needs to contact the same
server in one session, it must reconnect to the
server to reissue another request.
38HTTP is a Connection-Oriented Protocol
- The scheme is adequate for the original intent
of HTTP for retrieving simple network documents.
- It is inefficient for documents such as those
that contain a large number of links to image
objects to be fetched by the server, since
fetching each of these links require a
reestablishment of a connection. - It is also insufficient fors ophisticated web
applications based on HTTP (such as shopping
carts).
39HTTP is a stateless Protocol
- HTTP 1.0 (as well as version 1.1) is also a
stateless protocol the server does not maintain
any state information on a clients session.
Regardless of whether the connection is kept
alive, each request is handled by a server as a
new request. As with non-persistent connectons
originally in practice with HTTP, a stateless
protocol is adequate for the original intent of
the protocol, but not so for the more complex
applications for which HTTP has been extended,
the next topic that we will study.
40HTTP is a Connection-Oriented Protocol
- HTTP1.0 was extended to allow a request header
line Connection Keep-Alive to be issued by a
client who wishes to maintain a persistent
connection with the server a cooperating server
will keep the connection open after sending a
response. In HTTP/1.1, connections are persistent
by default. Such a connection allows multiple
requests to be send over the same TCP connection.
41Dynamically generated web contents
42Dynamically-generated Web Contents
- In the beginning, HTTP was employed to transfer
static contents, that is, contents that exist in
a constant state, such as a plain text file or an
image file. - As the web evolved, applications began to use
HTTP for a purpose not originally intended an
application which allows a browser user to
retrieve data based on dynamic information
entered during an HTTP session.
43Dynamicly-generated Web Contents
- A typical web application, such as a shopping
cart, requires fetching remote data based on data
entered by a client at runtime. - For example, an enterprise application typically
allows a user to key in data, which is then used
to formulate a query to retreive data from a
database, and the outcome is displayed to the
user. - Applied to the web, it is desirable to allow a
client to submit data during a web session to
retrieve data from the web server host, to be
displayed by the web browser
44Dynamically-generated Web Contents
- A generic HTTP server does not possess the
application logic for fetching the data from the
data source. - Instead, an external process that has the
application logic will serve as an intermediary. - The external process runs on the server host,
accepts input data from the web server, exercises
its application logic to obtain data from the
data source, returns the outcome to the web
server, which transmits the outcome to the
client.
45Dynamically-generated Web Contents
- The first widely adopted protocol to augment HTTP
in supporting run-time generated web contents is
the Common Gateway Interface (CGI) protocol. - Although rudimentary by comparison, CGI is the
predecessor of more sophisticated protocols and
facilities (such the Java Servlet) that serve
similar purpose. - The understanding of CGI and some of its
supplementary protocols is important in that it
prepares us for the understanding of more
advanced protocols and facilities.
46The Common Gateway Interface (CGI) Protocol
47Common Gateway Interface (CGI)
- The Common Gateway Interface (CGI) is a standard
for providing an interface, or a gateway, between
an information server and an external process
(that is, a process external to the server). - Using the protocol, a web client may specify a
program, known as a CGI script, as the target web
object in an HTTP request. - The web server fetches the CGI script, activates
it as a process, passing to the process input
data transmitted by the web client. The web
script executes and transmits its output to the
web server, which returns the web-script
generated data as the body of a response to the
web client. - Â
- Â
48CGI - 2
- An HTTP request may specify a CGI program, or CGI
script. - A CGI program can be written in
- Programming languages C. Ada, C, Fortran such
a program needs to be compiled to generate an
executable. - Script languages such as Perl, Tkl, cobra, such a
program, referred to as a CGI script, requires
the appropriate language interpreter to be
present at the server host. - Commonly used for processing user input from HTML
forms, and subsequently composing a web page sent
as part of the server response.
49CGI Program - 3
- When a web server receives a request whose URI
specifies a web program, the web server initiates
the execution of the web program. - The web program formulates its output in HTML,
which is sent to the server and forwarded to the
web client as the HTTP response.
50CGI program
51Action field in a web page
- A web script can be specified in an action field
of a web page. When the web page is submitted,
an HTTP request is issued by the browser
specifying the web script as the URI - ltHTMLgt
- ltHEADgt
- ltTITLEgtA Simple Web Page which illustrates
CGIlt/TITLEgt - lt/HEADgt
- ltBODYgt
- ltFORM ACTION"Hello.cgi"gt
- ltCENTERgt
- Click on the SUBMIT button to activate
- the CGI script Hello.cgiltbrgt
- ltINPUT TYPE"Submit" NAME"submit"
VALUE"SUBMIT"gt - lt/CENTERgt
- lt/FORMgt
- lt/BODYgt
- lt/HTMLgt
52Common Gateway Interface (CGI)
53A sample web page (hello.html) which invokes a
CGI script
- ltHTMLgt
- ltHEADgt
- ltTITLEgtA web page which invokes a web
scriptlt/TITLEgt - lt/HEADgt
- ltBODYgt
- ltH1gtThis web page illustrates the use of a web
scriptlt/H1gt - ltPgt
- ltBRgt
- The script or program is either a run-script
written in a - script language such as Perl, or an executable
generated - from a source program written in a language such
as C/C. - lt/Pgt
- ltHRgt
- ltFORM METHOD"post" ACTION"hello.cgi"gt
- ltHRgt
- Press ltinput type"submit" value"here"gt to
submit your query. - lt/FORMgt
- ltHRgt
- lt/BODYgt
54A sample web script hello.c
- /
- This C program is for a CGI script which
generates - the output for a web page. When displayed by
a - browser, the message "Hello there!" will be
shown - in blue.
- /
- include ltstdio.hgt
- Â
- main(int argc, char argv)
- printf("Content-type text/htmlcc",10,10)
- printf("ltfont color bluegt")
- printf("ltH1gtHello there!lt/H1gt")
- printf("lt/fontgt")
55A sample web script hello.pl
- !/usr/local/bin/perl
- A simple Perl CGI script
- print "Content-type text/html\n\n"
- print "ltheadgt\n"
- print "lttitlegtHello, Worldlt/titlegt\n"
- print "lt/headgt\n"
- print "ltbodygt\n"
- print "ltfont color bluegt\n"
- print "lth1gtHello, Worldlt/h1gt\n"
- print "lt/fontgt\n"
- print "lt/bodygt\n"
56Web forms
57A Web Form
- You may have noticed that the hello example
presented does not make use of any user input,
and the contents of the dynamically generated web
page is predeterminable. This is because the
example is provided as an overview of the CGI
protocol. - In practice, a CGI script is typically invoked
by a special kind of web page known as a web
form, to be described in the next section, which
accepts input at run time, and invokes a CGI
script which makes use of such input. We will
next look at the the CGI
58A web form
- A web form is a special kind of web page which
- provides a graphical user interface that prompts
input data from a user - invokes the execution of an external program on
the web server host, when a submit button on the
page is pressed by the user - See form.html
59A web form
- The code that generates a web form is enclosed
between the HTML tags ltFORMgt ... lt/FORMgt - Within the ltFORMgt tag attributes can be
specified to provide additional information
related to the CGI protocol, including - ACTIONlta character string containing the
absolute or relative URL of the identification of
the external program which is to be initiated by
the web server when the form is submittedgt - METHODlta reserved word, POST or GET, which
specifies the manner that the external program
expects to receive from the web server the
collection of data submitted by the user, called
the query data.gt - FORM METHOD"post" ACTION"form.cgi
60A web form
- In the coding for the form, each of the input
items (also called an input elements) has a NAME
tag. - For each of these items, the browser user enters
or selects a value. - What is thy NAME ltINPUT NAMEname"gtltPgt
- What is thy favorite color
- ltSELECT NAME"color"gt
- The collection of the data for the input items is
a character string, called a query string, of
namevalue pairs separated by the character. - nameJohn20Chencolorred
- Each namevalue pair is encoded using
URL-encoding, so that some unsafe characters
(such as spaces,quotes, , and ) are mapped to a
hexadecimal representation. - For example, the value string
- The return is gt17 is encoded as
The20return20is203E1725. - Â
61A Web Form Query String
- An example of a query string for the example form
is - nameJohn20Doequestpeace20on20earthcolorazu
re - swallowcontinentaltextThe20return20is203E1
725 (all on one line) - The collection of the data into a query string,
including the encoding of the values, is
performed by the browser. - When the form is submitted by the user, the
query string is passed to the server in the HTTP
request, in a manner depending on the FORM METHOD
specified in the form. The query string is then
forwarded by the server to the external program.
62Web Form Query String Processing
- Based on the form input, the browser assembles
the query string. - The string is transmitted to the web server,
which in turn passes it on to the external
program (the CGI script named in the form). - The manner that the string is transmitted depends
on the specification of the FORM METHOD in the
web form.
63FORM GET Method browser to server
- If GET is specified with the FORM METHOD tag,
the query string is transmitted to the server in
a HTTP request with a GET method line. - ltFORM METHODget" ACTIONgetForm.cgi"gt
- Recall that an HTTP GET request specifies a URI
for the web object requested by the client. To
accommodate the query string, the syntax for the
URI specification was extended to allow the
attachment of the query string to the end of the
URI (for the CGI script), delimited by the ?
character, as, for example - GET /cgi/getForm.cgi?nameJohn20Doequestpeace
HTTP/1.0 - Since the length of the GET Request-URI line is
limited, the length of the query string that can
be appended in this manner is also limited.
Hence this method is not suitable if the form
needs to send a large amount of data, such as
data in a text box.
64Form GET method server to external program
- The server invokes the CGI script and passes on
the query string that it received from the
browser, as appended to the URI in the HTTP
request. - The CGI program, or the external program in
general, will receive the encoded form input in
an environment variable called QUERY_STRING. - Environment variables are variables maintained by
the operating system of the server host. - The CGI program retrieves the query string from
the environment variable, decode the character
string to obtain the name-value pairs, and uses
the parameters during the execution of the
program to generate output phrased in HTML.
65The getForm example
- See
- getForm.html
- getForm.c
66(No Transcript)
67(No Transcript)
68- gttelnet www 80
- Trying 129.65.241.7...
- Connected to tiedye-srv.csc.calpoly.edu.
- Escape character is ''.
- GET /mliu/form/getForm.cgi?nameDonald HTTP/1.0
- HTTP/1.1 200 OK
- Date Sun, 24 Feb 2002 223055 GMT
- Server Apache/1.3.9 (Unix) ApacheJServ/1.0
- Connection close
- Content-Type text/html
- ltbody bgcolor"CCFFCC"gtlth2gtThis page is
generated dynamically by getForm.cgi.lt/ - h2gtltH1gtQuery Resultslt/H1gtYou submitted the
following name/value pairsltpgt - ltulgt
- ltligt ltcodegtname Donaldlt/codegt
- lt/bodygtlt/htmlgtConnection closed by foreign host.
69FORM POST Method browser to server
- If POST is specified with the FORM METHOD tag,
the query string is transmitted to the server in
a HTTP request with a POST method line previous
described. - ltFORM METHODpost" ACTIONpostForm.cgi"gt
- Recall that an HTTP POST request is followed by a
request body, which holds text contents to be
sent to the server. Using the POST METHOD, the
URI of the CGI script is specified with the POST
request line, followed by the request header, a
blank line, then the query string, as, for
example - POST /cgi/postForm.cgi HTTP/1.0
- Accept /
- Connection Keep-Alive
- Host myHost.someU.edu
- User-Agent Generic
- Â
- nameJohn20Doequestpeace20on20earthcolorazu
re - Since the length of the request body is
unlimited, the query string can be of arbitrary
length. Hence the POST method can be used to
send any amount of query data to the server.
70Form POST method server to external program
- The server invokes the CGI script and passes on
the query string that it received from the
browser via the request body. - The CGI program, or the external program in
general, will receive the encoded form input on
the standard input. - The server will NOT send you an EOF on the end
of the data, instead you should use the
environment variable CONTENT_LENGTH to determine
how much data you should read from (the standard
input). - The CGI program reads the query string from the
standard input, decode the character string to
obtain the name-value pairs, and uses the
parameters during the execution of the program to
generate output phrased in HTML.
71The postForm example
- See
- postForm.html
- postForm.c
72(No Transcript)
73(No Transcript)
74- gttelnet www 80
- Trying 129.65.241.7...
- Connected to tiedye-srv.csc.calpoly.edu.
- Escape character is ''.
- POST /mliu/form/postForm.cgi HTTP/1.0
- Content-type application/x-www-form-urlencoded
- Content-length 11
- nameDonald
- HTTP/1.1 200 OK
- Date Sun, 24 Feb 2002 225233 GMT
- Server Apache/1.3.9 (Unix) ApacheJServ/1.0
- Connection close
- Content-Type text/html
- ltbody bgcolor"FFFF99"gtltH1gtQuery Resultslt/H1gtYou
submitted the following name/v - alue pairsltpgt
- ltulgt
- ltligt ltcodegtname Donaldlt/codegt
75Encoding and decoding query strings
- Whether a query string is obtained from the
QUERY_STRING environment variable, or from the
standard input, the CGI program must decode the
string and extract the name-value pairs from it,
so that the parameters may be used for the
programs execution. - Due to the popularity of CGI programs, there are
a number of existing libraries or classes that
provide routines(functions) and methods for this
purpose. For example, Perl has easy-to-use
procedures in a library called CGI-lib for the
decoding and for extracting the name-value pairs
into a data structure called an associative
array and NCSA provides a library of C routines
for the same purpose. - See getForm.c, postform.c
76Environment Variables used with CGI
- An environment variable defines is a parameter of
a user's working environment on a computer
system, such as the default directory path for
the system to locate a program invoked by the
user. On a computer system, environment
variables are used across multiple languages and
operating systems to provide information to
applications that may be specific to a user. - CGI uses environment variables that are set by
the HTTP server to pass information about
requests from the server to the external program
(CGI script).
77Environment Variables used with CGI
- Some of the key environment variables related to
CGI are listed below - REQUEST_METHOD The method with which the
request was made. For CGI, this is "GET" or
"POST". - QUERY_STRING If the GET method was specified
in the form, this variable contains a character
string for the form data. - CONTENT_TYPE the content type of the data,
which should be application/x-www-form-urlencode
d for a query string - CONTENT_LENGTH The length of the query
string.
78Web Session State Data
79Web Session and session state data
- During a session of a web application such as a
shopping cart, several HTTP requests are issued,
each of which invokes an external program such as
a CGI script.
80Web Session and session state data
- Note that in our example it is necessary for the
second CGI script, form2.cgi, to have knowledge
of value of the data item id in the query string
sent to the first CGI script, form1.cgi. - However, the two web scripts are two separate
programs and are executed by the web server
independently. - Data that needs to be shared among CGI scripts
invoked successively during a web session are
called session state data. - There is no provision in HTTP nor CGI to allow
for such sharing, as both of these protocols are
stateless and do not support the notion of a
session.
81Session Data Sharing Mechanisms
- Because of the popularity of Internet
applications, a variety of mechanisms have
emerged to allow the sharing of session data
among CGI scripts (and other external programs). - These mechansims can be classified as follows
- Server-side facilities
- Client-side facilities
82Server-side facilities for session state data
- secondary storage (file or database) on the
server host may be used as a repository of
session state data - software objects which may be employed as state
data repository java beans, session objects,
application context state data objects.
83Client-side facilities for session state data
- An ingenious idea for maintaining session state
data is to maintain the data through the web
client. - Since each session is associated with a single
client, this scheme allows the state data to be
maintained in a decentralized fashion. - Specifically, the scheme allows the state data to
be passed from a web script to the web client,
which passes the data to a subsequent web script.
The data passing can be repeated throughout the
duration of the web session.
84Client-side facilities for session state data
- Two schemes which makes use of client-side
facilities to maintain session data - HIDDEN FORM fields this scheme embeds session
state data in dynamically generated web forms - cookies this mechanism uses transient or
persistent storage on the client host to hold
state data, which is passed in the HTTP request
header to web scripts that require the data.
85Maintaining state data using hidden form fields
86Using HIDDEN FORM Fields
- A hidden form field or a hidden field is an INPUT
element in a web form specified with
TYPEHIDDEN'. - Unlike other other INPUT elements, a hidden
field is not displayed by the browser and
requires no input. Rather, the value of the
element is the VALUE attribute specified with the
field, and the name-value of the field is
collected by the browser, along with the
name-value pairs of other INPUT elements, in the
query string when the form is submitted.
87Using HIDDEN FORM Fields
88Using HIDDEN FORM Fields
- The first web script form.cgi generates the
element - ltinput typehidden nameid
value"l2345"gt - in the dynamically generated form2.html.
- form2.html, when presented by the browser, will
not display this field, but another input field
is displayed which prompts for a purchase. - When form2.html is submitted, the query string
id12345buytv is sent to the second web form,
form2.cgi. - When the query string is decoded, the value of
the state data item id becomes available to
form2.cgi.
89Using HIDDEN FORM Fields
90Using HIDDEN FORM Fields
- The hidden field is a rudimentary scheme for
maintaining session data. It has the merit of
simplicity, requiring only the introduction of a
new form field element and no additional
resources on either the server-side or the
client-side. - In the scheme, the HTTP client becomes a
temporary repository for the state information,
and the session data is sent using the normal
mechanisms for transmitting query strings. - The simplicity of the scheme comes at the cost a
security risk, in the sense that the state data
transmitted using hidden form field is
unprotected.
91Using HIDDEN FORM Fields
- Although a hidden input element is not displayed
by the browser, it is embedded in the source code
of the dynamically generated web page form2.html,
which is plainly viewable by any browser user who
exercises the view-source capability provided by
the user. Hence the scheme allows data to become
exposed, and therefore poses a security risk. - Hidden fields should not be used to transmit
sensitive data such as an identification or
account balances.
92Example code of using hidden fields to pass state
data
- See files in the CGI/hiddenFields folder of the
program samples - Form.html
- hiddenForm.c
- hiddenForm2.c
93Maintaining state data using cookies
94Using cookies for state data
- A more sophisticated scheme for session state
data repository on the client side is a mechanism
known as a cookie, for no compelling reason. - The scheme makes use of an extension of the basic
HTTP to allow a servers response to contain a
piece of state information for which the client
will provide storage in an object. - Included in that state object is a description
of the range of URLs for which that state is
valid. Any future HTTP requests made by the
client which fall in that range will include a
transmittal of the current value of the state
object from the client back to the server.
95Using cookies for state data
- A CGI script creates a cookie by including a
Set-Cookie header line as part of the HTTP
response that it outputs. - Each cookie contains a URL-encoded name-value
pair, similar to a name-value pair in a query
string, for a state data item (for example,
id12345). When the response is received by the
browser, it creates an object (a cookie) which
contains the name-value pair. - The cookie is sent as a request header line in
each subsequent request sent by the browser to
the web server, which appends the name-value pair
to the query string being sent to a web script.
96Using cookies for state data
97Using cookies for state data
98Syntax of the Set-Cookie HTTP Response Header Line
- The core syntax of the set-cookie header line is
a string in the following format (keywords are
listed in bold) - Set-Cookie NAMEVALUE expiresDATE
- pathPATH domainDOMAIN_NAME secure
- The line starts with the keyword Set-Cookie and
the delimiter colon (), followed by a list of
attributes separated by semi-colons. The
attributes are explained as follows
99Syntax of the Set-Cookie HTTP Response Header
Line
- NAMEVALUE
- URL-encoded name-value pair for the state data
to be stored in the cookie created. This is the
only required attribute on the Set-Cookie header
line.
100Syntax of the Set-Cookie HTTP Response Header
Line
- expiresDATE
- The expires attribute specifies a date string
that defines the valid life time of that cookie.
Once the expiration date has been reached, the
client host is free to deallocate the cookie and
the state data contain in the cookie can no
longer be assumed to be sent to the server. - The date string is formatted as
- Wdy, DD-Mon-YYYY HHMMSS GMT
- The time format is based on RFC 822, RFC 850, RFC
1036, and RFC 1123, with the variations that the
only legal time zone is GMT and the separators
between the elements of the date must be dashes. - expires is an optional attribute. If not
specified, the cookie will expire when the user's
session ends.
101Syntax of the Set-Cookie HTTP Response Header
Line
- domainDOMAIN_NAME
- This attribtues sets the domain for the cookie
created. - Among the cookies stored on the client host, a
browser is supposed to send only cookies whose
domain attributes of the cookie is made with the
Internet domain name of the host name specified
in the URI of the object in the HTTP request
(with which the cookie is sent). - If there is a tail match, then the cookie will
go through path matching to see if it should be
sent. "Tail matching" means that the domain
attribute is matched against the tail of the
fully qualified domain name in the URI.
102Syntax of the Set-Cookie HTTP Response Header Line
- For example
- A domain attribute of "acme.com" would match
host names - "anvil.acme.com"
- as well as
- "shipping.crate.acme.com
- so that the name-value pair in the cookie
tagged with the domain attribute of acme.com
will be sent with a HTTP request where the
requested object has a URI containing the host
name - anvil.acme.com (such as anvil.acme.com/index.html
) - or
- shipping.crate.acme.com (such as
shipping.crate.acme.com/sales/shop.htm).
103Syntax of the Set-Cookie HTTP Response Header Line
- The default value of domain is the host name of
the server which generated the cookie response.
- For example, if the server is www.someU.edu,
then, if no domain attribute is set with a
cookie, then the cookies domain is
www.someU.edu.
104Syntax of the Set-Cookie HTTP Response Header Line
- pathPATH
- The path attribute is used to specify the subset
of URIs in a domain for which the cookie is
valid. - If a cookie has already passed the domain
matching, then the pathname component of the URI
is compared with the path attribute, and if there
is a match, the cookie is considered valid and is
sent along with the HTTP request. The path "/foo"
would match "/foobar" and "/foo/bar.html". The
path "/" is the most general path. - If the path is not specified, it as assumed to be
the same path as the document being described by
the header which contains the cookie.
105The path attribute in set cookie
106The secure attribute in set cookie
- secure
- If a cookie is marked secure, it will only be
transmitted if the communications channel with
the host is a secure one. Currently this means
that secure cookies will only be sent to HTTPS
(HTTP over SSL) servers. - If secure is not specified, a cookie is
considered safe to be sent in the clear over
unsecured channels.
107How cookies are passed from the browser to the
server
- When requesting a URL from an HTTP server, the
browser will match the URI against all cookies
stored on the client host. - If any matching cookie is found, then a line
containing the name/value pairs of all matching
cookies will be included in the HTTP request.
The format of the line is - Cookie NAME1VALUE1 NAME2VALUE2 ...
NAMEnVALUEn
108How cookies are passed from the browser to the
server
- Cookie NAME1VALUE1 NAME2VALUE2 ...
NAMEnVALUEn - When such a line is encountered by the HTTP
server in the request header, the server extracts
the substrings containing the name-value pairs
from the line and place the string in an
environment variable named HTTP_COOKIE. - When the CGI script is executed, it may retreive
the state data, as name-value pairs, from the
environment variable HTTP_COOKIE.
109How cookies are passed from the browser to the
server
- Example
- If the following request is sent to the server
- GET /cgi/hello.cgi?nameJohnquestpeace HTTP/1.0
- Cookie age25
- ltblank linegt
- then the server will place the string
nameJohnquestpeace in the environment
variable QUERY_STRING and the string age25 in
HTTP_COOKIE for the invoked CGI script.
110How cookies are passed from the browser to the
server
- Example
- If a request sent to a server is
- POST /cgi/hello.cgi HTTP/1.0
- Cookie age25
- ltblank linegt
- nameJohnquestpeace
- then the string nameJohnquestpeace will be
sent by the server to the standard input of the
CGI script, while the string age25 will be
placed in the environment variable HTTP_COOKIE.
111How cookies are passed from the browser to the
server
- The domain and path attributes for the cookies
are designed to allow state data to be shared
among selective CGI scripts. - Two transaction examples to follow.
112First Example transaction sequence
- Client requests a document, and receives in the
response - Set-Cookie CUSTOMERWILE_E_COYOTE path/
expiresWednesday, 09-Nov-99 231240 GMT - When client requests a URL in path "/" on this
server, it sends - Cookie CUSTOMERWILE_E_COYOTE
- Client requests a document, and receives in the
response - Set-Cookie PART_NUMBERROCKET_LAUNCHER_0001
path/ - When client requests a URL in path "/" on this
server, it sends - Cookie CUSTOMERWILE_E_COYOTE
PART_NUMBERROCKET_LAUNCHER_0001 - Client receives
- Set-Cookie SHIPPINGFEDEX path/foo
- When client requests a URL in path "/" on this
server, it sends - Cookie CUSTOMERWILE_E_COYOTE
PART_NUMBERROCKET_LAUNCHER_0001 - When client requests a URL in path "/foo" on
this server, it sends - Cookie CUSTOMERWILE_E_COYOTE
PART_NUMBERROCKET_LAUNCHER_0001 SHIPPINGFEDEX
113Second Example transaction sequence
- Client receives
- Set-Cookie PART_NUMBERROCKET_LAUNCHER_0001
path/ When client requests a URL in path "/" on
this server, it sends - Cookie PART_NUMBERROCKET_LAUNCHER_0001
- Client receives
- Set-Cookie PART_NUMBERRIDING_ROCKET_0023
path/ammo When client requests a URL in path
"/ammo" on this server, it sends - Cookie PART_NUMBERRIDING_ROCKET_0023
PART_NUMBERROCKET_LAUNCHER_0001 - NOTE There are two name/value pairs named
"PART_NUMBER" since there are two cookies that
match the path attribute the "/" and "/ammo".
114A sample set of CGI script which make use of
cookies
- Cookie/Cookie.html
- Cookie/Cookie.c
- Cookie/Cookie2.c
115Summary - 1
- You have been introduced to Internet
applications and the key protocols that support
them. - The Hypertext Markup Language (HTML) is a markup
language used to create documents that can be
retrieved using the World Web Web. - The XML(Extensible Markup Language) allows a
document to be marked up for structured
information.
116Summary - 2
- The HTTP (HyperText Hyperlink Protocol) is the
transport protocol on the web - It allows the transferring of web contents of
virtually unlimited types - It is a connection-oriented, stateless,
request-response protocol - In HTTP/1.0, each connection allows only one
round of request-response - HTTP is text-based the request and responses are
character strings - Each HTTP request and response is composed of
four parts The request/response line a header
section a blank line the body
117Summary - 3
- The Common Gateway Interface (CGI) protocol is a
protocol to augment HTTP in supporting run-time
generated web. - Using the protocol, a web client may specify an
external program, known as a CGI script, as the
target web object in an HTTP request. - When requested, the web server fetches the CGI
script, activates it as a process, passing to the
process input data transmitted by the web client. - The web script executes and transmits its output
to the web server, which returns the web-script
generated data as the body of a response to the
web client.
118Summary - 4
- A web form is a special kind of web page which
(i) provides a graphical user interface that
prompts input data from a user, and, (ii) when a
submit button on the page is pressed by the user,
invokes the execution of an external program on
the web server host. - The input data is gathered in a query string,
which is sent to a web script.
119Summary - 5
- To allow session data to be shared among the web
scripts invoked during a web session, there are a
number of mechanisms - Server-side facilities files, database, and
others. - Client-side hidden-form tags and cookies
- The use of hidden-form tags and cookies raises
privacy and security concerns.