Title: HTTP, HTML , and the Web
1HTTP, HTML , and the Web
2Basic Ideas of the Web
- Certain machines on the internet have direct
access to (usually on their disks) a large number
of documents of varying formats but with
content others are interested in. Call these web
server machines. Document formats are identified
by file name suffixes like file.doc or
file.html - Goal is to allow users on remote computers across
an internet to see these documents displayed
according to their format. - To accomplish User program on users remote host
becomes a client of a web server must contact
the web server - User program has to somehow identify which web
server machine it wants to get to and which
document on that machine it wants. - Then it has to set up a connection to that
machine, ask for the document and wait till the
server machine sends it back a copy - Then the users program must display the document
according to its appropriate format. - Users application program that does all this is
called a web browser
3Web Browser-Server Model
Web Server Disk Storage Document resides here
3. Web Server retrieves Web Page from disk storage
Web Server Machine
Web Server Application
Internet (WWW)
2. Client Requests Web Page from Server (GET
request)
HTTP Spoken Here
4. Server downloads web page to client as body
part of reply to GET
Client Machine
Client Browser
Web Page
5. Browser visually renders document User sees
it!
1. User on client machine inputs request to see
document (web page) with a GIVEN URL
4Client/Server
- Uniform Resource Locator uniquely identifies
ltserver-machine,documentgt with layout - protocol//computerport/document
- http//clem.mscd.edu/evansell/MAINPOINTS.html
- http//clem.mscd.edu80/evansell/MAINPOINTS.html
- ftp//clem.mscd.edu21/evansell/MAINPOINTS.html
- Given a URL, client contacts server listed as
computer sets up a TCP connection - Requests document by sending a message on that
connection - Server replies sending back the document
requested then tears down connection - Transient connection
- Usually works well as users tend to get one page
and then browse elsewhere, or slowly
5Client/Server Continued
- If document has embedded images (or other stuff),
new connection for each embedded entity - Not the best
- Therefore some persistence has been added
- Can create persistent connections
- Negotiated at connect
6HTTP Protocol used from Browser to Web Server
- Sent of messages and rules for messaging between
web browsers (clients) and web servers - protocol - HyperText Transmission Protocol (HTTP) (RFC 2616)
Text-based protocol easy to read in the clear - Just a few simple messages
- GET item version CRLF request to read a general
object - HEAD (status) - A request to return the response
header only, without the content. This can
contain much useful information about the
requested entity, without the need to actually
load it -- eg, how big it is. - POST - Originally defined as a request to "append
to a named resource" (eg, a Web page), this
method is extensively used in CGI-based systems - PUT - Request to store an object (eg, Web page,
image, etc). Has only ever been used
experimentally.
7HTTP Get
- GET /index.html HTTP/1.0ltnewlinegtltnewlinegt
- The response from the server consists of a status
line, then a number of plain text headers,
followed by a blank line and then the requested
data object. - Request GET /index.html HTTP/1.0
- Response
- HTTP/1.0 200 OK Server Netscape-Enterprise/3.5.1C
- Date Sun, 16 Mar 2003 114839 GMT
- Content-type text/html
- Last-modified Fri, 14 Mar 2003 022252 GMT
- Content-length 11378
- lt!doctype html public "-//w3c//dtd html 4.0
transitional//en"gt - lthtmlgt
- ltheadgt
- ........(etc the actual text of the document
you asked to get)
8Server Response
9Hypertext Markup Language (HTML)
- What is hypermedia and hypertext? what is a
hyperlink? - Need a standard hypertext document format as a
standard - HTML is that standard it is a markup language
that uses tags to specify document content and
layout - Child of SGML, which is a meta-language that HTML
is defined in - lttaggt lt/taggt
10Details on HTML
- HTML is a markup language -- documents are (in
general) plain ASCII textfiles, with certain
characters reserved to denote markup. Such
languages have a long and venerable history in
computing (eg starting with roff, TeX, (see also
here), LaTeX, SGML and subsequently XML. - The structure (or, to a somewhat lesser extent,
the displayed appearance) of a HTML document (or
Web page) is described using embedded formatting
codes (or tags) intermingled with the information
in the document. - In HTML, the markup tags are delimited by the
special characters "lt" and "gt" -- the "less than"
and "greater than" characters, often (rather
clumsily IMHO) called "angle brackets". If either
of these characters must appear as part of the
actual data, they are written as lt and gt
respectively. - HTML introduced a uniform, and revolutionary, way
of specifying hyperlinks in a document, using the
ltA HREF"...some URL..."gtlink textlt/Agt
structure. This was revolutionary! - Modern HTML standards have evolved to support
incredibly complex document layouts (using the
ltTABLEgt markup, style sheets, client-side
scripting, etc), seemlessly mingling text and
graphics into what has become an entirely new
form of media.
11Simple HTML Example See it formatted
ltHTMLgt ltHEADgt ltTITLEgt Elliott's Web Page
lt/TITLEgt lt/HEADgt ltBODY BGCOLOR"bb00bb"
LINK"ffff00" ALINK"ff0000" VLINK"00ff00"gt ltH
1gt Elliott on the Web lt/H1gt ltULgt ltLIgt See my web
page at ltA HREF"http//clem.mscd.edu/evansell"gt
Metro State lt/Agt lt/LIgt ltLIgt My personal home
page link will be added here - later lt/LIgt ltLIgt
Interests Jai-lai, outdated technology, older
movies lt/LIgt lt/ULgt ltHRgt Email me at ltA
HREF"mailtoevansell_at_mscd.edu"gt
evansell_at_mscd.edu lt/Agt lt/BODYgt lt/HTMLgt
12Web Browsers and Server
- Server application fairly simple just receives
individual requests on temporary connections
responds and tears down connection - Usually multi-threaded
- HTTP server on port 80
- Web client browser much more complex
- Must be able to talk various protocols to get
documents from web servers http, ftp, etc. - Must be able to format and render documents at
least HTML type if not others - Must do caching to save on real time
- Must do other things like be able to handle
embedded documents or support bookmarks
(favorites)
13(No Transcript)
14Cache
- Browsers cache documents for faster repeat access
- Save a copy on the local disk
- Faster than going through network
- Files kept for awhile
- Under user control
- Client and server interact for document
refreshing - Server can specify how long a document is valid
for - From always reload to never reload
- Client can check with server on date of document
and reload only if newer - Browsers uses HEAD message to get timeliness info
15Security
- https
- Uses public-key cryptography to assure secure
connection - Using 128-bit key
16Emerging Trends
- Browsing from other devices memory and display
limited devices - Wide usage of XML
- eXtensible Markup Language
- Specifies structure and meaning of data
- Does not specify formatting is really about
content of data - Need style sheets to format for visually
rendering XSL, XSLT - Wireless Internet
17Telnet HTTP Session
telnet clem.mscd.edu 80 Trying
147.153.1.3... Connected to clem.mscd.edu. Escape
character is ''. GET /evansell/NETWORKS/simpleh
tml.html HTTP/1.0 HTTP/1.1 200 OK Server
Netscape-Enterprise/6.0 Date Mon, 05 May 2003
221334 GMT Content-type text/html Last-modified
Mon, 28 Apr 2003 220729 GMT Content-length
483 Accept-ranges bytes Connection
close ltHTMLgt ltHEADgt ltTITLEgt Elliott's Web Page
lt/TITLEgt lt/HEADgt ltBODY BGCOLOR"bb00bb"
LINK"ffff00" ALINK"ff0000" VLINK"00ff00"gt ltH
1gt Elliott on the Web lt/H1gt ltULgt ltLIgt See my web
page at ltA HREF"http//clem.mscd.edu/evansell"gt
Metro State lt/Agt lt/LIgt ltLIgt My personal home
page link will be added here - later lt/LIgt ltLIgt
Interests Jai-lai, outdated technology, older
movies lt/LIgt lt/ULgt ltHRgt Email me at ltA
HREF"mailtoevansell_at_mscd.edu"gt
evansell_at_mscd.edu lt/Agt lt/BODYgt lt/HTMLgt Connection
closed by foreign host.
18Exercises
- Chapter 35 3, 7, 13, 14
- Make sure you telnet to a web server and ask for
a simple HTML document