Title: Web Servers, Data Transmission and Exchange
1Web Servers, Data Transmission and Exchange
- Zachary G. Ives
- University of Pennsylvania
- CIS 455 / 555 Internet and Web Systems
- January 28, 2009
2Today
- Finish discussion of thread pools and Web servers
- Communications Sending data
- Physical vs. logical representation
- Encoding and management of heterogeneity
3HTTP Overview
- Requests
- A small number of request types (GET, POST, PUT,
DELETE) - Request may contain additional information, e.g.
client info, parameters for forms, etc. - Responses
- Response codes 200 (OK), 404 (not found), etc.
- Metadata contents MIME type, length, etc.
- The payload or data
4A Simple HTTP Request
- GET /cis455/index.html HTTP/1.1If-Modified-Sinc
e Sun, 25 Jan 2009 111223 GMTReferer
http//www.cis.upenn.edu/index.html - Requests data at a path using HTTP 1.1 protocol
- Example response
- HTTP/1.1 200 OKDate Tue, 28 Jan 2009 95600
GMTLast-Modified Wed, 25 Jan 2009 83000
GMTContent-Type text/htmlContent-Length 3931 -
5Request Types
- GET
- Retrieve the resource at a URL
- PUT
- Publish the specified data at a URL
- DELETE
- (Self-explanatory)
- POST
- Submit form content
6The Thread Pool Request Handler Queue
7Forms Returning Data to the Server
- HTML forms allow assignments of values to
variables - Two means of submitting forms to apps
- GET-style within the URL
- GET /home/my.cgi?paramvalparam2val2
- POST-style as the data
- POST /home/second.cgi
- Content-Length 34
- searchKey Pennwhere www.google.com
8Authentication and Authorization
- Authentication
- At minimum, user ID and password authenticates
requestor - Client may wish to authenticate the server, too!
- SSL (well discuss this more later)
- Part of SSL certificate from trusted server,
validating machine - Also public key for encrypting clients
transmissions - Authorization
- Determine what user can access
- For files, applications typically, access
control list - If data from database, may also have view-based
security
9Programming Support in Web Servers
- CGI Common Gateway Interface the oldest
- A CGI is a separate program, often in Perl,
invoked by the server - Certain info is passed from server to CGI via
Unix-style environment variables - QUERY_STRING REMOTE_HOST, CONTENT_TYPE,
- HTTP post data is read from stdin
- Interface to persistent process
- In essence, how communication with a database is
done Oracle or MySQL is running on the side - Communicate via pipes, APIs like ODBC/JDBC, etc.
- Server module running in the same process
- Might be custom code (e.g., Apache extension) or
an interpreter/runtime system
10Server Modules
- Interpreters
- JavaScript/JScript, PHP, ASP,
- Often a full-fledged programming language
- Code is generally embedded within HTML, not
stand-alone - Custom runtimes/virtual machines
- Most modern Perl runtimes Java servlets ASP.NET
- A virtual machine runs within the web server
process - Functions are invoked within that JVM to handle
each request - Code is generally written as usual, but may need
to use HTML to create UI rather than standard GUI
APIs - Most of these provide (at least limited)
protection mechanisms
11Servlets
- An interesting model for programming applications
in Java - A servlet is a subclass of HttpServlet
- It overrides methods doGet() or doPost()
- Its given a number of objects
HttpServletRequest (includes info about
parameters, browser, etc.), HttpServletResponse
(a means for sending info back to the browser,
including data, forwarding requests, etc.) - Theres a notion of a session that can be used to
share state across doGet()/doPost() invocations
its generally connected with a cookie - Those of you who took CSE 330/CIS 550 should be
generally familiar with servlets - Those who didnt should be able to catch up by
looking at, e.g., http//www.apl.jhu.edu/hall/jav
a/Servlet-Tutorial/ - http//www.novocode.com/doc/servlet-essentials/
- Your homework assignment will be to build a
simple servlet engine a la Tomcat
12(Cross-)Session State Cookies
- Major problem with sessionless nature of HTTP
how do we keep info between connections? - Cookie an opaque string associated with a web
site, stored at the browser - Create in HTTP response with Set-Cookie xxx
- Passed in HTTP header as Cookie xxx
- Interpretation is up to the application
- Usually, object-value pairs passed in HTTP
header - Cookie userJoe pwdblob
- Often have an expiration
- Very common session cookies
13Persistent State Interfacing with a Database
- A very common operation
- Read some data from a database, output in a web
form - e.g., postings on Slashdot, items for a product
catalog, etc. - Three problems, abstracted away by ODBC/ADO/JDBC
- Impedance mismatch from relational DBs to objects
in Java (etc.) - Standard API for different databases
- Physical implementation for each DB
14Going One Step Further
- Today, data doesnt just come from databases
- Web services, e.g., Amazon or corporate intranet
services - External entities like credit card companies,
shippers - Web pages
- Etc.
15Sending Data
- How do we send data within a program?
- What is the implicit model?
- How does this change when we need to make the
data persistent? - What happens when we are coupling systems?
- How do we send data between programs on the same
machine? - Between different machines?
16Marshalling
- Converting from an in-memory data structure to
something that can be sent elsewhere - Pointers -gt something else
- Specific byte orderings
- Metadata
- Note that the same logical data gets a different
physical encoding - A specific case of Codds idea of
logical-physical separation - Data model vs. data
17Communication and Streams
- When storing data to disk, we have a combination
of sequential and random access - When sending data on the wire, data is only
sequential - Stream-based communication based on packets
- What are the implications here?
- Pipelining, incremental evaluation,
18Why Data Interchange Is Hard
- Need to be able to understand
- Data encoding (physical data model)
- May have syntactic heterogeneity
- Endian-ness, marshalling issues
- Impedance mismatches
- Data representation (logical data model)
- May have semantic heterogeneity
- Imprecise and ambiguous values/descriptions
19Examples
- MP3 ID3 format record at end of file
20Examples
- JPEG JFIF header
- Start of Image (SOI) marker -- two bytes (FFD8)
- JFIF marker (FFE0)
- length -- two bytes
- identifier -- five bytes 4A, 46, 49, 46, 00
(the ASCII code equivalent of a zero terminated
"JFIF" string) - version -- two bytes often 01, 02
- the most significant byte is used for major
revisions - the least significant byte for minor revisions
- units -- one byte Units for the X and Y
densities - 0 gt no units, X and Y specify the pixel aspect
ratio - 1 gt X and Y are dots per inch
- 2 gt X and Y are dots per cm
- Xdensity -- two bytes
- Ydensity -- two bytes
- Xthumbnail -- one byte 0 no thumbnail
- Ythumbnail -- one byte 0 no thumbnail
- (RGB)n -- 3n bytes packed (24-bit) RGB values
for the thumbnail pixels, n Xthumbnail
Ythumbnail
21Finding File Formats
- http//www.wikipedia.org/
- http//www.wotsit.org/
- etc.
22The Problem
- You need to look into a manual to find file
formats - (At best, e.g., MS .DOC file format)
- The Web is about making data exchange easier
Maybe we can do better! - The mother of all file formats
23Desiderata for Data Interchange
- Ability to represent many kinds of information
- Different data structures
- Hardware-independent encoding
- Endian-ness, UTF vs. ASCII vs. EBCDIC
- Standard tools and interfaces
- Ability to define shape of expected data
- With forwards- and backwards-compatibility!
- Thats XML
24Consumers of XML
- A myriad of tools and interfaces, including
- DOM document object model
- Standard OO representation of an XML tree
- SAX simple API for XML
- An event-driven parser interface for XML
- startElement, endElement, etc.
- Ant Java-based make tool with XML makefile
- XPath, XQuery, XSL, XSLT
- Web service standards
- Anything AJAX (mash-ups)
25XML as a Data Model
- XML information set includes 7 types of nodes
- Document (root)
- Element
- Attribute
- Processing instruction
- Text (content)
- Namespace
- Comment
- XML data model includes this, plus typing info,
plus order info and a few other things
26Example XML Document
Processing Instr.
- lt?xml version"1.0" encoding"ISO-8859-1" ?gt
- ltdblpgt
- ltmastersthesis mdate"2002-01-03"
key"ms/Brown92"gt - ltauthorgtKurt P. Brownlt/authorgt
- lttitlegtPRPL A Database Workload
Specification Languagelt/titlegt - ltyeargt1992lt/yeargt
- ltschoolgtUniv. of Wisconsin-Madisonlt/schoolgt
- lt/mastersthesisgt
- ltarticle mdate"2002-01-03" key"tr/dec/SRC1997-
018"gt - lteditorgtPaul R. McJoneslt/editorgt
- lttitlegtThe 1995 SQL Reunionlt/titlegt
- ltjournalgtDigital System Research Center
Reportlt/journalgt - ltvolumegtSRC1997-018lt/volumegt
- ltyeargt1997lt/yeargt
- lteegtdb/labs/dec/SRC1997-018.htmllt/eegt
- lteegthttp//www.mcjones.org/System_R/SQL_Reunio
n_95/lt/eegt - lt/articlegt
Open-tag
Element
Attribute
Close-tag
27XML Data Model Visualized( Document Object
Model)
attribute
root
p-i
element
Root
text
dblp
?xml
mastersthesis
article
mdate
mdate
key
key
author
title
year
school
2002
editor
title
year
journal
volume
ee
ee
2002
1992
1997
The
ms/Brown92
tr/dec/
PRPL
Digital
db/labs/dec
Univ.
Paul R.
Kurt P.
SRC
http//www.
28A Few Common Uses of XML
- Serves as an extensible HTML
- Allows custom tags (e.g., used by MS Word,
openoffice) - Supplement it with stylesheets (XSL) to define
formatting - Provides an exchange format for data (still need
to agree on terminology) - Tables, objects, etc.
- Format for marshalling and unmarshalling data in
Web Services
29XML as a Super-HTML(MS Word)
- lth1 class"Section1"gtlta name"_top /gtCIS 550
Database and Information Systemslt/h1gt - lth2 class"Section1"gtFall 2003lt/h2gt
- ltp class"MsoNormal"gt
- ltplacegt311 Townelt/placegt, Tuesday/Thursday
- lttime Hour"13" Minute"30"gt130PM
300PMlt/timegt - lt/pgt
-
30XML Easily Encodes Relations
Student-course-grade
- ltstudent-course-gradegt
- lttuplegt ltsidgt1lt/sidgtltcoursegt330-f03lt/coursegtltgra
degtBlt/gradegtlt/tuplegt - lttuplegt ltsidgt23lt/sidgtltcoursegt455-s04lt/coursegtltgr
adegtAlt/gradegtlt/tuplegt - lt/student-course-gradegt
31It Also Encodes Objects (with Pointers
Represented as IDs)
- ltprojectsgt
- ltproject classcse455 gt
- lttypegtProgramminglt/typegtltmemberListgt
- ltteamMembergtJoanlt/teamMembergt
- ltteamMembergtJilllt/teamMembergt
- lt/memberListgtltcodeURLgtwww.lt/codeURLgtltincorpora
tesProjectFrom classcse330 /gt - lt/projectgt
32XML and Code
- Web Services (.NET, Java web service toolkits)
are using XML to pass parameters and make
function calls marshalling as part of remote
procedure calls - SOAP WSDL
- Why?
- Easy to be forwards-compatible
- Easy to read over and validate (?)
- Generally firewall-compatible
- Drawbacks? XML is a verbose and inefficient
encoding! - But if the calls are only sending a few 100s of
bytes, who cares?
33XML When Tags Are Used by Different Sources
- Namespaces allow us to specify a context for
different tags - Two parts
- Binding of namespace to URI
- Qualified names
- lttag xmlnsmynshttp//www.fictitious.com/mypath
xmlnshttp//www.default/mypathgt - ltthistaggtis in default namespacelt/thistaggt
- ltmynsthistaggtthis a different
taglt/mynsthistaggtlt/taggt
34XML Isnt Enough on Its Own
- Its too unconstrained for many cases!
- How will we know when were getting garbage?
- How will we query?
- How will we understand what we got?