Title: It started with connectivity
1It started with connectivity
- A short history of the web and the basics of how
it works - CS387 Jan 2004
2Early Days birth of core concepts
- 1945 the idea of linking together microfiche
appears in an Atlantic Monthly article by
Vannevar Bush. - The article detailed a photo-electrical-mechanical
device called Memex - 1960s a prototype of a computer system for
browsing hypertext, editing, and emailing is
developed by Doug Engelbart. - The system was called NLS for oNLine System
- 1960s term Hypertext coined by Ted Nelson at ACM
20th National Conference
3Enter Tim Berners-Lee
- 1980 While at CERN, Berners-Lee writes a
notebook program to link arbitrary nodes. The
node have a related type, a title and a list of
bidirectional typed links. - 1989 Berners-Lee makes a proposal on information
management at CERN - 1990 Berners-Lees boss approves the purchase of
a NeXT cube - development on a global hypertext system begins.
4WWW World Wide Web
- 1990
- Berners-Lee begins work on a hypertext GUI
browsereditor and dubs it WorldWideWeb. - First web server developed at this time
- Berners-Lees information management proposal
from 1989 is reformulated - Development on the NeXT box continues
- Line-mode browser and WorldWideWeb reach a
demonstrable state near end of year
5(No Transcript)
6Time to show the world
- 1991 May 17 the general release of WWW on
central CERN machines - 1991 June CERN hosts seminar on WWW
- 1991 Aug Files available by FTP
- 1992 more browsers Viola Erwise released
- 1993 roughly 50 known web servers exist
- 1993 alpha release of Mosaic for X
- Marc Andreessen Eric Bina offer up the release
while at NCSA and continue refining Mosaic.
7From Mosaic to Mozilla
- Berners-Lee on Marc, Eric, and Mosaic
- They made their browser easy to install
- They had outstanding customer support
- They were the first to get inline images working.
Before, pictures were displayed in separated
windows - This made web pages much sexier
- 1994 over 200 web servers by start of year
- 1994 Andreessen and colleagues leave NCSA to
form Mosaic Communications Corp which later
became Netscape. - Side note Jamie Zawinski coins the name Mozilla
8Building Upon Layersa simple view of how it all
works
- IP is how we get things from Computer A to
Computer B (network layer) - TCP is how we reliable get things from Computer A
to Computer B (transport layer) - HTTP is how we communicate what things we want
from web servers (application layer)
9Request / Responsehow browsers and web servers
communicate
- Web servers, at the core, are simple
- They speak HTTP all requests made from browsers
are done in this protocol - Regardless of web server implementations there is
a set common tasks they perform. - Connection Set Up accept client connection, or
close if client is unwanted - Receive request read an HTTP request message
from the network - Process request interpret request message and
take appropriate action - Access resource(s) access the resource specified
in message - Construct response create the HTTP response
message w/ correct headers - Send response send the response back to the
client - Log transaction place notes about completed
transaction in log file
10- !/usr/bin/perl
- use Socket use Carp use FileHandle
- port (_at_ARGV ? ARGV0 8080) (1) use port
8080 by default, unless overridden on command
line - proto getprotobyname('tcp') (2) create
local TCP socket and set it to listen for
connections - socket(S, PF_INET, SOCK_STREAM, proto) die
- setsockopt(S, SOL_SOCKET, SO_REUSEADDR, pack("l",
1)) die - bind(S, sockaddr_in(port, INADDR_ANY)) die
- listen(S, SOMAXCONN) die
- printf(" d\n\n",port) (3) print a startup
message - while (1)
- cport_caddr accept(C, S) (4) wait for a
connection C - (cport,caddr) sockaddr_in(cport_caddr)
- C-autoflush(1)
- cname gethostbyaddr(caddr,AF_INET) (5)
print who the connection is from
11Ask Ye Shall Receive
- A request message always includes a start line
and header information (and an optionally a
body) - GET /foo/bar/baz.txt HTTP/1.0
- Accept /
- Accept-language en-us
- Accept-encoding gzip, deflate
- User-agent Mozilla/5.0
- Host www.six-fifty.com8080
- Connection Keep-alive
- A response message always includes a start line,
header info, and usually contains a body - HTTP/1.1 200 OK
- Connection Close
- Content-type text/plain
- Hello, Welcome to the Txt-Web!
- The body is the payload of the message and can be
any MIME type
12HTTP Request
- The first line of an HTTP request looks like
this - GET /index.html HTTP/1.0
- Format
- HTTP Method
- Document Address
- HTTP protocol
- Optional Header Information can also be sent
- User-Agent Mozilla/5.0 (Windows 2000 U) Opera
6.0 en - Accept image/gif, image/jpeg, text/, /
13HTTP Response
- The first line of an HTTP Response looks like
this - HTTP/1.1 200 OK
- Format protocol, status code, description
- Following this is an additional info header
- Date Mon, 19 Jan 2004 194941 GMT
- Server Apache/1.3.29 (Unix) mod_ssl/2.8.16
OpenSSL/0.9.7c - Last-Modified Tue, 02 Sep 2003 233229 GMT
- ETag "18c586-d62-3f55288d"
- Accept-Ranges bytes
- Content-Length 3426
- Connection close
- Content-Type text/html
14HTTP Methods
- Two most common methods
- GET
- Designed for information retrieval getting
documents, images, results of a database query,
etc. - POST
- Designed for posting information to the server
such as credit card info, forum messages, or info
to be stored in a database. - Other methods DELETE, PUT, TRACE, OPTIONS HEAD
- OPTIONS informs which methods are supported not
all 7 must be supported - DELETE and PUT not supported by most servers
- HEAD returns only the headers from a response
- TRACE the message through proxy servers to the
server
15MIME types
- MIME (Multipurpose Internet Mail Extensions),
like the name suggests, was originally designed
to solve problems with moving email messages
between systems. - Great success with solving email issues, MIME was
adopted by HTTP to describe and label multimedia
content. - Web servers attach a MIME type to all HTTP object
data. - HEAD /people/lenards/andy_title.gif HTTP/1.0
- HTTP/1.1 200 OK
- Date Mon, 19 Jan 2004 223057 GMT
- Server Apache/1.3.29 (Unix) mod_ssl/2.8.16
OpenSSL/0.9.7c - Last-Modified Mon, 26 Nov 2001 235249 GMT
- ETag "18c563-930a-3c02d5d1"
- Accept-Ranges bytes
- Content-Length 37642
- Connection close
- Content-Type image/gif
16MIME types (cont.)
- A MIME type is a textual label, represented as a
primary object type a specific subtype,
separated by a slash. - Examples
- An HTML document would be labeled text/html
- An XML document would be labeled text/xml
- A plain ASCII text document would be labeled
text/plain - A GIF-format image would be labeled image/gif
- A PNG-format image would be labeled image/png
- A Microsoft PowerPoint presentation would
be application/vnd.ms-powerpoint - Complete List is available at
- http//www.isi.edu/in-notes/iana/assignments/medi
a-types/
17The Web is Stateless
- HTTP is a stateless protocol
- Once a web server completes a clients request,
the connection is severed. - Connection to raptor.cs.arizona.edu closed by
foreign host. - So, there is no way to recognize a sequence of
requests originating from a given client. - As programmers we like state.
- How do we get around the inherent nature of the
protocol well be building applications with?
18Clever Workarounds
- Techniques
- Viewstate using hidden form fields to pass
around data from page to page. - URL rewriting (or munging) modifying every link
to contain an extra piece of data in the query
string to identify the user. - Cookies information handed to the client that
can be passed back to the server upon request.
Implemented as a simple string with several fields
19MmmCookies
- Below is a cookie from blackbeard.eastnet.ecu.edu
- EGSOFT_ID
- 216.39.182.210-4289295232.29602783
- blackbeard.eastnet.ecu.edu/
- 1024
- 2865430528
- 30124157
- 133967936
- 29602784
-
- For Microsoft Internet Explorer, cookies appear
in the users Cookies directory under Documents
and Settings - Files are named with the following scheme
- username_at_website-url1.txt
20Never fear
- Maintaining State
- Session tracking
- Session management
- etc.
- However you put it each web technology has
elegant ways of helping overcome the stateless
nature of HTTP - PHP, ASP.NET, Cold Fusion, and JSP all have ways
of managing session data
21Web Programming Considerations
- Programming model is loosely coupled
- Lots can go wrong
- Hardware issues servers crash, etc
- Connection Issues copper pirates steal your
ethernet cables - Lack robust user interaction
- Server round-trips
- Connection speeds vary based on user
- Browser issues
- Mozilla, Opera, Safari, Galeon, Lynx, Netscape,
IE - Each browser has its quirks
22Enterprise
- A word with many meanings
- For this course an Enterprise application is
- an application that solves a business problem
(or set of problems) on a large scale.
23What is an Enterprise Website?
- A robust website built with extensibility in mind
to solve a business problem (or set of problems)
on a large scale. - Extensible design n-tier architectural approach
- For this course, we will assume a 3-tier approach
- Presentation
- Business Logic
- Data Services
- Often tiers are divided differently, but to
simplify matters we will stick with this
decomposition.
24Enterprise App vs. Website
- The main difference separating enterprise
applications from enterprise website is the
presentation layer. - Both are distributed in nature
- Both make use of tiered, or layered design
- Extensibility
- Both are built with scalability in mind
- Both are designed to be robust
25Why?The issues a web application confronted with
- Concurrency
- 1000s of people might be using the site at the
same time - Unpredictable load
- 100 users today, 100,000 users tomorrow usage
can spike and dip - Security risks
- Web applications are exposed, sitting ducks for
attacks - Opportunity for wide-area distribute computing
- Using services provide by other machines on the
Internet - Creating a user experience
- In the face unreliable connections stateless
protocols, we must build reliability maintain
state - Developing in Internet-time
- Extreme requirements absurd development
schedules - Coping with change
- Requirements that change mid-way through a
project, sometimes due to experience gained from
testing with users - User demands for ubiquitous access
- Users want browser, mobile (WAP), and voice
access to web resources - Source adapted from Philip Alexs Guide to Web
Publishing
26How?What approach do we use to overcome
- Concurrency
- Hardware
- Limit data download (less pictures)
- Unpredictable load
- Hardware
- Design systems to run on multiple machines at
once - Security risks
- Keep stored information to a minimum
- Constant monitoring installing patches, reading
log files, etc. - Opportunity for wide-area distributed computing
- Use available services in our code
- Expose useful parts of our own code
27How? (cont.)
- Creating a user experience
- Cookies, Viewstate, URL rewriting, and Databases
- Developing in Internet-Time
- Good programming style
- Use modular design
- Code reuse
- Business concern
- Coping with change
- Again, good programming style helps (reuse)
- Good scheduling, plan for extra time
- User demands for ubiquitous access
- Separation of presentation and content (data)
- XML, XPath, XSL, and WML
28Evolving to N-Tier
- Client-Server provided programmers the freedom to
use resources not located on the client machine. - This also meant that hardware could be tailored
to the needs of applications. - However, as database applications became more
complex (and more demanding of system resources),
it was evident that multiple classes of servers.
29Rise of the middle tier
- Relational databases (RDB) provide implementation
that allows database administrators to define
data integrity rules - The role of RDBs increased as the complexity of
triggers and stored procedures grew. - Business rules began to creep into the database
area. - Business rules units of processing or
algorithms that represent some concept of
importance to the organization using the
database. - Like broad and wide brush strokes, business rules
applicable throughout an application. - This generates the desire to implement business
rules in a centrally managed area. - They are not directly related to data integrity
so it is not correct to lump them in with the
database.
30It demands on how you count
- Sometimes N-Tier is referred to as 3-Tier
- Presentation tier
- Clients remain focused on presenting data and
gathering input from users - Data tier
- Only functionality to access data and maintain
its integrity is implemented here - Business tier (also called application logic,
business services, or middle tier) - Implementation and enforcement of business rules
is defined in this tier - The term N-tier
- Comes from the subdivision of the business logic
tier. This tier is rarely homogeneous so
subdivision based on tasks are defined and
implemented. - Some programmers still count this as 3-Tiers
where others count these subdivisions, giving
raise to N-Tiers.
31Benefits of using an N-Tier approach
- Separation of presentation from function
- Ability to optimize each tier in isolation for
scaling and performance - Limited parallelism during development
- Reuse of functional modules
- Ability to select the appropriate platform for
each task
32Separation of presentation function
- Visual presentation is isolated from tested
functionality - Users like graphs better than data in grids
this change to how data is shown does not warrant
retesting the code that retrieved the data - The client computer need only be as powerful as
the rendering requires it to be. - Intranet application that implement complex
functionality on the server are excellent
examples. - With server farms and powerful computing
resources available within networks or the
Internet, this benefit has become more and more
popular - Side note as we move into a climate where more
web-enabled devices exist (PDA, phones, etc)
this benefit of N-Tier will become even more
important
33Optimize each tier in isolation
- Not just code optimization
- The flexibility of an N-Tier application means
that hardware can be chose based on the work that
tier will be doing - With databases, disk and memory performance
concerns outweigh others - With business logic, CPU and network I/O
performance concerns outweigh others - Brute Force throw the right hardware at the
tiers - Elegant Approach refactor and adjust the
software to the needs of each tier. Tune
existing hardware. Ensure data services and
business logic are not on the same server (since
they have different needs).
34Limited parallelism during development
- Each tier can be developed with only stub
functionality from the other tiers. - As functionality is implemented, the stub are
removed and the other tiers gain a chance to code
exist live code. - With a mature application, development goes
without any of the teams really noticing - However, this does not happen without clean
interfaces between the tiers (this is key).
35Reuse of functional modules
- A point poorly elaborated from 386
Object-oriented programming fails in giving reuse
without requiring source code level knowledge. - If the interface is not sufficient, source code
access allows easy manipulation. - Component software aims at maximizing reuse
- Components are reused at run-time source is
rarely available. This forces designers to think
carefully and thoroughly about the interface.
36Ability to select the appropriate platform for
each task
- This does not mean Linux, UNIX, HP-UX, Mac OS X
(necessarily). - Windows has more than one platform
- Windows CE, Windows XP, Windows 2000, Windows
2000 Server, and Windows Server 2003 - Again, the flexibility of an N-Tier solution
allows tiers to reside on different platform
(even multiple platforms within the same tier).
37Resources
- http//www.w3.org/History.html
- http//www.w3.org/People/Berners-Lee/
- HTTP The Definitive Guide
- By David Gourley, Brian Totty
- Professional Windows DNA
- By Christopher Blexud, et. al.
- Programming PHP
- By Rasmus Lerdorf, Kevin Tatroe
- Philip and Alexs Guide to Web Publishing
- By Philip Greenspun
- Web Database Applications with PHP MySQL
- By Hugh E. Williams, David Lane
- Input from
- Barney Boisvert
- Danny Mandel
- Kelly Heffner
- Ted Glaza
- Ben Ahern