Title: Office Automation
1Office Automation Intranets
Lecture 10 Intranet Functionality 2 Textual
Media and Database Integration
2Agenda
- we will consider a number of issues relating to
text resources- how intranets can provide
information - re-purposing texts documents into HTML
- the development and operation of doc bases
- integrating databases with Intranet applications
(and the associated technical issues of state and
persistence)
3Quick Publishing
4Quick PublishingBefore Intranets
- traditional print publications
- requires designers, printers, bindings, shipping,
receiving- many professionals - requires may expensive and time consuming
decisions - often source materials are out of date (risking
professional legal difficulties)
5Quick Publishing
- quick publishing is typically an organisations
first major intranet activity - there is an overlay between quick publishing and
information management (the latter generally
involves database publishing- refer to Lecture 3)
6Quick Publishing
- circumvents the time taken to print and publish
material by creating digital documents for web
publication straight from the document source
applications (word processors and DTP
applications) - digital documents are posted on the intranet. Eg
NetObjects TeamFusion (1992) Acrobat file on the
BUSS909 Intranet
7Quick Publishing
- information is available for viewing, downloading
and printing - some of these kinds of documents are referred to
as brochure-ware - but may include more substantial materials such
as policy documents - this use of intranets treats the WWW like an
information parts store
8Quick Publishing
- information parts stores are more than just a
collection of files- they invite use- eg. PPT
files in the BUSS909 Intranet - information on intranets can be
- updated and re-published very quickly bypassing
mailrooms and printing agencies - removes the need for multiple physical copies- no
costs for additional copies
9Quick Publishing
- some companies have measurable cost reductions
normally intranets can reduce some paper use but
that will not happen immediately - copying costs may in fact increase
- while it would be expected that overall paper use
will decrease - additional costs may be incurred internally while
the IT/IS department tools up for Quick Publishing
10Quick Publishing
- generally really quick publishing is
accomplished by using an electronic publishing
application like Acrobat (see BUSS909 Intranet) - but as soon as you want to create real digital
documents (with hyperlinks etc) then you are
looking at converting documents to HTML - very soon problems can emerge!
11Web/Database Integration
12Client/Server DB ApplicationsTraditional -vs-
Web-based
- Platform-dependent
- Client is natively compiled and therefore
executes fast - Installation necessary
- Fat client maintenance needs incurred
- New, unfamiliar interface
- Rich, custom GUI constructs possible
- Difficult to integrate- existing applications
- Difficult to add multimedia
- Persistent connection to database
- Platform-independent
- Client is an interpreter (HTML, Java, JavaScript,
Microsoft Active X, etc) and is therefore slower - No installation necessary, depending on model
used - Thin Client, maintenance is minimized
- One common, familiar interface across
applications - Limited set of GUI constructs with Java applets,
custom-coded ones add to download time - Easy to integrate with existing applications
- Easy to add multimedia
- Nonpersistent connection to databases
13IntranetsWeb Database Applications- Components
- fundamental components of web database
applications for the architecture of a web
database application - a database gateway is a combination of one or
more of the first three layers of the
architecture browser, application logic, and
database connection
14IntranetsWeb Database Applications- Components
- web database applications are composed of four
components or layers - browser layer
- application logic layer
- database connection layer
- database layer
15IntranetsWeb Database Applications- Components
External helper program
Browser Layer Application Logic Layer Databa
se Gateway Layer Database Layer
HTML Document
Java Application
Java applet
CGI program
Proprietary Web Server
Web Server API module
Vendors Database API
Command-line interface to database
JDBC
ODBC
Database (RDMS)
16IntranetsWeb Database Applications- Two tiered
- web database applications can consist of multiple
tiers- there are two varieties - two-tiered applications
- three-tiered applications
- two-tiered applications consist of client which
supplies the user interface and the database
connection and the database
17IntranetsWeb Database Applications- Three tiered
- three tiered applications consist of a client
which supplies the user interface, a middle tier
which supports the database connection, and the
database - additional tiers can be used for operations such
as security and state management (see Technical
Note 4)
18Web-Database Integration
- there are two ways to integrate Web database
applications with other applications - by directly linking applications of one
technology base- generally involves
straight-forward coding - passing data between two applications- generally
involves CGI (Common Gateway Interface) scripts
19Web-Database IntegrationCGI Protocol
- CGI Protocol is generally the heart of
integration- it was designed to do this in web
applications - CGI scripting was the first way that web sites
were able to be integrated with databases and
other resources external to the Web
20Web-Database IntegrationMethods for Passing Data
- Integration of Web database applications with
other applications can be accomplished with CGI
coupled with one of several methods for passing
data - Hidden Fields
- URL Parameters
- Cookies
- JavaScript
21Web-Database IntegrationPassing Data Hidden
Fields (1)
- a hidden HTML form field is commonly used as a
simple storage container for data that needs to
be passed from page to page of an HTML-based and
CGI application - For example
- ltinput typehidden namesessionID
valuejwsr438kowkmgl
22Web-Database IntegrationPassing Data Hidden
Fields (2)
- CGI programs automatically populate the field
with a session ID that can be used to look up
session information stored in a database - suppose an intranet requires employees to
authenticate their identity by logging in with a
user name and password
23Web-Database IntegrationPassing Data Hidden
Fields (3)
- at login
- employee can be assigned a randomly generated
unique key that is stored in the database along
with the users name - the authentication procedure stores the key in a
hidden HTML form field - during the session
- the employee moves within the site the hidden
field can be referenced by new resources to
determine who is accessing them
24Web-Database IntegrationPassing Data Hidden
Fields (4)
- this functionality can be used in contexts other
than simply authenticating users - they are also an ideal way of passing data
between Lotus Domino applications and straight
CGI or Web Server API applications because
Dominos collaborative document management system
is written in HTML
25Web-Database IntegrationPassing Data URL
Parameters (1)
- within the URL, the string following the question
mark (?) in a GET request, are another storage
area for data that can be accessed by Web
database applications (see Lecture 8) - like HTML hidden form fields, the URL parameter
string can be retrieved by CGI programs, web
server API programs etc
26Web-Database IntegrationPassing Data URL
Parameters (2)
- the URL parameters can be used in the same way
that HTML hidden fields are used - for example, a session key can be stored in the
URL and used to access user authentication, user
privaledges, and session state information
27Web-Database IntegrationPassing Data Cookies (1)
- Cookies are pieces of information sent by the Web
server in HTTP headers and stored on the client
machine - are used for the same purposes as HTML hidden
form fields, and URL parameters but have an
additional features...
28Web-Database IntegrationPassing Data Cookies (2)
- the data stored in a Cookie can be retrieved
across multiple Web browsing sessions - when a user quits an instance of a Web browser,
any data stored in a HTML hidden field or URL
parameter during the session is lost, but if...
29Web-Database IntegrationPassing Data Cookies (3)
- if the data is stored in a cookie on the client
machine then the information is retrievable even
after the user quits the browser - as a consequence the data will be ready to be
accessed during a later session
30Web-Database IntegrationPassing Data Cookies (4)
- cookies relevant to a URL are sent back to the
server and accessible via the environment
variable HTTP_COOKIE - Cookies are retrievable from Java (via
JavaScript), JavaScript, CGI, and Web server
modules such as Lotus Domino
31Web-Database IntegrationPassing Data JavaScript
(1)
- JavaScript can be accessed from Java
- provides a bridge between a Java applet and the
document on which it lives- adds new capabilities
to Web database application programming - an applet can know about any forms residing on
the same page as its own and can access these
fields
32Web-Database IntegrationPassing Data JavaScript
(2)
- higher than JDK1.1, if an applet resides with a
frame, then it can access parent and sibling
frames - reading data from them
- executing JavaScript functions defined in them,
or - overwriting the framed documents completely
33Persistence State
34Persistence
- persistent database connections are highly
efficient data channels between a database client
and the DBMS- ideal for database applications - they allow single applications to exhaust these
valuable data channels as applications may
require more than one constant connection
35Non-Persistence
- however Web-based database applications and the
web applications in general do not have
persistence - the non-persistent connection architecture of the
Web is a mixed blessing
36State
- non-persistent connections mean that programmers
take care of the application state management - in order to understand persistence we must
understand state - the state of a system is expressed through the
values that it current holds as a result of its
execution
37Persistence and State
- persistence is the result or remembering or
tracking all the incremental intermediate changes
in the state of the system (its objects,
movements, or the actions of various media) - persistence is the capabilty of remembering a
state across different applications or time
periods
38Persistence and Statefor Traditional Applications
- for traditional applications, managing
persistence is easy- the applications state can
be kept in memory as long as the computer has
enough memory - stand-alone applications do not interact with any
other applications, clients, or servers, and so
do not need to make their state available or
dependent on external factors
39Resource Allocation ModelTraditional Databases
Connections are maintained to the database idle
sessions waster resources
40Persistence and Statefor Web Applications (1)
- state maintenance in Web database applications
adds another level of complexity which must be
handled by programmers - HTTP- the main protocol of the Web, is
connectionless- which means that once an HTTP
request is sent and the response is received the
connection for the communication is closed
41Persistence and Statefor Web Applications (2)
- if a connection could be kept open between the
client and the server, the server could at any
time query the client for state information and
vise versa - the server would know the identity of the client
user throughout the session once the user had
logged in
42Persistence and Statefor Web Applications (3)
- the server has no constant memory of the users
identity even after the user has logged in - therefore, HTTP clients must make a new
connection for each server request - in a stateless request, the transaction is atomic
43Persistence and Statefor Web Applications (4)
- programmers must also address the added overhead
of creating new database connections each time a
CGI program or Web server module requires
database access - database connections are expensive because they
take time- which is a problem when implementing
Web applications
44Resource Allocation ModelWeb Databases
Persistent connections reduce the overhead of
accessing databases from the WWW or Intranet
Non-persistent connections optimize the number of
available database connections at any time
which promotes the sharing database resources
45Doc Bases
46Doc BasesIntroduction...
- I have mentioned in previous lectures that the
success of the Internet owes much to its
simplicity - internet protocols connect clients to servers
using standard ASCII text. - programmers can write distributed programs using
almost any programming or scripting languages-
all languages can be applied to produce and
consumed text-based protocols.
47Doc BasesIntroduction...
- Why bother developing when you can buy
distributed computing from Microsoft? - simply assume that developing distributed
applications will be difficult - the reason is that these proprietary solutions
use proprietary APIs, packaged into proprietary
components that speak inaccessible binary
protocols to other proprietary components.
48Doc BasesIntroduction
- looking at the news group mail message (NNTP)
shows how simple it can be... - each NNTP transaction consists of a few
elements...
- Path localhost!not-for-mail
- From Rodney Clarke ltrclarke_at_nimnet.netgt
- Newsgroups test
- Subject sample message
- Date Tue, 14 Apr 1999 222959 -0400
- Message-ID lt35760488.F61E274_at_nimnet.netgt
- NNTP-Posting-host localhost
- MIME-Version 1.0
- Content-type text/plain charsetus-ascii
- Content-Transfer-Encoding 7bit
- X-Mailer Mozilla 4.5 en (WinNT I)
- This is the body of the sample message!
49Doc Bases
- Newsgroups header which identifies which group
the message was posted to, - Message-ID is automatically generated by the
news server and uniquely identifies the specific
message, and - there are of course other fields including From,
Subject, Date and so on whose meanings are
familiar to you.
50Doc BasesGeneral Definition...
- a collection of these news messages in a tree of
subdirectories is the news servers' primary
message data base - this collection of messages is an example of
Internet style data store or document data base
which is being given the name of docbase (see
Udell, 1999)
51Doc BasesGeneral Definition...
- docbase is used to define ASCII text is
structured according to some specific rules - in the case of news messages these rules of
defined by the appropriate standard for USENET
messages (RFC 1036)
52Doc BasesGeneral Definition...
- this kind of Internet data store holds
semi-structured information defined as a
combination of data that is - structured- for example news messages defined
key dimensions such as author, subject, date),
together with, - unstructured- for example a message body
contains free form text.
53Doc BasesImpose Rules on Web Page Collections (1)
- a collection of web pages wont hold the same
kind of semi structured data - web pages are not required to carry headers for
example - nor are they required to exhibit more complex
structures than can be expressed in XML - but you can choose to impose these kinds of rules
on a collection of web pages
54Doc BasesImpose Rules on Web Page Collections (2)
- imposing rules on web pages converts them into
containers for semi-structured data and
collections of them form a web doc base. - semi-structured data is at the heart of groupware
and requires the kind of skills necessary in
publishing as well as conventional data management
55Doc BasesProduce, Transform and Delivery
- it is possible to develop programs which produce,
transform and deliver web doc bases. - using exactly the same Internet protocols that we
are already familiar with, these web doc bases
can then be turned into CSCW/groupware
applications
56Doc BasesStructure (1)
- Doc bases consist of several elements including a
repository format - defines a doc base- the repository format for
mail and news data stores is usually the copies
of the headers and message bodies - for web doc bases the repository format is often
a markup language, eg. HTML, XHTML- a new
standard which requires HTML documents to be
well formed, SGML, or XML
57Doc BasesStructure (2) Input and Delivery Format
- an input tool moves content into the repository-
often a text editor, export filter or a web form
and its handler - the delivery format is the data store that is
server application uses to deliver a document to
apply application
58Doc BasesStructure (3) Transformation Viewing
Tools
- when repository and delivery format differ a
transformation tool bridges the gap between them-
is usually true for web doc bases but not always - a viewing tool is a client application that reads
and displays a delivery format generally by means
of a web browser, news reader or on occasion and
mail reader.
59Doc BasesImplementation (1) XML Query Languages
- there are several ways in which docbases can be
implemented - XML compatible Query Languages are now emerging
although some of these are not stable (still
experimental, but they will become increasingly
viable in the next 2 yrs) - there are some practical ways for implementing
scalable, real world doc bases, now
60Doc BasesImplementation (2) XML and Perl
- perhaps the most practical way to implement a web
based docbase is to - use XML as the repository format and then
- develop a translator in Perl using the
XMLParser module that connects Perl to an XML
parser called expat. - Perl is considered to be one of the few languages
of choice for web masters
61Doc BasesHow they work- Programmers Analogy
- if you program then developing these kinds of
systems are like - turning the repository into a form of source code
- the translator becomes a compiler, and
- the deliverable HTML pages can be thought of as a
form of object code
62Doc BasesExample of Help Desk Doc Base (1)
- a hypothetical doc base
- could be created with information on how to
create web pages and perform procedures using
NetObjects - this kind of archive of helpful tips is added to
when new solutions are found to interesting
problems in the company involving an
Intranet/Extranet Site
63Doc BasesExample of Help Desk Doc Base (2)
- web content in almost any form provides
opportunities for creating groupware - possibilities to connect internal groups (product
development, support, marketing and other groups)
with external groups (existing and prospective
customers, business partners)
64Doc BasesExample of Help Desk Doc Base (2)
- eg. a link on a web page
- lta hrefmailtorclarke_at_nimnet.comgtAuthorlt/agt
- can rendered automatically by the translator
into a parametised mailto - lta hrefmailtorclarke_at_nimnet.com?subjectTechni
calFeedback\August1998NetObjectsClientPage
Layoutgtrclarke_at_nimnet.netlt/agt - this allows the reader of a technical paper to
contact the systems support officer- the mail
header is automatically filled out for them
65Doc BasesExample of Help Desk Doc Base (3)
- the translator can do this because it knows about
the doc base structure - automatic translator substitution ensures that
the context of the message is provided and
consistent - the recipient could use a client-side mail filter
to manage the messages from the doc base,
organise then and count them by technical paper
or weblet section (as with messages to me about
A2 and A3)
66References
- Greer, T. (1998) Understanding Intranets
Strategic Technology Series Microsoft Press - Ju, P. (1997) Databases on the Web- Designing and
Programming for Network Access Pencom Web Works/
MT Books - Udell, J. (1999) Practical Internet Groupware
Building Tools for Collaboration Cambridge
OReilly Associates Inc.