Title: The%20Document%20In%20The%2021st%20Century
1The Document In The 21st Century
- William J. Bill McCalpin
- MIT, LIT, CDIA, EDP
- Principal, MHE
2Who MHE Is...
- MHE is the consulting firm which specializes in
the transition of information both within and
between the electronic printing, imaging, and
Internet environments.
3Introduction
4Thesis, Antithesis, Synthesis
- In the philosophy of Hegel, these words show the
inevitable transition of thought, by
contradiction and reconciliation, from
an initial conviction to its opposite and then to
a new, higher conception that involves but
transcends both of them
5The Hegelian Dialectic
- Thesis Most business have well-established,
productive legacy systems - Antithesis XML is springing forth everywhere and
will replace most legacy systems - Synthesis XML will be integrated with legacy
systems - enhancing some processes, changing many
others, and eliminating some altogether - In short, XML will change - not destroy - what
you do
6The Document In The 21th Century
7What Is A Document?
- The American Heritage Dictionary defines a
document as information in writing placed on a
medium such as paper, often used as a record. - Documents have been placed on clay tablets, gold
leaf, animal skins, all types of paper,
microfilm, optical storage, and so on
8Information And Presentation
- In every case, the document represents a
fundamental union of information and presentation - But presentation presumes that the primary
audience for the document is a human being - With the coming of the Internet, this is no
longer the case
9The Curse Of Presentation
- Composition products require that you specify a
printer, even before you know where the document
will print
10Why Are Print, Image, And Presentation Formats
Incompatible?
11Printing And Imaging Formats
- Many printing formats AFP, Metacode, DJDE, XES
(UDK), PostScript, PCL, etc. - All formats use external resources like fonts,
forms, graphics, etc., although sometimes
inconsistently - Most are escape-sequence based, some are formal
data architectures, and some are almost
programming languages
12Printing And Imaging Formats
- Many imaging formats - while most use CCITT Group
4 for image compression, most also have
proprietary data wrappers - Later systems adopted text-based formats such as
PDF, although storing other print streams is not
unknown - Systems which store text-based formats must
wrestle with resource issues
13Different Print Formats
- Why do printers have different formats? Because
of physical constraints imposed by the hardware - resources reduce the amount of data sent through
pipeline to printer - pages must be imaged in less than a fraction of a
second - complex graphics can be developed on the printer,
but this needs a special language
14Different Imaging Formats
- Why do imaging systems have different formats
because of physical constraints imposed by the
hardware - Mass storage was expensive
- Indexing schemes were too close to the
application - Text is avoided sometimes because of resource
issues - Interoperability with other products an issue
15Result
- In each case, data architecture decisions were
made in order to enhance some aspect of
legibility of the stored objects. - If there were no requirement to present the
information (to a human reader), then the
requirement for custom data formats for each
vendor would probably disappear!
16Information Exchanges
- B2C - business to consumer
- B2B - business to business
- B2B2C - business to business to consumer
- 2C requires presentation information
- B2B requires no presentation information, if the
recipient is a process, not a person
17Why B2B?
- NYSE (New York Stock Exchange)
- Formerly, 100 million trades in a day was
considered very heavy - Now 1 billion trades a day is considered very
heavy - The difference is automation the same multiplier
applies to B2B - 1 effect of XML is the separation of information
from presentation
18The Nature Of XML
19XML And SGML
- XML is eXtensible Markup Language
- XML is an instance of SGML, Standard Generalized
Markup Language, an ISO standard (ISO 8879) - XML is extensible because people and
enterprises with common interests get together to
define the tags which describe their data
20XML And Print Formats
- In most print formats, something like an account
number would be - AMB 200 AMI 300 SCFL 01 STO 0, 90 TRN 12345-67890
- In XML, the same information is
- ltaccount_numbergt12345-67890lt/account_numbergt
21XML and Image Formats
- Raster-based image formats contain only bitmaps
- To read the text data within the bitmap requires
an OCR/ICR process, which can fail - Most usable data is extracted from the document
and placed in the index
22XML And Electronic Formats
- The nature of all electronic presentation formats
is to be focused on the presentation of the
information. - The nature of XML is focused on the authors
content, that is, information is described as
what it is, not how it looks.
23Separating Information From Presentation
- XML enables the total separation of information
from presentation - Thus, some XML objects have only tagged
information, while others have content and
presentation information
XML
XML
XSL
24How To Relate XML to Everyman
- You might think that XML is too esoteric for most
people to understand - But XML is based on the basic human need
exchanging information - XML couples the communication skills we have used
over the last several thousand years to modern,
Internet technology - So how can you understand it?
25Communication Difficulty 1
- In order for any communication to take place,
both parties must share the same fundamental
mechanism which carries information - For example, in writing, if a boy and girl dont
even share the same writing schemes, they cant
possibly understand...
26Chinese Characters vs Latin Alphabet
27Underlying Structure of XML
- Text characters
- Tags are delimited by lt and gt, i.e. ltxmlgt
- Ending tags have /, e.g., lt/xmlgt
- Parameters are indicated by double quotes, e.g.,
ltPAPER track"Application"gt - XML is a series of tags and data, e.g.,
ltSTATEgtTexaslt/STATEgt
28Communication Difficulty 2
- Once both parties agree to the fundamental
syntax, then both parties must next agree to the
words to be used - In the case of XML, how do both parties know that
ltSTATEgt means a political subdivision and not one
of gas,liquid,solid?
29A Date Gone Bad
- One evening in the hotel lobby bar, two young
Italian men spend a while talking to an
attractive Venezuelan girl...and her aunt - They spoke Italian and she spoke Spanish, but
they communicated passably
30A Date Still Going Bad
- However, the aunt wanted to go up to her room
with her niece - The Italians wanted to take the young lady out
dancing... - So they asked her
31Oops
- What the boys said
- Vuoi andare con noi sta sera?
- What the young lady needed to hear
- Quisieras ir con nosotros esta tarde?
32Miscommunication
- Even though Italian and Spanish use the same
sounds, the same grammar, and have a common
ancestry in Latin, some words are different - Unfortunately, the most common words in both
languages are likely to be the most different
33The Cost Of Data Differences
- NASA lost a 125 million Mars orbiter because
one engineering team used metric units while
another used English units for a key spacecraft
operation... CNN 9/30/99
34XML Words
- HTML has a certain number of fixed tags -
everyone knows what they are, but they cant be
augmented - In XML, everyone can make up their own tags to
suit their needs - but how do we avoid a Tower of
CyberBabel?
35Communication Difficulty 3
- Even when you agree to common tags, you still
need to agree to a common understanding - In XML, the Schema (now replacing the DTD)
defines what tags are allowed to describe a
particular collection of data - For example, in the field of human relations,
what is a date?
36One DTD For A Date
- A woman thinks
- Invitation - formal
- Dress-up - nicely
- Eat out dinner with wine at nice restaurant
- Entertainment see a movie
- Private moment good night kiss
- lt!DOCTYPE Date
- lt!ELEMENT Date (Invitation, Dress, Meal,
Entertainment, Intimacy) gt - lt!ELEMENT Invitation (PCDATA) gt
- lt!ELEMENT Dress (PCDATA) gt
- lt!ELEMENT Meal (PCDATA) gt
- lt!ELEMENT Entertainment (PCDATA) gt
- lt!ELEMENT Intimacy (PCDATA) gt
37A Womans View Of A Date
- ltdategt
- ltinvitationgtTelephone calllt/invitationgt
- ltdressgtLong dresslt/dressgt
- ltmealgt4-star restaurantlt/mealgt
- ltentertainmentgtthe theatrelt/entertainmentgt
- ltintimacygtA passionate, romantic kisslt/intimacygt
- lt/dategt
38Another DTD For A Date
- A man thinks
- Eat out six-pack of beer
- Private moment necking
- lt!DOCTYPE Date
- lt!ELEMENT Date (Meal,Intimacy) gt
- lt!ELEMENT Meal (PCDATA) gt
- lt!ELEMENT Intimacy (PCDATA) gt
39A Mans View Of A Date
- ltdategt
- ltmealgtsix-pack of beerlt/mealgt
- ltintimacygtnecking
- lt/intimacygt
- lt/dategt
40When Men And Women Agree
- ltdategt
- ltinvitationgtTelephone calllt/invitationgt
- ltdressgtLong dresslt/dressgt
- ltmealgt4-star restaurantlt/mealgt
- ltentertainmentgtthe theatrelt/entertainmentgt
- ltintimacygtA passionate, romantic
kisslt/intimacygt - lt/dategt
- ltdategt
- ltinvitationgtHonking
- lt/invitationgt
- ltdressgtNot the shirt he changed the oil
inlt/dressgt - ltmealgtfood and beerlt/mealgt
- ltentertainmentgtrent a videolt/entertainmentgt
- ltintimacygtA passionate, romantic kiss while
neckinglt/intimacygt - lt/dategt
41The Four Stages Of XML Evolution
42The Evolution Of Technology
- Creation of basic technology
- Growth of technical tools
- Conversion of technology into business
applications - the penetration into verticals - Reduction to commodity
431 Creation Of The Basic Technology Of XML
44Creation Of Basic Technology
- In 1998, the World Wide Web Consortium declared
XML to be a recommendation, that is, a
world-wide standard - This phase began in 1990 with the creation of the
Web and browsers, and is now substantially
complete
452The Growth Of Technical Tools
46Growth Of Technical Tools
- Once the underlying technology has been created,
tools and utilities are built to use this
technology - These tools are often somewhat primitive and are
not focused on the business problem - This phase has been going furiously since 1998
47The World Wide Web Consortium and XML
48World Wide Web Consortium
- The World Wide Web Consortium was created in
October 1994 to develop common protocols that
promote the Webs evolution and ensure its
interoperability - The W3C has more than 500 Member organizations
from around the world - The W3C has many roles
49The Roles of the W3C
- Standards Body (XML and others)
- Software and Services
- Working Groups
- Initiatives
- Activities with other standards bodies
50W3C and Standards
- XML
- XSL
- CSS1 CSS2
- DOM
- HTML
- MathML
- PICS
- PNG
- RDF
- SMIL
- SVG
- XHTML
- XPath, XPointer, XML Base, Xlink
- XML Schema
51Standards
- XML (eXtensible Markup Language) is the universal
format for structured documents and data on the
Web. The base specifications are XML 1.0 Feb '98,
and Namespaces, Jan '99.
52Standards (Cont.)
- XSL (eXtensible Style Sheets)
- XSL is a language (in XML) for expressing
stylesheets. It consists of two parts - XSL Transformations (XSLT) a language for
transforming XML documents - An XML vocabulary for specifying formatting
semantics (XSL Formatting Objects)
53Standards (Cont.)
- CSS (Cascading Style Sheets) CSS1 and CSS2
describe how documents are presented on screens,
in print, or perhaps how they are pronounced - Authors and readers can influence the
presentation of documents without sacrificing
device-independence or adding new HTML tags
54Standards (Cont.)
- CSS3 is now a Working Draft
- The main purpose of CSS3 is to modularize the
specification, so that dozens of changes dont
have to be shove(d) ... into a single monolithic
specification - Devices which are constrained (such as an aural
browser) can choose to support only certain
modules instead of all of CSS.
55Why Two Style Sheet Languages?
56Standards (Cont.)
- DOM (Document Object Model)
- a standard API to the document structure and aims
to make it easy for programmers to access
components of a document and delete, add or edit
their content, attributes and style. - HTML (HyperText Markup Language)
- The current language of the Internet, which is
being redefined as XHTML 1.0
57Standards (Cont.)
- MathML (Mathematical Markup Language)
- provides a much needed foundation for the
inclusion of mathematical expressions in Web
pages. - PICS Platform for Internet Content Selection
- The PICS specification enables labels (metadata)
to be associated with Internet content. It was
originally designed to help parents and teachers
control what children access on the Internet.
58Standards (Cont.)
- PNG Portable Network Graphics
- a patent-free replacement for GIF and many common
uses of TIFF - RDF Resource Description Framework
- provide a lightweight metadata system to support
the exchange of knowledge on the Web.
59Standards (Cont.)
- SMIL Synchronized Multimedia Integration
Language - for television-like multimedia on the Web
- SVG Scalable Vector Graphics
- SVG is a language for describing two-dimensional
graphics in XML
60Standards (Cont.)
- XHTML eXtensible HyperText Markup Language
- What is the difference between XHTML 1.0, XHTML
Basic and XHTML 1.1? - XHTML 1.0 HTML 4.01
- XHTML Basic - subset for mobile apps
- XHTML 1.1 - modularized tags to help support
other applications
61Standards (Cont.)
- XPath, XPointer, XML Base, Xlink
- defines linking, pointers, base URIs, etc.
- XML Schema
- offers facilities for describing the structure
and constraining the contents of XML 1.0
documents - The major difference between DTDs and Schemas is
that Schemas allow better data typing (and
Schemas are in XML) - Became a recommendation on May 2, 2001
62Software and Services
- Amaya - W3C's Editor/Browser
- Amaya is a browser/authoring tool that allows you
to publish documents on the Web. - From http//www.w3.org/Amaya/
- CSS Validator - W3C CSS Validation Service
- At http//jigsaw.w3.org/css-validator/
63Software and Services (cont.)
- HTML Tidy
- Tidy is a utility which is able to fix up a wide
range of HTML problems. - From http//www.w3.org/People/Raggett/tidy/
- HTML Validator
- It checks HTML documents for conformance to W3C
HTML and XHTML Recommendations and other HTML
standards. - From http//validator.w3.org/
64Software and Services (cont.)
- Jigsaw W3Cs Java Server
- Jigsaw is W3C's leading-edge Web server platform,
providing a sample HTTP 1.1 implementation on top
of an advanced architecture implemented in Java.
From http//www.w3.org/Jigsaw/ - Libwww
- Libwww is a highly modular, general-purpose
client side Web API written in C for Unix and
Windows (Win32). From http//www.w3.org/Library/
65Working Groups
- CC/PP Composite Capabilities/Preference
Profiles - Automating the way in which your agent (PC, cell
phone, PDA) identifies its capabilities and
preferences - Device Independence Activity
- These Groups are working towards making the
information of the World Wide Web accessible to
various devices and achieving Web device
independent authoring.
66Working Groups (cont.)
- Internationalization Working Group and
Internationalization Interest Group - These groups promote the use of Unicode in other
recommendations and activities - Micropayments
- The Internet enables commerce in intangibles
(like information), but conventional payment
methods are too expensive for this
67Working Groups (cont.)
- XForms - Interactive forms in XML
- XML Encryption - encrypting/decrypting XML
documents and their contents - XML Protocol - using XML as an encapsulation
language in communications - XML Query - enabling collections of XML files to
be accessed like databases
68Working Groups (cont.)
- Voice Browser Activity
- This group has created a number of working
drafts, such as on a Speech Recognition Grammar
and a Speech Synthesis Markup Language - The W3C working group is basing its proposal for
Dialog Markup Language on VoiceXML, from the
VoiceXML Forum (www.voicexml.org), which is an
IEEE group
69Initiatives
- Web Accessibility Initiative (WAI)
- These guidelines explain how to make Web content
accessible to people with disabilities - P3P - Platform for Privacy Preference
- P3P is an industry standard providing a simple,
automated way for users to gain more control over
the use of personal information on Web sites they
visit.
70Where Can I find...?
- Each of the preceding items can be found (today)
at www.w3c.org - Everyone should check here periodically to obtain
updates - Members can participate in projects and setting
standards - www.xml.com is a commercial site with a
newsletter and a huge amount of educational
material
713 Conversion Of Technology Into Business
Applications
72XML In The Verticals
- The next step in the evolution of XML is the
integration of XML objects into the processes of
verticals, e.g., insurance, telecommunications,
banking, finance, etc. - In each vertical, groups will come together to
create standards for that vertical - This phase is just beginning in most verticals
73The Insurance Vertical
- ACORD (www.acord.org) is a well-known body in the
insurance vertical - ACORD, the Association for Cooperative Operations
Research and Development, describes itself as
the insurance industry's nonprofit standards
developer - ACORD initially developed standard forms to
enable information sharing in the vertical
74ACORD And PC
- In the Property and Casualty business, the main
driver to the Internet is the real-time exchange
of data between producers, carriers, rating
bureaus, service providers, and more. - The ACORD XML standard is designed to address
the real-time requirement by defining PC
transactions that include both a request and a
response message. - from http//www.acord.org/xml_frame.htm
754Reduction To Commodity
76Reduction To Commodity
- In the last phase, the technology disappears
from the view of the user - Older technologies are invisibly replaced with
the newer technology, e.g. EDI by XML - Users perform business-oriented tasks without
being aware of underlying technology
77Past Progressions - Example 1
- 1 - Computer chips
- 2 - assembler
- 3 - COBOL, Fortran, PL/I, C, and a host of 3rd
generation languages - 4 - GUI-based code generators
- We are now well into phase 4
78Past Progressions - Example 2
- 1 - Laser printer
- 2 - FDL (Xerox), PPFA (IBM), etc.
- 3 - Business-user friendly composition and
formatting tools - 4 - GUI-based products with multiple,
transparent drivers - We are now in phase 4
79The Growth Of The XML Bubble
80Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
Billing
HR
Pol. Proc.
EDI
81Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
Billing
HR
Pol. Proc.
EDI
XML
EBPP
82Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
83Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
84Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
85Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
86Com- pliance
Archive
New Sales
Reprints
Policy Print
Reports
Notices
CRM
11 Mark.
Campaign Manage.
XML Bubble
Billing
HR
Pol. Proc.
EDI
EBPP
87Todays Billing Process
Billing Extract
Post Process
Print/ Format
Data Base
88Todays Billing Process XML
Billing Extract
Post Process
Print/ Format
Data Base
XML App.
89As the Bubble Grows
Print/ Format
Data Base
Post Process
XML App.
Billing Extract
90Driver
Driver
XML Applications with business rules
Driver
Email
Driver
91Composition Systems Before XML - 1
Business Rules
Compo-sition
Data base
92Composition Systems Before XML - 2
Business Rules
Compo-sition
Data base
93Compo-sition
Business Rules
Driver
XML Applications with business rules
Driver
Email
Driver
94The Effect on Complex Systems
- Over time, simple tools became complex systems
- Due to competition, these systems added
functionality beyond the core product - The XML Bubble will cause these systems to split
again - Much of the added functionality was and will be
vertically specific, and fall into the XML Bubble
95Reference
- www.w3c.org - the official World Wide Web
Consortium site (youll find links to the XML
spec here) - http//www.w3.org/XML/ - a long but not
exhaustive list of XML sites, software, and
information - Taming The Web With XML - an entry level
article describing XML at http//www.mhe-consultin
g.com/writep1.html
96William J. Bill McCalpin
- MIT, LIT, CDIA, EDP
- Principal, MHE
- 1400 Cheyenne Dr.
- Richardson, Texas 75080-3921
- 972-231-3660 (v) 972-690-4521 (f)
- mccalpin_at_mhe-consulting.com