Title: XML What It Means To You
1XML - What It Means To You
- William J. Bill McCalpin
- EDPP, CDIA, MIT, LIT
- Principal
- MHE
2Introduction
3Thesis, Antithesis, Synthesis
- In the philosophy of Hegel, these words show the
inevitable transition of thought, by
contradiction and reconciliation, from
an initial conviction to its opposite and then to
a new, higher conception that involves but
transcends both of them
4The Hegelian Dialetic
- Thesis Most business have well-established,
productive legacy systems - Antithesis XML is springing forth everywhere
- Synthesis XML will be integrated with legacy
systems - enhancing some processes, changing many
others, and eliminating some altogether - In short, XML will affect what you do
5How To Relate XML to Everyman
- You might think that XML is too esoteric for most
people to understand - But XML is based on the basic human need
exchanging information - XML couples the communication skills we have used
over the last several thousand years to modern,
Internet technology - So how can you understand it?
6Sex And The Single Pixel
- Or, How To Explain XML Through Human Relationships
7Men Are From MarsWomen Are From Venus
- Author John Gray has the best selling book
describing the difficulties of communication - Why would there be such difficulties?
8Communication Difficulty 1
- In order for any communication to take place,
both parties must share the same fundamental
mechanism which carries information - For example, in writing, if a boy and girl dont
even share the same writing schemes, they cant
possibly understand...
9Chinese Characters vs Latin Alphabet
10Underlying Structure of XML
- Text characters
- Tags are delimited by lt and gt, i.e. ltxmlgt
- Ending tags have /, e.g., lt/xmlgt
- Parameters are indicated by double quotes, e.g.,
ltPAPER track"Application"gt - XML is a series of tags and data, e.g.,
ltSTATEgtTexaslt/STATEgt
11Communication Difficulty 2
- Once both parties agree to the fundamental
syntax, then both parties must next agree to the
words to be used - In the case of XML, how do both parties know that
ltSTATEgt means a political subdivision and not one
of gas,liquid,solid?
12A Date Gone Bad
- One evening in the hotel lobby bar, two young
Italian men spend a while talking to an
attractive Venezuelan girl...and her aunt - They spoke Italian and she spoke Spanish, but
they communicated passably
13A Date Still Going Bad
- However, the aunt wanted to go up to her room
with her niece - The Italians wanted to take the young lady out
dancing... - So they asked her
14Oops
- What the boys said
- Vuoi andare con noi sta sera?
- What the young lady needed to hear
- Quisieras ir con nosotros esta tarde?
15Miscommunication
- Even though Italian and Spanish use the same
sounds, the same grammar, and have a common
ancestry in Latin, some words are different - Unfortunately, the most common words in both
languages are likely to be the most different
16The Cost Of Data Differences
- NASA lost a 125 million Mars orbiter because
one engineering team used metric units while
another used English units for a key spacecraft
operation... CNN 9/30/99
17XML Words
- HTML has a certain number of fixed tags -
everyone knows what they are, but they cant be
augmented - In XML, everyone can make up their own tags to
suit their needs - but how do we avoid a Tower of
CyberBabel?
18Communication Difficulty 3
- Even when you agree to common tags, you still
need to agree to a common understanding - In XML, the Schema (now replacing the DTD)
defines what tags are allowed to describe a
particular collection of data - For example, in the field of human relations,
what is a date?
19One DTD For A Date
- A woman thinks
- Invitation - formal
- Dress-up - nicely
- Eat out dinner with wine at nice restaurant
- Entertainment see a movie
- Private moment good night kiss
- lt!DOCTYPE Date
- lt!ELEMENT Date (Invitation, Dress, Meal,
Entertainment, Intimacy) gt - lt!ELEMENT Invitation (PCDATA) gt
- lt!ELEMENT Dress (PCDATA) gt
- lt!ELEMENT Meal (PCDATA) gt
- lt!ELEMENT Entertainment (PCDATA) gt
- lt!ELEMENT Intimacy (PCDATA) gt
20A Womans View Of A Date
- ltdategt
- ltinvitationgtTelephone calllt/invitationgt
- ltdressgtLong dresslt/dressgt
- ltmealgt4-star restaurantlt/mealgt
- ltentertainmentgtthe theatrelt/entertainmentgt
- ltintimacygtA passionate, romantic kisslt/intimacygt
- lt/dategt
21Another DTD For A Date
- A man thinks
- Eat out six-pack
- Private moment necking
- lt!DOCTYPE Date
- lt!ELEMENT Date (Meal,Intimacy) gt
- lt!ELEMENT Meal (PCDATA) gt
- lt!ELEMENT Intimacy (PCDATA) gt
22A Mans View Of A Date
- ltdategt
- ltmealgtsix-pack of beerlt/mealgt
- ltintimacygtnecking
- lt/intimacygt
- lt/dategt
23When Men And Women Agree
- ltdategt
- ltinvitationgtTelephone calllt/invitationgt
- ltdressgtLong dresslt/dressgt
- ltmealgt4-star restaurantlt/mealgt
- ltentertainmentgtthe theatrelt/entertainmentgt
- ltintimacygtA passionate, romantic
kisslt/intimacygt - lt/dategt
- ltdategt
- ltinvitationgtHonking
- lt/invitationgt
- ltdressgtNot the shirt he changed the oil
inlt/dressgt - ltmealgtfood and beerlt/mealgt
- ltentertainmentgtrent a videolt/entertainmentgt
- ltintimacygtA passionate, romantic kiss while
neckinglt/intimacygt - lt/dategt
24Presentation
- In human relationships, its normal for someone
to present themselves in the best light possible - We try to minimize any deficiencies while
maximizing our positive attributes - Thus, we would like to present ourselves as
25Authors View
26Original Data
27XSL
- XSL - eXtended Style Language
- XSL is derived from CSS - Cascading Style Sheets
- XSL can enable the author to create one or many
views of XML - Since XSL can be separate from the XML object,
the reader can apply the presentation information
as well as the author
28Communication Difficulty 4
- When all we had was paper and film, the author
alone controlled the presentation of the data - One of the great advantages of electronic formats
is that the presentation of data can now be put
into the hands of the reader - How can we describe this in the field of human
relationships?
29(No Transcript)
30Three Bachelors To Choose From
- Our contestant has to choose from 3 bachelors
- But if the information about the bachelors were
on paper, then the information would be presented
only one way
31How To Choose?
- But with XML (and other electronic formats like
HTML), our contestant can view the information in
different ways, to help her make her decision
32The Datamenttm
33The Datament
- Efforts to expand the meaning of document to
include all manner of electronic formats have
been unsuccessful - Hence, we have invented the concept of the
datamenttm, which is a organized collection of
information in time which can be viewed by both
human and machine
34The Readers Of Dataments
- Because the datament is in XML, presentation
information can be ignored and the data directly
extracted from the appropriate tags - Dataments can also carry one or more views of
the data. - One view should be the original static view
- Another view can allow the reader flexibility
35Why Multiple Views?
- Think of a 60,000 page phone bill - its
impossible to make any sense of it without
sorting, hiding, etc. like with a spreadsheet - On the other hand, if one reader alters the view,
then another reader might miss important
information, hence there is a default view - This default or author-centric view will also
help satisfy regulatory authorities
36Communication Difficulty 5
- Without resorting to bars, how can people easily
find compatible partners? - Now think about all the classified ads you might
have to pore through in order to find someone who
interests you - Fortunately, personals have a standard indexing
method
37A Personal Ad
- DWF - divorced white female
- SBM - single black male
- WBFP - wood burning fireplace - oops
- This system works because there is a standard
method of indexing personals - If the authors of the classifieds made up their
own indexes, think of the confusion
38Apples And Oranges
- nice DWM seeks girl who wants a good time
- cons w trvst in spc prog seeks swng aln to tk to
their ldr
39Extending XML
- XML is not only a useful way to accurately
describe people, er, information, but it can be
use as the basis of many other standards - For example, RDF stands for Resource
Description Framework , that is, a framework for
describing and interchanging metadata (i.e.,
information about information).
40XML
- XML has a common underlying syntax
- Industries and groups can create XML tags which
suit their needs - XML enables both the author and the reader to
control the presentation - But lets digress...
41What Is A Document?
- The American Heritage Dictionary defines a
document as information in writing placed on a
medium such as paper, often used as a record. - Documents have been placed on clay tablets, gold
leaf, animal skins, all types of paper,
microfilm, optical storage, and so on
42Information And Presentation
- In every case, the document represents a
fundamental union of information and presentation - But presentation presumes that the primary
audience for the document is a human being - With the coming of the Internet, this is no
longer the case
43The Curse Of Presentation
- Composition products require that you specify a
printer, even before you know where the document
will print
44Why Are Print, Image, And Presentation Formats
Incompatible?
45Printing And Imaging Formats
- Many printing formats AFP, Metacode, DJDE, XES
(UDK), PostScript, PCL, etc. - All formats use external resources like fonts,
forms, graphics, etc., although sometimes
inconsistently - Most are escape-sequence based, some are formal
data architectures, and some are almost
programming languages
46Printing And Imaging Formats
- Many imaging formats - while most used CCITT
Group 4 for image compression, most also had
proprietary data wrappers - Later systems adopted text-based formats such as
PDF, although storing other print streams is not
unknown - Systems which store text-based formats must
wrestle with resource issues
47Different Print Formats
- Why do printers have different formats? Because
of physical constraints imposed by the hardware - resources reduce the amount of data sent through
pipeline to printer - pages must be imaged in less than a fraction of a
second - complex graphics can be developed on the printer,
but this needs a special language
48Different Imaging Formats
- Why do imaging systems have different formats
because of physical constraints imposed by the
hardware - Mass storage was expensive
- Indexing schemes were too close to the
application - Text is avoided sometimes because of resource
issues - Interoperability with other products an issue
49Result
- In each case, data architecture decisions were
made in order to enhance some aspect of
legibility of the stored objects. - If there were no requirement to present the
information (to a human reader), then the
requirement for custom data formats for each
vendor would probably disappear!
50Universal Literacy
- Whos reading our documents?
51The Road To Universal Literacy
- First, only the few could read
- After the printing press, the many began to read
- Eventually, educational reforms brought the
ability to read to all
52Literacy In The Internet Age
- Can there be a spread of literacy beyond all?
- How many webpages have you ever read?
- You will never be able to keep up with the Web
alone - There are already an estimated 98,685,000 host
computers on the Internet (www.mids.org)
53Intelligent Agents
- Just around the corner is software that will read
the Web for us not search, but read - So we have to spread literacy to an audience
beyond all people, that is - Does increased quality in presentation mean
better computer literacy?
54Noise On The Net
- Think of the average webpage
- three dimensional spinning objects
- marquees scrolling across the bottom
- multiple frames bookmarks
- audio
- These items are all designed to attract the eye
your eye - This does nothing for the machine reading the
webpage
55Two Important Truths
- There are two important truths of the Internet
era - Documents which are read by humans need to be
dynamic in their presentation - Documents which are read by computers dont need
any presentation information at all - XML totally divorces presentation from
information!
56What Have We Learned About XML?
57XML Summary
- XML uses tags to describe data
- ltstategtTexaslt/stategt
- Businesses and non-profits join together to build
DTD/Schemas to describe data objects in their
spaces - lt?xml version"1.0" encoding"ISO-8859-1"?gt
- lt!DOCTYPE claim
58XML Summary
- An XML document contains information for a
particular event or transaction which can be
understand by both parties - XML documents can be intended for two types of
readers human and machine
59XML Summary
- XML documents intended for a machine do not
require any presentation information - XML dataments carry the information which
enables both static (author-centric) and dynamic
(reader-centric) presentations, using XSL
60What Will You Tell Your Boss?
- Well, this dude named Hegel met Drew Carey while
speaking Spanish in an Italian bar when they met
a transvestite space alien who was looking for a
missing NASA satellite who told them that women
were not either from Venus and that Mimi and
Pierce Brosnan were on a date but each was
reading different versions of the same menu
because it was a datament in XML.
61Reference
- www.w3c.org - the official World Wide Web
Consortium site (youll find links to the XML
spec here)
62William J. Bill McCalpin
- EDPP, CDIA, MIT, LIT
- Principal, MHE
- 1400 Cheyenne Dr.
- Richardson, Texas 75080-3921
- 972-231-3660 (v) 972-690-4521 (f)
- mccalpin_at_mhe-consulting.com