Title: Beyond HTML: Extensible Markup Language
1Beyond HTMLExtensible Markup Language
- Timothy W. Cole
- Grainger Engineering Library Information
CenterUniversity of Illinois at Urbana-Champaign - American Association of Law Libraries19 July
2000 - t-cole3_at_uiuc.edu
- http//dli.grainger.uiuc.edu/Publications/TWCole/A
ALL_2000/
2Ordered Hierarchy of Content ObjectsA Definition
of Text in Computer Terms
- Premise A Text is the Sum of its Components
- So a ltBOOKgt Could Be Defined as
ContainingltFRONT_MATTERgt ltCHAPTERgts
ltBACK_MATTERgt - ltFRONT_MATTERgt Could ContainltBOOK_TITLEgt
ltAUTHORgts ltPUBLISHERgt - While Each ltCHAPTERgt Could ContainltCHAPTER_TITLE
gt ltSECTIONgts - And Each ltSECTIONgt Could ContainltSECTION_TITLEgt
ltPARAGRAPHgts - Components Chosen Reflect Anticipated Use
3Ordered Hierarchy of Content Objects(continued)
- OHCO is a Useful, Albeit Imperfect Model
- More Powerful Than Model of Text as a Stream of
Characters Formatting Instructions - Does Not Allow for Overlapping Content Objects
- OHCO Model is Inherent in XML, HTML
- XML Designed for Descriptive Content Objects, Not
Presentational Content Objects - XML Syntax is Fixed, But Semantics is Extensible
4XML Basics Markup Content
- Consider Would Display Aslt?xml
version'1.0' ?gt Colè, Tim lt!-- This is
an Example --gtltauthor sequence'first'gtltLNamegt
Colegrave lt/LNamegt,ltFNamegt Tim lt/FNamegt
lt/authorgt - This example illustrates
- XML Processing Instructions
- XML Comments (Ignored by XML Applications)
- XML Element Markup, Including an Attribute
- XML Content, Including an Entity
5XML Basics (continued)
- Well-Formed XML Rules
- XML Element Markup is Case-Sensitive
- All XML Tags Must Be Closed
- Hierarchical Nesting No Overlapping Elements
- All XML Attribute Values Must Be Quoted
- Enforces Stricter Syntax than HTML
- Facilitates Fast, Efficient Parsing
- Extensible Semantics Provide Flexibility
- Well-Formed More Lightweight Than SGML
6Is It Valid Or Well-Formed?When Does It Matter?
- All Web Browsers Need Is Well-Formed
- XML Authoring Tools Need To Validate
- Otherwise Tower of Babel Ensues
- Indexing Agents Schema-Specific Rendering
Agents May Need To Validate - Illustrations
- Malformed XML
- Well-Formed But Invalid XML
- Valid XML
7Library Uses of XMLUsing XML for Primary Sources
- Facilitates Searching
- Full-Text Searching Field-Specific Searching
- More Meaningful Proximity Searching
- Better Retrieval / Browsing
- Selective Views / Suppression of Personal Data
- Re-Ordered Piecemeal Views
- Illustration -- Illinois Agronomy Handbook
- Search
- Browsing
8Library Uses of XMLXML for Metadata Wrapping
- Facilitates Interchange, Normalization, ...
- Simpler than Fixed Fields, Record Headers, Etc.
- XML Implementations of Metadata Standards, e.g.
RDF, EAD, DC, FGDC, US-MARC - Easier Routing / Handling of Specialized Content
- In Combination with Primary Source XML
- Automatic Extraction of Metadata From Source
- Facilitates Authority Control
9Library Uses of XML XML for Document Management
- Smarter Documents
- XML Namespaces -- Integrating Multiple XML
Schemas (Including XHTML) - Rights Management, Technical Requirements,
- Facilitates Enhanced Linking Between Docs.
- Creation of Links From Marked Up Content
- Easy to Add or Modify Links Over Time
- XLink XPointer Promise More Robust Linking
- Metadata File from Illinois DLIB Testbed
- Schema Integrates RDF, DC, Project Design
10Components of XML ImplementationsDTDs XML
Schemas
- Use Either to
- Define Content Models
- Declare Attributes Entities
- DTDs Inherited from SGML
- DTDs Themselves Not Well-Formed XML
- Limits on Detail of Content Model Definitions
- Minimal Data Typing
- XML Schemas Are Well-Formed XML
- Data Typing Better Content Models Supported
- Not Yet in Widespread Use
11Components of XML ImplementationsEncoding
Entities(Using Characters Not on Your Keyboard)
- Computers Use 1s and 0s, but Characters form the
Basis of Human-Readable Texts - Coded Character Sets (CCS) Assign Integer Values
to Characters -- ASCII, ISO 8859, Unicode - Character Encoding Schemes (CES) Map Those
Integers to Bytes -- 7-bit, 8-bit, UTF-8 - Bytes Are Then Rendered as Glyphs by Your
Computer, Using Font Appropriate to CCS/ CES - Font Unavailable Or CCS/CES Misunderstood Results
in Incorrect Character(s) on Screen
12Components of XML ImplementationsEncoding
Entities (continued)
- Common Ways to Deal With This Problem
- Select CCS/CES Appropriate to Language
- Use Default CCS/CES, but Override Default Font
- Use XML/HTML Named or Numeric Entity
- HTML Understands Non-Extensible Set of Named
Entities - XML Understands Numeric Entities Corresponding to
Unicode CCS, All Named Entities Must Be Declared
in DTD - Use Unicode for CCS, UTF-8 for CES - XML Defaults
- An Illustration in HTML
13Components of XML ImplementationsPresentation -
CSS Style Sheets
- XML Content Objects Have No Style
- Use Cascading Style Sheets (CSS)Work Like CSS
for HTML, Except - Must Be Explicit About Everything
- No Special Treatment of Class ID Attributes
- Attach CCS to XML Using Special XML PI
- CSS Does Define Formatting
- CSS DOES NOT Reorganize or Add Content
- Simple XML-CSS Example The CSS Used
14Components of XML ImplementationsTransformations
- XSLT Style Sheets
- Some Characteristics of XSLT Style Sheets
- XSLT Files Are Well-Formed XML
- XSLT Transform to Another Schema, Or to XHTML
- XSLT Objects Have Implicit Functionality
- Attach XSLT To Document Using XML PI
- XSLT Can Reorganize Add Content
- Still Need CSS for Presentation -- CSS Style
Sheets Work on the Output of XSLT Processing - Supplement XSLT With Script To Manipulate
Modify Actual Content - Simple XSLT Example The XSLT Style Sheet
15The State-of-the-Art in XML Tools
- XML Authoring
- Add-Ons to Established Word Processors,
e.g.WordPerfect 9 / WordPerfect 2000 - Tools With SGML Roots, e.g.ArborTexts Epic
(was Adept) EditorSoftQuads XMetaL Editor - New XML Tools, e.g.Vervet Logics
XMLProExtensibilitys XML Authority / XML Turbo - So Far, There Are Fewer Authoring Tools
Customized for Specialized XML Schemas
16The State-of-the-Art in XML Tools (continued)
- XML Presentation Tools
- Latest Releases of Netscape Navigator/Mozilla,
and Microsofts Internet Explorer Support XML--
But Support is Generic, Partial, Uneven - Plug-Ins, Standalones Available / In Work for
Advanced XML Schemas (CML, MML, VML,) - XML Database Integration Tools
- Add-Ons to Established DBMS Available/In
WorkMicrosoft SQL Server-XML Technology Preview - Illustration With Query CSS XML Source File
- XML Query Language Specification In Work
17Developing XML ApplicationsThe Politics of XML
- Evolution of XML
- XML Formalized as W3C Recommendation 2/98
- Numerous Ancillary Specs Released In
WorkNamespaces, XSLT, XLink/XPointer, XML
Signature - Numerous Early Implementors(Chemistry, Biology,
Multimedia, Metadata) - Prerequisites for Community Implementations
- Identify Target(s) of Opportunity
- Define Horizontal Vertical Content Objects
- Consensus Building Community Buy-In
- Test Implementations Tool Building
18Developing XML ApplicationsThe Politics of XML
(continued)
- Status of XML In Legal Community
- LegalXML Has Identified Targets Begun Process of
Defining Content Objects Building Consensus - Progress in Some Areas, e.g.Court Filing (see
also XML Court Interface) - Less Visible Progress in Other Workgroups,
e.g.Reference, Public Law, Users - Presence ( Vested Interests) of Extensive
Non-XML Legal Automation Systems In Place Lessens
Motivation
19Developing XML ApplicationsThe Politics of XML
(continued)
- Status of XML In Publishing Libraries
- Extensive XML Work in MetadataUnfortunately Has
Led to Competing Stds. - Many Publishers Have Been Using SGML for a Decade
or More -- But Only Internally - Perceived Tradeoff (probably overrated)Publicly
Releasing Primary Sources in XML vs.Control of
Product Marketplace - Problems with Early SGML Web Experiments
- No One Wants to be FirstBut No One Wants to be
Last Either
20Future Directions
- Continued Evolution of Standards, Tools
- Continued Development of Community
Implementations -- Selected Disciplines - Increased Use of XML Behind the Scenes
- Carryover from SGML Trends
- Integration of XML with Databases
- XML Unlikely to Replace HTML, Other Document
Formats, But Will Co-Exist - Magnitude of Role in Law Libraries Uncertain, but
Likely to Have At Least Some Role