Title: AN INTRODUCTION TO XML Well look at XidML later
1AN INTRODUCTION TO XML (Well look at XidML
later)
2THIS SESSION
- What is XML?
- History of XML
- Applications of XML
- Tools for XML (more in session 9)
- Pros and Cons of XML
- Data Interchange with XML (more in session 9)
- Later, we will discuss
- Why XidML?
- The XidML schema in detail
3WHAT IS XML?
- eXtensible Markup Language
- An industry standard language for the creation of
structured documents - By Marking up data we mean inserting tags (i.e.
elements or attributes) or other information into
the document so that it is easier to process
ltBookgt ltName typetextgtThe
Outsiderlt/Namegt ltAuthor typetextgt Albert
Camuslt/Authorgt ltCategory typetextgt
Fictionlt/Categorygt ltPublished typedategtlt/Publis
hedgt ltPrice currencyUSD Typelistpricegt
19.99lt/Pricegt ltInStock typeIntegergt4lt/InStockgt
ltDiscount typebooleangtNolt/Discountgt lt/Bookgt
- By the way This example is often used to
illustrate XML, if this XML was developed by
xidml.org the type attributes would be in a
schema BooksML and the currency would be an
Element (more later)
4XML IS MORE THAN A BETTER ASCII
- ASCII is normal text
- Human readable (evolved from typewriter)
- Many tools to manipulate it
- Can be easily converted to proprietary (e.g.
Word, Excel) - Plain Text - Low information
Name Author Category Published Price In Stock
Discount The Outsider Albert Camus Fiction 19.99
4 No The Road to McCarthy P. McCarthy Travel 2002
24.99 3 Yes
- No context
- No Structure
- Needs Human interpretation
5XML IS MORE THAN A BETTER CSV
- Comma Separated Variables is structured ASCII
- Human readable
- Can be easily converted to proprietary (e.g.
Word, Excel) - More information
Name, Author, Category, Published, Price, In
Stock, Discount The Outsider, Albert Camus,
Fiction,, 19.99, 4, No The Road to McCarthy, P.
McCarthy, Travel, 2002, 24.99, 3, Yes
- Missing Information highlighted
- Words grouped correctly
- Still no context
- Needs human intervention to import/export
6XML
- Adds context and structure
ltBookgt ltName typetextgtThe Outsiderlt/Namegt ltAu
thor typetextgt Albert Camuslt/Authorgt ltCategory
typetextgt Fictionlt/Categorygt ltPublished
typedategtlt/Publishedgt ltPrice currencyUSD
Typelistpricegt 19.99lt/Pricegt ltInStock
typeIntegergt4lt/InStockgt ltDiscount
typebooleangtNolt/Discountgt lt/Bookgt
- Context added
- Type information added
- Can be automatically manipulated
- Information rich
- Needs rule book to understand
7IF WE USED BookML
ltBook Reference "BK0001"gt ltNamegtThe XidML
handbooklt/Namegt ltAuthorsgt ltAuthor
Index"0"gt ltAuthorgtSid Emellt/Authorgt lt/Author
gt ltAuthor Index"1"gt ltAuthorgtJoe
Bloggslt/Authorgt lt/Authorgt lt/Authorsgt ltCategory
gtReferenceltCategorygt ltPublishedDategt27 March
2005 lt/PublishedDategt ltCurrencygtUsDollarslt/Curren
cygt ltAmountgt19.99lt/Amountgt ltNumberInStockgt20lt/Nu
mberInStockgt ltDiscountgtYeslt/Discountgt lt/Bookgt
8A WORLDWIDE STANDARD
- Who Controls it?
- W3C (World Wide Web Consortium www.w3.org)
- Adobe, AmEx, ATT, Boeing, Computer Associates,
Ericcson, HP, Intel, Microsoft, Oracle, Siemens,
Xerox etc. - Released in 1997
- Intended as a means of distributing context rich
documents on the internet - A lot of information on the internet
(www.xml.org)
9HISTORY OF XML
- XML is not new!!!!
- GML General Markup Language
- Developed by IBM in the 1960s - good for humans,
easy manipulation - Allowed a lot of cheatingmany variations
- SGML Standard General Markup Language
- ISO standard in 1986
- Standardized document validation and interchange
- HTML Hyper Text Markup Language
- Evolved along with World Wide Web for document
interchange - Loose standard
- Focus on presentation, not content or structure
- XML eXtensible Markup Language
- Addresses weaknesses of HTML
- A meta-language for defining structure and
context - Focus on data and data interchange
- First standard (a sub-set of SGML) released in
1997
10TOOLS FOR XML
- XML has a lot of support tools (some free)
- Designed to be easy to develop tools
- (2 weeks for competent computer science
graduate) - Parsers
- Off the shelf tools for reading XML files
- Stylesheets
- A language (XSL) for transforming XML into
other documents - Controls how browsers display XML files
- Validators
- Checks XML files for correct structure and syntax
- Checks field values for range, format etc.
- Needs to be told what the XML structure should be
(Schema) - Editing Tools
- Graphical environments that integrate other tools
- Allows XML, schema, stylesheets to be developed
- Validate files with strong error checking
11ADVANTAGES
- Allows specialists to extend language for a
domain - (music, mathematics, graphics, flight test
instrumentation) - Self-documenting
- With a schema
- Robust, recoverable and future-proof
- XML formats are text-based, making them more
readable, easier to document, and easier to
debug. - Off the shelf tools
- Tools are available on different platforms,
making it simpler to use XML instead of binary
formats to exchange complex information streams. - Maps very well onto structured data (e.g.
databases) - Allows creation of own-labelled structures for
storing information. - Well understood and supported
12DISADVANTAGES
- Different specialists developing different
standards! - Can be made incomprehensible and proprietary
- Yet another language with a learning curve
- Files can be large
- Not (inherently) an indexed medium
- No native support for libraries and nested files
- No native support for archiving and revision
control (of the data as opposed to the schema)
13APPLICATIONS OF XML
- XML is good for
- Data interchange between computers, platforms and
applications - Representing structured data for automated
processing - Future-proofing data
- XML is not good for
- Data storage/exchange in a homogenous environment
- It is not a Database (despite the hype!)
14EXAMPLE - PROBLEM
Test Vehicle
Ground
Flight Test DAU
Ground Station
Vendor 1 Database
Vendor 2 Database
Same information (or a subset) required in each
place Different database structures Different
vendors
FTE
Test Config Database
15EXAMPLE SOLUTION 1
Test Vehicle
Ground
Flight Test DAU
Ground Station
Vendor 1 Database
Vendor 2 Database
Custom SQL Application
Custom SQL Application
Database structure cannot change Very static No
flexibility Hard to maintain Must learn database
mapping for each vendor
FTE
Test Config Database
16EXAMPLE SOLUTION 2
Test Vehicle
Ground
Flight Test DAU
Ground Station
Vendor 1 Database
Vendor 2 Database
Must learn to write/read a new format for each
vendor Must learn database mapping for each
vendor Freedom to change databases
FTE
Test Config Database
Proprietary File Format
Proprietary File Format
17EXAMPLE SOLUTION 3
Test Vehicle
Ground
Flight Test DAU
Ground Station
Tools available to Assist XML transformation
Vendor 1 Database
Vendor 2 Database
XML I
Must learn database mapping for each
vendor Freedom to change databases Read/Writing
significantly easier
FTE
Test Config Database
XML III
XML II
18EXAMPLE SOLUTION 4
Test Vehicle
Ground
Flight Test DAU
Ground Station
Vendor 1 Database
Vendor 2 Database
XML
Must learn database mapping for a single XML
file Freedom to change databases Read/Writing
significantly easier
FTE
Test Config Database
19SO FAR
- XML A world standard for data interchange
- A better ASCII
- Community of XML developers for support and
development - Not a database - but closely tied to databases
- An optimal solution to the problem of data
interchange - Development effort still required
20XML SCHEMAS
- Specifies the grammars for an XML document
- Provides a description of the document structure
and meaning (meta-meta-data) - A schema is necessary and sufficient to fully
describe an XML grammar - Provides support for development and
documentation - Tools exist to automatically build data model
diagrams from a schema - Tools exist to automatically compare an XML
document against its schema and test for validity
(or not) - A schema can consist of many sub-schema that
provide descriptions of sub-sets of the document. - When we talk about a specialist implementation of
XML we really refer to a publicly released schema
for that domain - MathML for mathematics
- SensorML for sensors
- XidML for data acquisition and processing
21XML STYLESHEETS (XMLT)
- XML stylesheets can be used to transform XML
documents to almost any target format - Best used for XML or HTML target formats
- Can avoid coding with a parser
- Stylesheets are all about presentation
- Single source document can be used to generate
different reports for different audiences - An XML transform can be applied to an XML
document by including a Stylesheet processing
instruction - Single line added before the XML node
- Any program that understands XMLT (such as
Internet Explorer) can use the stylesheet to
transform the XML into the target format.
22XPath
- Used as a general-purpose query notation for
addressing and filtering the elements and text of
XML documents - Usually used in conjunction with XMLT (or a
parser) to extract data from a source XML
document and transform this data to a different
format - Allows tools for searching XML files to be built
23DATA, META-DATA, AND
24DOCUMENTING XML
- Sources of information
- The schema
- A structured description of all possible
elements, acceptable data types and values - The documentation
- Word document or HTML file with graphical
description of elements - Useful when programming
25THE SCHEMA
- The schema
- A structured description of all possible
elements, acceptable data types and values - Actually, there are many files they combine to
describe the schema - Not really for humans to read - more useful when
used in conjunction with a tool like XMLSpy
26THE SCHEMA - TREE
27THE SCHEMA - GRID
28XML DOCUMENTATION
- Usually HTML or Word document
- Every element is described
- Data model diagrams provide a lot of information
about the element - Can drill down into more and more detail as
required - Provides information on cross-referencing
29DATA MODEL DIAGRAM - FRAGMENT
Dotted line indicates optional element
indicates contains hidden child elements
(click to expand)
Indicates number of instances allowed
Connector indicates a sequence, a choice or all
30DATA MODEL DIAGRAM - EXAMPLE
31THREE VIEWS OF THE SAME DATA
Schema Documentation
32ATTRIBUTES AND ELEMENTS
ltDataLinksgt ltDataLinkSetgt ltX-DataLink-1.0
Name"MyRS232_Link1"gt ltDataBitsPerWordgt8lt/Data
BitsPerWordgt lt/X-DataLink-1.0gt lt/DataLinkSetgt
lt/DataLinksgt
33THIS SESSION COVERED
- What is XML?
- Main elements of XML
- How XML is documented
- Next, we will talk about XidML
34END OF SESSION 1