Title: MIS 3150 Data and Info Mgmt: XML
1MIS 3150 Data and Info Mgmt XML
2Learning Objectives
- Learn what XML is
- Learn the various ways in which XML is used
- Learn the key companion technologies
- See how XML is being used in industry as a
meta-language
3Agenda
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
4OverviewWhat is XML?
- A tag-based meta language
- Designed for structured data representation
- Represents data hierarchically (in a tree)
- Provides context to data (makes it meaningful)
- Self-describing data
- Separates presentation (HTML) from data (XML)
- An open W3C standard
- A subset of SGML
- vs. HTML, which is an implementation of SGML
5OverviewWhat is XML?
- XML is a "use everywhere" data specification
XML
XML
XML
XML
6OverviewDocuments vs. Data
- XML is used to represent two main types of
things - Documents
- Lots of text with tags to identify and annotate
portions of the document - Data
- Hierarchical data structures
7OverviewXML and Structured Data
- Pre-XML representation of data
- XML representation of the same data
"PO-1234","CUST001","X9876","5","14.98"
ltPURCHASE_ORDERgt ltPO_NUMgt PO-1234
lt/PO_NUMgt ltCUST_IDgt CUST001 lt/CUST_IDgt ltITEM_NUM
gt X9876 lt/ITEM_NUMgt ltQUANTITYgt 5
lt/QUANTITYgt ltPRICEgt 14.98 lt/PRICEgt lt/PURCHASE_ORD
ERgt
8OverviewBenefits of XML
- Open W3C standard
- Representation of data across heterogeneous
environments - Cross platform
- Allows for high degree of interoperability
- Strict rules
- Syntax
- Structure
- Case sensitive
9OverviewWho Uses XML?
- Submissions by
- Microsoft
- IBM
- Hewlett-Packard
- Fujitsu Laboratories
- Sun Microsystems
- Netscape (AOL), and others
- Technologies using XML
- SOAP, ebXML, BizTalk, WebSphere, many others
10Agenda
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
11Syntax and StructureComponents of an XML Document
- Elements
- Each element has a beginning and ending tag
- ltTAG_NAMEgt...lt/TAG_NAMEgt
- Elements can be empty (ltTAG_NAME /gt)
- Attributes
- Describes an element e.g. data type, data range,
etc. - Can only appear on beginning tag
- Processing instructions
- Encoding specification (Unicode by default)
- Namespace declaration
- Schema declaration
12Syntax and StructureComponents of an XML Document
- lt?xml version"1.0" ?gt
- lt?xml-stylesheet type"text/xsl"
href"template.xsl"?gt - ltROOTgt
- ltELEMENT1gtltSUBELEMENT1 /gtltSUBELEMENT2
/gtlt/ELEMENT1gt - ltELEMENT2gt lt/ELEMENT2gt
- ltELEMENT3 type'string'gt lt/ELEMENT3gt
- ltELEMENT4 type'integer' value'9.3'gt
lt/ELEMENT4gt - lt/ROOTgt
Elements with Attributes
Elements
Prologue (processing instructions)
13Syntax and StructureRules For Well-Formed XML
- There must be one, and only one, root element
- Sub-elements must be properly nested
- A tag must end within the tag in which it was
started - Attributes are optional
- Defined by an optional schema
- Attribute values must be enclosed in " " or ' '
- Processing instructions are optional
- XML is case-sensitive
- lttaggt and ltTAGgt are not the same type of element
14Syntax and StructureWell-Formed XML?
- No, CHILD2 and CHILD3 do not nest properly
ltxml? Version"1.0" ?gt ltPARENTgt ltCHILD1gtThis is
element 1lt/CHILD1gt ltCHILD2gtltCHILD3gtNumber
3lt/CHILD2gtlt/CHILD3gt lt/PARENTgt
15Syntax and StructureWell-Formed XML?
- No, there are two root elements
ltxml? Version"1.0" ?gt ltPARENTgt ltCHILD1gtThis is
element 1lt/CHILD1gt lt/PARENTgt ltPARENTgt ltCHILD1gtThi
s is another element 1lt/CHILD1gt lt/PARENTgt
16Syntax and StructureWell-Formed XML?
ltxml? Version"1.0" ?gt ltPARENTgt ltCHILD1gtThis is
element 1lt/CHILD1gt ltCHILD2/gt ltCHILD3gtlt/CHILD3gt lt
/PARENTgt
17Syntax and StructureAn XML Document
lt?xml version'1.0'?gt ltbookstoregt ltbook
genre'autobiography' publicationdate'1981'
ISBN'1-861003-11-0'gt lttitlegtThe
Autobiography of Benjamin Franklinlt/titlegt
ltauthorgt ltfirst-namegtBenjaminlt/first-namegt
ltlast-namegtFranklinlt/last-namegt
lt/authorgt ltpricegt8.99lt/pricegt lt/bookgt
ltbook genre'novel' publicationdate'1967'
ISBN'0-201-63361-2'gt lttitlegtThe Confidence
Manlt/titlegt ltauthorgt ltfirst-namegtHermanlt
/first-namegt ltlast-namegtMelvillelt/last-namegt
lt/authorgt ltpricegt11.99lt/pricegt
lt/bookgt lt/bookstoregt
18Syntax and Structure Namespaces Overview
- Part of XML's extensibility
- Allow authors to differentiate between tags of
the same name (using a prefix) - Frees author to focus on the data and decide how
to best describe it - Allows multiple XML documents from multiple
authors to be merged - Identified by a URI (Uniform Resource Identifier)
- When a URL is used, it does NOT have to represent
a live server
19Syntax and Structure Namespaces Declaration
Namespace declaration examples
xmlns bk "http//www.example.com/bookinfo/"
xmlns bk "urnmybookstuff.orgbookinfo"
xmlns bk "http//www.example.com/bookinfo/"
Namespace declaration
Prefix
URI (URL)
20Syntax and Structure Namespaces Examples
ltBOOK xmlnsbk"http//www.bookstuff.org/bookinfo"
gt ltbkTITLEgtAll About XMLlt/bkTITLEgt
ltbkAUTHORgtJoe Developerlt/bkAUTHORgt ltbkPRICE
currency'US Dollar'gt19.99lt/bkPRICEgt
ltbkBOOK xmlnsbk"http//www.bookstuff.org/bookin
fo" xmlnsmoney"urnfinancemoney"gt
ltbkTITLEgtAll About XMLlt/bkTITLEgt
ltbkAUTHORgtJoe Developerlt/bkAUTHORgt ltbkPRICE
moneycurrency'US Dollar'gt
19.99lt/bkPRICEgt
21Syntax and Structure Namespaces Default
Namespace
- An XML namespace declared without a prefix
becomes the default namespace for all
sub-elements - All elements without a prefix will belong to the
default namespace
ltBOOK xmlns"http//www.bookstuff.org/bookinfo"gt
ltTITLEgtAll About XMLlt/TITLEgt ltAUTHORgtJoe
Developerlt/AUTHORgt
22Syntax and Structure Namespaces Scope
- Unqualified elements belong to the inner-most
default namespace. - BOOK, TITLE, and AUTHOR belong to the default
book namespace - PUBLISHER and NAME belong to the default
publisher namespace
ltBOOK xmlns"www.bookstuff.org/bookinfo"gt
ltTITLEgtAll About XMLlt/TITLEgt ltAUTHORgtJoe
Developerlt/AUTHORgt ltPUBLISHER
xmlns"urnpublisherspublinfo"gt
ltNAMEgtMicrosoft Presslt/NAMEgt
lt/PUBLISHERgt lt/BOOKgt
23Syntax and Structure Namespaces Attributes
- Unqualified attributes do NOT belong to any
namespace - Even if there is a default namespace
- This differs from elements, which belong to the
default namespace
24Syntax and Structure Entities
- Entities provide a mechanism for textual
substitution, e.g. -
- You can define your own entities
- Parsed entities can contain text and markup
- Unparsed entities can contain any data
- JPEG photos, GIF files, movies, etc.
Entity Substitution
lt lt
amp
25Agenda
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
26The XML 'Alphabet Soup'
- XML itself is fairly simple
- Most of the learning curve is knowing about all
of the related technologies
27The XML 'Alphabet Soup'
XML Extensible Markup Language Defines XML documents
Infoset Information Set Abstract model of XML data definition of terms
DTD Document Type Definition Non-XML schema
XSD XML Schema XML-based schema language
XDR XML Data Reduced An earlier XML schema
CSS Cascading Style Sheets Allows you to specify styles
XSL Extensible Stylesheet Language Language for expressing stylesheets consists of XSLT and XSL-FO
XSLT XSL Transformations Language for transforming XML documents
XSL-FO XSL Formatting Objects Language to describe precise layout of text on a page
28The XML 'Alphabet Soup'
XPath XML Path Language A language for addressing parts of an XML document, designed to be used by both XSLT and XPointer
XPointer XML Pointer Language Supports addressing into the internal structures of XML documents
XLink XML Linking Language Describes links between XML documents
XQuery XML Query Language (draft) Flexible mechanism for querying XML data as if it were a database
DOM Document Object Model API to read, create and edit XML documents creates in-memory object model
SAX Simple API for XML API to parse XML documents event-driven
Data Island XML data embedded in a HTML page XML data embedded in a HTML page
Data Binding Automatic population of HTML elements from XML data Automatic population of HTML elements from XML data
29The XML 'Alphabet Soup' Schemas Overview
- DTD (Document Type Definitions)
- Not written in XML
- No support for data types or namespaces
- XSD (XML Schema Definition)
- Written in XML
- Supports data types
- Current standard recommended by W3C
30The XML 'Alphabet Soup' Schemas Purpose
- Define the "rules" (grammar) of the document
- Data types
- Value bounds
- A XML document that conforms to a schema is said
to be valid - More restrictive than well-formed XML
- Define which elements are present and in what
order - Define the structural relationships of elements
31The XML 'Alphabet Soup' Schemas DTD Example
ltBOOK ISBN1234567890gt ltTITLEgtAll About
XMLlt/TITLEgt ltAUTHORgtJoe Developerlt/AUTHORgt lt/BO
OKgt
lt!DOCTYPE BOOK lt!ELEMENT BOOK (TITLE, AUTHOR)
gt lt!ATTLIST BOOK ISBN ID REQUIRED gt
lt!ELEMENT TITLE (PCDATA) gt lt!ELEMENT
AUTHOR (PCDATA) gt gt
32The XML 'Alphabet Soup' Schemas XSD Example
ltCATALOGgt ltBOOKgt ltTITLEgtAll About
XMLlt/TITLEgt ltAUTHORgtJoe Developerlt/AUTHORgt
lt/BOOKgt lt/CATALOGgt
33The XML 'Alphabet Soup' Schemas XSD Example
ltxsdschema id"NewDataSet" targetNamespace"http
//tempuri.org/schema1.xsd" xmlns"http//tempu
ri.org/schema1.xsd" xmlnsxsd"http//www.w3.o
rg/1999/XMLSchema" xmlnsmsdata"urnschemas-mi
crosoft-comxml-msdata"gt ltxsdelement
name"book"gt ltxsdcomplexType
content"elementOnly"gt ltxsdallgt
ltxsdelement name"title" minOccurs"0"
type"xsdstring"/gt ltxsdelement
name"author" minOccurs"0" type"xsdstring"/gt
lt/xsdallgt lt/xsdcomplexTypegt
lt/xsdelementgt ltxsdelement name"Catalog"
msdataIsDataSet"True"gt ltxsdcomplexTypegt
ltxsdchoice maxOccurs"unbounded"gt
ltxsdelement ref"book"/gt lt/xsdchoicegt
lt/xsdcomplexTypegt lt/xsdelementgt lt/xsdschemagt
34The XML 'Alphabet Soup' Schemas Why You Should
Use XSD
- Newest W3C Standard
- Broad support for data types
- Reusable "components"
- Simple data types
- Complex data types
- Extensible
- Inheritance support
- Namespace support
- Ability to map to relational database tables
- XSD support in Visual Studio.NET
35The XML 'Alphabet Soup' Transformations XSL
- Language for expressing document styles
- Specifies the presentation of XML
- More powerful than CSS
- Consists of
- XSLT
- XPath
- XSL Formatting Objects (XSL-FO)
36The XML 'Alphabet Soup' Transformations Overview
- XSLT a language used to transform XML data into
a different form (commonly XML or HTML)
XML
XML,HTML,
XSLT
37The XML 'Alphabet Soup' Transformations XSLT
- The language used for converting XML documents
into other forms - Describes how the document is transformed
- Expressed as an XML document (.xsl)
- Template rules
- Patterns match nodes in source document
- Templates instantiated to form part of result
document - Uses XPath for querying, sorting, etc.
38The XML 'Alphabet Soup' XPath (XML Path Language)
- General purpose query language for identifying
nodes in an XML document - Declarative (vs. procedural)
- Contextual the results depend on current node
- Supports standard comparison, Boolean and
mathematical operators (, lt, and, or, , , etc.)
39The XML 'Alphabet Soup' XPath Operators
Operator Usage Description
/ Child operator selects only immediate children (when at the beginning of the pattern, context is root)
// Recursive descent selects elements at any depth (when at the beginning of the pattern, context is root)
. Indicates current context
.. Selects the parent of the current node
Wildcard
_at_ Prefix to attribute name (when alone, it is an attribute wildcard)
Applies filter pattern
40The XML 'Alphabet Soup' XPath Query Examples
./author (finds all author elements within
current context) /bookstore (find the bookstore
element at the root) / (find the root
element) //author (find all author elements
anywhere in document) /bookstore_at_specialty
"textbooks" (find all bookstores where the
specialty attribute
"textbooks") //book_at_style /bookstore/_at_specialt
y (find all books where the style
attribute the sepciality attribute of the
bookstore element at the root) //booktitleABCD
/author/name/text() (find the text node for
the authors name of ABCD) //booktitleABCD
and price gt 30 (boolean combination of
two conditions) //booknot(.//publisherAddison
Wesley) (more like a not exists! No publisher
tag with that name!)
41More XPath Examples
Path Expression Result
/bookstore/book1 Selects the first book element that is the child of the bookstore element
/bookstore/booklast() Selects the last book element that is the child of the bookstore element
/bookstore/booklast()-1 Selects the last but one book element that is the child of the bookstore element
/bookstore/bookposition()lt3 Selects the first two book elements that are children of the bookstore element
//title_at_lang Selects all the title elements that have an attribute named lang
//title_at_lang'eng' Selects all the title elements that have an attribute named lang with a value of 'eng'
/bookstore/bookpricegt35.00 Selects all the book elements of the bookstore element that have a price element with a value greater than 35.00
/bookstore/bookpricegt35.00/title Selects all the title elements of the book elements of the bookstore element that have a price element with a value greater than 35.00
42XPath Functions
- Accessor functions
- node-name, data, base-uri, document-uri
- Numeric value functions
- abs, ceiling, floor, round,
- String functions
- compare, concat, substring, string-length,
uppercase, lowercase, starts-with, ends-with,
matches, replace, - Other functions include functions on boolean
values, dates, nodes, etc.
43The XML 'Alphabet Soup' Data Islands
- XML embedded in an HTML document
- Manipulated via client side script or data binding
ltXML id"XMLID"gt ltBOOKgt ltTITLEgtAll About
XMLlt/TITLEgt ltAUTHORgtJoe Developerlt/AUTHORgt
lt/BOOKgt lt/XMLgt ltXML id"XMLID"
src"mydocument.xml"gt
44The XML 'Alphabet Soup' Data Islands
- Can be embedded in an HTML SCRIPT element
- XML is accessible via the DOM
ltSCRIPT language"xml" id"XMLID"gt ltSCRIPT
type"text/xml" id"XMLID"gt ltSCRIPT
language"xml" id"XMLID"
src"mydocument.xml"gt
45The XML 'Alphabet Soup' XML-Based Applications
- Microsoft SQL Server
- Retrieve relational data as XML
- Query XML data
- Join XML data with existing database tables
- Update the database via XML Updategrams
- New XML data type in SQL 2005
- Microsoft Exchange Server
- XML is native representation of many types of
data - Used to enhance performance of UI scenarios (for
example, Outlook Web Access (OWA))
46Agenda
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
47XML as a Meta-Language
A Language to create Languages
CSS
SAX
DOM
DSSL
XSL
XML/DTD
XLL
XSLT
GO
XSchema
CML
XPath
MathML
WML
XPointer
BeanML
XQL
48Gene Ontology (GO)
- Describing and manipulating information about the
molecular function, biological process and
cellular component of gene products. - Gene Ontology website
- http//www.geneontology.org
- GO DTD
- ftp//ftp.geneontology.org/pub/go/xml/dtd/go.dtd
- GO Browsers and tools
- http//www.geneontology.org/tools
- GO Resources and samples
- http//www.geneontology.org/annotations
49Math ML
- Describing and manipulating mathematical
notations - MathML website
- www.w3.org/Math
- MathML DTD
- www.w3.org/Math/DTD
- MathML Browser
- www.w3.org/Amaya
- MathML Resources
- www.webeq.com/mathml see sample documents here
50Chemical ML
- Representing molecular and chemical information
- CML website
- www.xml-cml.org
- CML DTD
- www.xml-cml.org/dtdschema/index.html
- CML Browser and Authoring Environment
- www.xml-cml.org/jumbo.html
- CML Resources
- www.xml-cml.org/chimeral/index.html
- see sample documents here
- some require plug-in downloads, can be slow
51Wireless ML
- Allows web pages to be displayed over mobile
devices - WML works with WAP to deliver the content
- Underlying model Deck of Cards that the User can
sift through - WAP/WML website
- www.wapforum.org
- WML DTD
- www.wapforum.org/DTD/wml_1.1.xml
- WAP/WML Resources
- www.oasis-open.org/cover/wap-wml.html
- www.w3scripts.com/wap Tutorial on WML, also see
WAP Demo
52Scalable Vector Graphics
- Describing vector graphics data for use over the
web - Rendering is done on the browser
- Bandwidth demands lower, scaling easier
- SVG website
- www.w3.org/Graphics/SVG
- SVG Plug-Ins
- www.adobe.com/svg
- SVG Resources
- www.irt.org/articles/js176 1999 article and good,
brief tutorial - planet.svg An Example from Deitel
53Bean ML
- Describing software components such as Java Beans
- Defines how the components are interconnected and
can be used - Bean ML Specs and Tools
- www.alphaworks.ibm.com/aw.nsf/techmain/bml
- Bean ML Resources
- www.oasis-open.org/cover/beanML.html
- With Bean ML
- You can mark-up beans using Bean ML
- And invoke different operations on Beans
- Includes BML Scripting Framework
54XBRL
- Extensible Business Reporting Language
- Capturing and representing financial and
accounting information - Variety of situations
- e.g. publishing reports, extracting data for
analysis, regulatory forms etc. - Initiated under the direction of AICPA
- XBRL website
- www.xbrl.org
- XBRL DTDs and Schemas
- http//www.xbrl.org/Core/2000-07-31/default.htm
- Demos and Tools
- http//www.xbrl.org/Demos/demos.htm
- http//www.xbrl.org/Tools.htm
55News ML
- Designed to be media-independent
- Initiated by International Press
Telecommunications Council - Enables tracking of news stories over time
- NewsML website
- www.newsml.org
- NewsML DTD
- http//www.oasis-open.org/cover/newsML.html
- SportsML DTD Derived from NewsML DTD
- http//xml.coverpages.org/sportsML.html
56cXML
- CommerceXML from Ariba plus 40 other companies
- cXML website
- www.cxml.org
- Primary Set of Tools/Implementations to support
cXML - http//www.ariba.com/solutions/solutions_overview.
cfm - See also Whitepapers link explaining how these
can be used for - E-procurement
- E-fulfillment
- And others ..
57xCBL
- xCBL from Microsoft, SAP, Sun
- xCBL website
- www.xcbl.org
- Marketed as XML component library for B2B
e-commerce - Available Resources (see internal links)
- DTDs and Schemas
- XDK SOX Parser and an XSLT Engine
- Example Documents
58ebXML
- UN/CEFACT the United Nations body whose mandate
covers worldwide policy and technical development
in the area of trade facilitation and electronic
business. - www.uncefact.org
- ebXML website
- www.ebxml.org
- Current Endorsements
- http//www.ebxml.org/endorsements.htm
- Still needs buy-in from the larger IS/IT vendors
- Related Effort RosettaNet
- http//www.rosettanet.org/rosettanet/Rooms/Display
Pages/LayoutInitial - Business Processes for IT, Component and Chip
companies
59Conclusion
- Overview
- Syntax and Structure
- The XML Alphabet Soup
- XML as a meta-language
60Resources
- http//www.xml.com/
- http//www.w3.org/xml/
- http//www.w3schools.com/
- http//msdn.microsoft.com/xml/