Title: II. XML Data Management
1II. XML Data Management
A XML refresher using material from
A. Silverschatz and M. Sapossnek B - XML-Data
Management (1) Query languages XPATH,
XQuery, SQLX C - Mapping XML data to
databases - Native XML Data management
2What is XML?
- Acronym for eXtensible Markup Language
- Syntax for structuring data and documents in
human-readable form - THE "Syntax of the WEB"
- Meta language for defining languages
- Bases of many extensions
- Namespaces
- Stylesheets
- Hyperlinks
- Schemata
- Standardized by W3Chttp//www.w3.org/TR/REC-xml
3What XML is Not..
- No protocol
- Language for describing data
- Used as data format in protocols
- Protocols may be syntactically defined by XML
- No programming languagebut
- XML documents may contain code fragments
- New languages allow for XML code as part of the
language (Xen, a MS extension of C ) - Some XML extensions with superimposed PL
semantics,rule semantics in XSLT - No magic semantics
- Interpretation by humans, applications,
standards derived from XML
4Why XML?
- not a question any more, since widely adopted
- Simple
- Extensible
- Easy to process
- Easy to generate
- Data interchange critical for networked
applications
"XML will be the ASCII of the Web basic,
essential, unexciting" Tim Bray
... it is already
5XML example
- Pre-XML representation of data
- XML representation of the same data
PO-1234,CUST001,X9876,5,14.98
6XML example
PURCHASE_ORDER
PO_NUM
Cust_ID
QUNTY
PO-1234
CUST001
XML documents - tree structured - Data an
metadata in the same document (as
opposed to RDBS)
7XML Usage
- Two basic types of XML usage
- Document centric (document oriented)
- structuring a digital document, including logical
layout - primary focus of SGML - predecessor of XML
- Data centric
- Description of data in a self describing form for
later processing - Distinction not totally clear
- See purchase order example If typical document
characteristic included (company addr.,customer
addr, date, , company logo) it would be a
document oriented usage of XML
8Document centric XML documents example
ltProductgt ltNamegtVariabler Maulschlüssellt/Namegt
ltDevelopergt Full Fabrication Labs, Inc.
lt/Developergt ltSummarygt Großer, verstellbarer
Schraubenschlüssellt/Summarygt ltDescriptiongt
ltParagtDer Engländer besteht aus erstklassigem
Stahl und besitzt einen gummierten
Handgriff. Die Maulgröße liegt zwischen 0
und 32 mm. lt/Paragt ltParagtSie können.....
lt/Paragt ltListgt ltItemgt ltLink
URL"Order.html"gt Bestellen lt/Linkgtlt/Itemgt
ltItemgt ltLink URL"Wrenches.htm"gt Andere Werkzeuge
ansehen lt/Linkgt lt/Itemgt ltItemgt
ltLink URL"catalog.zip"gt Den Katalog
herunterladen lt/Linkgt lt/Itemgt lt/Listgt
ltParagt Der Schraubenschlüssel kostet 15.33 Euro
inkl. MWSt. Wenn Sie jetzt bestellen,
erhalten Sie zusätzlich unsere wertlose
Hobbybastler-Fibel.lt/Paragt lt/Descriptiongt lt/Pro
ductgt
TypicalLong text elements
9Data centric XML documents example
ltOrdersgt ltSalesOrder SONumber"12345"gt
ltCustomer CustNumber"543"gt
ltCustNamegt ABC Industrieslt/CustNamegt
ltStreetgt 123 Main St.lt/Streetgt
ltCitygtChicagolt/Citygt ....
lt/Customergt ltLine LineNumber"1"gt
ltPart PartNumber"123"gt
ltDescriptiongt ltpgtltbgt Turkey
wrenchlt/bgtltbr /gt Stainless
steel, one-piece construction,
lifetime guarantee.lt/pgt
lt/Descriptiongt ltPricegt9.95lt/Pricegt
lt/Partgt
ltQuantitygt10lt/Quantitygt lt/Linegt
....... lt/SalesOrdergt lt/Ordersgt
10XML Syntax
- One, and only one, root element
- Sub-elements must be properly nested
- A tag must end within the tag in which it was
started - Attributes are optional
- Attribute values must be enclosed in or
- No data type but 'string'
- Processing instructions optional
- XML is case-sensitive
- lttaggt and ltTAGgt are not the same type of element
11Why hierarchical "data model"?
- Hierachies (nesting) in data bases? Why not?
- REDUNDANCY!
- Multiple items, customers, occur multiple times
in different orders - Normalization replaces redundancies by foreign
keys - OO / OR Data bases??
- Nesting useful in data transfer
- External application does not have access to
foreign key / to database.
12XML Attributes vs Elements
- Distinction between subelement and attribute
- In the context of documents
- attributes are part of markup
- subelement contents part of the basic document
contents - In the context of data representation
difference not clear, but confusing - Same information can be represented in two ways
- ltaccount account-number A-101gt
- .
- lt/accountgt
- ltaccountgt ltaccount-numbergt A-101
lt/account-numbergt - lt/accountgt
- Suggestion use attributes for identifiers of
elements use subelements for
contents
13How to use XML data?
Applicationwith XML-Generator
DOM SAX
Receiving application
XML-Parser
Standard- Interfaces
How does application know about - syntactical
correctness - data semantics ?
14Correct or not correct ?
15Correctness of XML documents
- Syntactic correctness
- Conformance to XML syntax
- Document structured according to XML syntax is
well-formed - Compare Syntax checker for program
- Semantic correctness
- Given Meta level description of XML documents
Document Type Definition (DTD) or XML Schema - Document is valid with respect to DTD (Schema)
if all definitions and restrictions have been
fulfilled - No DTD allowed, applications must know, what is
meant - What is semantics??
- Interpretation of tags is a matter of humans
and/or the application program ltxyzgt could
mean "book title" or "first name" or
16XML Namespaces
- Part of XMLs extensibility
- Allow autonomous users to differentiate between
tags of the same name (using a prefix) - Frees author to focus on the data and decide how
to best describe it - Allows multiple XML documents from multiple
authors to be merged
17Namespace
- Examples
- No prefix all elements belong to same namespace
ltBOOK xmlnsbkhttp//www.bookstuff.org/bookinfo
gt ltbkTITLEgtAll About XMLlt/bkTITLEgt
ltbkAUTHORgtJoe Developerlt/bkAUTHORgt ltbkPRICE
currencyUS Dollargt19.99lt/bkPRICEgt
ltBOOK xmlnshttp//www.bookstuff.org/bookinfogt
ltTITLEgtAll About XMLlt/TITLEgt ltAUTHORgtJoe
Developerlt/AUTHORgt
18DTD and XML schema
- Type of XML document defined as
- DTD - not expressible in XML syntax
- XML schema
- Document Type Definition (DTD)
- Does not constrain types all values are
strings in XML - Syntax
- lt!ELEMENT elem (subelement-spec)gt
- lt!ATTLIST elem (attribute-specs) gt
19DTD elements and attributes
- Example (element decl)
- lt!ELEMENT depositor (customer-name
account-number)gt - lt!ELEMENT customer-name (PCDATA) gt
- lt!ELEMENT account-number (PCDATA) gt
- Subelements
- names of elements
- PCDATA (parsed character data), i.e., character
strings - EMPTY (no subelements) or ANY (anything can be a
subelement) - Subelement specification may have regular
expressions - lt!ELEMENT bank ( ( account customer
depositor))gt - Notation
- alternatives
- 1 or more occurrences ?
"?" 0 or one - 0 or more occurrences
20DTD example
- lt!DOCTYPE bank
- lt!ELEMENT bank ( ( account customer
depositor))gt - lt!ELEMENT account (account-number
branch-name balance)gt - lt!ELEMENT customer (customer-name
customer-street customer-city)gt - lt!ELEMENT depositor (customer-name
account-number)gt - lt!ELEMENT account-number (PCDATA)gt
- lt!ELEMENT branch-name (PCDATA)gt
- lt!ELEMENT balance (PCDATA)gt
- lt!ELEMENT customer-name (PCDATA)gt
- lt!ELEMENT customer-street (PCDATA)gt
- lt!ELEMENT customer-city (PCDATA)gt
- gt
21DTD attributes
- Attribute specification for each attribute
- Name
- Type of attribute
- CDATA
- ID (identifier) or IDREF (ID reference) or IDREFS
- more on this later
- Whether
- mandatory (REQUIRED) has a default value
(value), - or neither (IMPLIED)
- Examples
- lt!ATTLIST account acct-type CDATA checkinggt
- lt!ATTLIST customer
- customer-id ID REQUIRED
- accounts IDREFS REQUIRED gt
22DTD attribute ID
- At most one attribute of type ID per element
- ID attribute value of each element in an XML
document must be distinct - ID attribute value is object identifier
- attribute of type IDREF must contain the ID value
of an element in the same document - attribute of type IDREFS contains a set of (0 or
more) ID values. ID value must contain the ID
value of an element in the same document - ID, IDREF, IDREFS do not designate a particular
domain (no type!)
23DTD declaration
- External DTD-declarationlt?xml version"1.0"gtlt!DO
CTYPE bank SYSTEM "http//www.x-ag.de/banks.dtd"gt
ltbankgt ... lt/bankgt - Internal DTD-declarationlt!DOCTYPE custDesc
lt!ELEMENT custDesc (PCDATA)gt gtltcustDescgt
consumer rights protagonist lt/custDescgt - Mixed usagelt!DOCTYPE bank SYSTEM
"http//www.x-ag.de/banks.dtd" lt!ATTLIST bank
Descr CDATA REQUIREDgtgtltbank Descr" mostly
private customers and ATM"gt ... lt/bankgt
24DTD limits
- No typing of text elements and attributes
- All values are strings, no integers, reals, etc.
- Difficult to specify unordered sets of
subelements - Order is usually irrelevant in databases
- (A B) allows specification of an unordered
set, but - Cannot ensure that each of A and B occurs only
once - How to express a, b and c in arbitrary order?
lt!ELEMENT a ((b,c,d) (c,b,d) (b,d,c), ...)gt
- IDs and IDREFs are untyped
- The owners attribute of an account may contain a
reference to another account, which is
meaningless - owners attribute should ideally be constrained to
refer to customer elements
25XML Schema
- XML Schema (XSD) much more expressible Schema
language compared to DTD schemas - Typing of values
- E.g. integer, string, etc
- constraints on min/max values
- User defined types
- specified in XML syntax, unlike DTDs
- More standard representation, but verbose
- namespace support
- Many more features
- List types, uniqueness and foreign key
constraints, inheritance Ability to map to RDB, - significantly more complicated than DTD syntax
- Use of XSD recommended
26ltxsdschema xmlnsxsdhttp//www.w3.org/2001/XMLSc
hemagt ltxsdelement namebank typeBankType/gt lt
xsdelement nameaccountgtltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement
nameaccount-number typexsdstring/gt
ltxsdelement namebranch-name
typexsdstring/gt ltxsdelement
namebalance typexsddecimal/gt
lt/xsdsquencegtlt/xsdcomplexTypegt lt/xsdele
mentgt .. definitions of customer and depositor
. ltxsdcomplexType nameBankTypegtltxsdsquencegt
ltxsdelement refaccount minOccurs0
maxOccursunbounded/gt ltxsdelement
refcustomer minOccurs0 maxOccursunbounded
/gt ltxsdelement refdepositor
minOccurs0 maxOccursunbounded/gt lt/xsdsequen
cegt lt/xsdcomplexTypegt lt/xsdschemagt
XSD example (from Silverschatz)
27Using XML
- Data exchange ?
- Data management
- Store, retrieve, query large document sets
efficiently - Today's solutions
- Mapping to RDB / ORDB / OODB
- "Native" XML data management (not necessarily
very different from storing in conventional DB)
- Standardized data description different
extensions and applications - Bioinformatic Sequence Markup Language (BSML)
- MathML
- Scalable Vector Graphics (SVG).. And many, many
more - Ressource Description in the web (RDF)
28Using XML RDF with XML syntax
RDF-Modell
www.me.de/fritz
Homepage
Fritz Müller
Creator
Many of these triples form a graph
29Using XML
- Layout of documents?
- XML documents have logical structure
- Layout structure needed for output
- Use transformation language to describe device
specific transformations
Transformation into all kinds of languages
(HTML, pdf, ) on all kinds of devices
30XML transformation
- XSLT The language used for converting XML
documents into other forms - Describes how the document is transformed
- Expressed as an XML document (.xsl)
- Template rules
- Patterns match nodes in source document
- Templates instantiated to form part of result
document - XPath for querying, sorting, etc.
- XSL-FO language for describing layout
- XSL XSLT XPATH XSL-FO
31XML transformation example (1)
ltsalesgt ltsummarygt ltheadinggtScootney
Publishinglt/headinggt ltsubheadgtRegional Sales
Reportlt/subheadgt ltdescriptiongtSales
Reportlt/descriptiongt lt/summarygt ltdatagt
ltregiongt ltnamegtWest Coastlt/namegt
ltquarter number"1" books_sold"24000" /gt
ltquarter number"2" books_sold"38600" /gt
ltquarter number"3" books_sold"44030" /gt
ltquarter number"4" books_sold"21000" /gt
lt/regiongt ... lt/datagt lt/salesgt
32XML transformation example (2)
- XSL style sheet - mapping to HTML
ltxslparam name"low_sales" select"21000"/gt ltBODY
gt lth1gtltxslvalue-of select"//summary/heading"/gt
lt/h1gt ... lttablegtlttrgtltthgtRegion\Quarterlt/thgt
ltxslfor-each select"//data/region1/quarter"gt
ltthgtQltxslvalue-of select"_at_number"/gtlt/thgt
lt/xslfor-eachgt ... ltxslfor-each
select"//data/region"gt lttrgtltxslvalue-of
select"name"/gtlt/thgt ltxslfor-each
select"quarter"gt lttdgtltxslchoosegt
ltxslwhen test"number(_at_books_sold lt
low_sales)"gt colorredlt/xslwhengt
ltxslotherwisegtcolorgreenlt/xslotherwisegtlt/xs
lchoosegt ltxslvalue-of select"format-number
(_at_books_sold,',')" /gt lt/tdgt
... lttdgtltxslvalue-of
select"format-number(sum(quarter/_at_books_sold),
',')"/gt
XPath expression XPath query language on doc
trees
33XML transformation example (2)