II. XML Data Management - PowerPoint PPT Presentation

About This Presentation
Title:

II. XML Data Management

Description:

Syntax for structuring data and documents in human-readable ... Tim Bray ... it is already. HS / DBSII-03-XML-1. 5. XML example. Pre-XML representation of data: ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 34
Provided by: schw75
Category:
Tags: xml | bray | data | management

less

Transcript and Presenter's Notes

Title: II. XML Data Management


1
II. XML Data Management
A XML refresher using material from
A. Silverschatz and M. Sapossnek B - XML-Data
Management (1) Query languages XPATH,
XQuery, SQLX C - Mapping XML data to
databases - Native XML Data management

2
What is XML?
  • Acronym for eXtensible Markup Language
  • Syntax for structuring data and documents in
    human-readable form
  • THE "Syntax of the WEB"
  • Meta language for defining languages
  • Bases of many extensions
  • Namespaces
  • Stylesheets
  • Hyperlinks
  • Schemata
  • Standardized by W3Chttp//www.w3.org/TR/REC-xml

3
What XML is Not..
  • No protocol
  • Language for describing data
  • Used as data format in protocols
  • Protocols may be syntactically defined by XML
  • No programming languagebut
  • XML documents may contain code fragments
  • New languages allow for XML code as part of the
    language (Xen, a MS extension of C )
  • Some XML extensions with superimposed PL
    semantics,rule semantics in XSLT
  • No magic semantics
  • Interpretation by humans, applications,
    standards derived from XML

4
Why XML?
  • not a question any more, since widely adopted
  • Simple
  • Extensible
  • Easy to process
  • Easy to generate
  • Data interchange critical for networked
    applications

"XML will be the ASCII of the Web basic,
essential, unexciting" Tim Bray
... it is already
5
XML example
  • Pre-XML representation of data
  • XML representation of the same data

PO-1234,CUST001,X9876,5,14.98
6
XML example
  • Graphical representation

PURCHASE_ORDER
PO_NUM
Cust_ID
QUNTY
PO-1234
CUST001
XML documents - tree structured - Data an
metadata in the same document (as
opposed to RDBS)
7
XML Usage
  • Two basic types of XML usage
  • Document centric (document oriented)
  • structuring a digital document, including logical
    layout
  • primary focus of SGML - predecessor of XML
  • Data centric
  • Description of data in a self describing form for
    later processing
  • Distinction not totally clear
  • See purchase order example If typical document
    characteristic included (company addr.,customer
    addr, date, , company logo) it would be a
    document oriented usage of XML

8
Document centric XML documents example
ltProductgt ltNamegtVariabler Maulschlüssellt/Namegt
ltDevelopergt Full Fabrication Labs, Inc.
lt/Developergt ltSummarygt Großer, verstellbarer
Schraubenschlüssellt/Summarygt ltDescriptiongt
ltParagtDer Engländer besteht aus erstklassigem
Stahl und besitzt einen gummierten
Handgriff. Die Maulgröße liegt zwischen 0
und 32 mm. lt/Paragt ltParagtSie können.....
lt/Paragt ltListgt ltItemgt ltLink
URL"Order.html"gt Bestellen lt/Linkgtlt/Itemgt
ltItemgt ltLink URL"Wrenches.htm"gt Andere Werkzeuge
ansehen lt/Linkgt lt/Itemgt ltItemgt
ltLink URL"catalog.zip"gt Den Katalog
herunterladen lt/Linkgt lt/Itemgt lt/Listgt
ltParagt Der Schraubenschlüssel kostet 15.33 Euro
inkl. MWSt. Wenn Sie jetzt bestellen,
erhalten Sie zusätzlich unsere wertlose
Hobbybastler-Fibel.lt/Paragt lt/Descriptiongt lt/Pro
ductgt
TypicalLong text elements
9
Data centric XML documents example
ltOrdersgt ltSalesOrder SONumber"12345"gt
ltCustomer CustNumber"543"gt
ltCustNamegt ABC Industrieslt/CustNamegt
ltStreetgt 123 Main St.lt/Streetgt
ltCitygtChicagolt/Citygt ....
lt/Customergt ltLine LineNumber"1"gt
ltPart PartNumber"123"gt
ltDescriptiongt ltpgtltbgt Turkey
wrenchlt/bgtltbr /gt Stainless
steel, one-piece construction,
lifetime guarantee.lt/pgt
lt/Descriptiongt ltPricegt9.95lt/Pricegt
lt/Partgt
ltQuantitygt10lt/Quantitygt lt/Linegt
....... lt/SalesOrdergt lt/Ordersgt
10
XML Syntax
  • One, and only one, root element
  • Sub-elements must be properly nested
  • A tag must end within the tag in which it was
    started
  • Attributes are optional
  • Attribute values must be enclosed in or
  • No data type but 'string'
  • Processing instructions optional
  • XML is case-sensitive
  • lttaggt and ltTAGgt are not the same type of element

11
Why hierarchical "data model"?
  • Hierachies (nesting) in data bases? Why not?
  • REDUNDANCY!
  • Multiple items, customers, occur multiple times
    in different orders
  • Normalization replaces redundancies by foreign
    keys
  • OO / OR Data bases??
  • Nesting useful in data transfer
  • External application does not have access to
    foreign key / to database.

12
XML Attributes vs Elements
  • Distinction between subelement and attribute
  • In the context of documents
  • attributes are part of markup
  • subelement contents part of the basic document
    contents
  • In the context of data representation
    difference not clear, but confusing
  • Same information can be represented in two ways
  • ltaccount account-number A-101gt
  • .
  • lt/accountgt
  • ltaccountgt ltaccount-numbergt A-101
    lt/account-numbergt
  • lt/accountgt
  • Suggestion use attributes for identifiers of
    elements use subelements for
    contents

13
How to use XML data?
  • Basic Idea

Applicationwith XML-Generator
DOM SAX
Receiving application
XML-Parser
Standard- Interfaces
How does application know about - syntactical
correctness - data semantics ?
14

Correct or not correct ?
15
Correctness of XML documents
  • Syntactic correctness
  • Conformance to XML syntax
  • Document structured according to XML syntax is
    well-formed
  • Compare Syntax checker for program
  • Semantic correctness
  • Given Meta level description of XML documents
    Document Type Definition (DTD) or XML Schema
  • Document is valid with respect to DTD (Schema)
    if all definitions and restrictions have been
    fulfilled
  • No DTD allowed, applications must know, what is
    meant
  • What is semantics??
  • Interpretation of tags is a matter of humans
    and/or the application program ltxyzgt could
    mean "book title" or "first name" or

16
XML Namespaces
  • Part of XMLs extensibility
  • Allow autonomous users to differentiate between
    tags of the same name (using a prefix)
  • Frees author to focus on the data and decide how
    to best describe it
  • Allows multiple XML documents from multiple
    authors to be merged

17
Namespace
  • Examples
  • No prefix all elements belong to same namespace

ltBOOK xmlnsbkhttp//www.bookstuff.org/bookinfo
gt ltbkTITLEgtAll About XMLlt/bkTITLEgt
ltbkAUTHORgtJoe Developerlt/bkAUTHORgt ltbkPRICE
currencyUS Dollargt19.99lt/bkPRICEgt
ltBOOK xmlnshttp//www.bookstuff.org/bookinfogt
ltTITLEgtAll About XMLlt/TITLEgt ltAUTHORgtJoe
Developerlt/AUTHORgt
18
DTD and XML schema
  • Type of XML document defined as
  • DTD - not expressible in XML syntax
  • XML schema
  • Document Type Definition (DTD)
  • Does not constrain types all values are
    strings in XML
  • Syntax
  • lt!ELEMENT elem (subelement-spec)gt
  • lt!ATTLIST elem (attribute-specs) gt

19
DTD elements and attributes
  • Example (element decl)
  • lt!ELEMENT depositor (customer-name
    account-number)gt
  • lt!ELEMENT customer-name (PCDATA) gt
  • lt!ELEMENT account-number (PCDATA) gt
  • Subelements
  • names of elements
  • PCDATA (parsed character data), i.e., character
    strings
  • EMPTY (no subelements) or ANY (anything can be a
    subelement)
  • Subelement specification may have regular
    expressions
  • lt!ELEMENT bank ( ( account customer
    depositor))gt
  • Notation
  • alternatives
  • 1 or more occurrences ?
    "?" 0 or one
  • 0 or more occurrences

20
DTD example
  • lt!DOCTYPE bank
  • lt!ELEMENT bank ( ( account customer
    depositor))gt
  • lt!ELEMENT account (account-number
    branch-name balance)gt
  • lt!ELEMENT customer (customer-name
    customer-street customer-city)gt
  • lt!ELEMENT depositor (customer-name
    account-number)gt
  • lt!ELEMENT account-number (PCDATA)gt
  • lt!ELEMENT branch-name (PCDATA)gt
  • lt!ELEMENT balance (PCDATA)gt
  • lt!ELEMENT customer-name (PCDATA)gt
  • lt!ELEMENT customer-street (PCDATA)gt
  • lt!ELEMENT customer-city (PCDATA)gt
  • gt

21
DTD attributes
  • Attribute specification for each attribute
  • Name
  • Type of attribute
  • CDATA
  • ID (identifier) or IDREF (ID reference) or IDREFS
  • more on this later
  • Whether
  • mandatory (REQUIRED) has a default value
    (value),
  • or neither (IMPLIED)
  • Examples
  • lt!ATTLIST account acct-type CDATA checkinggt
  • lt!ATTLIST customer
  • customer-id ID REQUIRED
  • accounts IDREFS REQUIRED gt

22
DTD attribute ID
  • At most one attribute of type ID per element
  • ID attribute value of each element in an XML
    document must be distinct
  • ID attribute value is object identifier
  • attribute of type IDREF must contain the ID value
    of an element in the same document
  • attribute of type IDREFS contains a set of (0 or
    more) ID values. ID value must contain the ID
    value of an element in the same document
  • ID, IDREF, IDREFS do not designate a particular
    domain (no type!)

23
DTD declaration
  • External DTD-declarationlt?xml version"1.0"gtlt!DO
    CTYPE bank SYSTEM "http//www.x-ag.de/banks.dtd"gt
    ltbankgt ... lt/bankgt
  • Internal DTD-declarationlt!DOCTYPE custDesc
    lt!ELEMENT custDesc (PCDATA)gt gtltcustDescgt
    consumer rights protagonist lt/custDescgt
  • Mixed usagelt!DOCTYPE bank SYSTEM
    "http//www.x-ag.de/banks.dtd" lt!ATTLIST bank
    Descr CDATA REQUIREDgtgtltbank Descr" mostly
    private customers and ATM"gt ... lt/bankgt

24
DTD limits
  • No typing of text elements and attributes
  • All values are strings, no integers, reals, etc.
  • Difficult to specify unordered sets of
    subelements
  • Order is usually irrelevant in databases
  • (A B) allows specification of an unordered
    set, but
  • Cannot ensure that each of A and B occurs only
    once
  • How to express a, b and c in arbitrary order?
    lt!ELEMENT a ((b,c,d) (c,b,d) (b,d,c), ...)gt
  • IDs and IDREFs are untyped
  • The owners attribute of an account may contain a
    reference to another account, which is
    meaningless
  • owners attribute should ideally be constrained to
    refer to customer elements

25
XML Schema
  • XML Schema (XSD) much more expressible Schema
    language compared to DTD schemas
  • Typing of values
  • E.g. integer, string, etc
  • constraints on min/max values
  • User defined types
  • specified in XML syntax, unlike DTDs
  • More standard representation, but verbose
  • namespace support
  • Many more features
  • List types, uniqueness and foreign key
    constraints, inheritance Ability to map to RDB,
  • significantly more complicated than DTD syntax
  • Use of XSD recommended

26
ltxsdschema xmlnsxsdhttp//www.w3.org/2001/XMLSc
hemagt ltxsdelement namebank typeBankType/gt lt
xsdelement nameaccountgtltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement
nameaccount-number typexsdstring/gt
ltxsdelement namebranch-name
typexsdstring/gt ltxsdelement
namebalance typexsddecimal/gt
lt/xsdsquencegtlt/xsdcomplexTypegt lt/xsdele
mentgt .. definitions of customer and depositor
. ltxsdcomplexType nameBankTypegtltxsdsquencegt
ltxsdelement refaccount minOccurs0
maxOccursunbounded/gt ltxsdelement
refcustomer minOccurs0 maxOccursunbounded
/gt ltxsdelement refdepositor
minOccurs0 maxOccursunbounded/gt lt/xsdsequen
cegt lt/xsdcomplexTypegt lt/xsdschemagt
XSD example (from Silverschatz)
27
Using XML
  • Data exchange ?
  • Data management
  • Store, retrieve, query large document sets
    efficiently
  • Today's solutions
  • Mapping to RDB / ORDB / OODB
  • "Native" XML data management (not necessarily
    very different from storing in conventional DB)
  • Standardized data description different
    extensions and applications
  • Bioinformatic Sequence Markup Language (BSML)
  • MathML
  • Scalable Vector Graphics (SVG).. And many, many
    more
  • Ressource Description in the web (RDF)

28
Using XML RDF with XML syntax
RDF-Modell
www.me.de/fritz
Homepage
Fritz Müller
Creator
Many of these triples form a graph
29
Using XML
  • Layout of documents?
  • XML documents have logical structure
  • Layout structure needed for output
  • Use transformation language to describe device
    specific transformations

Transformation into all kinds of languages
(HTML, pdf, ) on all kinds of devices
30
XML transformation
  • XSLT The language used for converting XML
    documents into other forms
  • Describes how the document is transformed
  • Expressed as an XML document (.xsl)
  • Template rules
  • Patterns match nodes in source document
  • Templates instantiated to form part of result
    document
  • XPath for querying, sorting, etc.
  • XSL-FO language for describing layout
  • XSL XSLT XPATH XSL-FO

31
XML transformation example (1)
  • Document

ltsalesgt ltsummarygt ltheadinggtScootney
Publishinglt/headinggt ltsubheadgtRegional Sales
Reportlt/subheadgt ltdescriptiongtSales
Reportlt/descriptiongt lt/summarygt ltdatagt
ltregiongt ltnamegtWest Coastlt/namegt
ltquarter number"1" books_sold"24000" /gt
ltquarter number"2" books_sold"38600" /gt
ltquarter number"3" books_sold"44030" /gt
ltquarter number"4" books_sold"21000" /gt
lt/regiongt ... lt/datagt lt/salesgt
32
XML transformation example (2)
  • XSL style sheet - mapping to HTML

ltxslparam name"low_sales" select"21000"/gt ltBODY
gt lth1gtltxslvalue-of select"//summary/heading"/gt
lt/h1gt ... lttablegtlttrgtltthgtRegion\Quarterlt/thgt
ltxslfor-each select"//data/region1/quarter"gt
ltthgtQltxslvalue-of select"_at_number"/gtlt/thgt
lt/xslfor-eachgt ... ltxslfor-each
select"//data/region"gt lttrgtltxslvalue-of
select"name"/gtlt/thgt ltxslfor-each
select"quarter"gt lttdgtltxslchoosegt
ltxslwhen test"number(_at_books_sold lt
low_sales)"gt colorredlt/xslwhengt
ltxslotherwisegtcolorgreenlt/xslotherwisegtlt/xs
lchoosegt ltxslvalue-of select"format-number
(_at_books_sold,',')" /gt lt/tdgt
... lttdgtltxslvalue-of
select"format-number(sum(quarter/_at_books_sold),
',')"/gt
XPath expression XPath query language on doc
trees
33
XML transformation example (2)
  • The result
Write a Comment
User Comments (0)
About PowerShow.com