XML: Extensible Markup Language - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

XML: Extensible Markup Language

Description:

XML: Extensible Markup Language FST-UMAC Gong Zhiguo – PowerPoint PPT presentation

Number of Views:149
Avg rating:3.0/5.0
Slides: 35
Provided by: fst73
Category:

less

Transcript and Presenter's Notes

Title: XML: Extensible Markup Language


1
XML Extensible Markup Language
  • FST-UMAC
  • Gong Zhiguo

2
How the Web is Today
  • HTML documents
  • all intended for human consumption
  • many generated automatically by applications

Easy to fetch any Web page, from any server, any
platform
3
Limits of the Web Today
  • Application cannot consume HTML
  • HTML wrapper technology is brittle
  • screen scraping
  • OO technology (Corba) requires controlled
    environment
  • Companies merge, form partnerships need
    interoperability fast

4
Paradigm Shift on the Web
  • new Web standard XML
  • XML generated by applications
  • XML consumed by applications
  • data exchange
  • across platforms enterprise interoperability
  • across enterprises

Web from collection of documents to data and
documents
5
XML
  • a W3C standard to complement HTML
  • origins structured text SGML
  • motivation
  • HTML describes presentation
  • XML describes content
  • http//www.w3.org/TR/REC-xml (2/98)

6
From HTML to XML
HTML describes the presentation
7
HTML
lth1gt Bibliography lt/h1gt ltpgt ltigt Foundations of
Databases lt/igt Abiteboul, Hull, Vianu
ltbrgt Addison Wesley, 1995 ltpgt ltigt Data on
the Web lt/igt Abiteoul, Buneman, Suciu
ltbrgt Morgan Kaufmann, 1999
8
XML
ltbibliographygt ltbookgt lttitlegt Foundations
lt/titlegt ltauthorgt Abiteboul
lt/authorgt ltauthorgt Hull
lt/authorgt ltauthorgt Vianu
lt/authorgt ltpublishergt Addison
Wesley lt/publishergt ltyeargt 1995
lt/yeargt lt/bookgt lt/bibliographygt
9
XML Terminology
  • tags book, title, author,
  • start tag ltbookgt, end tag lt/bookgt
  • elements ltbookgtltbookgt,ltauthorgtlt/authorgt
  • elements are nested
  • empty element ltredgtlt/redgt abbrv. ltred/gt
  • an XML document single root element

well formed XML document if it has matching tags
10
More XML Attributes
ltbook price 55 currency USDgt lttitlegt
Foundations of Databases lt/titlegt ltauthorgt
Abiteboul lt/authorgt ltyeargt 1995
lt/yeargt lt/bookgt
attributes are alternative ways to represent data
11
Query Languages Motivation
  • granularity of the HTML Web one file
  • granularity of Web data varies
  • single data item get Johns salary
  • entire database get all salaries
  • aggregates get average salary
  • need query language to define granularity

12
XML-QL A Query Language for XML
  • http//www.w3.org/TR/NOTE-xml-ql (8/98)
  • features
  • regular path expressions
  • patterns, templates
  • Skolem Functions
  • based on OEM data model

13
Pattern Matching in XML-QL
where ltbook languagefrenchgt
ltpublishergt ltnamegt
Morgan Kaufmann lt/namegt
lt/publishergt ltauthorgt a
lt/authorgt lt/bookgt in
www.a.b.c/bib.xml construct a
14
Simple Constructors in XML-QL
where ltbook language lgt
ltauthorgt a lt/gt lt/gt in
www.a.b.c/bib.xml construct ltresultgt ltauthorgt
a lt/gt ltlanggt l lt/gt lt/gt
Note lt/gt abbreviates lt/bookgt or lt/resultgt or ...
ltresultgt ltauthorgtSmithlt/authorgtltlanggtEnglishlt/lang
gtlt/resultgt ltresultgt ltauthorgtSmithlt/authorgtltlanggtMa
ndarinlt/langgtlt/resultgt ltresultgt
ltauthorgtDoelt/authorgtltlanggtEnglishlt/langgtlt/resultgt
15
Schemas in XML
  • Document Type Definition (DTD)
  • XML Schema
  • RDF Schema

16
Document Type Definition DTD
  • part of the original XML specification
  • an XML document may have a DTD
  • terminology for XML
  • well-formed if tags are correctly closed
  • valid if it has a DTD and conforms to it
  • validation is useful in data exchange

17
DTDs as Grammars
lt!DOCTYPE paper lt!ELEMENT paper
(section)gt lt!ELEMENT section ((title,section)
text)gt lt!ELEMENT title (PCDATA)gt
lt!ELEMENT text (PCDATA)gt gt
ltpapergt ltsectiongt lttextgt lt/textgt lt/sectiongt
ltsectiongt lttitlegt lt/titlegt ltsectiongt
lt/sectiongt
ltsectiongt lt/sectiongt
lt/sectiongt lt/papergt
18
DTDs as Schemas
  • Not so well suited
  • impose unwanted constraints on order
    lt!ELEMENT person (name,phone)gt
  • references cannot be constrained
  • can be too vague
  • lt!ELEMENT person ((namephoneemail))gt

19
XML Storage
  • text file (XML)
  • store in ternary relation
  • use DTD to derive schema
  • mine data to derive schema
  • build special purpose repository (Lore)

20
XML Storage Text File
  • advantages
  • simple
  • less space than one thinks
  • reasonable clustering
  • disadvantage
  • no updates
  • require special purpose query processor

21
Store XML in Ternary Relation
o1
paper
o2
year
title
author
author
o3
o4
o5
o6


1986
Florescu, Kossman 1999
22
Use DTD to derive Schema
  • DTD
  • ODMG classes
  • Christophides et al. 1994 , Shanmugasundaram et
    al. 1999

lt!ELEMENT employee (name, address,
project)gt lt!ELEMENT address (street, city,
state, zip)gt
class Employee public type tuple (namestring,
addressAddress, projectList(Project)) class
Address public type tuple (streetstring, )
23
Mine Data to Derive Schema
Deutsch et al. 1999
24
XML and Databases (1)
  • Is XML a database?
  • In a strict sense, no.
  • In a more liberal sense, yes, but
  • XML has
  • Storage (the XML document)
  • A schema (DTD)
  • Query languages (XQL, XML-QL, )
  • Programming interfaces (SAX, DOM)
  • XML lacks
  • Efficient storage, indexes, security,
    transactions, multi-user access, triggers,
    queries across multiple documents

25
XML and Databases (2)
  • Data versus Documents
  • There are two ways to use XML in a database
    environment
  • Use XML as a data transport, i.e., to get data in
    and out of the database
  • Data is stored in a relational or object-oriented
    database
  • Middleware converts between the database and XML
  • Use a native XML database, i.e., store data in
    document form
  • Use a content management system

26
XML and Databases (3)
  • Data-centric documents
  • Fairly regular structure
  • Fine-grained data
  • Little or no mixed content
  • Order of sibling elements often not significant
  • Document-centric documents
  • Irregular structure
  • Larger-grained data
  • Lots of mixed content
  • Order of sibling elements is significant

27
XML and Databases (4)
  • Data-centric storage and retrieval systems
  • Use a database
  • Add middleware to convert to/from XML
  • Use an XML server (specialized product for
    e-commerce)
  • Use an XML-enabled web server with a database
    backend
  • Document-centric storage and retrieval systems
  • Content management system
  • Persistent DOM implementation

28
XML and Databases (5)
  • Mapping document structure to database structure
  • Template-driven
  • No predefined mapping
  • Embedded commands process (retrieve) data
  • Currently only available from RDBMS to XML
  • lt?xml version1.0gtltFlightInfogt ltIntrogtThe
    following flights have available
    seatslt/Introgt ltSelectStmtgtSELECT Airline,
    FltNumber, Depart, Arrive FROM
    Flightslt/SelectStmtgt ltConcludegtWe hope one of
    these meets your needslt/Concludegtlt/FlightInfogt

29
XML and Databases (6)
  • Template-driven - Example result
  • lt?xml version1.0gtltFlightInfogt ltIntrogtThe
    following flights have available
    seatslt/Introgt ltFlightsgt ltRowgt
    ltAirlinegtACMElt/Airlinegt ltFltNumbergt123lt/FltN
    umbergt ltDepartgtDec 12, 2000,
    1343lt/Departgt ltArrivegtDec 13, 2000,
    0121lt/Arrivegt lt/Rowgt lt/Flightsgt
    ltConcludegtWe hope one of these meets your
    needslt/Concludegtlt/FlightInfogt

30
XML and Databases (7)
  • Mapping document structure to database structure
  • Model-driven
  • A data model is imposed on the structure of the
    XML document
  • This model is mapped to the structures in the
    database
  • There are two common models
  • Model the XML document as a single table or a set
    of tables
  • Model the XML document as a tree of data-specific
    objects (good for OODBMS mapping)

31
XML and Databases (8)
  • Single table or set of tables
  • lt?xml version1.0gtltdatabasegt lttablegt
    ltrowgt ltcolumn1gt...lt/column1gt
    ltcolumn2gt...lt/column2gt ... lt/rowgt
    lt/tablegtlt/databasegt
  • Tree organization
  • Orders SalesOrder
    / \Customer Item Item
  • Part Part

32
XML and Databases (9)
  • Generating DTDs from a database schema and vice
    versa
  • Many times the DTD does not change often for an
    application and does not need to be automatically
    generated.
  • Some simple conversions are possible
  • Example DTD from relational schema
  • For each table, create an ELEMENT.
  • For each column in a table, create an attribute
    or a PCDATA-only child ELEMENT.
  • For each primary key/foreign key relationship
    in which a column of the table contributes the
    primary key, create a child ELEMENT.

33
XML and Databases (10)
  • Document-centric storage and retrieval systems
  • Content management system
  • Allows the storage of discrete content fragments,
    such as examples, procedures, chapters, as well
    as metadata such as author names, revision dates,
    etc.
  • Many content management systems are built on top
    of relational or object-oriented database
    systems.
  • Examples
  • BladeRunner (Interleaf), SigmaLink (STEP),
    Parlance Content Manager (XyEnterprise),Target
    2000 (Progressive Information Technology)
  • Persistent DOM implementation

34
Further Readings
www. w3.org/XML www-db.stanford.edu/widom www-roc
q.inria.fr/abiteboul db.cis.upenn.edu www.researc
h.att.com/suciu Abiteboul, Buneman, Suciu Data
on the Web From Relational to Semistructured to
XML Morgan Kaufmann, 1999 (appears in October)
Write a Comment
User Comments (0)
About PowerShow.com