Managing XML and Semistructured Data - PowerPoint PPT Presentation

About This Presentation
Title:

Managing XML and Semistructured Data

Description:

DTD: !ELEMENT person (name,address) Types: Simple types (integers, strings, ... name has. different meanings. in person and. in product. Subtle Use of Local Names ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 23
Provided by: gerome
Category:

less

Transcript and Presenter's Notes

Title: Managing XML and Semistructured Data


1
Managing XML and Semistructured Data
  • Lecture 12 XML Schema

Prof. Dan Suciu
Spring 2001
2
In this lecture
  • XML Schemas
  • Elements v. Types
  • Regular expressions
  • Expressive power
  • Resources
  • W3C Draft www.w3.org/TR/2001/REC-xmlschema-1-2001
    0502

3
XML Schemas
  • http//www.w3.org/TR/xmlschema-1/10/2000
  • generalizes DTDs
  • uses XML syntax
  • two documents structure and datatypes
  • http//www.w3.org/TR/xmlschema-1
  • http//www.w3.org/TR/xmlschema-2
  • XML-Schema is very complex
  • often criticized
  • some alternative proposals

4
XML Schemas
  • ltxsdelement namepaper typepapertype/gt
  • ltxsdcomplexType namepapertypegt
  • ltxsdsequencegt
  • ltxsdelement nametitle
    typexsdstring/gt
  • ltxsdelement nameauthor
    minOccurs0/gt
  • ltxsdelement nameyear/gt
  • ltxsd choicegt lt xsdelement
    namejournal/gt
  • ltxsdelement
    nameconference/gt
  • lt/xsdchoicegt
  • lt/xsdsequencegt
  • lt/xsdelementgt

DTD lt!ELEMENT paper (title,author,year,
(journalconference))gt
5
Elements v.s. Types in XML Schema
ltxsdelement namepersongt ltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegt
lt/xsdcomplexTypegtlt/xsdelementgt
ltxsdelement nameperson
typetttgtltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegtlt/xsdco
mplexTypegt
DTD lt!ELEMENT person (name,address)gt
6
Elements v.s. Types in XML Schema
  • Types
  • Simple types (integers, strings, ...)
  • Complex types (regular expressions, like in DTDs)
  • Element-type-element alternation
  • Root element has a complex type
  • That type is a regular expression of elements
  • Those elements have their complex types...
  • ...
  • On the leaves we have simple types

7
Local and Global Types in XML Schema
  • Local type
  • ltxsdelement namepersongt
    define locally the persons type
    lt/xsdelementgt
  • Global type ltxsdelement nameperson
    typettt/gt ltxsdcomplexType nametttgt
    define here the type ttt
    lt/xsdcomplexTypegt

Global types can be reused in other elements
8
Local v.s. Global Elements inXML Schema
  • Local element
  • ltxsdcomplexType nametttgt
    ltxsdsequencegt ltxsdelement
    nameaddress type.../gt...
    lt/xsdsequencegt lt/xsdcomplexTypegt
  • Global element ltxsdelement nameaddress
    type.../gt ltxsdcomplexType nametttgt
    ltxsdsequencegt ltxsdelement
    refaddress/gt ... lt/xsdsequencegt
    lt/xsdcomplexTypegt

Global elements like in DTDs
9
Regular Expressions in XML Schema
  • Recall the element-type-element alternation
  • ltxsdcomplexType name....gt
    regular expression on
    elements lt/xsdcomplexTypegt
  • Regular expressions
  • ltxsdsequencegt A B C lt/...gt
    A B C
  • ltxsdchoicegt A B C lt/...gt
    A B C
  • ltxsdgroupgt A B C lt/...gt
    (A B C)
  • ltxsd... minOccurs0 maxOccursunboundedgt
    ..lt/...gt (...)
  • ltxsd... minOccurs0 maxOccurs1gt ..lt/...gt
    (...)?

10
Local Names in XML-Schema
ltxsdelement namepersongt ltxsdcomplexTypegt
. . . . . ltxsdelement
namenamegt ltxsdcomplexTypegt
ltxsdsequencegt
ltxsdelement namefirstname
typexsdstring/gt
ltxsdelement namelastname typexsdstring/gt
lt/xsdsequencegt
lt/xsdelementgt . . . .
lt/xsdcomplexTypegtlt/xsdelementgt ltxsdelement
nameproductgt ltxsdcomplexTypegt . .
. . . ltxsdelement namename
typexsdstring/gt lt/xsdcomplexTypegtlt/xsdel
ementgt
name has different meanings in person and in
product
11
Subtle Use of Local Names
ltxsdcomplexType nameoneBgt ltxsdchoicegt
ltxsdelement nameB typexsdstring/gt
ltxsdsequencegt ltxsdelement nameA
typeonlyAs/gt ltxsdelement nameA
typeoneB/gt lt/xsdsequencegt
ltxsdsequencegt ltxsdelement nameA
typeoneB/gt ltxsdelement nameA
typeonlyAs/gt lt/xsdsequencegt
lt/xsdchoicegtlt/xsdcomplexTypegt
ltxsdelement nameA typeoneB/gt ltxsdcomplex
Type nameonlyAsgt ltxsdchoicegt
ltxsdsequencegt ltxsdelement nameA
typeonlyAs/gt ltxsdelement nameA
typeonlyAs/gt lt/xsdsequencegt
ltxsdelement nameA typexsdstring/gt
lt/xsdchoicegtlt/xsdcomplexTypegt
Arbitrary deep binary tree with A elements, and a
single B element
12
Attributes in XML Schema
ltxsdelement namepaper typepapertype/gt ltxsd
complexType namepapertypegt
ltxsdsequencegt ltxsdelement
nametitle typexsdstring/gt . .
. . . . lt/xsdsequencegt ltxsdattribute
namelanguage" type"xsdNMTOKEN"
fixedEnglish"/gt lt/xsdcomplexTypegt
Attributes are associated to the type, not to the
element Only to complex types more trouble if we
want to add attributes to simple types.
13
Mixed Content, Any Type
ltxsdcomplexType mixed"true"gt . . . .
  • Better than in DTDs can still enforce the type,
    but now may have text between any elements
  • Means anything is permitted there

ltxsdelement name"anything" type"xsdanyType"/gt
. . . .
14
All Group
ltxsdcomplexType name"PurchaseOrderType"gt
ltxsdallgt ltxsdelement name"shipTo"
type"USAddress"/gt
ltxsdelement name"billTo" type"USAddress"/gt
ltxsdelement ref"comment"
minOccurs"0"/gt ltxsdelement
name"items" type"Items"/gt lt/xsdallgt
ltxsdattribute name"orderDate"
type"xsddate"/gt lt/xsdcomplexTypegt
  • A restricted form of in SGML
  • Restrictions
  • Only at top level
  • Has only elements
  • Each element occurs at most once
  • E.g. comment occurs 0 or 1 times

15
Derived Types by Extensions
ltcomplexType name"Address"gt ltsequencegt
ltelement name"street" type"string"/gt
ltelement name"city"
type"string"/gt lt/sequencegt lt/complexTypegt
ltcomplexType name"USAddress"gt
ltcomplexContentgt ltextension
base"ipoAddress"gt ltsequencegt ltelement
name"state" type"ipoUSState"/gt
ltelement name"zip"
type"positiveInteger"/gt lt/sequencegt
lt/extensiongt lt/complexContentgt lt/complexTypegt
Corresponds to inheritance
16
Derived Types by Restrictions
  • () may restrict cardinalities, e.g. (0,infty)
    to (1,1) may restrict choices other
    restrictions

ltcomplexContentgt ltrestriction
base"ipoItemsgt rewrite the entire
content, with restrictions...
lt/restrictiongt lt/complexContentgt
Corresponds to set inclusion
17
Simple Types
  • String
  • Token
  • Byte
  • unsignedByte
  • Integer
  • positiveInteger
  • Int (larger than integer)
  • unsignedInt
  • Long
  • Short
  • ...
  • Time
  • dateTime
  • Duration
  • Date
  • ID
  • IDREF
  • IDREFS

18
Facets of Simple Types
  • Facets additional properties restricting a
    simple type
  • 15 facets defined by XML Schema
  • Examples
  • length
  • minLength
  • maxLength
  • pattern
  • enumeration
  • whiteSpace
  • maxInclusive
  • maxExclusive
  • minInclusive
  • minExclusive
  • totalDigits
  • fractionDigits

19
Facets of Simple Types
  • Can further restrict a simple type by changing
    some facets
  • Restriction subset

20
Not so Simple Types
  • List types
  • Union types
  • Restriction types

ltxsdsimpleType name"listOfMyIntType"gt
ltxsdlist itemType"myInteger"/gt lt/xsdsimpleTypegt
ltlistOfMyIntgt20003 15037 95977 95945lt/listOfMyIntgt
21
Summary of XML Schema
  • Formal Expressive Power
  • Can express precisely the regular tree languages
    (over unranked trees)
  • Lots of other stuff
  • Some form of inheritance
  • A null value
  • Large collection of data types

22
Summary of Schemas
  • in SS data
  • graph theoretic
  • data and schema are decoupled
  • used in data processing
  • in XML
  • from grammar to object-oriented
  • schema wired with the data
  • emphasis on semantics for exchange
Write a Comment
User Comments (0)
About PowerShow.com