Title: Managing XML and Semistructured Data
1Managing XML and Semistructured Data
Prof. Dan Suciu
Spring 2001
2In this lecture
- XML Schemas
- Elements v. Types
- Regular expressions
- Expressive power
- Resources
- W3C Draft www.w3.org/TR/2001/REC-xmlschema-1-2001
0502
3XML Schemas
- http//www.w3.org/TR/xmlschema-1/10/2000
- generalizes DTDs
- uses XML syntax
- two documents structure and datatypes
- http//www.w3.org/TR/xmlschema-1
- http//www.w3.org/TR/xmlschema-2
- XML-Schema is very complex
- often criticized
- some alternative proposals
4XML Schemas
- ltxsdelement namepaper typepapertype/gt
- ltxsdcomplexType namepapertypegt
- ltxsdsequencegt
- ltxsdelement nametitle
typexsdstring/gt - ltxsdelement nameauthor
minOccurs0/gt - ltxsdelement nameyear/gt
- ltxsd choicegt lt xsdelement
namejournal/gt - ltxsdelement
nameconference/gt - lt/xsdchoicegt
- lt/xsdsequencegt
- lt/xsdelementgt
DTD lt!ELEMENT paper (title,author,year,
(journalconference))gt
5Elements v.s. Types in XML Schema
ltxsdelement namepersongt ltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegt
lt/xsdcomplexTypegtlt/xsdelementgt
ltxsdelement nameperson
typetttgtltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegtlt/xsdco
mplexTypegt
DTD lt!ELEMENT person (name,address)gt
6Elements v.s. Types in XML Schema
- Types
- Simple types (integers, strings, ...)
- Complex types (regular expressions, like in DTDs)
- Element-type-element alternation
- Root element has a complex type
- That type is a regular expression of elements
- Those elements have their complex types...
- ...
- On the leaves we have simple types
7Local and Global Types in XML Schema
- Local type
- ltxsdelement namepersongt
define locally the persons type
lt/xsdelementgt - Global type ltxsdelement nameperson
typettt/gt ltxsdcomplexType nametttgt
define here the type ttt
lt/xsdcomplexTypegt
Global types can be reused in other elements
8Local v.s. Global Elements inXML Schema
- Local element
- ltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement
nameaddress type.../gt...
lt/xsdsequencegt lt/xsdcomplexTypegt - Global element ltxsdelement nameaddress
type.../gt ltxsdcomplexType nametttgt
ltxsdsequencegt ltxsdelement
refaddress/gt ... lt/xsdsequencegt
lt/xsdcomplexTypegt
Global elements like in DTDs
9Regular Expressions in XML Schema
- Recall the element-type-element alternation
- ltxsdcomplexType name....gt
regular expression on
elements lt/xsdcomplexTypegt - Regular expressions
- ltxsdsequencegt A B C lt/...gt
A B C - ltxsdchoicegt A B C lt/...gt
A B C - ltxsdgroupgt A B C lt/...gt
(A B C) - ltxsd... minOccurs0 maxOccursunboundedgt
..lt/...gt (...) - ltxsd... minOccurs0 maxOccurs1gt ..lt/...gt
(...)?
10Local Names in XML-Schema
ltxsdelement namepersongt ltxsdcomplexTypegt
. . . . . ltxsdelement
namenamegt ltxsdcomplexTypegt
ltxsdsequencegt
ltxsdelement namefirstname
typexsdstring/gt
ltxsdelement namelastname typexsdstring/gt
lt/xsdsequencegt
lt/xsdelementgt . . . .
lt/xsdcomplexTypegtlt/xsdelementgt ltxsdelement
nameproductgt ltxsdcomplexTypegt . .
. . . ltxsdelement namename
typexsdstring/gt lt/xsdcomplexTypegtlt/xsdel
ementgt
name has different meanings in person and in
product
11Subtle Use of Local Names
ltxsdcomplexType nameoneBgt ltxsdchoicegt
ltxsdelement nameB typexsdstring/gt
ltxsdsequencegt ltxsdelement nameA
typeonlyAs/gt ltxsdelement nameA
typeoneB/gt lt/xsdsequencegt
ltxsdsequencegt ltxsdelement nameA
typeoneB/gt ltxsdelement nameA
typeonlyAs/gt lt/xsdsequencegt
lt/xsdchoicegtlt/xsdcomplexTypegt
ltxsdelement nameA typeoneB/gt ltxsdcomplex
Type nameonlyAsgt ltxsdchoicegt
ltxsdsequencegt ltxsdelement nameA
typeonlyAs/gt ltxsdelement nameA
typeonlyAs/gt lt/xsdsequencegt
ltxsdelement nameA typexsdstring/gt
lt/xsdchoicegtlt/xsdcomplexTypegt
Arbitrary deep binary tree with A elements, and a
single B element
12Attributes in XML Schema
ltxsdelement namepaper typepapertype/gt ltxsd
complexType namepapertypegt
ltxsdsequencegt ltxsdelement
nametitle typexsdstring/gt . .
. . . . lt/xsdsequencegt ltxsdattribute
namelanguage" type"xsdNMTOKEN"
fixedEnglish"/gt lt/xsdcomplexTypegt
Attributes are associated to the type, not to the
element Only to complex types more trouble if we
want to add attributes to simple types.
13Mixed Content, Any Type
ltxsdcomplexType mixed"true"gt . . . .
- Better than in DTDs can still enforce the type,
but now may have text between any elements - Means anything is permitted there
ltxsdelement name"anything" type"xsdanyType"/gt
. . . .
14All Group
ltxsdcomplexType name"PurchaseOrderType"gt
ltxsdallgt ltxsdelement name"shipTo"
type"USAddress"/gt
ltxsdelement name"billTo" type"USAddress"/gt
ltxsdelement ref"comment"
minOccurs"0"/gt ltxsdelement
name"items" type"Items"/gt lt/xsdallgt
ltxsdattribute name"orderDate"
type"xsddate"/gt lt/xsdcomplexTypegt
- A restricted form of in SGML
- Restrictions
- Only at top level
- Has only elements
- Each element occurs at most once
- E.g. comment occurs 0 or 1 times
15Derived Types by Extensions
ltcomplexType name"Address"gt ltsequencegt
ltelement name"street" type"string"/gt
ltelement name"city"
type"string"/gt lt/sequencegt lt/complexTypegt
ltcomplexType name"USAddress"gt
ltcomplexContentgt ltextension
base"ipoAddress"gt ltsequencegt ltelement
name"state" type"ipoUSState"/gt
ltelement name"zip"
type"positiveInteger"/gt lt/sequencegt
lt/extensiongt lt/complexContentgt lt/complexTypegt
Corresponds to inheritance
16Derived Types by Restrictions
- () may restrict cardinalities, e.g. (0,infty)
to (1,1) may restrict choices other
restrictions
ltcomplexContentgt ltrestriction
base"ipoItemsgt rewrite the entire
content, with restrictions...
lt/restrictiongt lt/complexContentgt
Corresponds to set inclusion
17Simple Types
- String
- Token
- Byte
- unsignedByte
- Integer
- positiveInteger
- Int (larger than integer)
- unsignedInt
- Long
- Short
- ...
- Time
- dateTime
- Duration
- Date
- ID
- IDREF
- IDREFS
18Facets of Simple Types
- Facets additional properties restricting a
simple type - 15 facets defined by XML Schema
- Examples
- length
- minLength
- maxLength
- pattern
- enumeration
- whiteSpace
- maxInclusive
- maxExclusive
- minInclusive
- minExclusive
- totalDigits
- fractionDigits
19Facets of Simple Types
- Can further restrict a simple type by changing
some facets - Restriction subset
20Not so Simple Types
- List types
- Union types
- Restriction types
ltxsdsimpleType name"listOfMyIntType"gt
ltxsdlist itemType"myInteger"/gt lt/xsdsimpleTypegt
ltlistOfMyIntgt20003 15037 95977 95945lt/listOfMyIntgt
21Summary of XML Schema
- Formal Expressive Power
- Can express precisely the regular tree languages
(over unranked trees) - Lots of other stuff
- Some form of inheritance
- A null value
- Large collection of data types
22Summary of Schemas
- in SS data
- graph theoretic
- data and schema are decoupled
- used in data processing
- in XML
- from grammar to object-oriented
- schema wired with the data
- emphasis on semantics for exchange