Title: CSE 636 Data Integration
1CSE 636Data Integration
XML Schema
2XML Schemas
- W3C Recommendation http//www.w3.org/XML/Schema
- Generalizes DTDs
- Uses XML syntax
- Two documents structure and datatypes
- http//www.w3.org/TR/xmlschema-1
- http//www.w3.org/TR/xmlschema-2
- XML-Schema is very complex
- often criticized
- some alternative proposals
3XML Schemas
- lt?xml version"1.0"?gt
- ltxsdschema xmlnsxsd"http//www.w3.org/2001/XMLS
chema"gt - ltxsdelement namepaper typepapertype/gt
- ltxsdcomplexType namepapertypegt
- ltxsdsequencegt
- ltxsdelement nametitle
typexsdstring/gt - ltxsdelement nameauthor
minOccurs0/gt - ltxsdchoicegt
- ltxsdelement namejournal/gt
- ltxsdelement nameconference/gt
- lt/xsdchoicegt
- lt/xsdsequencegt
- lt/xsdcomplexTypegt
- lt/xsdschemagt
DTD lt!ELEMENT paper (title, author?,
(journalconference))gt
4XML Namespaces
- http//www.w3.org/TR/REC-xml-names (1/99)
- Solve the problem of tag name conflicts
- name prefixlocalpart
ltbook xmlnsisbnwww.isbn-org.org/defgt
lttitlegt lt/titlegt ltnumbergt 15 lt/numbergt
ltisbnnumbergt . lt/isbnnumbergt lt/bookgt
5XML Namespaces
- Syntactic ltnumbergt , ltisbnnumbergt
- Semantic provide URL for schema
lttag xmlnsmystyle http//gt
ltmystyletitlegt lt/mystyletitlegt
ltmystylenumbergt lt/taggt
6Elements vs. Types in XML Schema
ltxsdelement nameperson
typepType/gt ltxsdcomplexType namepTypegt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegtlt/xsdcom
plexTypegt
ltxsdelement namepersongt ltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement namename
typexsdstring/gt
ltxsdelement nameaddress
typexsdstring/gt lt/xsdsequencegt
lt/xsdcomplexTypegtlt/xsdelementgt
DTD lt!ELEMENT person (name,address)gt
7Elements vs. Types in XML Schema
- Types
- Simple types (integers, strings, ...)
- Complex types (regular expressions, like in DTDs)
- Element-type-element alternation
- Root element has a complex type
- That type is a regular expression of elements
- Those elements have their complex types...
- ...
- On the leaves we have simple types
8Local vs. Global Types in XML Schema
- Local type
- ltxsdelement namepersongt define
locally the persons typelt/xsdelementgt - Global typeltxsdelement nameperson
typepType/gtltxsdcomplexType namepTypegt
define here the type pTypelt/xsdcomplexTy
pegt
Global types can be reused in other elements
9Local vs. Global Elementsin XML Schema
- Local element
- ltxsdcomplexType namepTypegt
ltxsdsequencegt ltxsdelement nameaddress
type.../gt... lt/xsdsequencegtlt/xsdcomplexTy
pegt - Global elementltxsdelement nameaddress
type.../gtltxsdcomplexType namepTypegt
ltxsdsequencegt ltxsdelement refaddress/gt
... lt/xsdsequencegtlt/xsdcomplexTypegt
Global elements like in DTDs
10Regular Expressions in XML Schema
Recall the element-type-element alternation
ltxsdcomplexType name....gt
regular expression on elementslt/xsdcomplexType
gt Regular expressions ltxsdsequencegt A B C
lt/...gt A B C ltxsdchoicegt A B C lt/...gt A
B C ltxsdgroupgt A B C lt/...gt (A B
C) ltxsd minOccurs0
maxOccursunboundedgtlt/gt (...) ltxsd
minOccurs0 maxOccurs1gtlt/gt
(...)?
11Local Names in XML Schema
ltxsdelement namepersongt ltxsdcomplexTypegt
ltxsdelement namenamegt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement namefirstname
typexsdstring/gt ltxsdelement
namelastname typexsdstring/gt
lt/xsdsequencegt lt/xsdcomplexTypegt
lt/xsdelementgt lt/xsdcomplexTypegtlt/xsd
elementgt ltxsdelement nameproductgt
ltxsdcomplexTypegt ltxsdelement
namename typexsdstring/gt
lt/xsdcomplexTypegtlt/xsdelementgt
name has different meanings in person and in
product
12Subtle Use of Local Names
ltxsdelement nameA typeoneB/gt ltxsdcomplexT
ype nameonlyAsgt ltxsdchoicegt
ltxsdsequencegt ltxsdelement nameA
typeonlyAs/gt ltxsdelement nameA
typeonlyAs/gt lt/xsdsequencegt
ltxsdelement nameA typexsdstring/gt
lt/xsdchoicegtlt/xsdcomplexTypegt ltxsdcomplexType
nameoneBgt ltxsdchoicegt
ltxsdelement nameB typexsdstring/gt
ltxsdsequencegt ltxsdelement nameA
typeonlyAs/gt ltxsdelement nameA
typeoneB/gt lt/xsdsequencegt
ltxsdsequencegt ltxsdelement nameA
typeoneB/gt ltxsdelement nameA
typeonlyAs/gt lt/xsdsequencegt
lt/xsdchoicegtlt/xsdcomplexTypegt
Arbitrary deep binary tree with A elements, and
a single B element
13Attributes in XML Schema
ltxsdelement namepaper typepapertype/gt ltxsd
complexType namepapertypegt ltxsdsequencegt
ltxsdelement nametitle typexsdstring/gt
lt/xsdsequencegt ltxsdattribute
namelanguage" type"xsdNMTOKEN"
fixedEnglish"/gt lt/xsdcomplexTypegt
- Attributes are associated to the type,not to the
element - Only to complex types
- More trouble if we want to add attributes to
simple types
14Adding Attributes to Simple Types
- ltxsdelement name"B"gt
- ltxsdcomplexTypegt
- ltxsdsimpleContentgt
- ltxsdextension base"xsdstring"gt
- ltxsdattribute name"testAttr
type"xsdstring"/gt - lt/xsdextensiongt
- lt/xsdsimpleContentgt
- lt/xsdcomplexTypegt
- lt/xsdelementgt
15Mixed Content, Any Type
- Better than in DTDs can still enforce the type,
but now may have text between any elements - Means anything is permitted there
ltxsdcomplexType mixed"true"gt
ltxsdelement name"anything" type"xsdanyType"/gt
16All Group
ltxsdcomplexType name"PurchaseOrderType"gt
ltxsdallgt ltxsdelement name"shipTo"
type"USAddress"/gt ltxsdelement
name"billTo" type"USAddress"/gt
ltxsdelement ref"comment" minOccurs"0"/gt
ltxsdelement name"items" type"Items"/gt
lt/xsdallgt ltxsdattribute name"orderDate"
type"xsddate"/gt lt/xsdcomplexTypegt
- Restrictions
- Only at top level
- Has only elements
- Each element occurs at most once
- E.g. comment occurs 0 or 1 times
17Derived Types by Extensions
ltcomplexType name"Address"gt ltsequencegt
ltelement name"street" type"string"/gt
ltelement name"city" type"string"/gt
lt/sequencegt lt/complexTypegt ltcomplexType
name"USAddress"gt ltcomplexContentgt
ltextension base"ipoAddress"gt
ltsequencegt ltelement name"state"
type"ipoUSState"/gt ltelement
name"zip" type"positiveInteger"/gt
lt/sequencegt lt/extensiongt
lt/complexContentgt lt/complexTypegt
- Corresponds to inheritance
18Derived Types by Restrictions
- may restrict cardinalities, e.g. (0,infty) to
(1,1) - may restrict choices
- other restrictions
- Corresponds to set inclusion
ltcomplexContentgt ltrestriction
base"ipoItemsgt rewrite the entire
content, with restrictions lt/restrictiongt
lt/complexContentgt
19Simple Types
- string
- token
- byte
- unsignedByte
- integer
- positiveInteger
- int
- larger than integer
- unsignedInt
- long
- short
- ...
- time
- dateTime
- duration
- date
- ID
- IDREF
- IDREFS
20Facets of Simple Types
- Facets additional properties restricting a
simple type - 15 facets defined by XML Schema
- Examples
- length
- minLength
- maxLength
- pattern
- enumeration
- whiteSpace
- maxInclusive
- maxExclusive
- minInclusive
- minExclusive
- totalDigits
- fractionDigits
21Facets of Simple Types
- Can further restrict a simple type by changing
some facets - Restriction subset
22Not so Simple Types
- List types
- Union types
- Restriction types
ltxsdsimpleType name"listOfMyIntType"gt
ltxsdlist itemType"myInteger"/gt lt/xsdsimpleTypegt
ltlistOfMyIntgt20003 15037 95977 95945lt/listOfMyIntgt
23Summary of XML Schema
- Formal Expressive Power
- Can express precisely the regular tree languages
(over unranked trees) - Lots of other stuff
- Some form of inheritance
- A null value
- Large collection of data types
24References
- Lecture Slides
- Dan Suciu
- http//www.cs.washington.edu/homes/suciu/COURSES/5
90DS/12xmlschema.htm - BRICS XML Tutorial
- A. Moeller, M. Schwartzbach
- http//www.brics.dk/amoeller/XML/index.html
- W3C's XML Schema homepage
- http//www.w3.org/XML/Schema
- XML School
- http//www.w3schools.com