Title: Typage de documents XML et automates
1Typage de documents XML et automates
- DTD, XML Schema et automates darbres
- Serge Abiteboul
- Avec contributions dOmar Benjelloun
2Motivations
3Documents XML
ltagt ltbgt ltcgtblalt/cgt lt/bgt
ltbgtblalt/bgt lt/agt
4Documents XML
- Représentation sous forme darbres
a
b
b
c
bla
bla
5Typage
- Il nest pas imposé dans XML
- Données semi-structurées
- Mais
- Améliore le stockage
- Facilite la navigation dans les données
- data guide
- Facilite linterrogation
- Facilite la description/explication des données
- Aide à loptimisation
- Permet linteropérabilité entre programmes
- Permet de protéger les données
6Améliore le stockage
Lower-bound schema
Store rest in overflow graph
7Aide à loptimisation
select X.title from Bib._ X where X..zip
12345
select X.title from Bib.book X where
X.address.zip 12345
Upper-bound schema
8Extraction de schéma
- Problem statement
- given data instance D
- find some schema S for D
- In practice more specific schema may be too
large, need to relax
9Schema Extraction Sample Data
r
employee
employee
employee
employee
employee
employee
employee
employee
manages
manages
manages
manages
manages
p8
p1
p2
p3
p4
p5
p6
p7
managedby
managedby
managedby
managedby
managedby
worksfor
worksfor
worksfor
worksfor
worksfor
company
worksfor
worksfor
worksfor
c
10Lower Bound Schema Extraction
Root r
employee
company
employee
Bosses p1,p4,p6
Regulars p2,p3,p5,p7,p8
manages
managedby
worksfor
Company c
worksfor
11Upper Bound Schema Extraction Data Guides
Root r
employee
Employees p1,p1,p3,P4 p5,p6,p7,p8
company
manages
managedby
worksfor
Bosses p1,p4,p6
Regulars p2,p3,p5,p7,p8
manages
managedby
worksfor
Company c
worksfor
12Data guide
root
- Donne tous les chemins possibles dans les
données - Minimisation dautomate
13Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
employee
t1
t2
STRING
projects
14Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
t1
employee
t1
t2
STRING
projects
15Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
t1
employee
t1
t2
STRING
projects
16Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
p3
p4
p5
p6
p9
p1
p2
p7
p8
R
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
employee
t1
t2
STRING
projects
17Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
R
workson
leads
leads
workson
leads
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
R
_
employee
t1
t2
STRING
projects
18Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
R
_
employee
t1
t2
STRING
projects
19Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
R
_
employee
t1
t2
STRING
projects
20Typage darbres
21Typage darbres
- Deux grandes familles
- types/labels couplés
- types/labels découplés
22Types couplés DTD
- Deux éléments avec le même label ont le même type
- Type expression régulière sur les fils
- Exemple
- book (title,author,price)
23Exemple de DTD
- lt!ELEMENT populationdata (continent) gt
- lt!ELEMENT continent (name, country) gt
- lt!ELEMENT country (name, province)gt
- lt!ELEMENT province (name, city) gt
- lt!ELEMENT city (name, pop) gt
- lt!ELEMENT name (PCDATA) gt
- lt!ELEMENT pop (PCDATA) gt
24Avantages et inconvénients
- Validation efficace
- - Nest pas clos par union et complément
- - Pouvoir dexpression limité
25voitures
occasion
neuve
annonce
annonce
annonce
annonce
année
marque
marque
1992
Peugeot
Renault
lt!ELEMENT annonce (année?, marque) gt
On ne peut pas différencier les voitures neuves
(sans année) des voitures doccasion (avec année).
26Types découplés
- Chaque type implique un label, mais pas
linverse. - annonce1 annonce (année, marque)
- annonce2 annonce (marque)
- Plein de bonnes propriétés
XML Schema Beaucoup dautres gadgets dans
les schémas
27voitures
occasion
neuve
annonce
annonce
annonce
année
marque
marque
1992
Peugeot
Renault
occasion occasion (annonce1) neuve
neuve (annonce2) annonce1 annonce
(année, marque) annonce2 annonce (marque)
28lt?xml version"1.0"?gt ltxsdschema
targetNamespace"http//www.net-language.com"
xmlnsxsd"http//www.w3.org/2000/10/XMLSchema"
xmlns"http//www.net-language.com"
elementFormDefault"qualified"gt ltxsdelement
name"flashcards"gt ltxsdcomplexTypegt ltxsdseq
uencegt ltxsdelement ref"flashcard"
minOccurs"0" maxOccurs"unbounded"/gt lt/x
sdsequencegt lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"flashcard"gt ltxsdcomplexType
gt ltxsdsequencegt ltxsdelement
ref"question"/gt ltxsdelement
ref"answer"/gt ltxsdelement ref"author"
minOccurs"0"/gt ltxsdelement ref"comment"
minOccurs"0" maxOccurs"unbounded"/gt lt/
xsdsequencegt lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"question" type"xsdstring"/gt
ltxsdelement name"answer" type"xsdstring"/gt
ltxsdelement name"author" type"xsdstring"/gt ltx
sdelement name"comment" type"xsdstring"/gt lt/xs
dschemagt
XML avec un type particulier
29Typage
Automates darbres
30Automates sur les mots
Transitions
Alphabet
Etats
Etat initial
Etats acceptants
31Exemple
a b a a b -
a b a -
q0
q0
q0
q0
q0
q0
q0
q0
q0
q2
q0
q1
q1
q1
q1
q1
32Propriétés
- Déterminisation
- Minimisation
- Clos par union et intersection
- Complément
- Limitations
- Outil essentiel de linformatique
33Autre utilisation des automates XPATH x in
//a/b
b
(0)
a
a
a
b
a
b
x
x
b
NFA
DFA
34Exemple //a/b
b
(0)
a
(01)
a
a
b
a
b
x
x
b
NFA
DFA
35Exemple //a/b
b
(0)
a
(01)
(01)
a
a
b
a
b
x
x
b
NFA
DFA
36Exemple //a/b
b
(0)
a
(01)
(01)
a
a
b
(02)
a
b
x
x
x
b
NFA
DFA
37Exemple //a/b
b
(0)
a
(01)
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
38Exemple //a/b
b
(0)
a
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
39Exemple //a/b
b
(0)
a
(01)
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
40Exemple //a/b
b
(0)
a
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
41Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
a
b
x
x
x
b
NFA
DFA
42Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
(01)
a
b
x
x
x
b
NFA
DFA
43Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
(01)
(02)
a
b
x
x
x
b
x
NFA
DFA
44Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
(01)
a
b
x
x
x
b
x
NFA
DFA
45Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
a
b
x
x
x
b
x
NFA
DFA
46Exemple //a/b
b
(0)
a
(01)
a
a
b
x
a
b
x
x
x
b
x
NFA
DFA
47Exemple //a/b
b
(0)
a
a
a
b
x
a
b
x
x
x
b
x
NFA
DFA
48Taille de lautomate déterministe
//a///b
49Automates darbres
- Cas 1
- arbres de rang borné
- Exemple
- arbres binaires
50Automates darbres binaires
- Changement principal
- Pour les feuilles
- Pour les nuds
-
51Exemple
52Évaluation dun circuit booléen
OK
53Rang Borné Propriétés
- Déterminisation
- Minimisation
- Clôture par
- Complément
- Intersection/Union
Comme pour les mots
Oui, mais plus dur
Oui
54Mais
- Les documents XML correspondent à des arbres de
rang arbitraire - book (intro,section,conclusion)
Cas 2 rang non borné
55Automates darbres non bornés
- Changement principal
- accepte si pour chaque feuille on obtient un état
qi, tel que
56Exemple
57Évaluation dun circuit booléen
v
v
v
v
0
1
0
v
1
v
0
1
1
v
1
1
v
1
1
0
1
58Propriétés
CA NE MARCHE PAS TOUJOURS
Lévaluation de haut en bas est délicate.
59Propriétés
Déterminisation toujours possible
Evaluation de bas en haut .
60Propriétés
- Déterminisation
- Oui, si évaluation de bas en haut
- Minimisation Oui
- Clôture
- par complément
- par intersection/union
Oui
61DTD -gt automateXML schema -gt automate
Evaluation de haut en bas déterministe
62Exemple avec XML schema
- racine voitures (occases, neuves)
- occases occasion (annonce1)
- neuves neuve (annonce2)
- annonce1 annonce (année, marque)
- annonce2 annonce (marque)
63XML Schema
- Beaucoup critiqué
- Pas encore standardisé
- Nombreux concurrents
64Objectifs
- Éviter l'utilisation d'outils spéciaux pour créer
et maintenir les DTDs - XML Schema doit permettre la définition de
données plus riches - Les schémas devront être extensibles
- Voir
- http//www.w3schools.com/schema/
65XML Schéma
- Syntaxe XML
- Comme automates darbres types découplés
- ltelement name"annonce" type"annonce1"/gt
- Des fonctionnalités en plus
- Gestion des namespaces
- Types atomiques
- Contraintes dintégrité
- etc.
66Elements simples
- ltxselement name"xxx" type"yyy"/gt
- Example
- ltlastnamegtRefsneslt/lastnamegt
- ltagegt34lt/agegt
- ltdateborngt1968-03-27lt/dateborngt
- ltxselement name"lastname" type"xsstring"/gt
- ltxselement name"age" type"xsinteger"/gt
- ltxselement name"dateborn" type"xsdate"/gt
- Most common types xsstring xsdecimal
xsinteger xsboolean xsdate xstime
67Attributs
- ltxsattribute name"xxx" type"yyy"/gt
- Example
- ltlastname lang"EN"gtSmithlt/lastnamegt
- ltxsattribute name"lang" type"xsstring"/gt
- Most common types xsstring xsdecimal
xsinteger xsboolean xsdate xstime
68Restrictions
- ltxselement name"age"gt
- ltxssimpleTypegt
- ltxsrestriction base"xsinteger"gt
ltxsminInclusive value"0"/gt - ltxsmaxInclusive value"100"/gt
- lt/xsrestrictiongt
- lt/xssimpleTypegt
- lt/xselementgt
- Other restrictions enumerated types, patterns,
etc.
69Complex elements 4 kinds
- A complex XML element, "product", which is empty
- ltproduct pid"1345"/gt
- A complex XML element,which contains only other
elements - ltemployeegt ltfirstnamegtJohnlt/firstnamegt
ltlastnamegtSmithlt/lastnamegt lt/employeegt - A complex XML element which contains only text
- ltfood type"dessert"gtIce creamlt/foodgt
- A complex XML element which contains both
elements and text - ltdescriptiongt It happened on ltdate
lang"norwegian"gt 03.03.99lt/dategt ....
lt/descriptiongt
70Example elements only
- ltpersongt
- ltfirstnamegtJohnlt/firstnamegt ltlastnamegtSmithlt/last
namegt - lt/persongt
- ltxselement name"person"gt
- ltxscomplexTypegt
- ltxssequencegt
- ltxselement name"firstname" type"xsstring"/gt
ltxselement name"lastname" type"xsstring"/gt
lt/xssequencegt - lt/xscomplexTypegt
- lt/xselementgt
71Autres gadgets dans XML Schema
- Possibilité d'importer les types associés à un
espace de noms - ltimport nameSpace "http// ..."
- schemaLocation "http// ..."
/gt - Possibilité dinclure ou d'étendre un schéma
- ltinclude schemaLocation"http// ..."/gt
- ltredefine schemaLocation"http// ..."/gt
- .... Extensions ...
- lt/redefinegt
72Types de donnés nommés
- ltxselement name"employee"gt
- ltxscomplexTypegt ltxssequencegt ltxselement
name"firstname" type"xsstring"/gt ltxselement
name"lastname" type"xsstring"/gt lt/xssequencegt
- lt/xscomplexTypegt
- lt/xselementgt
- Only the "employee" element can use the specified
complex type - (ltsequencegt indicates an order on child
elements)
- Alternative
- ltxselement name"employee" type"personinfo" /gt
- ltxscomplexType name"personinfo"gt
- ltxssequencegt ltxselement name"firstname"
type"xsstring"/gt ltxselement name"lastname"
type"xsstring"/gt lt/xssequencegt - lt/xscomplexTypegt
73Exemple une DTD
lt?xml version"1.0"?gt lt!ELEMENT EMAIL (TO,
FROM, CC, BCC, SUBJECT?, BODY?)gt lt!ATTLIST
EMAIL LANGUAGE (WesternGreekLatinUniversal)
"Western" ENCRYPTED CDATA IMPLIED PRIORITY
(NORMALLOWHIGH) "NORMAL"gt lt!ELEMENT TO
(PCDATA)gt lt!ELEMENT FROM (PCDATA)gt lt!ELEMENT CC
(PCDATA)gt lt!ELEMENT BCC (PCDATA)gt lt!ATTLIST
BCC HIDDEN CDATA FIXED "TRUE"gt lt!ELEMENT
SUBJECT (PCDATA)gt lt!ELEMENT BODY
(PCDATA)gt lt!ENTITY SIGNATURE "Bill"gt
74en XML Schema
lt?xml version"1.0" ?gt ltSchema name"email"
xmlns"urnschemas-microsoft-comxml-data"
xmlnsdt"urnschemas-microsoft-co
mdatatypes"gt ltAttributeType name"language"
dttype"enumeration"
dtvalues"Western Greek Latin Universal" /gt
ltAttributeType name"encrypted" /gt
ltAttributeType name"priority" dttype"enumeratio
n" dtvalues"NORMAL LOW HIGH" /gt
ltAttributeType name"hidden" default"true" /gt
ltElementType name"to" content"textOnly" /gt
ltElementType name"from" content"textOnly" /gt
ltElementType name"cc" content"textOnly" /gt
ltElementType name"bcc" content"mixed"gt
ltattribute type"hidden" required"yes" /gt
lt/ElementTypegt ltElementType name"subject"
content"textOnly" /gt ltElementType name"body"
content"textOnly" /gt ltElementType
name"email" content"eltOnly"gt ltattribute
type"language" default"Western" /gt
ltattribute type"encrypted" /gt ltattribute
type"priority" default"NORMAL" /gt ltelement
type"to" minOccurs"1" maxOccurs"" /gt
ltelement type"from" minOccurs"1" maxOccurs"1"
/gt ltelement type"cc" minOccurs"0"
maxOccurs"" /gt ltelement type"bcc"
minOccurs"0" maxOccurs"" /gt ltelement
type"subject" minOccurs"0" maxOccurs"1" /gt
ltelement type"body" minOccurs"0" maxOccurs"1"
/gt lt/ElementTypegt lt/Schemagt
75lt?xml version"1.0" encoding"utf-8"?gt
ltxsschema xmlnsxs"http//www.w3.org/2001/XMLSch
ema" targetNamespace"http//www.net-
language.com"gt ltxselement name"book"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"title" type"xsstring"/gt
ltxselement name"author"
type"xsstring"/gt ltxselement
name"character"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"name"
type"xsstring"/gt ltxselement
name"friend-of" type"xsstring"
minOccurs"0" maxOccurs"unbounded"/gt
ltxselement name"since"
type"xsdate"/gt ltxselement
name"qualification" type"xsstring"/gt
lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt ltxsattribute
name"isbn" type"xsstring"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xsschemagt
76XML schema
77Typage
78Vérification de type
- Qui vérifie?
- Éditeurs XML
- Échanges de données (départ et arrivée)
- Vérification dynamique quand on a généré les
données - Vérification statique vérification du programme
qui construit les données - application requête Xquery transformation XSLT
79Vérification et inférence de type
- Input schéma T dentrée fonction f
- Vérification étant donné T, est-ce que pour
tout dT, f(T)T ? - Inférence trouver le plus petit T tel que pour
tout dT, f(T)T ? - Cas particulier f(any)
- Indécidable en général à cause des jointures
80Exemple
- for p in doc("parts.xml)//partcolorred"
- return ltpiècegt
- ltnomgtp/namelt/nomgt
- ltdescgtp/desclt/descgt
- lt/piècegt
- Type du résultat
- (pièce (nom (string) desc (any) )
- Si le type de parts.xml//part/desc is string
- (pièce (nom (string) desc (string) )
81Illustration de la difficulté
- for X in Input, Y in Input do print ( ltb/gt
- Input lta/gt lta/gt
- Résultat ltb/gt ltb/gt ltb/gt ltb/gt
- Problème bi ? in2 pour n 0 nest pas un
type - Pas de meilleurs type
- b
- ? b2 b2b
- ? b2 b4 b4b
82Des automates pour calculer
83Proposal k-pebble transducers
stack
milo,suciu,vianu
84k-pebble transducers result
Capture a core aspect of Xquery but not the data
management part
85Plein dautres utilisations des types
possibleInformation incomplète
86Scenario
- Information continuously enriched
by successive queries to XML sources - Need to
- -represent incomplete information
- -intelligently answer queries
- using incomplete information
87Wish list
- Intuitive representation system
extension of DTDs - Efficient incremental maintenance
through consecutive queries
T0
Input DTD
88Wish list
- Intuitive representation system
graceful extension of DTDs - Efficient incremental maintenance
through consecutive queries
T0
Input DTD
(q1 , A1)
89Wish list
- Intuitive representation system
graceful extension of DTDs - Efficient incremental maintenance
through consecutive queries
T0 T1
Input DTD
(q1 , A1)
90Wish list
- Intuitive representation system
graceful extension of DTDs - Efficient incremental maintenance
through consecutive queries
T0 T1
Input DTD
(q1 , A1) (q2 , A2)
91Wish list
- Intuitive representation system
graceful extension of DTDs - Efficient incremental maintenance
through consecutive queries
T0 T1 T2
Input DTD
(q1 , A1) (q2 , A2)
92Wish list
- Intuitive representation system
graceful extension of DTDs - Efficient incremental maintenance
through consecutive queries
T0 T1 T2
Input DTD
(q1 , A1) (q2 , A2) (q3 , A3)
93Wish list
- Intuitive representation system
graceful extension of DTDs - Efficient incremental maintenance
through consecutive queries
T0 T1 T2
T3
Input DTD
(q1 , A1) (q2 , A2) (q3 , A3)
94Wish list
- Intuitive representation system
graceful extension of DTDs - Efficient incremental maintenance
through consecutive queries
T0 T1 T2
T3
Input DTD
(q1 , A1) (q2 , A2) (q3 , A3)
(qk , Ak)
95Wish list
- Intuitive representation system
graceful extension of DTDs - Efficient incremental maintenance
through consecutive queries
T0 T1 T2
T3 Tk
Input DTD
(q1 , A1) (q2 , A2) (q3 , A3)
(qk , Ak)
rep(Tk) rep(T0) ? q1-1(A1) ? ? qk-1(Ak)
96Querying incomplete XML docs
- Using just available information
- compute description of all answers
97Querying incomplete XML docs
- Using just available information
- compute description of all answers
rep(T)
rep
T
98Querying incomplete XML docs
- Using just available information
- compute description of all answers
q
rep(T) q(rep(T))
rep
T
99Querying incomplete XML docs
- Using just available information
- compute description of all answers
q
rep(T) q(rep(T))
rep
T q(T)
q
100Querying incomplete XML docs
- Using just available information
- compute description of all answers
q
rep(T) q(rep(T)) rep(q(T))
rep
rep
T q(T)
q
Need strong representation system wrt
query language
101- Answer queries using the
- sure and possible modalities
-
- -is t surely true in all answers to q?
-is t possibly true
in some answer to q? -
-
Example -is t a prefix of all trees in
q(rep(T)) ? -is t a prefix of some tree in
q(rep(T)) ?
102- Decide if available info is enough
- to fully answer query
-
- - similar to answering queries using views
- If not, seek additional information
- - mediator problem find minimal set of
additional queries to sources needed to
fully answer query
- - use representation of incomplete info to
- guide mediator
-
103Challenge balance expressiveness and
tractability!
- Parameters choice of
- XML document types (DTDs)
- Query language
- Representation system
- Proposal of Abiteboul Segoufin Vianu - PODS
2001 -
- -simple, practically appealing
- -many limitations
- -justification
- extra features lead to serious problems!
-
104- XML abstraction
- unranked trees with labels and values
- simplified DTD
- tree type unordered, simple cardinality
- constraints
catalog product
1 exactly one at least one unrestricted ?
zero or one
1
1
1
name price category picture
?
subcat
105- Query language
- prefix-selection queries (ps-queries)
catalog product
name price category picture
subcat
106- Query language
- prefix-selection queries (ps-queries)
catalog product
No branching siblings with same label
name price category picture
catalog
subcat
product product
subcat picture
107- Query language
- prefix-selection queries (ps-queries)
catalog product
name price category picture
subcat
108- Query language
- prefix-selection queries (ps-queries)
catalog product
name price category picture
subcat
109- Query language
- prefix-selection queries (ps-queries)
catalog product
name price category picture
subcat
110- Query language
- prefix-selection queries (ps-queries)
catalog product
name price category picture
lt200 elec
subcat
Find electronics products with price lt 200 and
without pictures (display all info about their
name, the price, the category and subcategory)
Important assumption persistent node
ids! (queries return actual nodes from the input
can join answers from consecutive queries)
111Example
Source DTD (tree type)
Answer to Query 1
catalog
catalog product
1
product product
product
1
1
name price cat picture
1
Canon 120 elec Nikon 199 elec
Sony 175 elec
subcat
camera camera
cd-player
Query 1
catalog product
name pricelt200 catelec
subcat
112Incomplete information after Query 1
catalog
product product
product
Canon 120 elec Nikon 199 elec
Sony 175 elec
camera camera
cd-player
known information prefix data tree
113Incomplete information after Query 1
catalog
product product
product
product1 product2
Canon 120 elec Nikon 199 elec
Sony 175 elec
name price cat picture name price cat
picture
?elec
?200
camera camera
cd-player
subcat subcat
missing information
known information prefix data tree
114Incomplete information after Query 1
catalog
product product
product
product1 product2
Canon 120 elec Nikon 199 elec
Sony 175 elec
name price cat picture name price cat
picture
?elec
?200
camera camera
cd-player
subcat subcat
missing information extended tree
type --conditions on data values --specialization
known information prefix data tree
115Incomplete information after Query 1
catalog
product product
product
product1 product2
Canon 120 elec Nikon 199 elec
Sony 175 elec
name price cat picture name price cat
picture
?elec
?200 elec
camera camera
cd-player
subcat subcat
missing information extended tree
type --conditions on data values --specialization
known information prefix data tree
Incomplete tree
116Incomplete tree
.
117 Query 2
Answer to Query 2
catalog product
catalog
product
product
name catelec picture
Canon elec c.jpg
Olympus elec o.jpg
subcatcamera
camera
camera
118Incomplete tree after Query 2
catalog
product product3
product product2a
product1 product2b product2c
Canon 120 elec c.jpg Nikon 199 elec
Sony 175 elec Olympus elec o.jpg
camera camera
cd-player camera
product2b
product2c
product3 product2a
name price cat name price
cat picture
name price cat picture
name price cat
?200
elec
elec
elec
elec
?200
?200
subcat camera
subcatcamera
subcatcamera
subcat ?camera
119Suppose next query is Query 3 find the name,
price and pictures of all cameras costing
less than 200 and having at least one picture.
catalog
product
name price cat picture
lt200
elec
subcat camera
Can be fully answered using available information
120Query 4 find all cameras
catalog product
name catelec
subcatcamera
- Using available information can
--provide the complete list of cameras that are
less than 200 or have a picture --tell the
user that there may be more cameras (that are
expensive and have no pictures).
- Can fully answer query by asking
- the extra query
- Query 5 find the cameras
- that cost at least 200 and have no picture.
121Conclusion
- Le typage de documents XML est primordial
- Pour la navigation
- Pour linterrogation/loptimisation
- Pour léchange entre applications
- Pour la protection des données
- DTDs système de type simple dutilisation mais
limité - XML Schema plus expressif, peut-être trop riche
en fonctionnalités - Les automates darbres fournissent un bon niveau
dabstraction et une bonne compréhension du
processus de validation. - Dautres types dautomates peuvent aussi être
utilisés, basés sur un parcours séquentiel unique
du document (pushdown tree automata)
122Merci