Typage de documents XML et automates - PowerPoint PPT Presentation

1 / 122
About This Presentation
Title:

Typage de documents XML et automates

Description:

address. name. managed-by. name. Company. Employee. Store rest in overflow ... address. author. title. zip. city. street. last. name. first. name. string. string. string. string ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 123
Provided by: wwwroc
Category:

less

Transcript and Presenter's Notes

Title: Typage de documents XML et automates


1
Typage de documents XML et automates
  • DTD, XML Schema et automates darbres
  • Serge Abiteboul
  • Avec contributions dOmar Benjelloun

2
Motivations
3
Documents XML
  • Représentation textuelle

ltagt ltbgt ltcgtblalt/cgt lt/bgt
ltbgtblalt/bgt lt/agt
4
Documents XML
  • Représentation sous forme darbres

a
b
b
c
bla
bla
5
Typage
  • Il nest pas imposé dans XML
  • Données  semi-structurées 
  • Mais
  • Améliore le stockage
  • Facilite la navigation dans les données
  • data guide
  • Facilite linterrogation
  • Facilite la description/explication des données
  • Aide à loptimisation
  • Permet linteropérabilité entre programmes
  • Permet de protéger les données

6
Améliore le stockage
Lower-bound schema
Store rest in overflow graph
7
Aide à loptimisation
select X.title from Bib._ X where X..zip
12345
select X.title from Bib.book X where
X.address.zip 12345
Upper-bound schema
8
Extraction de schéma
  • Problem statement
  • given data instance D
  • find some schema S for D
  • In practice more specific schema may be too
    large, need to relax

9
Schema Extraction Sample Data
r
employee
employee
employee
employee
employee
employee
employee
employee
manages
manages
manages
manages
manages
p8
p1
p2
p3
p4
p5
p6
p7
managedby
managedby
managedby
managedby
managedby
worksfor
worksfor
worksfor
worksfor
worksfor
company
worksfor
worksfor
worksfor
c
10
Lower Bound Schema Extraction
Root r
employee
company
employee
Bosses p1,p4,p6
Regulars p2,p3,p5,p7,p8
manages
managedby
worksfor
Company c
worksfor
11
Upper Bound Schema Extraction Data Guides
Root r
employee
Employees p1,p1,p3,P4 p5,p6,p7,p8
company
manages
managedby
worksfor
Bosses p1,p4,p6
Regulars p2,p3,p5,p7,p8
manages
managedby
worksfor
Company c
worksfor
12
Data guide
root
  • Donne tous les chemins possibles dans les
    données
  • Minimisation dautomate

13
Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
employee
t1
t2
STRING
projects
14
Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
t1
employee
t1
t2
STRING
projects
15
Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
t1
employee
t1
t2
STRING
projects
16
Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
p3
p4
p5
p6
p9
p1
p2
p7
p8
R
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
_
employee
t1
t2
STRING
projects
17
Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
R
workson
leads
leads
workson
leads
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
R
_
employee
t1
t2
STRING
projects
18
Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
R
_
employee
t1
t2
STRING
projects
19
Simulation de graphe
root
programmer
statistician
c1
c2
employee
employee
employee
project
e2
e3
e4
e1
workson
workson
leads
workson
workson
workson
consults
consults
workson
leads
leads
workson
leads
R
p3
p4
p5
p6
p9
p1
p2
p7
p8
"exercise"
"lecture"
"finance"
"adminstr."
"PR"
"undergrad"
"grad"
"postgrad"
"web"
programmer statistician
R
_
employee
t1
t2
STRING
projects
20
Typage darbres
21
Typage darbres
  • Deux grandes familles
  • types/labels couplés
  • types/labels découplés

22
Types couplés DTD
  • Deux éléments avec le même label ont le même type
  • Type expression régulière sur les  fils 
  • Exemple
  • book (title,author,price)

23
Exemple de DTD
  • lt!ELEMENT populationdata (continent) gt
  • lt!ELEMENT continent (name, country) gt
  • lt!ELEMENT country (name, province)gt
  • lt!ELEMENT province (name, city) gt
  • lt!ELEMENT city (name, pop) gt
  • lt!ELEMENT name (PCDATA) gt
  • lt!ELEMENT pop (PCDATA) gt

24
Avantages et inconvénients
  • Validation efficace
  • - Nest pas clos par union et complément
  • - Pouvoir dexpression limité

25
voitures
occasion
neuve


annonce
annonce
annonce
annonce
année
marque
marque
1992
Peugeot
Renault
lt!ELEMENT annonce (année?, marque) gt
On ne peut pas différencier les voitures neuves
(sans année) des voitures doccasion (avec année).
26
Types découplés
  • Chaque type implique un label, mais pas
    linverse.
  • annonce1 annonce (année, marque)
  • annonce2 annonce (marque)
  • Plein de bonnes propriétés

XML Schema Beaucoup dautres gadgets dans
les schémas
27
voitures
occasion
neuve

annonce
annonce
annonce

année
marque
marque
1992
Peugeot
Renault
occasion occasion (annonce1) neuve
neuve (annonce2) annonce1 annonce
(année, marque) annonce2 annonce (marque)
28
lt?xml version"1.0"?gt ltxsdschema
targetNamespace"http//www.net-language.com"
xmlnsxsd"http//www.w3.org/2000/10/XMLSchema"
xmlns"http//www.net-language.com"
elementFormDefault"qualified"gt ltxsdelement
name"flashcards"gt ltxsdcomplexTypegt ltxsdseq
uencegt ltxsdelement ref"flashcard"
minOccurs"0" maxOccurs"unbounded"/gt lt/x
sdsequencegt lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"flashcard"gt ltxsdcomplexType
gt ltxsdsequencegt ltxsdelement
ref"question"/gt ltxsdelement
ref"answer"/gt ltxsdelement ref"author"
minOccurs"0"/gt ltxsdelement ref"comment"
minOccurs"0" maxOccurs"unbounded"/gt lt/
xsdsequencegt lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"question" type"xsdstring"/gt
ltxsdelement name"answer" type"xsdstring"/gt
ltxsdelement name"author" type"xsdstring"/gt ltx
sdelement name"comment" type"xsdstring"/gt lt/xs
dschemagt
XML avec un type particulier
29
Typage
  • Comment ça marche ?

Automates darbres
30
Automates sur les mots
Transitions
Alphabet
Etats
Etat initial
Etats acceptants
31
Exemple
a b a a b -
a b a -
q0
q0
q0
q0
q0
q0
q0
q0
q0
q2
q0
q1
q1
q1
q1
q1
32
Propriétés
  • Déterminisation
  • Minimisation
  • Clos par union et intersection
  • Complément
  • Limitations
  • Outil essentiel de linformatique

33
Autre utilisation des automates XPATH x in
//a/b
b
(0)
a
a
a
b
a
b
x
x
b
NFA
DFA
34
Exemple //a/b
b
(0)
a
(01)
a
a
b
a
b
x
x
b
NFA
DFA
35
Exemple //a/b
b
(0)
a
(01)
(01)
a
a
b
a
b
x
x
b
NFA
DFA
36
Exemple //a/b
b
(0)
a
(01)
(01)
a
a
b
(02)
a
b
x
x
x
b
NFA
DFA
37
Exemple //a/b
b
(0)
a
(01)
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
38
Exemple //a/b
b
(0)
a
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
39
Exemple //a/b
b
(0)
a
(01)
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
40
Exemple //a/b
b
(0)
a
(01)
a
a
b
a
b
x
x
x
b
NFA
DFA
41
Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
a
b
x
x
x
b
NFA
DFA
42
Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
(01)
a
b
x
x
x
b
NFA
DFA
43
Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
(01)
(02)
a
b
x
x
x
b
x
NFA
DFA
44
Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
(01)
a
b
x
x
x
b
x
NFA
DFA
45
Exemple //a/b
b
(0)
a
(01)
(02)
a
a
b
x
a
b
x
x
x
b
x
NFA
DFA
46
Exemple //a/b
b
(0)
a
(01)
a
a
b
x
a
b
x
x
x
b
x
NFA
DFA
47
Exemple //a/b
b
(0)
a
a
a
b
x
a
b
x
x
x
b
x
NFA
DFA
48
Taille de lautomate déterministe
//a///b
49
Automates darbres
  • Cas 1
  • arbres de rang borné
  • Exemple
  • arbres binaires

50
Automates darbres binaires
  • Changement principal
  • Pour les feuilles
  • Pour les nuds

51
Exemple
52
Évaluation dun circuit booléen
OK
53
Rang Borné Propriétés
  • Déterminisation
  • Minimisation
  • Clôture par
  • Complément
  • Intersection/Union

Comme pour les mots
Oui, mais plus dur
Oui
54
Mais
  • Les documents XML correspondent à des arbres de
    rang arbitraire
  • book (intro,section,conclusion)

Cas 2 rang non borné
55
Automates darbres non bornés
  • Changement principal
  • accepte si pour chaque feuille on obtient un état
    qi, tel que

56
Exemple
57
Évaluation dun circuit booléen
v
v
v
v
0
1
0
v
1
v
0
1
1
v
1
1
v
1
1
0
1
58
Propriétés
  • Déterminisation

CA NE MARCHE PAS TOUJOURS
Lévaluation  de haut en bas  est délicate.
59
Propriétés
  • Déterminisation (bis)

Déterminisation toujours possible
Evaluation  de bas en haut .
60
Propriétés
  • Déterminisation
  • Oui, si évaluation  de bas en haut 
  • Minimisation Oui
  • Clôture
  • par complément
  • par intersection/union

Oui
61
DTD -gt automateXML schema -gt automate
Evaluation  de haut en bas  déterministe
62
Exemple avec XML schema
  • racine voitures (occases, neuves)
  • occases occasion (annonce1)
  • neuves neuve (annonce2)
  • annonce1 annonce (année, marque)
  • annonce2 annonce (marque)

63
XML Schema
  • Beaucoup critiqué
  • Pas encore standardisé
  • Nombreux concurrents

64
Objectifs
  • Éviter l'utilisation d'outils spéciaux pour créer
    et maintenir les DTDs
  • XML Schema doit permettre la définition de
    données plus riches
  • Les schémas devront être extensibles
  • Voir
  • http//www.w3schools.com/schema/

65
XML Schéma
  • Syntaxe XML
  • Comme automates darbres types découplés
  • ltelement name"annonce" type"annonce1"/gt
  • Des fonctionnalités en plus
  • Gestion des namespaces
  • Types atomiques
  • Contraintes dintégrité
  • etc.

66
Elements simples
  • ltxselement name"xxx" type"yyy"/gt
  • Example
  • ltlastnamegtRefsneslt/lastnamegt
  • ltagegt34lt/agegt
  • ltdateborngt1968-03-27lt/dateborngt
  • ltxselement name"lastname" type"xsstring"/gt
  • ltxselement name"age" type"xsinteger"/gt
  • ltxselement name"dateborn" type"xsdate"/gt
  • Most common types xsstring xsdecimal
    xsinteger xsboolean xsdate xstime

67
Attributs
  • ltxsattribute name"xxx" type"yyy"/gt
  • Example
  • ltlastname lang"EN"gtSmithlt/lastnamegt
  • ltxsattribute name"lang" type"xsstring"/gt 
  • Most common types xsstring xsdecimal
    xsinteger xsboolean xsdate xstime

68
Restrictions
  • ltxselement name"age"gt
  • ltxssimpleTypegt
  • ltxsrestriction base"xsinteger"gt
    ltxsminInclusive value"0"/gt
  • ltxsmaxInclusive value"100"/gt
  • lt/xsrestrictiongt
  • lt/xssimpleTypegt
  • lt/xselementgt
  • Other restrictions enumerated types, patterns,
    etc.

69
Complex elements 4 kinds
  • A complex XML element, "product", which is empty
  • ltproduct pid"1345"/gt
  • A complex XML element,which contains only other
    elements
  • ltemployeegt ltfirstnamegtJohnlt/firstnamegt
    ltlastnamegtSmithlt/lastnamegt lt/employeegt
  • A complex XML element which contains only text
  • ltfood type"dessert"gtIce creamlt/foodgt
  • A complex XML element which contains both
    elements and text
  • ltdescriptiongt It happened on ltdate
    lang"norwegian"gt 03.03.99lt/dategt ....
    lt/descriptiongt

70
Example elements only
  • ltpersongt
  • ltfirstnamegtJohnlt/firstnamegt ltlastnamegtSmithlt/last
    namegt
  • lt/persongt
  • ltxselement name"person"gt
  • ltxscomplexTypegt
  • ltxssequencegt
  • ltxselement name"firstname" type"xsstring"/gt
    ltxselement name"lastname" type"xsstring"/gt
    lt/xssequencegt
  • lt/xscomplexTypegt
  • lt/xselementgt

71
Autres gadgets dans XML Schema
  • Possibilité d'importer les types associés à un
    espace de noms
  • ltimport nameSpace "http// ..."
  • schemaLocation "http// ..."
    /gt
  • Possibilité dinclure ou d'étendre un schéma
  • ltinclude schemaLocation"http// ..."/gt
  • ltredefine schemaLocation"http// ..."/gt
  • .... Extensions ...
  • lt/redefinegt

72
Types de donnés nommés
  • ltxselement name"employee"gt
  • ltxscomplexTypegt ltxssequencegt ltxselement
    name"firstname" type"xsstring"/gt ltxselement
    name"lastname" type"xsstring"/gt lt/xssequencegt
  • lt/xscomplexTypegt
  • lt/xselementgt
  • Only the "employee" element can use the specified
    complex type
  • (ltsequencegt indicates an order on child
    elements)
  • Alternative
  • ltxselement name"employee" type"personinfo" /gt
  • ltxscomplexType name"personinfo"gt
  • ltxssequencegt ltxselement name"firstname"
    type"xsstring"/gt ltxselement name"lastname"
    type"xsstring"/gt lt/xssequencegt
  • lt/xscomplexTypegt

73
Exemple une DTD
lt?xml version"1.0"?gt lt!ELEMENT EMAIL (TO,
FROM, CC, BCC, SUBJECT?, BODY?)gt lt!ATTLIST
EMAIL LANGUAGE (WesternGreekLatinUniversal)
"Western" ENCRYPTED CDATA IMPLIED PRIORITY
(NORMALLOWHIGH) "NORMAL"gt lt!ELEMENT TO
(PCDATA)gt lt!ELEMENT FROM (PCDATA)gt lt!ELEMENT CC
(PCDATA)gt lt!ELEMENT BCC (PCDATA)gt lt!ATTLIST
BCC HIDDEN CDATA FIXED "TRUE"gt lt!ELEMENT
SUBJECT (PCDATA)gt lt!ELEMENT BODY
(PCDATA)gt lt!ENTITY SIGNATURE "Bill"gt
74
en XML Schema
lt?xml version"1.0" ?gt ltSchema name"email"
xmlns"urnschemas-microsoft-comxml-data"
xmlnsdt"urnschemas-microsoft-co
mdatatypes"gt ltAttributeType name"language"
dttype"enumeration"
dtvalues"Western Greek Latin Universal" /gt
ltAttributeType name"encrypted" /gt
ltAttributeType name"priority" dttype"enumeratio
n" dtvalues"NORMAL LOW HIGH" /gt
ltAttributeType name"hidden" default"true" /gt
ltElementType name"to" content"textOnly" /gt
ltElementType name"from" content"textOnly" /gt
ltElementType name"cc" content"textOnly" /gt
ltElementType name"bcc" content"mixed"gt
ltattribute type"hidden" required"yes" /gt
lt/ElementTypegt ltElementType name"subject"
content"textOnly" /gt ltElementType name"body"
content"textOnly" /gt ltElementType
name"email" content"eltOnly"gt ltattribute
type"language" default"Western" /gt
ltattribute type"encrypted" /gt ltattribute
type"priority" default"NORMAL" /gt ltelement
type"to" minOccurs"1" maxOccurs"" /gt
ltelement type"from" minOccurs"1" maxOccurs"1"
/gt ltelement type"cc" minOccurs"0"
maxOccurs"" /gt ltelement type"bcc"
minOccurs"0" maxOccurs"" /gt ltelement
type"subject" minOccurs"0" maxOccurs"1" /gt
ltelement type"body" minOccurs"0" maxOccurs"1"
/gt lt/ElementTypegt lt/Schemagt
75
lt?xml version"1.0" encoding"utf-8"?gt
ltxsschema xmlnsxs"http//www.w3.org/2001/XMLSch
ema" targetNamespace"http//www.net-
language.com"gt ltxselement name"book"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"title" type"xsstring"/gt
ltxselement name"author"
type"xsstring"/gt ltxselement
name"character"
minOccurs"0" maxOccurs"unbounded"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"name"
type"xsstring"/gt ltxselement
name"friend-of" type"xsstring"
minOccurs"0" maxOccurs"unbounded"/gt
ltxselement name"since"
type"xsdate"/gt ltxselement
name"qualification" type"xsstring"/gt
lt/xssequencegt
lt/xscomplexTypegt lt/xselementgt
lt/xssequencegt ltxsattribute
name"isbn" type"xsstring"/gt
lt/xscomplexTypegt lt/xselementgt
lt/xsschemagt
76
XML schema
  • Assez riche
  • Trop?

77
Typage
78
Vérification de type
  • Qui vérifie?
  • Éditeurs XML
  • Échanges de données (départ et arrivée)
  • Vérification dynamique quand on a généré les
    données
  • Vérification statique vérification du programme
    qui construit les données
  • application requête Xquery transformation XSLT

79
Vérification et inférence de type
  • Input schéma T dentrée fonction f
  • Vérification étant donné T, est-ce que pour
    tout dT, f(T)T ?
  • Inférence trouver le plus petit T tel que pour
    tout dT, f(T)T ?
  • Cas particulier f(any)
  • Indécidable en général à cause des jointures

80
Exemple
  • for p in doc("parts.xml)//partcolorred"
  • return ltpiècegt
  • ltnomgtp/namelt/nomgt
  • ltdescgtp/desclt/descgt
  • lt/piècegt
  • Type du résultat
  • (pièce (nom (string) desc (any) )
  • Si le type de parts.xml//part/desc is string
  • (pièce (nom (string) desc (string) )

81
Illustration de la difficulté
  • for X in Input, Y in Input do print ( ltb/gt
  • Input lta/gt lta/gt
  • Résultat ltb/gt ltb/gt ltb/gt ltb/gt
  • Problème bi ? in2 pour n 0 nest pas un
    type
  • Pas de meilleurs type
  • b
  • ? b2 b2b
  • ? b2 b4 b4b

82
Des automates pour calculer
83
Proposal k-pebble transducers
stack
milo,suciu,vianu
84
k-pebble transducers result
Capture a core aspect of Xquery but not the data
management part
85
Plein dautres utilisations des types
possibleInformation incomplète
86
Scenario
  • Information continuously enriched
    by successive queries to XML sources
  • Need to
  • -represent incomplete information
  • -intelligently answer queries
  • using incomplete information

87
Wish list
  • Intuitive representation system
    extension of DTDs
  • Efficient incremental maintenance
    through consecutive queries


T0
Input DTD
88
Wish list
  • Intuitive representation system
    graceful extension of DTDs
  • Efficient incremental maintenance
    through consecutive queries


T0
Input DTD
(q1 , A1)
89
Wish list
  • Intuitive representation system
    graceful extension of DTDs
  • Efficient incremental maintenance
    through consecutive queries


T0 T1
Input DTD
(q1 , A1)
90
Wish list
  • Intuitive representation system
    graceful extension of DTDs
  • Efficient incremental maintenance
    through consecutive queries


T0 T1
Input DTD
(q1 , A1) (q2 , A2)
91
Wish list
  • Intuitive representation system
    graceful extension of DTDs
  • Efficient incremental maintenance
    through consecutive queries


T0 T1 T2

Input DTD
(q1 , A1) (q2 , A2)
92
Wish list
  • Intuitive representation system
    graceful extension of DTDs
  • Efficient incremental maintenance
    through consecutive queries


T0 T1 T2
Input DTD
(q1 , A1) (q2 , A2) (q3 , A3)
93
Wish list
  • Intuitive representation system
    graceful extension of DTDs
  • Efficient incremental maintenance
    through consecutive queries


T0 T1 T2
T3
Input DTD
(q1 , A1) (q2 , A2) (q3 , A3)
94
Wish list
  • Intuitive representation system
    graceful extension of DTDs
  • Efficient incremental maintenance
    through consecutive queries


T0 T1 T2
T3
Input DTD
(q1 , A1) (q2 , A2) (q3 , A3)
(qk , Ak)
95
Wish list
  • Intuitive representation system
    graceful extension of DTDs
  • Efficient incremental maintenance
    through consecutive queries


T0 T1 T2
T3 Tk
Input DTD
(q1 , A1) (q2 , A2) (q3 , A3)
(qk , Ak)
rep(Tk) rep(T0) ? q1-1(A1) ? ? qk-1(Ak)
96
Querying incomplete XML docs
  • Using just available information
  • compute description of all answers

97
Querying incomplete XML docs
  • Using just available information
  • compute description of all answers

rep(T)
rep
T
98
Querying incomplete XML docs
  • Using just available information
  • compute description of all answers

q
rep(T) q(rep(T))
rep
T
99
Querying incomplete XML docs
  • Using just available information
  • compute description of all answers

q
rep(T) q(rep(T))
rep
T q(T)
q
100
Querying incomplete XML docs
  • Using just available information
  • compute description of all answers

q
rep(T) q(rep(T)) rep(q(T))
rep
rep
T q(T)
q
Need strong representation system wrt
query language
101
  • Answer queries using the
  • sure and possible modalities
  • -is t surely true in all answers to q?
    -is t possibly true
    in some answer to q?

Example -is t a prefix of all trees in
q(rep(T)) ? -is t a prefix of some tree in
q(rep(T)) ?
102
  • Decide if available info is enough
  • to fully answer query
  • - similar to answering queries using views
  • If not, seek additional information
  • - mediator problem find minimal set of
    additional queries to sources needed to
    fully answer query
  • - use representation of incomplete info to
  • guide mediator

103
Challenge balance expressiveness and
tractability!
  • Parameters choice of
  • XML document types (DTDs)
  • Query language
  • Representation system
  • Proposal of Abiteboul Segoufin Vianu - PODS
    2001
  • -simple, practically appealing
  • -many limitations
  • -justification
  • extra features lead to serious problems!

104
  • XML abstraction
  • unranked trees with labels and values
  • simplified DTD
  • tree type unordered, simple cardinality
  • constraints

catalog product
1 exactly one at least one unrestricted ?
zero or one

1
1

1
name price category picture
?
subcat
105
  • Query language
  • prefix-selection queries (ps-queries)

catalog product
name price category picture

subcat
106
  • Query language
  • prefix-selection queries (ps-queries)

catalog product
No branching siblings with same label
name price category picture
catalog
subcat
product product
subcat picture
107
  • Query language
  • prefix-selection queries (ps-queries)

catalog product
name price category picture
subcat
108
  • Query language
  • prefix-selection queries (ps-queries)

catalog product
name price category picture
subcat
109
  • Query language
  • prefix-selection queries (ps-queries)

catalog product
name price category picture
subcat
110
  • Query language
  • prefix-selection queries (ps-queries)

catalog product
name price category picture
lt200 elec
subcat
Find electronics products with price lt 200 and
without pictures (display all info about their
name, the price, the category and subcategory)
Important assumption persistent node
ids! (queries return actual nodes from the input
can join answers from consecutive queries)
111
Example
Source DTD (tree type)
Answer to Query 1
catalog
catalog product

1
product product
product
1
1

name price cat picture
1
Canon 120 elec Nikon 199 elec
Sony 175 elec
subcat
camera camera
cd-player
Query 1
catalog product
name pricelt200 catelec
subcat
112
Incomplete information after Query 1
catalog
product product
product
Canon 120 elec Nikon 199 elec
Sony 175 elec
camera camera
cd-player
known information prefix data tree
113
Incomplete information after Query 1
catalog


product product
product
product1 product2


Canon 120 elec Nikon 199 elec
Sony 175 elec
name price cat picture name price cat
picture
?elec
?200
camera camera
cd-player
subcat subcat
missing information
known information prefix data tree
114
Incomplete information after Query 1
catalog


product product
product
product1 product2


Canon 120 elec Nikon 199 elec
Sony 175 elec
name price cat picture name price cat
picture
?elec
?200
camera camera
cd-player
subcat subcat
missing information extended tree
type --conditions on data values --specialization
known information prefix data tree
115
Incomplete information after Query 1
catalog


product product
product
product1 product2


Canon 120 elec Nikon 199 elec
Sony 175 elec
name price cat picture name price cat
picture
?elec
?200 elec
camera camera
cd-player
subcat subcat
missing information extended tree
type --conditions on data values --specialization
known information prefix data tree

Incomplete tree
116
Incomplete tree
.
117
Query 2
Answer to Query 2
catalog product
catalog
product
product
name catelec picture
Canon elec c.jpg
Olympus elec o.jpg
subcatcamera
camera
camera
118
Incomplete tree after Query 2
catalog



product product3
product product2a
product1 product2b product2c
Canon 120 elec c.jpg Nikon 199 elec
Sony 175 elec Olympus elec o.jpg
camera camera
cd-player camera
product2b
product2c
product3 product2a

name price cat name price
cat picture
name price cat picture
name price cat
?200
elec
elec
elec
elec
?200
?200
subcat camera
subcatcamera
subcatcamera
subcat ?camera
119
Suppose next query is Query 3 find the name,
price and pictures of all cameras costing
less than 200 and having at least one picture.
catalog
product
name price cat picture
lt200
elec
subcat camera
Can be fully answered using available information
120
Query 4 find all cameras
catalog product
name catelec
subcatcamera
  • Using available information can

--provide the complete list of cameras that are
less than 200 or have a picture --tell the
user that there may be more cameras (that are
expensive and have no pictures).
  • Can fully answer query by asking
  • the extra query
  • Query 5 find the cameras
  • that cost at least 200 and have no picture.

121
Conclusion
  • Le typage de documents XML est primordial
  • Pour la navigation
  • Pour linterrogation/loptimisation
  • Pour léchange entre applications
  • Pour la protection des données
  • DTDs système de type simple dutilisation mais
    limité
  • XML Schema plus expressif, peut-être trop riche
    en fonctionnalités
  • Les automates darbres fournissent un bon niveau
    dabstraction et une bonne compréhension du
    processus de validation.
  • Dautres types dautomates peuvent aussi être
    utilisés, basés sur un parcours séquentiel unique
    du document (pushdown tree automata)

122
Merci
Write a Comment
User Comments (0)
About PowerShow.com