XML ? Relational DB - PowerPoint PPT Presentation

1 / 62
About This Presentation
Title:

XML ? Relational DB

Description:

EBERHARD-KARLS-UNIVERSIT T T BINGEN SFS TCL. XML Relational DB. Proposal ... Semantic markup added to irregularly structured or largely text-based information ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 63
Provided by: kep7
Category:

less

Transcript and Presenter's Notes

Title: XML ? Relational DB


1
XML ? Relational DB
  • Proposal by Richard Edwards
  • Presented by Stephan Kepser

2
Richard Edwards
  • Computer Scientist
  • School of Informatics
  • University of Bangor, Wales
  • rich_at_informatics.bangor.ac.uk
  • http//www.informatics.bangor.ac.uk/rich/
  • Source of this talk 2 PPT presentations by
    Edwards.

3
Aim
  • To store arbitrary XML documents
  • without recourse to underlying DTD or schema
  • in a relational database
  • in particular in a SQL database.

4
In a Nutshell
  • Store the Document Object Model of the XML
    document.
  • Extend each node entry by two natural numbers
    resembling the position of the node in the
    document tree.

5
Overview of Presentation
  • XML and Databases
  • Document object model (DOM)
  • Storing the DOM in a relational database
  • Adding navigational information
  • Conclusions

6
XML
  • Extensible Markup Language
  • Form of a class of documents may be given by
  • a Document Type Definition (DTD)
  • an XML Schema
  • Model of a document A tree.

7
Example
8
Terminology
  • Well-formed
  • Conforms to the rules of XML
  • Valid
  • Document structure conforms to a specified
    schema
  • Structure hierarchy of elements, attributes
    content
  • DTD (SGML), XML Schema

9
Categorising documents
  • Data-centric
  • Low-level atomic data
  • Highly structured, e.g. a database extract
  • Document-centric
  • Semantic markup added to irregularly structured
    or largely text-based information
  • These are loose categories
  • Documents displaying the characteristics of both
    are common, e.g. notes added to structured data.

10
XML versus Databases
  • An XML document is not a database.
  • Database management systems offer
  • Indexing (clustered and non-clustered)
  • Constraints
  • Foreign Keys
  • Patterns
  • Querying
  • Multi-user access

11
XML versus Databases
  • Database management systems offer
  • Security
  • Transaction management
  • Thread safety
  • Crash recovery
  • Cacheing
  • Locking mechanisms
  • Deadlock contention handling

12
Document Object Model
  • At the same time
  • a model for arbitrary XML documents
  • an application programming interface (API)
  • for languages like Java, C, Python
  • Base tree structure of a document

13
DOM (2)
  • A DOM is not the elements' tree of the document.
  • Because an XML document contains much more than
    elements
  • Attributes
  • Comments
  • Text nodes
  • Notation
  • Processing instructions

14
DOM root
  • A DOM is a tree.
  • Its root is the document node.
  • This is a new root node to attach the whole
    document to.
  • Its daughters the top level objects of the
    document. These are
  • the document's root element
  • processing instructions
  • lt?xml-stylesheet href"person.css"
    type"text/css" ?gt

15
DOM root (2)
  • document type node
  • lt!DOCTYPE person gt
  • comment nodes
  • lt!-- no comment --gt

16
Node types
  • Element node
  • Attribute node
  • Name and value of an attribute
  • Document node
  • Root of the DOM tree
  • Document type node
  • Root of the DTD
  • Comment node

17
Node types (2)
  • Processing instruction node
  • Text node
  • Contains ordinary text of the document
  • CDATA section node
  • lt!CDATA blabla bla gt
  • Document fragment node
  • Root of a scrap board for editing,
  • Only for processing, not part of a real
    XML-document

18
Node types (3)
  • Entity node
  • Definition of an entity in a DTD
  • lt!ENTITY coord "((x, y)(y,x))"gt
  • Entity reference node
  • Optional reference of an entity, e.g., lt
  • Notation node
  • Definition of a notation in a DTD
  • lt!NOTATION jpeg SYSTEM "image/jpeg"gt

19
Node properties
  • Node name
  • Element or attribute name, or a fixed string
    like comment
  • Node value
  • Attribute value or the actual text of a text
    node
  • Node type
  • One of the node types above

20
Node properties (2)
  • Parent node
  • List of child nodes
  • First child
  • Last child
  • Previous sibling
  • Next sibling
  • Attributes set

21
Leaf nodes
  • Attribute node
  • CDATA section node
  • Comment node
  • Entity node
  • Entity reference node
  • Notation node
  • Processing instruction node
  • Text node

22
Example
  • ltsample bogus"value"gt
  • lttext_nodegtTest data.lt/text_nodegt
  • lt/samplegt

23
Example
24
DOM
  • The DOM contains a document's structure and ist
    content.
  • It is a generic model, exists for any
    XML-document.
  • It is independent of a DTD (may contain a DTD!)
    or a schema.

25
Persistent DOM
  • Store DOM in relation database
  • Translate the tree into a set of nodes
  • Store node properties in tables.

26
Database design
27
Node type
  • Lookup table

28
doc
  • Data table

29
doc
  • Sample data

30
node
  • Data table

31
node
  • Sample data

32
element_name_leaf
my_incredibly_verbosely_named_element
33
text_leaf
34
element_namespace_leaf
my_incredibly_verbose_nsmyelement
35
More tables
  • attribute
  • node_id
  • attribute_id
  • attribute_namespace_leaf
  • attribute_name_leaf
  • attribute_value_leaf

36
Yet more tables
  • comment_leaf
  • cdata_leaf
  • entity_reference_leaf
  • pi_data_leaf
  • pi_target_leaf

37
Database design repeated
38
Labelling nodes
  • Each node receives two natural numbers as
    coordinates (x,y).
  • The tree is traversed depth-first.
  • A counter is installed. On entering and leaving a
    node it is incremented.
  • On entering a node, the node receives the current
    counter value as x-coordinate.
  • On leaving a node, the node receives the current
    counter value as y-coordinate.

39
x 1 y ?
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

SAX event startDocument() DB action insert doc
(and return unique doc ID)
40
x 2 y ?
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
SAX event startElement(nsURI, locName, qName,
attrs) DB action insert node, insert
element_name_leaf
41
x 3 y ?
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
3
?
SAX event startElement(nsURI, locName, qName,
attrs) DB action insert node, insert
element_name_leaf
42
x 4 y ?
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
3
?
c
4
?
SAX event startElement(nsURI, locName, qName,
attrs) DB action insert node, insert
element_name_leaf
43
x 4 y 5
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
3
?
c
4
5
SAX event endElement(nsURI, locName, qName) DB
action update node (set y index)
44
x 6 y ?
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
3
?
c
d
4
5
6
?
SAX event startElement(nsURI, locName, qName,
attrs) DB action insert node, insert
element_name_leaf
45
x 6 y 7
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
3
?
c
d
4
5
6
7
SAX event endElement(nsURI, locName, qName) DB
action update node (set y index)
46
x 3 y 8
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
3
8
c
d
4
5
6
7
SAX event endElement(nsURI, locName, qName) DB
action update node (set y index)
47
x 9 y ?
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
e
3
8
9
?
c
d
4
5
6
7
SAX event startElement(nsURI, locName, qName,
attrs) DB action insert node, insert
element_name_leaf
48
x 9 y 10
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
e
3
8
9
10
c
d
4
5
6
7
SAX event endElement(nsURI, locName, qName) DB
action update node (set y index)
49
x 11 y ?
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
e
f
3
8
9
10
11
?
c
d
4
5
6
7
SAX event startElement(nsURI, locName, qName,
attrs) DB action insert node, insert
element_name_leaf
50
x 12 y ?
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
e
f
3
8
9
10
11
?
c
d
g
?
12
4
5
6
7
SAX event startElement(nsURI, locName, qName,
attrs) DB action insert node, insert
element_name_leaf
51
x 13 y 14
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
e
f
3
8
9
10
11
?
c
d
g
?
12
4
5
6
7

SAX event characters(char ch, int start, int
length) DB action insert node, insert text_leaf
13
14
52
x 12 y 15
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
e
f
3
8
9
10
11
?
c
d
g
15
12
4
5
6
7

SAX event endElement(nsURI, locName, qName) DB
action update node (set y index)
13
14
53
x 11 y 16
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
?
b
e
f
3
8
9
10
11
16
c
d
g
15
12
4
5
6
7

SAX event endElement(nsURI, locName, qName) DB
action update node (set y index)
13
14
54
x 2 y 17
doc
1
?
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
17
b
e
f
3
8
9
10
11
16
c
d
g
15
12
4
5
6
7

SAX event endElement(nsURI, locName, qName) DB
action update node (set y index)
13
14
55
x 1 y 18
doc
1
18
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
17
b
e
f
3
8
9
10
11
16
c
d
g
15
12
4
5
6
7

SAX event endDocument() DB action update doc
(set success flag) do extras
13
14
56
doc
1
18
  • ltagt
  • ltbgt
  • ltc/gt
  • ltd/gt
  • lt/bgt
  • lte/gt
  • ltfgt
  • ltggtlt/ggt
  • lt/fgt
  • lt/agt

a
2
17
b
e
f
3
8
9
10
11
16
c
d
g
15
12
4
5
6
7

13
14
57
Navigation
  • The next sibling (x,y) of any given node (x, y)
    has xy1
  • The first child (x,y) of any given node (x, y)
    has xx1
  • The set of nodes that originate from a given node
    (x, y) have xltxlty

58
Node labelling
  • Facilitates rapid serialisation (serial output)
  • No need to traverse the tree
  • Facilitates simple locking mechanism
  • But
  • Maintenance of node labels
  • Trade-off between select speed and update
    efficiency

59
Querying
  • XPath expressions have to be translated into
    complicated select statements.
  • Subcomponents of an XPath expression are
    translated into where-clauses or temporary tables
    (embeded select).
  • Node coordinates help a lot, but do not do
    everything.

60
Querying (2)
  • Example
  • XPath //book
  • SQL select from node as nd, element_node_leaf
  • as enl where nd.owner_doc_id 2321 and
  • nd.node_id enl.node_id and
  • enl.leaf_text 'text_node'

61
Querying (3)
  • XPath //book/author
  • SQL select from node as nd1, node as nd2,
    element_node_leaf as enl1, element_node_leaf as
    enl2 where
  • nd1.owner_doc_id 2321 and nd1.node_id
    enl1.node_id and enl1.leaf_text 'author' and
  • nd2.owner_doc_id 2321 and nd2.node_id
    enl2.node_id and enl2.leaf_text 'book' and
  • nd1.x_index nd2.x_index 1

62
Querying (4)
  • XSLT and XQuery require many select-statements
    and additional computations outside of the
    database.
  • XSLT and XQuery Turing-complete
  • SQL II (Standard) First-order logic
  • SQL III FOL least fixed points
Write a Comment
User Comments (0)
About PowerShow.com