XML and databases - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

XML and databases

Description:

Data organized into chunks, similar entities groups together ... functionality shared between XSL Transformations (XSLT) extensible stylesheet language ... – PowerPoint PPT presentation

Number of Views:24
Avg rating:3.0/5.0
Slides: 38
Provided by: susanv5
Category:
Tags: xml | databases | xsl

less

Transcript and Presenter's Notes

Title: XML and databases


1
  • XML and databases

2
XML and Databases
  • Data today
  • Structured - Info in databases
  • Data organized into chunks, similar entities
    groups together
  • Descriptions for entities in groups same
    format, length, etc.
  • Semi-structured data has certain structure, but
    not all items identical
  • Schema info may be mixed in with data values
  • Similar entities grouped together may have
    different attributes
  • Self-describing data
  • May be displayed as a graph
  • Unstructured data
  • Data can be of any type, may have no format or
    sequence
  • Web pages in HTML
  • Video, sound, images

3
XML documents and DBMS
  • Options
  • Use a DBMS to store XML documents as text
  • If DBMS has module for document processing
  • Use DBMS to store XML document contents as data
    elements
  • Works if all documents have same structure
  • map XML schema to DB schema
  • Design special DBMS for storing XML data
  • New type of XML DBMS designed, e.g. based on
    hierarchical model (Tamino)
  • Create XML documents from preexisting RDBS and
    store into DB

4
Why XML
  • Want Data source database with Web interface
  • Specify content and format of Web pages with HTML
  • Use HTML tags (predefined) for formatting Web
    documents
  • HTML not suitable for specifying structured data
    from databases
  • Does not contain schema information
  • only how to display information
  • XML standard for structuring and exchanging
    data over Web

5
Types of XML documents
  • Data-centric documents have small data items
    following a specific structure
  • Document-centric documents with large amount of
    text (little structured data)
  • Hybrid documents with structured and
    unstructured data

6
XML
  • Basic object is XML document
  • Structuring concepts
  • Elements (tags)
  • Attributes
  • XML attributes describe properties and
    characteristics of elements (tags)
  • Also
  • Entities, identifiers, references

7
Elements
  • Elements identified by
  • Start tag lt gt
  • End tag lt/gt
  • Simple elements data values
  • Complex elements constructed from other
    elements hierarchically
  • XML called a tree or hierarchical model
  • No limit on number of nesting elements
  • See fig. 27.1

8
Schemaless
  • If semistructured data schemaless XML document
    and it is standalone
  • lt?xml version 1.0 standaloneyes?gt
  • (no corresponding file, e.g. DTD, specifying
    schema)

9
Well-formed
  • XML document is well formed if
  • Starts with XML declaration to indicate version
    and other relevant attributes
  • Single root element
  • Element matching pair of start/end tags within
    parent element
  • Syntactically correct
  • Can be processed to create internal tree

10
Next step?
  • Want to use info in XML document to determine
    schema of database
  • To do this, parse document to create tree
    structure of data
  • After parsed, if not stand alone, can compare to
    definition of data structure (DTD of XML schema)
    to validate

11
Parsing summary
  • To specify structure of data of semi-structured
    data
  • DTD, XML Schema
  • Each time access data, must parse document to
    create tree structure of data
  • After parsed, compare to definition of data
    structure (DTD or XML schema) to validate
  • Parsing SLOWS down the process

12
Parsing create internal tree
  • Whole document must be parsed beforehand to
    generate tree
  • Set of API functions to manipulate tree and
    parsing models
  • DOM (Document Object Model) - uses main memory to
    parse entire document
  • SAX allows processing XML documents on the fly
    (also good for streaming XML documents)
  • Once parsed, allows validation of XML documents
    against
  • DTD (Document Type Definition) file or
  • XML schema file
  • Valid means well formed and elements must follow
    structure and types specified in the separate
    schemas

13
XML DTD (Fig. 27.2)
  • First specify root tag
  • Parenthesis following element can be
  • Type
  • names of other elements (children)
  • If PCDATA, means element is a leaf node (parsed
    character data string)
  • Parenthesis can be nested
  • indicates either

14
DTD contd
  • To check for conformance to DTD add to XML
    document
  • lt?xml version1.0 standalonenogt
  • lt!DOCTYPE project SYSTEM proj.dtdgt
  • Could also include DTD doc at beginning of XML
    doc
  • Problems with DTD
  • datatypes not general
  • Special syntax requires specialized processors
  • Elements must follow ordering of document

15
DTD - Notation for specifying elements
  • - element can be repeated 0 or more times
  • - element can be repreated 1 or more times
  • ? element can be repeated zero or one times
  • If no symbol, element must appear exactly once
  • Type is specified in parenthesis (PCDATA)
    means parsed character data

16
XML Schema
  • Alternative to (evolution from) DTD
  • Standard for specifying structure of XML
    documents
  • xsd XML schema definition

17
XML Schema
  • Same syntax rules, so same processors on both
  • Could display the entire Company database as a
    single document
  • Could store DB in XML format instead of
    relational DB

18
Features of XML Schema
  • 1) To identify XML schema language elements used,
    specify a file at a Web site location
  • Each such definition is XML namespace
  • ltxsdschema xmlnsxsdhttp//www.w3.org/2001/XML
    Schemagt
  • File name assigned to xsd, and this variable used
    as prefix to all XML schema commands

19
Features contd
  • 2) Annotation, documents and language
  • Used for providing comments and other
    descriptions, e.g. en means english
  • 3) Elements and types
  • xsdelement - specifies element name
  • xsdComplexType if elements and children
  • xsdsequence ordered set of element types,
    e.g. dept, employee, etc.,

20
Features contd
  • 4) First-level elements
  • specified in element tags
  • Element type, minimum and maximum occurrences
  • MinOccurs, etc.
  • also
  • xsdkey - PK
  • xsdunique tag, but must give constraint a name
  • xsdkeyref foreign keys

21
Features contd
  • Structures of complex elements - complex types
  • Composite (compound) attributes complex types
  • http//www.deepx.com/resources/quickref/XSD1-1.0.p
    df

22
XPath Language
  • Addresses parts of an XML document
  • Language mainly consists of location paths and
    expressions
  • Provide a common syntax and semantics for
    functionality shared between XSL Transformations
    (XSLT) extensible stylesheet language
  • Describes how files encoded in XML are to be
    formatted or transformed
  • a language for transforming XML documents into
    other XML documents or human-readable
  • Language to transform the format of XML data into
    data of other formats- eg. XML data into HTML, or
    to PDF
  • uses set of well-defined rules

23
XSLT
  • Functional language
  • Evaluation of math functions
  • Avoids state/mutable data, e.g. Mathematica
  • Domain specific programming language (for
    specific task)
  • Lambda-calculus

24
To Query
  • XPointer
  • Specify position in XML document so other
    documents can link to it
  • XPath
  • facilities for manipulation of strings, numbers
    and booleans

25
To Query
  • Use XPath (XML Path Language) to retrieve data
  • XPath expression language, based on tree
    representation of XML document
  • Small query language
  • Provides ability to navigate around the tree
  • Addresses specific parts of XML document
  • Common syntax and behaviour model between
    Xpointer (addresses components of XML based
    internet media) and XSLT (translates XML into
    human readable documents)

26
XPath location path and expressions
  • A location path is e.g. childparaposition(1)
  • XPath expressions
  • Returns collection of element nodes that satisfy
    patterns specified in expression
  • Name with qualifier conditions
  • Separators
  • / means tag must appear as child of previous
    parent tag
  • // means tag can appear as descendant of previous
    tag at any level

27
XPath
  • To access whole XML document
  • Doc(www.company.com/info.xml)/company
  • /company/department
  • //employee employeeSalary gt 70000/employeeName
  • /company/employee employeeSalary gt
    70000/employeeName
  • /company/project/projectWorker hours ge 20.0

28
To Query
  • XQuery (like SQL) to query data using XPath
    expressions
  • Based on SQL-like FLOWR (FLWOR) for joins
  • For, Let, Where, Orderby Return

29
XQuery querying in XML
  • For ltvariable bindings to individual nodesgt
  • LET ltvariable binding to collections of nodesgt
  • WHERE ltqualifer conditionsgt
  • RETURN ltquery result specificationgt

30
Example query
  • for x IN
  • doc(www.company.com/info.xml)/company/employee
  • WHERE x/employeeSalary gt 70000
  • RETURN ltresgt x/employeeName/firstName,
  • x/employeeName/lastName lt/resgt
  • What does this do?
  • What do you think of XQuery?

31
Research
  • XML parsing A Threat to DB Performance M.
    Nicola and J. John
  • Enterprises embracing XML
  • Working to standardize industry data processing
  • XML-enables relational systems and XML databases
    do not provide performance as RDBS
  • Performance affected by
  • Performance of XML parser
  • Converting XML into relational format
  • Evaluating XPath expressions
  • XSLT processing

32
Nicola and John paper contd
  • Parsing allows for optional validation of XML
    document
  • Schema validation checks documents compliance
    and determines type info
  • DB systems and Xquery sensitive to data types
  • If insert to DB, must parse
  • Need to index or extract information
  • If update to DB, must reparse
  • Read doesnt require XML parsing
  • If mapped to relational schema, XQueries are
    translated into SQL

33
Nicola and John paper contd
  • What are the parser bound applications?
  • What is the XML parser performance?
  • What is needed?

34
Research Topics
  • XML to relational data mapping
  • XML Parsing
  • Updating XML in RDBMS
  • XML and access control
  • In your paper review, please indicate if you
    would recommend it for the rest of the class

35
Oracle and XML
  • /Schema for XML when using SQLPlus
  • //A DTD is not needed!!!
  • drop table Company
  • Create table Company of XMLType
  • The rest of the definition

36
(No Transcript)
37
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com