Semistructured Data and XML - PowerPoint PPT Presentation

1 / 100
About This Presentation
Title:

Semistructured Data and XML

Description:

Concepts of the Object Exchange Model (OEM), a model for semistructured data. ... Name Characters are letters, digits, hyphens, underscores, colons or full stops. ... – PowerPoint PPT presentation

Number of Views:118
Avg rating:3.0/5.0
Slides: 101
Provided by: thomas861
Category:

less

Transcript and Presenter's Notes

Title: Semistructured Data and XML


1
Chapter 29
  • Semistructured Data and XML
  • Transparencies

2
Chapter - Objectives
  • What semistructured data is.
  • Concepts of the Object Exchange Model (OEM), a
    model for semistructured data.
  • Basics of Lore, a semistructured DBMS, and its
    query language, Lorel .
  • Main language elements of XML.
  • Difference between well-formed and valid XML
    documents.
  • How Document Type Definitions (DTDs) can be used
    to define the valid syntax of an XML document.

3
Chapter - Objectives
  • How Document Object Model (DOM) compares with
    OEM.
  • About other related XML technologies.
  • Limitations of DTDs and how the W3C XML Schema
    overcomes these limitations.
  • How RDF and RDF Schema provide a foundation for
    processing meta-data.

4
DTD XML Names and NMTOKEN
  • Name Characters are letters, digits, hyphens,
    underscores, colons or full stops.
  • An NMTOKEN is any collection of Name Characters
  • NMTOKENS is any list of NMTOKENs separated by
    white space (space, tab, newline etc.)
  • Case is significant PERSON and person are
    distinct names
  • Attribute and Element names must be (a subset of)
    NMTOKEN with restriction
  • Names cannot begin with a digit
  • Names cannot begin with xml (or any variant
    gotten by case changes) system will use this
    prefix

5
Element Declarations EMPTY
  • Keyword ELEMENT Introduces a new
    elementlt!ELEMENT NAME CONTENT_MODELgt
  • Element name must begin with a letter, and may
    additionally contain digits and some
    punctuations, i.e. ., -, _, and as we
    described earlier under NMTOKEN
  • If an element can hold no child elements, and
    also no text, then it is known as empty element
    and denoted by EMPTY for CONTENT_MODEL
  • This seems trivial but it isnt because the
    present or absence of this element in an XML file
    can be used as a flag
  • As an example we can find several in HTML such as
    HR and IMG which never have children and include
    no text. Here we would writelt!ELEMENT HR EMPTYgt
    and then ltHR/gt or ltHRgtlt/HRgt generates a
    horizontal line
  • EMPTY ELEMENTS can have attributes such as the
    SRC attribute in ltIMG/gt to specify source of
    image.

6
Element Declarations ANY
  • An element declared to have a content of ANY may
    contain all of the other elements declared in the
    DTD
  • This is not quite the same as no DTD for the file
  • lt!DOCTYPE fred lt!ELEMENT fred ANY gtgt
  • ltfredgt ltpeoplegtMe and Yoult/peoplegt ltpeoplegtThem
    lt/peoplegtlt/fredgt
  • Gets an error due to presence of ltpeoplegt tag
  • Adding lt!ELEMENT people ANY gt inside DTD
    declaration produces a valid document.

7
Entities
  • The DTD of an XML document can contain entity
    declarations. These are like macro substitutions
    in other languages.
  • ENTITYs are defined in DTD and consist of
    several flavors
  • General Entities are referenced as EntName
  • Parameter Entities are referenced as Entname
  • We have already seen the character entities
  • amp for
  • apos for
  • gt for gt
  • lt for lt
  • quot for
  • These are built in but you could add other such
    entities with
  • lt!ENTITY aitself A gt and aitself would be
    replaced by A

8
General Entities
  • As another example, we can use in DTDlt!ENTITY
    TODAY May 12 2003 gt andltcommentgtTODAY was
    very quiet in Irvinelt/commentgtis parsed as
    ltcommentgtMay 12 2003 was very quiet in
    Irvinelt/commentgt
  • General Entity references can be nested inside a
    DTD, e.g., one can write lt!ENTITY YEAR 2003 gt
    lt!ENTITY TODAY May 12 YEAR gt
  • However one must use Parameter Entities and not
    General Entities for macro substitution in other
    DTD declarations like lt!ATTLIST and lt!ELEMENT
  • Parameter entities are defined as inlt!ENTITY
    CUSTARDTAGS (NAME,DATE,ORDERS) gt

9
Parameter Entities
  • lt!ENTITY peopletags (firstname,lastname,dateofbi
    rth) gtlt!ELEMENT student peopletags gt
    lt!ELEMENT teacher peopletags gt lt!ELEMENT
    administrator peopletags gt
  • Defines a bunch of people ELEMENTS to have the
    same child elements
  • Parameter entities are even more commonly used
    for attributes because almost always several
    ELEMENTS share the same attributes (with often a
    basic set being augmented in different ways for
    different ELEMENTS)
  • This basic set can be set in a parameter Entity

10
Defining Implied Attributes
  • Attributes must be declared in the DTD to be able
    to be used
  • Implied means that this attribute optional and
    there is no default value
  • lt!ELEMENT population (PCDATA)gt
  • lt!ATTLIST population year CDATA IMPLIEDgt
  • The attribute year can be defined or undefined in
    the element population. Valid Examples
  • ltpopulation year2000gt80lt/populationgt
  • ltpopulationgt80lt/populationgt

11
Defining Required Attributes
  • lt!ELEMENT population (PCDATA)gt lt!ATTLIST
    population year REQUIREDgt
  • The population must contain a year attribute
  • ltpopulation year1996gt80lt/populationgt
  • lt!ELEMENT population (PCDATA)gt lt!ATTLIST
    population year (20002001) REQUIREDgt
  • The population must contain a year attribute of
    2000 or 2001
  • ltpopulation year2000gt80lt/populationgt
  • No quotes on the enumeration values

12
Defining Default Attributes
  • lt!ELEMENT population (PCDATA)gt lt!ATTLIST
    population year CDATA 2000gt
  • All these are valid
  • ltpopulation year2001gt80lt/populationgt
  • ltpopulation year2000gt80lt/populationgt
  • ltpopulationgt80lt/populationgt

13
Defining Fixed Attributes
  • lt!ELEMENT population (PCDATA)gt lt!ATTLIST
    population year CDATA FIXED 2000gt
  • Invalid ltpopulation year2001gt80lt/populationgt
  • Valid ltpopulation year2000gt80lt/populationgt
  • Valid ltpopulationgt80lt/populationgt

14
Defining Unique Attributes
  • lt!ELEMENT animal (name)gt
  • lt!ATTLIST animal code ID REQUIREDgt
  • The code attribute has to be unique in the XML
    document
  • ltanimal codeT50gtltnamegtLionlt/namegt lt/animalgt

    ltanimal codeT51gtltnamegtRabbitlt/namegt lt/animalgt

15
Referring Unique Attributes
  • lt!ELEMENT website (url)gt
    lt!ATTLIST website animal_refer IDREF REQUIREDgt
  • animal_refer attribute refers to previous ID
    attribute defined
  • ltwebsite animal_referT50gt
    lturlgthttp//www.lions.comlt/urlgt
    lt/websitegt

16
Referring Multiple Unique Attributes
  • lt!ELEMENT website (url)gt
    lt!ATTLIST website contents IDREFS REQUIREDgt
  • contents attribute contain series of IDs
  • ltwebsite contentsT50 T51gt
    lturlgthttp//www.animals.comlt/urlgt
    lt/websitegt

17
XML Example - the DTD
  • lt!ELEMENT addressBook (person)gt
  • lt!ELEMENT person (name, email, link?) gt
  • lt!ATTLIST person id ID REQUIRED gt
  • lt!ATTLIST person gender (malefemale) IMPLIEDgt
  • lt!ELEMENT name (PCDATA(family,given))gt
  • lt!ELEMENT family (PCDATA)gt
  • lt!ELEMENT given (PCDATA)gt
  • lt!ELEMENT email (PCDATA)gt
  • lt!ELEMENT link EMPTY gtlt!ATTLIST link manager
    IDREF IMPLIED
    subordinates IDREF IMPLIEDgt

18
DOCTYPE declarations
  • Internal local definition of DTD
  • External to an external file
  • Can combine both

19
Internal DTD
  • lt?xml version"1.0" standalone"yes" ?gt
  • lt!--open the DOCTYPE declaration -
  • the open square bracket indicates an internal
    DTD--gt
  • lt!DOCTYPE foo
  • lt!--define the internal DTD--gt
  • lt!ELEMENT foo (PCDATA)gt
  • lt!--close the DOCTYPE declaration--gt
  • gt
  • ltfoogtHello World.lt/foogt

20
Internal DTD rules
  • The document type declaration must be placed
    between the XML declaration and the first element
    (root element) in the document .
  • The keyword DOCTYPE must be followed by the name
    of the root element in the XML document .
  • The keyword DOCTYPE must be in upper case .

21
External DTD
  • Useful for creating a common DTD that can be
    shared between multiple documents.
  • Any changes that are made to the external DTD
    automatically updates all the documents that
    reference it.
  • Two types private, and public.
  • Rules
  • If any elements, attributes, or entities are used
    in the XML document that are referenced or
    defined in an external DTD, standalone"no" must
    be included in the XML declaration .

22
"Private" External DTDs
  • Identified by the keyword SYSTEM
  • Intended for use by a single author or group of
    authors.
  • Example
  • lt!DOCTYPE root_element SYSTEM "DTD_location"gt
  • where DTD_location is relative or absolute URL
    (such as
  • http/ and file/).

23
"Private" External DTDs (cont)
  • XML document
  • lt?xml version"1.0" standalone"no" ?gt
  • lt!DOCTYPE document SYSTEM "subjects.dtd"gt
  • ltdocumentgt lt/documentgt
  • subjects.dtd
  • lt!ELEMENT document gt

24
Public" External DTDs
  • Identified by the keyword PUBLIC
  • Intended for broad use.
  • lt!DOCTYPE root_element PUBLIC "DTD_name"
    "DTD_location"gt where
  • DTD_location relative or absolute URL
  • DTD_name follows the syntax
  • "prefix//owner_of_the_DTD// description_of_the_D
    TD//ISO 639_language_identifier
  • "DTD_location" is used to find the public DTD if
    it cannot be located by the "DTD_name".

25
Public" External DTDs (cont)
  • lt?xml version"1.0" standalone"no" ?gt
  • lt!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
    Transitional//EN" "http//www.w3.org/TR/REC-html40
    /loose.dtd"gt
  • ltHTMLgt
  • ltHEADgt
  • ltTITLEgtA typical HTML filelt/TITLEgt
  • lt/HEADgt
  • ltBODYgt
  • lt/BODYgt
  • lt/HTMLgt

26
Public" External DTDs (cont)
  • Valid DTD_name Prefix
  • ISO The DTD is an ISO standard. All ISO
    standards are approved.
  • The DTD is an approved non-ISO standard.
  • - The DTD is an unapproved non-ISO standard.

27
Combining Internal and External DTDs
  • A document can use both internal and external DTD
    subsets.
  • The internal DTD subset is specified between the
    square brackets of the DOCTYPE declaration.
  • The declaration for the external DTD subset is
    placed before the square brackets immediately
    after the SYSTEM keyword.
  • Declaring an ELEMENT with the same name in both
    the internal and external DTD subsets is invalid

28
Example
  • lt?xml version"1.0" standalone"no" ?gt
  • lt!DOCTYPE document SYSTEM "subjects.dtd"
  • lt!ATTLIST assessment assessment_type (exam
    assignment prac)gt
  • lt!ELEMENT results (PCDATA)gt
  • gt
  • subjects.dtd
  • lt!ELEMENT document (title,subjectID,subjectname,p
    rerequisite?, classes,assessment,syllabus,textbook
    s)gt
  • lt!ELEMENT prerequisite (subjectID,subjectname)gt

29
DTD Validation
  • A XML content can be well-formed but invalid
    under DTD rules
  • e.g. DTD rule lt!ELEMENT name (PCDATA)gt
  • Acceptable ltnamegt Giancarlo Succi lt/namegt
  • Unacceptable
  • ltnamegt
  • ltfirst_namegt Giancarlo lt/first_namegt
  • ltlast_namegt Succi lt/last_namegt
  • lt/namegt

30
Beyond DTDs
  • DTD limitations
  • Simple document structures
  • Lack of real datatypes
  • Advanced schema languages
  • XML Schema
  • Relax NG

31
Limitations of DTDs
  • No typing of text elements and attributes
  • All values are strings, no integers, reals, etc.
  • Difficult to specify unordered sets of
    subelements
  • Order is usually irrelevant in databases
  • (A B) allows specification of an unordered
    set, but
  • Cannot ensure that each of A and B occurs only
    once
  • IDs and IDREFs are untyped
  • The owners attribute of an account may contain a
    reference to another account, which is
    meaningless
  • owners attribute should ideally be constrained to
    refer to customer elements

32
Shortcomings of DTDs
  • Useful for documents, but not so good for data
  • No support for structural re-use
  • Object-oriented-like structures arent supported
  • No support for data types
  • Cant do data validation
  • Can have a single key item (ID), but
  • No support for multi-attribute keys
  • No support for foreign keys (references to other
    keys)
  • No constraints on IDREFs (reference only a
    Section)

33
XML Schema
  • In XML format
  • Includes primitive data types (integers, strings,
    dates, etc.)
  • Supports value-based constraints (integers gt 100)
  • User-definable structured types
  • Inheritance (extension or restriction)
  • Foreign keys
  • Element-type reference constraints

34
XML Schema
  • XML Schema is a more sophisticated schema
    language which addresses the drawbacks of DTDs.
    Supports
  • Typing of values
  • E.g. integer, string, etc
  • Also, constraints on min/max values
  • User defined types
  • Is itself specified in XML syntax, unlike DTDs
  • More standard representation, but verbose
  • Is integrated with namespaces
  • Many more features
  • List types, uniqueness and foreign key
    constraints, inheritance ..
  • BUT significantly more complicated than DTDs.

35
XML Schema Simple Types
  • Elements that do not contain other elements or
    attributes are of type simpleType.
  • ltxsdelement nameSTAFFNO type
    xsdstring/gt
  • ltxsdelement nameDOB type xsddate/gt
  • ltxsdelement nameSALARY type xsddecimal/gt
  • Attributes must be defined last
  • ltxsdattribute namebranchNo type
    xsdstring/gt

36
XML Schema Complex Types
  • Elements that contain other elements are of type
    complexType.
  • List of children of complex type are described by
    sequence element.
  • ltxsdelement name STAFFLISTgt
  • ltxsdcomplexTypegt
  • ltxsdsequencegt
  • lt!-- children defined here --gt
  • lt/xsdsequencegt
  • lt/xsdcomplexTypegt
  • lt/xsdelementgt

37
Cardinality
  • Cardinality of an element can be represented
    using attributes minOccurs and maxOccurs.
  • To represent an optional element, set minOccurs
    to 0 to indicate there is no maximum number of
    occurrences, set maxOccurs to unbounded.
  • ltxsdelement nameDOB typexsddate
  • minOccurs 0/gt
  • ltxsdelement nameNOK typexsdstring
  • minOccurs 0 maxOccurs 3/gt

38
References
  • Can use references to elements and attribute
    definitions.
  • ltxsdelement nameSTAFFNO typexsdstring/gt
  • .
  • ltxsdelement ref STAFFNO/gt
  • If there are many references to STAFFNO, use of
    references will place definition in one place and
    improve the maintainability of the schema.

39
Defining New Types
  • Can also define new data types to create elements
    and attributes.
  • ltxsdsimpleType name STAFFNOTYPEgt
  • ltxsdrestriction base xsdstringgt
  • ltxsdmaxLength value 5/gt
  • lt/xsdrestrictiongt
  • lt/xsdsimpleTypegt
  • New type has been defined as a restriction of
    string (to have maximum length of 5 characters).

40
Groups
  • Can define both groups of elements and groups of
    attributes. Group is not a data type but acts as
    a container holding a set of elements or
    attributes.
  • ltxsdgroup name StaffTypegt
  • ltxsdsequencegt
  • ltxsdelement nameStaffNo
    typeStaffNoType/gt
  • ltxsdelement namePosition typePositionType
    /gt
  • ltxsdelement nameDOB type xsddate/gt
  • ltxsdelement nameSalary typexsddecimal/gt
  • lt/xsdsequencegt
  • lt/xsdgroupgt

41
Constraints
  • XML Schema provides XPath-based features for
    specifying uniqueness constraints and
    corresponding reference constraints that will
    hold within a certain scope.
  • ltxsdunique name NAMEDOBUNIQUEgt
  • ltxsdselector xpath STAFF/gt
  • ltxsdfield xpath NAME/LNAME/gt
  • ltxsdfield xpath DOB/gt
  • lt/xsduniquegt

42
XML Schema Version of Bank
  • ltxsdschema xmlnsxsdhttp//www.w3.org/2001/XMLSc
    hemagt
  • ltxsdelement namebank typeBankType/gt
  • ltxsdelement nameaccountgtltxsdcomplexTypegt
    ltxsdsequencegt ltxsdelement
    nameaccount-number typexsdstring/gt
    ltxsdelement namebranch-name
    typexsdstring/gt ltxsdelement
    namebalance typexsddecimal/gt
    lt/xsdsquencegtlt/xsdcomplexTypegt
  • lt/xsdelementgt
  • .. definitions of customer and depositor .
  • ltxsdcomplexType nameBankTypegtltxsdsquencegt
  • ltxsdelement refaccount minOccurs0
    maxOccursunbounded/gt
  • ltxsdelement refcustomer minOccurs0
    maxOccursunbounded/gt
  • ltxsdelement refdepositor minOccurs0
    maxOccursunbounded/gt
  • lt/xsdsequencegt
  • lt/xsdcomplexTypegt
  • lt/xsdschemagt

43
References
  • http//www.java.sun.com/xml/docs/tutorial/TOC.html
  • http//www.xml.com/pub/a/1999/09/expat/index.html
  • http//xmlfiles.com/dtd/dtd_attributes.asp
  • http//xmlwriter.net/xml_guide/doctype_declaration
    .shtml

44
What is an XML Parsing API?
  • Programming model for accessing an XML document
  • Sits on top of an XML parsing engine
  • Language/platform independent

45
Java XML Parsing Specification
  • The Java XML Parsing Specification is a request
    to include a standardised way of parsing XML into
    the Java standard library
  • The specification defines the following packages
  • javax.xml.parsers
  • org.xml.sax
  • org.xml.sax.helpers
  • org.w3c.dom
  • The first is an all-new plugability layer, the
    others come from existing packages

46
Two ways of using XML parsers SAX and DOM
  • The Java XML Parsing Specification specifies two
    interfaces for XML parsers
  • Simple API for XML (SAX) is a flat, event-driven
    parser
  • Document Object Model (DOM) is an object-oriented
    parser which translates the XML document into a
    Java Object hierarchy

47
SAX
  • Simple API for XML
  • Event-based XML parsing API
  • Not governed by any standards body
  • Guy named David Megginson basically owns it
  • SAX is simply a programming model that the
    developers of individual XML parsers implement
  • SAX parser written in Java would expose the
    equivalent events
  • "serial access" protocol for XML

48
SAX (cont)
  • A SAX parser reads the XML document as a stream
    of XML tags
  • starting elements, ending elements, text
    sections, etc.
  • Every time the parser encounters an XML tag it
    calls a method in its HandlerBase object to deal
    with the tag.
  • The HandlerBase object is usually written by the
    application programmer.
  • The HandlerBase object is given as a parameter to
    the parse() method in the SAX parser. It includes
    all the code that defines what the XML tags
    actually do.

49
How Does SAX work?
XML Document
SAX Objects
lt?xml version1.0?gt
Parser
ltaddressbookgt lt/addressbookgt
Parser
ltpersongt lt/persongt
ltnamegtJohn Doelt/namegt
Parser
ltemailgtjdoe_at_yahoo.comlt/emailgt
Parser
Parser
ltpersongt lt/persongt
Parser
ltnamegtJane Doelt/namegt
Parser
Parser
ltemailgtjdoe_at_mail.comlt/emailgt
Parser
Parser
50
SAX structure
51
SAX tutorial
  • http//java.sun.com/xml/jaxp/dist/1.1/docs/tutoria
    l/sax/index.html
  • Notes some files are at
  • http//www.ics.uci.edu/ics185/handouts/slides13-s
    ax/

52
More info about SAX
  • Read the tutorial
  • http//java.sun.com/xml/jaxp/dist/1.1/docs/tutoria
    l/sax/index.html

53
Document Object Model (DOM)
  • Most common XML parser API
  • Tree-based API
  • W3C Standard
  • All DOM compliant parsers use the same object
    model

54
DOM (cont)
  • A DOM parser is usually referred to as a document
    builder. It is not really a parser, more like a
    translator that uses a parser.
  • In fact, most DOM implementations include a SAX
    parser within the document builder.
  • A document builder reads in the XML document and
    outputs a hierarchy of Node objects, which
    corresponds to the structure of the XML document.

55
How Does DOM work?
DOM Objects
XML Document
lt?xml version1.0?gt
ltaddressbookgt lt/addressbookgt
ltpersongt lt/persongt
XML Parser
ltnamegtJohn Doelt/namegt
ltemailgtjdoe_at_yahoo.comlt/emailgt
ltpersongt lt/persongt
ltnamegtJane Doelt/namegt
ltemailgtjdoe_at_mail.comlt/emailgt
56
DOM Structure Model and API
  • hierarchy of Node objects
  • document, element, attribute, text, comment, ...
  • language independent programming DOM API
  • get... first/last child, prev/next sibling,
    childNodes
  • insertBefore, replace
  • getElementsByTagName
  • ...
  • Alternative event-based SAX API (Simple API for
    XML)
  • does not build a parse tree (reports events when
    encountering begin/end tags)
  • for (partially) parsing very large documents

57
DOM references
  • Online tutorial
  • http//java.sun.com/xml/jaxp/dist/1.1/docs/tutoria
    l/dom/index.html
  • API
  • http//java.sun.com/j2se/1.4.1/docs/guide/plugin/d
    om/

58
Validating versus Non-validating
  • An XML document is well-formed if its
    syntactically correct
  • An XML document is valid if its well-formed, and
    it conforms to all constraints imposed by a DTD.
  • A parser is validating if it tells whether an XML
    document is valid. Otherwise, its
    non-validating.
  • The tutorial has examples for both validating
    parsers and non-validating parsers
  • All of them check the well-formedness of an XML
    document
  • Here we focus on those non-validating parsers.

59
Querying XML
Application/User Query over XML Documents
XML Result (processed or displayed in browser)
Query Engine
60
Outline
  • XML queries
  • Many standards
  • As an example X-Query

61
Example XML DTD
lt!ELEMENT book (booktitle, author)gt lt!ELEMENT
booktitle (PCDATA)gt lt!ELEMENT author (name,
address)gt lt!ATTLIST author id ID
REQUIREDgt lt!ELEMENT name (firstname?,
lastname)gt lt!ELEMENT firstname (PCDATA)gt lt!ELEMEN
T lastname (PCDATA)gt lt!ELEMENT address
ANYgt lt!ELEMENT article (title, author,
contactauthor)gt lt!ELEMENT title
(PCDATA)gt lt!ELEMENT contactauthor
EMPTYgt lt!ATTLIST contactauthor authorID IDREF
IMPLIEDgt lt!ELEMENT monograph (title, author,
editor)gt lt!ELEMENT editor (monograph)gt lt!ATTLIST
editor name CDATA REQUIREDgt
62
DTD Graph
63
An example XML document
  • ltbookgt
  • ltbooktitlegt Gene lt/booktitlegt
  • ltauthor id dawkinsgt
  • ltnamegt
  • ltfirstnamegt Richard lt/firstnamegt
  • ltlastnamegt Dawkins lt/lastnamegt
  • lt/namegt
  • ltaddressgt
  • ltcitygt Timbuktu lt/citygt
  • ltzipgt 99999 lt/zipgt
  • lt/addressgt
  • lt/authorgt
  • lt/bookgt
  • Note an XML can be rooted at any element in the
    DTD!

64
An XML Query Language X-Query
  • Full specifications http//www.w3.org/TR/xquery/
  • FLWOR Expressions
  • for
  • let
  • where
  • order by
  • return

65
X-Query Example Q1
Find the last names of the authors of the book(s)
titled Gene.
  • ltgeneLastnameListgt
  • for b in doc(pub.xml")//book
  • let t b/booktitle
  • where t Gene
  • return
  • ltgeneLastnamegt
  • b/author/name/lastname
  • lt/geneLastname gt
  • lt/geneLastnameListgt

Results ltgeneLastnameListgt ltgeneLastNamegt
lt/geneLastNamegt ltgeneLastNamegt
lt/geneLastNamegt lt/geneLastnameListgt
66
for versus let
  • Both clauses bind variables
  • for is bound to each of the resulting tuples
  • Tuples are iterated one by one
  • let is bound to the entire resulting tuples

67
for example
  • for s in (ltone/gt, lttwo/gt, ltthree/gt)
  • return ltoutgtslt/outgt
  • Tuple stream
  • ltoutgt
  • ltone/gt
  • lt/outgt
  • ltoutgt
  • lttwo/gt
  • lt/outgt
  • ltoutgt
  • ltthree/gt
  • lt/outgt

68
let example
  • let s (ltone/gt, lttwo/gt, ltthree/gt)
  • return ltoutgtslt/outgt
  • Tuple Stream
  • ltoutgt
  • ltone/gt
  • lttwo/gt
  • ltthree/gt
  • lt/outgt

69
Multiple lets
  • for i in (1, 2), j in (3, 4)
  • Tuple stream
  • (i 1, j 3)
  • (i 1, j 4)
  • (i 2, j 3)
  • (i 2, j 4)

70
Path expressions
  • Use / or // to separate
  • Starting with /
  • Example /student
  • The root must be the tag student
  • Starting with //
  • Example //student
  • The root or its decedent must be the tag
    student.

71
Evaluating E1/E2
  • Expression E1 is evaluated,
  • If the result is not a sequence of nodes, raise a
    dynamic error
  • Each node resulting from the evaluation of E1
    then serves in turn to provide an inner focus for
    an evaluation of E2
  • Each evaluation of E2 must result in a sequence
    of nodes
  • otherwise, a dynamic error is raised.
  • The sequences of nodes resulting from all the
    evaluations of E2 are merged, eliminating
    duplicate nodes based on node identity.

72
Example
  • a/b/c 3,2,5,8,1

a
b
b
b
c
c
c
c
c
1
8
2
3
5
73
Example Q2
Find the names of the authors of the book(s)
titled Gene.
  • ltgeneNameListgt
  • for b in doc(pub.xml")//book
  • where b/booktitle Gene
  • return
  • ltgeneNamegt
  • b/author/name/lastname
  • b/author/name/firstname
  • lt/geneName gt
  • lt/geneNameListgt

Results ltgeneNameListgt ltgeneNamegt
lt/geneNamegt ltgeneNamegt lt/geneNamegt lt/gene
NameListgt
74
Example Q3
For each author last name, list his/her address
and the titles of all his/her books. Sort the
results based on the lastnames.
  • ltauthorInfogt
  • for ln in distinct-values(doc(pub.xml)/article/
    author/name/lastname)
  • order by ln
  • return
  • ltauthor aLastname " ln "gt
  • for b in doc(pub.xml")//book
  • a in b/author
  • where a/lastname ln
  • return
  • lttitlegt b/booktitle lt/titlegt
  • ltaddrgt a/address lt/addrgt
  • lt/authorgt
  • lt/authorInfogt

Results ltauthorInfogt ltauthor aLastname
gt lttitlegt lt/titlegt ltaddrgt
lt/addrgt lttitlegt lt/titlegt
ltaddrgt lt/addrgt lt/authorgt lt/authorInfogt
75
Example Q4
  • ltarticle-pairsgt
  • for ar1 in doc(pub.xml")//article
  • a1 in ar1/author
  • ar2 in doc(pub.xml")//article
  • where some a2 in ar2/author
  • satisfies (a2/lastname a1/lastname
    and a2/firstnamea1/firstname)
  • and ar1/title lt ar2/title
  • return
  • ltarticle-pairgt
  • ar1/title
  • ar2/title
  • lt/article-pairsgt

Find all article pairs by the same author
(without duplicates).
Results ltarticle-pairsgt lt article-pair gt
title1, title2 lt/article-pairgt lt/
article-pairsgt
76
some, every, satisfies
  • True
  • some x in (1, 2, 3) satisfies x 1
  • some x in (1, 2, 3), y in (5, 6, 7) satisfies
    x y 8
  • every x in (1, 2, 3), y in (5, 6, 7) satisfies
    x lt y
  • False
  • some x in (1, 2, 3), y in (5, 6, 7)
    satisfies x y 20
  • every x in (1, 2, 3) satisfies x 1

77
Problem
  • Given
  • DTDs
  • Collection of XML documents conforming to DTDs
  • Query
  • Based on DTD schemas
  • Over collection of XML documents, performing
    selections, joins, etc.
  • Producing an XML result

78
RDF
  • http//www.w3.org/TR/REC-rdf-syntax (2/99)
  • purpose metadata for Web
  • help search engines
  • syntax in XML
  • semantics edge-labeled graphs

79
Resource Description Framework (RDF)
  • Even XML Schema does not provide the support for
    semantic interoperability required.
  • For example, when two applications exchange
    information using XML, both agree on use and
    intended meaning of the document structure.
  • Must first build a model of the domain of
    interest, to clarify what kind of data is to be
    sent from first application to second.
  • However, as XML Schema just describes a grammar,
    there are many different ways to encode a
    specific domain model into an XML Schema, thereby
    losing the direct connection from the domain
    model to the Schema.

80
Resource Description Framework (RDF)
  • Problem compounded if third application wishes to
    exchange information with other two.
  • Not sufficient to map one XML Schema to another,
    since the task is not to map one grammar to
    another grammar, but to map objects and relations
    from one domain of interest to another.
  • Three steps required
  • reengineer original domain models from XML
    Schema
  • define mappings between the objects in the domain
    models
  • define translation mechanisms for the XML
    documents, for example using XSLT.

81
Resource Description Framework (RDF)
  • RDF is infrastructure that enables encoding,
    exchange, and reuse of structured meta-data.
  • This infrastructure enables meta-data
    interoperability through design of mechanisms
    that support common conventions of semantics,
    syntax, and structure.
  • RDF does not stipulate semantics for each domain
    of interest, but instead provides ability for
    these domains to define meta-data elements as
    required.
  • RDF uses XML as a common syntax for exchange and
    processing of meta-data.

82
RDF Data Model
  • Basic RDF data model consists of three objects
  • Resource anything that can have a URI eg, a
    Web page, a number of Web pages, or a part of a
    Web page, such as an XML element.
  • Property a specific attribute used to describe
    a resource eg, attribute Author may be used to
    describe who produced a particular XML document.
  • Statement consists of combination of a
    resource, a property, and a value.

83
RDF Data Model
  • Components known as subject, predicate, and
    object of an RDF statement.
  • Example statement
  • Author of http//www.dh.co.uk/staff_list.xml is
    John White
  • ltrdfRDF xmlnsrdfhttp//www.w3.org/1999/02/22-r
    df-syntax-ns xmlnsshttp//www.dh.co.uk/schema
    /"gt
  • ltrdfDescription abouthttp//www.dh.co.uk/sta
    ff_list.xmlgt
  • ltsAuthorgtJohn Whitelt/sAuthorgt
  • lt/rdfDescriptiongt
  • lt/rdfRDFgt

84
RDF Data Model
  • To store descriptive information about the
    author, model author as a resource.

85
RDF Schema
  • Specifies information about classes in a schema
    including properties (attributes) and
    relationships between resources (classes).
  • RDF Schema mechanism provides a basic type system
    for use in RDF models, analogous to XML Schema.
  • Defines resources and properties such as
    rdfsClass and rdfssubClassOf that are used in
    specifying application-specific schemas.
  • Also provides a facility for specifying a small
    number of constraints such as cardinality.

86
RDF Metadata standard
  • ltrdfDescription aboutwww.mypage.comgt
  • ltaboutgt birds, butterflies, snakes
    lt/aboutgt
  • ltauthorgt ltrdfDescriptiongt
  • ltfirstnamegt John
    lt/firstnamegt
  • ltlastnamegt Smith
    lt/lastnamegt
  • lt/rdfDescriptiongt
  • lt/authorgt
  • lt/rdfDescriptiongt

87
More RDF Examples
88
RDF Terminology
statement
89
More RDF Containers
  • bag, sequence, alternative
  • ltrdfDescriptiongt ltagt ltrdfBaggt

  • ltrdfligt s1 lt/rdfligt

  • ltrdfligt s2 lt/rdfligt
  • lt/rdfBaggt
  • lt/agt
  • lt/rdfDescriptiongt

90
RDF Containers (contd)
a
rdftype
rdf_2
rdf_1
Bag
s1
s2
91
More RDF Higher Order Statements
  • the author of www.thispage.com says the topic
    of www.thatpage.com is environment

RDF uses reification
92
What/where is the Semantic Web?
  • Machine-understandable information Semantic Web
  • A new form of Web content that is meaningful to
    computers will unleash a revolution of new
    possibilities

93
What/where is the Semantic Web?
  • Layered on top of existing Web. (Just like HTTP
    is built on top of TCP, which is on top of IP,
    which is on top of the data-link layer)

research / vapourware
solid implementations
TCPIP Data-Link
94
Layer 1 URI
  • Everything is a Resource (people, books, the
    attribute title of an Amazon book object, Web
    pages, the concept laziness, )
  • Ever resource has a unique identifier -- Uniform
    Resource Identifier
  • eg, the URI of a Web Page is its URL
  • Eg, the URI of my email address is
    mailtonick_at_ucd.ie
  • Owner of object can pick any URI they want as
    long is it is unique. Often has URL-like
    syntax but that is purely convention/arbitrary

95
Layer 2 XML
  • Use XML as common formatting standard for
    encoding data.
  • (Could invent a new format for every kind of data
    but why bother?)
  • ltbookgtlttitlegtWar Peacelt/titlegtlt/bookgt
  • lttaxonomy idamazongt ltconcept superclassthinggtbo
    oklt/conceptgt ltattribute classbookgttitlelt/attribu
    tegt lt/taxonomygt
  • ltontologygt ltmatchgtltsource fromamazongttitlelt/sour
    cegt ltdest ontfredhannagtnamelt/destgtlt/matchgt lt/
    ontologygt

Data
Meta-Data
Meta-Meta-Data
Danger/Warning Made-up syntax!!
96
XML Schema
  • An XML Schema document is an XML document that
    defines a set of XML tags (and how they may be
    used)

97
XML Namespaces
  • An XML documents may use tags defined in more
    than one XML Schema document
  • Namespace prefixes (xxxyyy) are used to
    unambiguously point to the defining XML Schema
    document

ltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/22-r
df-syntax-ns" xmlnsdc"http//purl.org/dc/
elements/1.1/"gt ltrdfDescription
about"http//www.cs.ucd.ie/staff/nick"gt
ltdctitlegtNicks Home Pagelt/dctitlegt
lt/rdfDescriptiongtlt/rdfRDFgt
98
Layer 3 RDF
  • All data/knowledge/facts/opinions/information is
    expressed on the Semantic Web as Resource
    Description Framework statements
  • Very simple language for making assertions
  • Triple (value) (attribute) (object)
  • (nick_at_ucd.ie) (is email address of) (Nick
    Kushmerick)
  • (0140444173) (is ISBN number of) (War Peace)
  • (field 5 of database A) (is a field of type)
    (postal code)

99
Everything is XML
  • Remember (Nicks Home Page) (is title of)
    (http//www.cs.ucd.ie/staff/nick)is actually
    encoded as some very ugly XMLlt?xml
    version"1.0"?gtlt!DOCTYPE rdfRDF SYSTEM
    "http//purl.org/dc/schemas/dcmes-xml-20000714.dtd
    "gtltrdfRDF xmlnsrdf"http//www.w3.org/1999/02/2
    2-rdf-syntax-ns" xmlnsdc"http//purl.or
    g/dc/elements/1.1/"gt ltrdfDescription
    about"http//www.cs.ucd.ie/staff/nick"gt
    ltdctitlegtNicks Home Pagelt/dctitlegt
    lt/rdfDescriptiongtlt/rdfRDFgt

100
Layer 4 Ontologies (RDF Schema)
  • There are lots of common RDF attribute-sets for
    lots of common tasks
  • eg -- Dublin Core Ohio, Sorry! defines a few
    dozen standard attributes for asserting
    statements about documents title, author, date,
    version, format, owner,
  • But suppose you want to define your own
    concepts/attributes --
  • RDF Schema set of RDF tags for defining a new
    set of RDF tags (no, this isnt circular)

101
RDF Schema for Dublin Core Ontology
102
Layer 4½ Mapping Between Ontologies
  • Taxonomy Crisis
  • How can your agent know that my title is your
    name?!
  • How can my agent know that some of your address
    objects are post-boxes, not physical addresses?!
  • How can my agent know that many Asian first names
    correspond to Western surnames?
  • Semantic Web Solution Services for
    translating/mapping between related ontologies.
  • Suppose Amazon.com uses Dublin Core (title),
    while Fred Hanna uses its own document ontology
    (name). So far my agent is forced to choose
    a ontology, or must be carefully crafted to
    understand both lanuages
  • A better solution A niche now exists for a
    independent entity (UniversalBookInfo.com) that
    maps title ? name etc

103
without UniversalBookInfo.com
Nick wants tobuy War Peace
Nicksvery complicatedagent
Programmersbank account

Amazonontology
FredHannaontology
Amazon
Fred Hanna
104
with UniversalBookInfo.com
Nick wants tobuy War Peace
Nicks agent
Joes agent


Janes Agent

UniversalBookInfo.com
Amazon
Fred Hanna

Bank Account
105
Layer 5 Logic
DAML OIL
  • Ontologies also allow axioms
  • All people have brains
  • Expressiveness Key challenge in formalizing
    axioms want to be able to say anything you need
    to in a particular domain.
  • All people have brains, except George Bush.
  • But more expressive logics mean slower inference
  • Intuitively, applying a rule such as You cant
    fool all of the people all of the time could
    require checking everyone in the universe to
    determine if there exists even one foolable
    person.

106
Integrating Services
  • Source can be services rather than data
    repositories
  • Eg. Amazon as a composite service for book buying
  • Separating line is somewhat thin
  • Handling services
  • Description (APII/O spec)
  • WSDL
  • Composition
  • Planning in general
  • Execution
  • Data-flow architectures
  • See next part

107
Who will annotate the data?
  • Semantic web works if the users annotate their
    pages using some existing ontology (or their own
    ontology, but with mapping to other ontologies)
  • But users typically do not conform to standards..
  • and are not patient enough for delayed
    gratification
  • The way to force them is to act as if you are
    helping them write web-pages
  • Currently most people dont write their HTML
    codethe MS frontpages and Claris Homepages of
    the world do..
  • What if we change the MS Frontpage/Claris
    Homepage so that they (slyly) add annotations?
  • E.g. The Mangrove project at U. Wash.
  • Help user in tagging their data (allow graphical
    editing)
  • Provide instant gratification by running services
    that use the tags.

108
Layer 6 Proofs
ugly XML encoding
Proof Verifier
Yes this proof is correct No this proof is flawed
(Easy to build once the Logic layer is fixed)
109
Proofs Huh?!??!
ugly XML encoding
I would like to buy this bookplease send my
company an invoice
I am an employee of XYZ Corp(because it says so
on this Webpage, which is an XYZ Corpofficial
document)
OK, book successfully ordered
Proof Verifier
Yes this proof is correct No this proof is flawed
(Easy to build once the Logic layer is fixed)
Sorry, we need a credit card!
110
Proofs ? Trust
  • In the Semantic Web, a proof is a procedure
    that can be automatically followed in order to
    verify an assertion.
  • Believability is always relative to a set of
    resources that you trust
  • I own bank account 239489248234, because my
    Digital Signature XXXX matches the record on Web
    page http//bank.com/accounts, and you trust this
    page because you own bank.com

111
Summary
  • Distributed global information ecosystem enables
    wide variety of value-added information services
    (monitoring your online purchases finding
    entertainment in which you might be interested
    scheduling appointments )
  • But doing so is difficult/impossible if relevant
    data is tied up in legacy documents intended for
    human eyes/common sense
  • The Semantic Web as Global Database/Brain for All
    Humanity -- Probably hopelessly futile
  • But within sufficiently motivated (ie, rich)
    segments of the Web todays Syntactic Web may
    well evolveinto A Semantic Web

Rose colored glasses are never made in bi-focals
because no-body wants to read smallprint in
dreams
Write a Comment
User Comments (0)
About PowerShow.com