XML

1 / 63
About This Presentation
Title:

XML

Description:

Document Object Model (XML API's) 3. Some Useful Articles. XML, Java, and the future of the web ... managedBy Sandra /managedBy /projects Project and ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 64
Provided by: utda

less

Transcript and Presenter's Notes

Title: XML


1
  • XML

2
Outline (ambitious)
  • Background documents (SGML/HTML) and databases
    (structured and semistructured data)
  • XML Basics and Document Type Descriptors
  • XML query languages XML-QL and XSL.
  • XML additions Xlink, Xpointer, RDF, SOX,
    XML-Data
  • Document Object Model (XML API's)

3
Some Useful Articles
  • XML, Java, and the future of the web
  • http//webreview.com/wr/pub/97/12/19/xml/index.htm
    l
  • XML and the Second-Generation Web
  • http//www.sciam.com/1999/0599issue/0599bosak.html
  • Articles/standards for XML, XSL, XML-QL
    http//www.w3c.org/
  • http//www.w3.org/TR/REC-xml

4
Part I Background
  • Whats the difference between the world of
    documents and information retrieval and databases
    and query interfaces?

5
Documents vs Databases
  • Document world
  • plenty of small documents
  • usually static
  • implicit structure
  • section, paragraph, toc,
  • tagging
  • human friendly
  • content
  • form/layout, annotation
  • Paradigms
  • Save as, wysiwyg
  • meta-data
  • author name, date, subject
  • Database world
  • a few large databases
  • usually dynamic
  • explicit structure (schema)
  • records
  • machine friendly
  • content
  • schema, data, methods
  • Paradigms
  • Atomicity, Concurrency, Isolation, Durability
  • meta-data
  • schema description

6
What to do with them
  • Documents
  • editing
  • printing
  • spell-checking
  • counting words
  • retrieving (IR)
  • searching
  • Database
  • updating
  • cleaning
  • querying
  • composing/transforming

7
HTML
  • Publishing hypertext on the World Wide Web
  • Designed to describe how a Web browser should
    arrange text, images and push-buttons on a page.
  • Easy to learn, but does not convey structure.
  • Fixed tag set.

Text (PCDATA)
Opening tag
Welcome to the XML
course Introduction
Closing tag
Bachelor tag
Attribute name
Attribute value
8
The Structure of XML
  • XML consists of tags and text
  • Tags come in pairs ...
  • They must be properly nested
  • ... ... --- good
  • ... ... --- bad
  • (You cant do ... ... ... in
    HTML)

9
XML text
  • XML has only one basic type -- text.
  • It is bounded by tags e.g.
  • The Big Sleep
  • 1935 --- 1935 is still text
  • XML text is called PCDATA (for parsed
  • character data). It uses a 16-bit encoding,
  • e.g. \\x0152 for the Hebrew letter Mem
  • Later we shall see how new types are specified by
    XML-data

10
XML structure
  • Nesting tags can be used to express various
    structures. E.g. A tuple (record)

Malcolm Atchison
(215) 898 4321
mp_at_dcs.gla.ac.sc
11
XML structure (cont.)
  • We can represent a list by using the same
  • tag repeatedly

...
... ...
...
12
Terminology
  • The segment of an XML document between an opening
    and a corresponding closing tag is called an
    element.

Malcolm Atchison
(215) 898 4321
(215) 898 4321 mp_at_dcs.gla.ac.sc

element
element, a sub-element of
not an element
13
XML is tree-like
Malcolm Atchison
(215) 898 4321
(215) 898 4321
mp_at_dcs.gla.ac.sc
Semistructured data models typically put the
labels on the edges
14
Mixed Content
  • An element may contain a mixture of sub-elements
    and PCDATA
  • British Airways
  • Worlds
    favorite airline
  • Data of this form is not typically generated from
    databases. It is needed for consistency with HTML

15
A Complete XML Document
  • Malcolm Atchison
  • (215) 898 4321
  • mp_at_dcs.gla.ac.sc

16
Two ways of representing a DB

projects
title budget managedBy
employees
name ssn age
17
Project and Employee relations in XML
Projects and employees are intermixed
  • Pattern recognition
  • 10000
  • Joe
  • Joe
  • 344556
  • 34

Sandra
2234 35
Auto
guided vehicle 70000
Sandra

18
Project and Employee relations in XML (contd)
Employees follow projects
Joe
344556
34
Sandra 2234
35


Pattern recognition
10000
Joe
Auto guided vehicles
70000
Sandra

19
Project and Employee relations in XML (contd)
Or without separator tags
Pattern
recognition 10000
Joe
Auto guided vehicles
70000 Sandra

Joe
344556 34
Sandra 2234
35
20
Attributes
  • An (opening) tag may contain attributes. These
    are typically used to describe the content of
    an element
  • cheese
  • fromage
  • branza
  • A food made
  • Order of attributes in an element does not matter
  • XML elements are ordered

21
Attributes (contd)
  • Another common use for attributes is to express
    dimension or type
  • 2400
  • 96
  • M05-.C_at_02!G96YE
  • A document that obeys the nested tags rule and
    does not repeat an attribute within a tag is said
    to be well-formed .

22
When to use attributes
  • Its not always clear when to use attributes

F. MacNiel
fmacn_at_dcs.barra.ac.sc
...
123 45 6789
F. MacNiel
fmacn_at_dcs.barra.ac.sc
...
23
XML Misc.
  • Apart from elements and attributes, XML allows
    processing instructions and comments. A
    processing instruction is a statement of the
    form
  • A comment takes the following form enclose
    comments between

24
Part III Document Type Descriptors
  • Imposing structure on XML documents

25
Document Type Descriptors
  • Document Type Descriptors (DTDs) impose structure
    on an XML document.
  • There is some relationship between a DTD and a
    schema, but it is not close -- hence the need for
    additional typing systems.
  • The DTD is a syntactic specification.

26
Example The Address Book
  • MacNiel, John
  • Dr. John MacNiel
  • 1234 Huron Street
  • Rome, OH 98765
  • (321) 786 2543
  • (321) 786 2543
  • (321) 786 2543
  • jm_at_abc.com

Exactly one name
At most one greeting
As many address lines as needed (in order)
Mixed telephones and faxes
As many as needed
27
Specifying the structure
  • name to specify a name element
  • greet? to specify an optional (0 or 1)
    greet elements
  • name,greet? to specify a name followed by an
    optional greet

28
Specifying the structure (cont)
  • addr to specify 0 or more address lines
  • tel fax a tel or a fax element
  • (tel fax) 0 or more repeats of tel or fax
  • email 0 or more email elements

29
Specifying the structure (cont)
  • So the whole structure of a person entry is
    specified by
  • name, greet?, addr, (tel fax), email
  • This is known as a regular expression. Why is it
    important?

30
Regular Expressions
  • Each regular expression determines a
    corresponding finite state automaton. Lets start
    with a simpler example
  • name, addr, email
  • This suggests a simple parsing program

addr
name
email
31
Another example
  • name,address,(tel fax),email

address
email
tel
tel
name
email
fax
fax
email
Adding in the optional greet further complicates
things
32
A DTD for the address book
  • (name, greet?, address, (fax tel),
    email)

33
Our relational DB revisited

projects
title budget managedBy
employees
name ssn age
34
Two DTDs for the relational DB
(projects,employees) (project)
managedBy) age) ...

employee) budget, managedBy) ssn, age) ...
35
Some things are hard to specify
  • Each employee element is to contain name, age and
    ssn elements in some order.
  • ( (name, age, ssn) (age, ssn, name)
  • (ssn, name, age) ...
  • )
  • Suppose there were many more fields !

36
Summary of XML regular expressions
  • A The tag A occurs
  • e1,e2 The expression e1 followed by e2
  • e 0 or more occurrences of e
  • e? Optional -- 0 or 1 occurrences
  • e 1 or more occurrences
  • e1 e2 either e1 or e2
  • (e) grouping

37
Specifying attributes in the DTD
  • dimension CDATA REQUIRED
  • accuracy CDATA IMPLIED
  • The dimension attribute is required the accuracy
    attribute is optional.
  • CDATA is the type of the attribute -- it means
    string.

38
The DTD Language
  • Default modifiers in DTD attributes

39
The DTD Language
  • Datatypes in DTD attributes

40
Consistency of ID and IDREF attribute values
  • If an attribute is declared as ID
  • the associated values must all be distinct (no
    confusion)
  • Id is a poor cousin of a key in relational
    databases.
  • If an attribute is declared as IDREF
  • the associated value must exist as the value of
    some ID attribute (no dangling pointers)
  • IDREF is a poor cousin of foreign key in
    relational databases.
  • Similarly for all the values of an IDREFS
    attribute
  • An attribute of type IDREFS represent a
    space-separated list of strings of references to
    valid IDs.
  • ID and IDREF attributes are not typed

41
Specifying ID and IDREF attributes
  • id ID REQUIRED
  • mother IDREF IMPLIED
  • father IDREF IMPLIED
  • children IDREFS IMPLIED

42
Some conforming data
  • father"john"
  • Jane Doe
  • John Doe
  • Mary Doe
  • father"john"
  • Jack Doe

43
An alternative specification
  • children?)

44
The revised data
  • Jane Doe
  • John Doe
  • ...

45
Types of Attributes
  • Enumerated - List of values (You must use one of
    the items)
  • ALIGN (LEFT CENTER RIGHT) "LEFT"
  • Programming XML in Java ALIGN"CENTER" Programming XML in Java

46
Types of Attributes
  • NMTOKEN - The characters of an NMTOKEN value must
    be a letter, digit, '.', '-', '_', or ''.
  • It may not include whitespace.
  • student_name
  • student_no NMTOKEN REQUIRED
  • Jo
    Smith

47
Types of Attributes
  • Entities
  • XML's way of referring to a data item.
  • Text or Binary data.
  • General Entity
  • Use in the content of XML document
  • References start with '' and end with ''
  • Parameter Entity
  • Use in a DTD
  • References start with '' and end with ''
  • Internal Entity - Defined in XML Document
  • External Entity - Defined in a external source
    file, URI.

48
Types of Attributes
  • Internal General Entities
  • Example 1DATE (PCDATA)

  • TODAY
  • Example 2

49
Types of Attributes
  • External General Entities
  • FPI URIExample

  • TODAY

50
Types of Attributes
  • Predefined General Entity References
  • amp  - The character
  • apos - The ' character
  • gt   - The character
  • lt   - The
  • quot - The " character
  • lkhanat_newutdal
    las.edu

51
Types of Attributes
  • Parameter Entities
  • Internal
  • Externalname PUBLIC FPI URI
  • Example


52
A useful abbreviation
  • When an element has empty content we can use
  • for blahblahbla
  • For example
  • Jane Doe
  • ...

53
Schema.dtd
  • )
  • REQUIRED

54
Schema.dtd (contd)
  • directed)
  • REQUIRED

55
The DTD Language
  • Example Sales Order Document
  • An order document is comprised of several sales
    orders. Each individual order has a number and it
    contains the customer information, the date when
    the order was received, and the items ordered.
    Each customer has a number, a name, street, city,
    state, and ZIP code. Each item has an item
    number, parts information and a quantity. The
    parts information contains a number, a
    description of the product and its unit price.
  • The numbers should be treated as attributes.

56
The DTD Language
  • Example Sales Order Document DTD




)










57
The DTD Language
  • Example Sales Order XML Document

  • ABC
    Industries 123 Main
    St. Chicago
    IL 60609
    10222000

    Turkey wrench
    9.95
    10

58
Connecting the document with its DTD
  • In line
  • ...
  • Another file
  • A URL
  • "http//www.schemaauthority.com/
    schema.dtd"

59
Well-formed and Valid Documents
  • Well-formed applies to any document (with or
    without a DTD) proper nesting of tags and unique
    attributes
  • Valid specifies that the document conforms to the
    DTD conforms to regular expression grammar,
    types of attributes correct, and constraints on
    references satisfied

60
DTDs v.s Schemas (or Types)
  • By database (or programming language) standards
    DTDs are rather weak specifications.
  • Only one base type -- PCDATA
  • No useful abstractions e.g., sets
  • IDREFs are untyped. You point to something, but
    you dont know what!
  • No constraints e.g., child is inverse of parent
  • No methods
  • Tag definitions are global
  • Some of the XML extensions impose something like
    a schema or type on an XML document. Well see
    these later

61
Lots of possibilities for schemas
  • XML Schema (under W3Cs spotlight)
  • XDR (Microsofts BizTalk)
  • SOX (Schema for Object-Oriented XML)
  • Schematron
  • DSD (ATT Labs and BRICS)
  • and more.

62
Some tools
  • XML Authority http//www.extensibility.com/tibco/s
    olutions/xml_authority/index.htm
  • XML Spy http//www.xmlspy.com
    /download.html

63
Summary
  • XML is a new data format. Its main virtues are
    widespread acceptance and the (important) ability
    to handle semistructured data (data without
    schema)
  • DTDs provide some useful syntactic constraints on
    documents. As schemas they are weak
  • How to store large XML documents?
  • How to query them?
  • How to map between XML and other representations?
Write a Comment
User Comments (0)