Title: XML Data and Technologies
1XML Data and Technologies
- Chapter 30.2.2, 30.3.2 - 30.3.5, 30.4, 30.5 (but
30.5.1, 30.5.2, 30.5.4 30.5.6)
2XML Document Modeling
- XML vocabularies
- Are designed for a specific type of content
- Conform general XML standards
- All documents from the same application must also
conform to a consistent set of rules - Modeling tools
- XML Data Type Definitions (DTDs)
- XML schemas
3Data Type Definitions
- DTD forms a vocabulary
- Defines a set of allowable elements
- Defines a set of allowable attributes
- DTD forms the grammar
- Content model is a pattern which determines the
element and attribute appearance - DTD facilitates content management
4Declaring a DTD
- Internal DTD is placed within the same document
- lt!DOCTYPE root_element
- declarations gt
- External DTD all declarations are placed to a
separate file with extension .dtd
5Declaring an External DTD
- Private external DTD is stored locally on the
server - lt!DOCTYPE root_element SYSTEM URLgt
- Public external DTD
- lt!DOCTYPE root_element
- PUBLIC FPI URLgt,
- Where FPI formal public identifier
6Declaring Document Elements
- lt!ELEMENT element content-modelgt
- Content-model specifies a type of element
content - Any element no restriction
- Empty element cannot store any content
- Character data text string
- Elements contains child (nested) elements
- Mixed contains both a character data and child
elements
7Element Definition ANY Content
- lt!ELEMENT element_name ANYgt
- Definition
- lt!ELEMENT card ANYgt
- Element appearance in XML document
- ltcardgt ltnamegt Toon Mermaid lt/namegt
- ltkindgt Monster lt/kindgtlt/cardgt
- Or
- ltcardgt Toon Mermaid lt/cardgt
8Element Definition EMPTY Content
- lt!ELEMENT element_name EMPTYgt
- Definition
- lt!ELEMENT attack EMPTYgt
- Element appearance in XML document
- ltattackgtlt/attackgt
- Or
- ltattack/gt
9Element Definition Character Data Only Content
- lt!ELEMENT element_name (PCDATA)gt
- Where PCDATA parsed-character data
- Definition
- lt!ELEMENT rarity (PCDATA)gt
- Element appearance in XML document
- ltraritygt Super rare foillt/raritygt
- NOT VALID appearance
- ltraritygt ltclassgtSuper rarelt/classgt
- lttypegt foillt/typegtlt/raritygt
10Element Definition Element Content
- lt!ELEMENT element_name (child_elements)gt
- Definition
- lt!ELEMENT rarity (class)gt
- Element appearance in XML document
- ltraritygtltclassgt Super rare foillt/classgtlt/raritygt
- NOT VALID appearance
- ltraritygt ltclassgtSuper rarelt/classgt
- lttypegt foillt/typegtlt/raritygt
11List of Child Elements
- Sequence is a list of elements with a defined
order - lt!ELEMENT element_name (child1,,childN)gt
- Choice lists possible elements
- lt!ELEMENT element_name (child1child2)gt
- Only one child-element is allowed
- Choice and sequence can be combined
12Occurrence of Elements
- Modifying symbols
- Allow zero or one - element?
- Allow one or more - element
- Allow zero or more - element
- Modifying symbols can be applied to a sequence or
choice
13Working with Mixed Content
- lt!ELEMENT element_name (PCDATAChild1Child2)gt
- Definition
- lt!ELEMENT rarity (PCDATAtype)gt
- Element appearance in XML document
- ltraritygt Super rare foillt/raritygt Or
- ltraritygt lttypegt Super rare foillt/typegtlt/raritygt
- Restricts the control of the document structure
14Defining Element Attributes
- Attribute-list declaration
- List of the names of attributes associated with a
specific element - Attribute data types
- Indicates whether the attribute is required or
optional - Default value for the attribute
15Attribute Declaration Syntax
- lt!ATTLIST element
- attribute1 type1 default1
- attribute2 type2 default2gt
- Or
- lt!ATTLIST element attribute1 type1 default1gt
- lt!ATTLIST element attribute2 type2 default2gt
- lt!ATTLIST element attribute3 type3 default3gt
16Attribute Types
- String types CDATA
- attribute CDATA
- Enumerated types
- attribute (value1value2)
- lt!ATTLIST card kind (MagicTrap)gt
- Tokenized type specifies some rules for the
format and content - attribute token
17Attribute Tokens
18Entities
- An entity physical storage unit
- Entity reference the method to refer to the
storage unit - Entities are introduced for
- To refer to often repeated text
- To include the content of external files
- Entity reference in XML documents
- entity_name
19Defining Entities
- Internal entity
- lt!ENTITY entity_name valuegt
- Example
- Definition
- lt!ENTITY effect1 Destroys one opponents
monstergt - In XML document lteffectgteffect1lt/effectgt
- External entity
- lt!ENTITY entity_name SYSTEM URLgt
20Attribute Defaults
21Merging XML Documents
- XML documents created based on a few XML
documents - Name collision
- Declaring a namespace in the document
- Assigning a prefix to the namespace
- Applying the prefix to corresponding elements
22Namespaces
- Namespace defined collection of element and
attribute names - Declaring a namespace
- lt?xmlnamespace nsURI prefixpr_name?gt
23Combine a Few XML Documents
- Although namespaces can be used to distinguish
elements from different XML documents in a given
document, to check validity of the document new
DTD must be created - DTD cannot be associated with a namespace
24DTDs Limitations
- Written in a different (non-XML) syntax- Extended
Backus Naur Form - Doesnt work well with namespaces
- Limited data typing
- Limited control on mixed content
25XML Schema
- A definition of a specific XML structure
- It is an XML document
- Defines
- Each element type of the structure
- Each data type associated with the element type
- Schema dialects
26Creating XML Schema
- W3C Schema Working Group
- File written in the XML Schema dialect with
extension .xsd - Single root element schema
- ltxsdschema
- xmlnsxsdhttp//www.w3.org/2001/XMLSchemagt
- Element declarations
- lt/xsdschemagt
27Schema Element Types
- Simple type an element contains no attribute or
child element - Complex type an element contains attributes
and/or child element - Complex type is defined based on a compositor and
attribute declaration
28Compositors
- Sequence compositor forces elements in the XML
document to be entered in the same order as in
the schema - Choice compositor specifies that only one of the
elements in the list to be used in the XML
document - All compositor allows any of the elements to
appear in the XML document - Compositors can be nested
29Element Declaration. Simple type
- ltxsdelement nameelement_name
typexsddata_type/gt - Where data_type can be
- string
- decimal
- integer, positiveInteger, and other
- boolean
- date
- .
30Element Declaration. Complex type
- ltxsdelement nameelement_namegt
- ltxsdcomplexTypegt
- ltxsdcompositorgt
- element declarations
- lt/ xsdcompositorgt
- attribute declaration
- lt/xsdcompelexTypegt
- lt/xsdelementgt
31Element Cardinality
- The number of possible occurances
- The minimum number
- minOccurs
- The maximum number
- maxOccur
- Default value is 1 for both attributes
32Declaring Attributes
- An attribute must be declared along with the
element it belongs to - ltxsdattribute nameattr_name
typexsddata_type useIs_required?
defaultdefault_value fixedfixed_values /gt - Is_required can be required, optional, prohibited
33Elements with Simple Content and Attributes
- ltxsdelement nameelement_namegt
- ltxsdcomplexTypegt
- ltxsdsimpleContentgt
- ltxsdextension basedata_typegt
- attribute declarations
- lt/xsdextensiongt
- lt/xsdsimpleContentgt
- lt/xsdcomplexTypegt
- lt/xsdelementgt
34Using Schemas in a Combined Documents
- To attach the schemas to different parts of the
document - Add the XML Schema-instance namespace to the
documents root element - Assign namespaces to the different parts of the
document - Add schemaLocation attribute to the parent
element of each part
35Example
- lt?xml version1.0?gt
- ltmcset xmlnsxsihttp//www.w3.org/2001/XMLSchem
a-instance - xmlnsmchttp//deck/monster/ns
- xsischemaLocationURI schemagt
- ltcardgt
-
- lt/mcsetgt
36Structuring a Schema
- Russian Doll design set of nested declarations
- Flat Catalog design all element declarations
are made globally - References
- ltxsdelement refelement_name/gt
37Displaying Contents of XML Documents
- Cascading Style Sheets have limitations
- Cant change the format of the content (like date
format) - Cant add additional text
- Hard to work with images and hyperlinks
- Are applied to the elements, but not attributes
- Extensible Style Sheet Language
- XSL-Formatting Objects (page layout and design)
- XSLTransformations (transforms XML content into
another presentation format) - XPath (locates information and performs
operations)
38XSLT Style Sheets
- Convert source document (XML document) into a
result document - Transformation is performed by XSLT processor
- Server side transformation
- Client side transformation
- Browsers support for XSLT
- Internet Explorer 6.0 fully supports W3CXSLT
specifications
39XSLT Transformation
- XSLT allows to create a result document as HTML,
XML, or text file. - ltxsloutput methodhtml version4.0/gt
- Some of XSLT processors generate result documents
according to specification.
40Creating XSLT
- lt?xml version"1.0" ?gt
- ltxslstylesheet version1.0 xmlnsxsl
- "http//www.w3.org/1999/XSL/Transform"gt
- Content
- lt/xslstylesheetgt
- XSLT file has an extension .xsl
41XSLT Content
- Template a set of elements that defines how a
part of the source document should be transformed
in the result document - Template
- ltxsltemplate matchnode
- XSLT and Literal result Elements
- lt/ xsltemplategt
- Where node either /, name, or XPath expression
42Template
- XSLT elements commands to XSLT processor
- Literal result element text sent to the result
document, but not processed by XSLT processor - The example HTML tags
43Referencing Parts of XML Documents
- XPath is the language for referencing the content
of an XML document - XML document is viewed as a node tree.
- Similar to Unix or DOS specifications for file
paths - Reaching elements
- /child1/child2
- Reaching attribute of child2
- /child1/_at_child
44Working with XSLT
- Inserting a node expression into XSLT document
- ltxslvalue-of selectXPath_expression/gt
- Applying XSLT to XML document
- lt?xmlstylesheet typetext/xsl hrefURL ?gt
45Querying XML Data
- XML Query Working Group
- Basic Documents
- XML Query Requirements
- XML Query Data Model
- XML Query Algebra
- XQuery
46Query Requirements
- Language must be declarative
- Independent of any protocol
- Data model must
- Represent XML 1.0 data types of Schema
specification - Support references within and outside of the
document - The language must support specific operations
47XML Query Data Model
- A node basic construct
- Node types
- Document,
- Element,
- Value,
- Attribute,
- Namespace,
- Processing instruction (PI) ,
- Comment
48Node Document
- Root node is connected to all the nodes that are
reachable directly or indirectly from it - Connected nodes form a tree
- Every node belongs to exactly one tree
- Every tree has exactly one root node
49XML Query Algebra
- Projection
- Selection
- Iteration
- Join
- Sorting
- Aggregation
50XQuery
- Is applied to XML
- Returns a sequence of XML fragments or atomic
values - XQuery relies on XPath and XML Schema data types.
- XQuery is not an XML language
51XQuery Expressions
- Path expressions
- Element constructors
- FLWOR ("flower") expressions
- List expressions
- Conditional expressions
- Quantified expressions
- Datatype expressions
52FLWOR Expression
- For-Let-Where-Order-Return
- Similar to SELECT-FROM-WHERE- GROUP-BY from SQL
- For binds values to one or more variables using
path expression, is used when iteration is
required - Generates list of bindings per variable
53FLWOR Expression (cont)
- Let binds values to one or more variables
without iteration - Single binding per each variable
- Where specifies qualification condition
- Return constructs the output of the expression
- Node
- Set of nodes
54XML and Databases
- Databases store data for machine processing
- Data-centric model
- Data is stored in a database
- Data is transferred as XML documents
- Document-centric model
- Documents are designed for human consumption
55XML Storing to and Retrieving from a Database
- Documents with simple recordsets
- Relational mapping
- Documents with hierarchical recordsets
- Object-relational mapping
- Object oriented mapping
- Schema-independent representation
- Relations are used to describe structure of XML
document
56Converting a Relation into XML Document
Root/Relation
57SQL and XML
- SQL/XML Standard
- Oracle XML
- XML SQL Utility
- Directly serves and stores XML from the database
- Takes SQL queries and generates XML documents
from results - Flexible XML output can be produced as Text or as
DOM trees - Generates DTDs and XML schemas from database
schemas