Title: XML Overview
1XML Overview
CSc 335 Object Oriented Programming Design
2Contents
- Introduction
- Purpose/Objectives
- Background
- Overview
- XML Example
- Uses of XML
- XML Basics
- Elements
- Attributes
- Comments
- Well-formed XML
- Validation
- DTDs
- Schemas
- The DOCTYPE Element
- Parsing
- XML Parsers
- Parsing APIs
- DOM Parsing
3Purpose/Objectives
- Any Java developer will use XML
- Becoming more important
- Explain motivation for XML
- Provide an overview and basic understanding
- Provide a starting point for learning additional
XML related topics - Teach what you need to know for programming
assignment - Can only scratch the surface
4Background
- HTML
- Application of Standard General Markup Language
(SGML) - Document language that combines data with
display formatting information - Used for document display/publishing
- Combination of data and formatting info limits
usefulness - Browser specific interpretations of formatting
tags - Cascading Style Sheets
- Allowed the separation of formatting information
from HTML documents - Allowed web pages and web sites to be more
flexible and easier to maintain
5Background (cont.)
- Creation of CSS allowed complete separation of
data and formatting - XML is the result
- Didnt happen by accident (two-part plan of W3C
to fix HTMLand create a more useable SGML) - XML is a subset of SGML (not an application of
it) - Much more flexible (not limited to web-display)
- Allows anyone to easily define their own
application of XML - HTML is like an application, XML is like a
language
6Overview What is XML?
- A markup language for applying structure to data
- A subset of SGML
- Much more flexible than HTML
- Describes data without specifying a particular
meaning - Not limited to predefined tags
- Human readable
- Machine readable
7Example Text
- Taken (and modified) from
- XML A Primer, 3rd Edition, by Simon St.Laurent
12729 Maple 1x1x2 4.25 12829 Oak
1x1x2 5.75 13029 Pine 1x1x2 2.00
8Example HTML
lt!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"gt lthtmlgt ltheadgt lttitlegtProduct
Listlt/titlegt lt/headgt ltbodygt The following
products are available ltulgt ltligt12729
Maple 1x1x2 ltbgt4.25lt/bgtlt/ligt ltligt12829
Oak 1x1x2 ltbgt5.75lt/bgtlt/ligt ltligt13029
Pine 1x1x2 ltbgt2.00lt/bgtlt/ligt
lt/ulgt lt/bodygt lt/htmlgt
9Example HTML
10Example XML
ltproduct-listgt ltproductgt ltidgt12729lt/idgt
ltdescriptiongtMaple 1x1x2lt/descriptiongt
ltpricegt4.25lt/pricegt lt/productgt ltproductgt
ltidgt12829lt/idgt ltdescriptiongtOak
1x1x2lt/descriptiongt ltpricegt5.75lt/pricegt
lt/productgt ltproductgt ltidgt13029lt/idgt
ltdescriptiongtPine 1x1x2lt/descriptiongt
ltpricegt2.00lt/pricegt lt/productgt lt/product-listgt
11Example XML
ltproduct-listgt ltproductgt ltidgt12729lt/idgt
ltdescriptiongt ltitemgtCut Lumberlt/itemgt
ltspeciesgtMaplelt/speciesgt ltheightgtltinchesgt1lt/
inchesgtltheightgt ltwidthgtltinchesgt1lt/inchesgtlt/w
idthgt ltlengthgtltfeetgt2lt/feetgtlt/lengthgt
lt/descriptiongt ltpricegt4.25lt/pricegt
lt/productgt ltproductgt lt/productgt lt/produc
t-listgt
12Example XML with Attributes
ltproduct-listgt ltproductgt ltidgt12729lt/idgt
ltdescriptiongt ltitemgtCut Lumberlt/itemgt
ltspeciesgtMaplelt/speciesgt ltheight
unitinchesgt1ltheightgt ltwidth
unitinchesgt1lt/widthgt ltlength
unitfeetgt2lt/lengthgt lt/descriptiongt
ltpricegt4.25lt/pricegt lt/productgt ltproductgt
lt/productgt lt/product-listgt
13Uses of XML
- Data Exchange
- Machine to Machine
- Human to Machine
- Machine to Human
- Human to Human
- Data Storage
- Multiple Use of Same Document/Data
14Specific Uses of XML
- EDI Replacement
- Web-Services (i.e. RPC format over protocols like
HTTP) - Data Display
- Web (XHTML, XML XSLT)
- Multiple Devices, same data (formatting not
embedded) - Data Storage Format (i.e. XML databases)
- Configuration Files
- Many other uses
15XML Details Elements
- XML Tags
- Main Building Block of XML Documents
- Opening and Closing Tag Required
- ltmy-taggtSome Datalt/my-taggt
- Can Combine Opening and Closing Tags (called
empty tags) - ltbr/gt
16XML Details Attributes
- Used to Specify Additional Details about Element
Data - Can use in place of or in addition to element
data - ltmy-tag attributevaluegtTextlt/my-taggt
- ltmy-tag dataText/gt
17XML Details - Comments
- lt! This is a comment. Comments are ignored by
XML parsers --gt - ltproduct-listgt
- ltproductgt
- ltidgt12729lt/idgt
- ltdescriptiongt
- ltitemgtCut Lumberlt/itemgt
- ltspeciesgtMaplelt/speciesgt
- lt! This is another comment --gt
- ltheight unitinchesgt1ltheightgt
- ltwidth unitinchesgt1lt/widthgt
- ltlength unitfeetgt2lt/lengthgt
- lt/descriptiongt
- ltpricegt4.25lt/pricegt
- lt/productgt
- ltproductgt
-
- lt/productgt
- lt/product-listgt
18Well-Formed XML Documents
- Every start-tag must have a matching end-tag
- Tags cannot overlap (strict hierarchical tree
structure, one parent per tag) - Each document has exactly one root element
- Other rules such as allowed characters in tag
names
19Valid XML Documents DTDs
- Document Type Definition (DTD)
- Specifies a datatype for the document
- Inherited from SGML
- Allows XML author to specify validation rules for
a document - Allows tools or programs to verify conformance to
expected structure, format, naming, etc. - Simple
- Widespread Use
- Declared in the document with a lt!DOCTYPE gt tag
- Lack support for data typing of element or
attribute content
20Valid XML Documents XML Schemas
- XML Schema
- Similar to DTDs
- More Complex
- More Powerful
- Newer than DTDs
- Not as Widely Used
- Heavily used within certain domains (such as
web-services) - Quickly Increasing in Popularity
21Example Document Prolog
lt?xml version"1.0" encoding"UTF-8"?gt lt!DOCTYPE
config SYSTEM "rules-config.dtd"gt ltconfiggt
lt/configgt
22DOCTYPE Element System ID
- lt!DOCTYPE config SYSTEM "rules-config.dtd"gt
- DTDs root element is config
- SYSTEM denotes a private DTD
- Can be found locally using name
rules-config.dtd - Location of DTD defaults to document relative
file location - Can be an absolute address or URI
- Can write code to change that (e.g. make it
CLASSPATH relative)
23DOCTYPE Element Public ID
- lt!DOCTYPE struts-config PUBLIC
- -//Apache Software Foundation//
- DTD Struts Configuration 1.1//EN
- http//jakarta.apache.org/struts/dtds/
- struts-config_1_1.dtdgt
- First String following PUBLIC is public
identifier (human and machine understandable
description of DTD) - Last String is URI specifying DTD location
24XML Parsers
- Prewritten code for parsing text out of XML
documents (elements and attributes) - Handle validation against DTD or Schema
- Drop into XML enabled applications as Jar files
- Provide API for accessing document content
- Eliminate the need to write low-level text
parsing code - Apache Xerces Popular Open Source parser
25XML Parsing
- Simple API for XML (SAX)
- Event-based API
- Parser invokes callback (listener) methods for
start-tags, end-tags, text data, etc. - Memory efficient (useful for large XML documents)
- Document Object Model (DOM)
- Tree-based API
- Parser parses (and optionally validates) entire
document - Method provided to get the root element
- All nodes of tree accessible from root element
- Easier to use than SAX
26Java-Based DOM Parsing
- java.xml and java.xml.parsers packages
- Contain Java wrapper classes for commercial and
open-source XML parsers - Main classes DocumentBuilderFactory and
DocumentBuilder - org.w3c.dom package
- Part of J2SE since 1.4
- Contains classes and interfaces used to implement
the DOM API - Main interfaces Document, Element
- org.xml.sax package
- Some interfaces used even when using DOM
- Main interfaces InputSource, EntityResolver,
ErrorHandler
27DOM Example Parsing
protected final Element getRootElement() throws
ParserConfigurationException, SAXException,
IOException if (this.rootElement null)
// Create an InputSource for the configuration
file with the // SystemId set to match the
file name so the entity resolver // callback
method can find the file. InputSource
inputSource resolveEntity(null,
configFileName) // Get the document builder
factory DocumentBuilderFactory factory
DocumentBuilderFactory.newInstance()
factory.setValidating(true) // Get the
document builder and parse the document
DocumentBuilder builder factory.newDocumentBuild
er() builder.setEntityResolver(this)
builder.setErrorHandler(this) Document doc
builder.parse(inputSource) // Get the root
element from the DOM tree this.rootElement
doc.getDocumentElement() return
this.rootElement
28DOM Example Parsing
- From root element
- Use methods of org.w3c.dom.Element (including
methods inherited from Node) to drill-down into
document. - Useful methods
- NodeList getElementsByTagName(String)
- NodeList getChildNodes()
- String getAttribute(String)
- Several others
29DOM Example Resolving Entities
- Allows programmer to specify how external
entities (DTDs, Schemas, XML Documents) are
accessed. - Can override default behavior of XML document
location relative path. - Implement EntityResolver interface and create a
resolveEntity(String, String) method. - Useful for reading documents and DTDs off the
CLASSPATH.
30DOM Example Handling Errors
// Implementation of the ErrorHandler
Interface public void warning(SAXParseException
ex) System.err.println(ex) public void
error(SAXParseException ex) throws SAXException
throw ex public void fatalError(SAXParseEx
ception ex) throws SAXException throw ex
31Topics Not Covered
- DTD Syntax
- XML Schemas
- Namespaces
- Entities
- XSL/XSLT
- XPath/XQuery
- XLink/XPointer
- Web Services
- Languages Support Covered Briefly
- Other