Title: XML Fundamentals
1XML Fundamentals
- Jie Liu, Ph.D.
- Professor
- Department Of Computer Science
- Western Oregon University
- Monmouth, OR 97361
- Liuj_at_wou.edu
2Outline
- XML Background
- Document Type Definitions
- Style Sheets and Process
- Other XML Related Technologies
- Summary
- Where to go for more help
3Beginnings of XML
- XML( eXtensible Mark-up Language) was developed
by a working group of the World Wide Web
Consortium (W3C) to bring Standard Generalized
Markup Language (SGML) to the web - XML is a faster, slimmed-down version of SGML
designed for use on the Web - First appeared in 1996
- SGML was 10 years old at this time
- The first official XML specification is published
in February 1998
4XML
- XML
- Primary function is to describe and contain data
- Extensible to allow for describing and defining,
virtually, any kind of data - Separates the data from the presentation
- Well-Formed files
- XML files are case sensitivity
5XML Suits Many Applications
- XML is used in many applications that primarily
involve - Complex document creation
- Uses language similar to HTML to provides a
richer set of document elements that is virtually
unlimited - Data exchange and database connectivity
- Used to define an interchange format suitable for
data transfer between databases from different
vendors on different operating systems
6XML Benefits
- Benefits of XML
- Structured data - XML handles data
- Style sheets can handle presentation
- Complements HTML
- XML data can be presented using an HTML page
- No predefined tags
- You make up the tags - or more properly, elements
- Self-describing
- A dictionary of tag definitions can be provided
for wide use - Requires precise rules to define the structure of
a document - Processing is easier with strict rules to follow
7A Simple Example of XML
- lt?xml version"1.0" standalone"yes"?gt
- ltKennelClub locationNortheast USA"gt
- ltBreedgtDobermanlt/Breedgt
- ltCategory nameworking" /gt
- ltFeaturesgt
- ltTraitgtlong straight earslt/Traitgt
- ltTraitgtshort taillt/Traitgt
- ltTraitgtshort hairlt/Traitgt
- ltTraitgtloyal to an individuallt/Traitgt
- lt/Featuresgt
- ltTrainingInstructionsgt
- Make use of a strong steel chain collar.
- Keep the dog on your left, walk with even
steps. - When stopped, the dog should sit beside
you. - lt/TrainingInstructionsgt
- ltFeedingInstructionsgt
- After 1 year of age, two cans of wet food
mixed with 1 can of dry food. - Provide a constant supply of water.
- Raise the feeding dish to the dogs
shoulder height.
8Road Map
- XML Background
- Document Type Definitions
- Style Sheets and Process
- Other XML Related Technologies
- Summary
- Where to go for more help
9Document Type Definitions (DTD)
- Is a set of rules that defines the elements, and
their attributes, of one or more XML documents - Not required, but recommended for complex
documents - Advantages
- Create rules that many XML documents will follow
- Explicitly define the markup and rules for
documents - Provide a common frame of reference that can be
shared
10Sample Document DTD
- lt!-- Begin bookcatalog.dtd --gt
- lt?xml version1.0 standaloneyes?gt
- lt!DOCTYPE Catalog
- lt!ENTITY bpl Boston Public Librarygt
- lt!ELEMENT Catalog (Book)gt
- lt!ELEMENT Book (Title,Author,Publisher,PubDate)
gt - lt!ELEMENT Title (PCDATA)gt
- lt!ATTLIST Title ISBN CDATA REQUIREDgt
- lt!ELEMENT Author (PCDATA)gt
- lt!ATTLIST Author Status CDATA IMPLIEDgt
- lt!ELEMENT Publisher (PCDATA)gt
- lt!ELEMENT Pub_Date (PCDATA)gt
- gt
11Sample Document Body
- lt!-- Begin Document Body --gt
- ltCataloggt
- ltBookgt
- ltTitle ISBN1201823764gtUNIX for the
Impatientlt/Titlegt - ltAuthor StatusactivegtPaul W. Abramslt/Authorgt
- ltPublishergtAddison-Wesleylt/Publishergt
- ltPubDategt1993lt/PubDategt
- lt/Bookgt
- ltBookgt
- ltTitle ISBN1565920546"gtThe Korn Shelllt/Titlegt
- ltAuthor Status"deceased"gtBill
Rosenblattlt/Authorgt - ltPublishergtOReilly Associateslt/Publishergt
- ltPubDategt1994lt/PubDategt
- lt/Bookgt
- lt/Cataloggt
12Document Type Declaration
- An internal document type definition
- lt!DOCTYPE Catalog
- The DTD information is contained within the
document - The name Catalog must identify the root element
in the XML document - Example of an external document type definition
- lt!DOCTYPE Catalog SYSTEM catalog_file.dtdgt
- Identifies Catalog as the root element
- Instructs the processor to fetch the external
document called catalog_file.dtd, which is a
relative path name to the dtd
13Why Have a DTD?
- Reasons to have a DTD
- They allow you to create rules for all documents
to follow - They clarify what markup may be used and how that
markup should be sequenced - They provide a common frame of reference that may
be shared by many users - Disadvantages of having a DTD
- May not have value when working with a few small
documents - Some processors are non-validating DTD not
needed - The DTD itself is NOT in XML syntax, making
processing and understanding it like parsing
another language - Cannot be mixed
14Namespace Schema
- Namespaces helps XML vocabulary designers to
break complex problems into smaller pieces and
mix multiple vocabulary as needed. - Schema permit vocabulary designers to create a
more precise definition of the vocabulary than
possible with DTDs and do so using XML syntax.
15Road Map
- XML Background
- Document Type Definitions
- Style Sheets and Processing
- Other XML Related Technologies
- Web Services
- Summary
- Where to go for more help
16Introducing Style Sheets
- Extensible Stylesheet Language (XSL)
- not only control presentation,
- but provide much more, including
- the functionality of a scripting language
- transform XML to HTML
- transform XML into XML.
- The output DTD can be different from the source
DTD
17More about XSL
- Extensible Stylesheet Language - XSL
- Similar to CSS but much more powerful
- Is created for XML
- XSL style sheets are XML documents
- Made up of three parts
- Language for transforming XML documents - XSLT
- Language for specifying parts of XML documents -
XPath - Vocabulary for specifying formatting properties -
XSLFO - XSLT and XPath are standards
18The Document Object Model, DOM
- A programming interface
- Uses standard syntax to describe a document
- Allows for the manipulation of XML documents
- Creating XML documents
- Transforming XML documents
- Importing Data
- Accessed by many languages
- C
- JavaScript
- VBScript
- Java
- VB
- C
19The Document Object Model (cont.)
- A DOM is created each time an XML processor
parses a document - Parsing result is stored as objects
- DOM objects, called Nodes, can be access and
manipulate
20Simple API for XML (SAX)
- Offers events as it parses the document
- Does not retain the document
- Makes minimal demands on system resources
- Benefits
- It can parse files of any size, simple, and fast
- It is useful when
- You want to build your own data structure
- Only want a small subset of the information
- You system has limited resources
21Simple API for XML (SAX)
- Drawbacks
- No random access to the document
- Is difficult to implement complex searches
- The DTD is not available
- Lexical Information is not Available
- Is Read-only
- Not support in some browsers
22Road Map
- XML Background
- From HTML to XML
- Document Type Definitions
- Style Sheets and Process
- Other XML Related Technologies
- Summary
- Where to go for more help
23Other Related Technologies
- XML Linking Language (Xlink) specification
provides a framework for creating both - XML Pointer Language (XPointer) specification
- Supports addressing into the internal structures
of XML docs - Built on top of the XML Path Language XPath
- The XML Query group is drafting a specification
to produce XQL to query an XML document
24Other Related Technologies (cont.)
- XML Schema Structures definition proposing
facilities for - Describing the structure of XML documents
- Constraining the contents of XML documents
- Providing a superset of the capabilities found in
DTDs - XML Schema Datatypes definition proposing
facilities for - Defining datatypes to be used in XML Schemas and
other XML specifications - Providing a superset of the capabilities found in
DTDs for specifying datatypes on elements and
attributes - The language description is, itself, an XML
document
25Summary
- SGML is the foundation for HTML and XML
- HTML is an application for display of data
- XML is an extensible language for describing data
- Is not difficult to transition HTML knowledge to
XML use - Document Type Definitions, DTDs, describe the
elements and attributes of XML documents and
define their use. - Schema is more powerful
- Style Sheets provide the ability to format and
display data described with XML - The Document Object Model, DOM, is a programming
interface for accessing and manipulating elements
and attributes of XML documents - There are many other XML Related Technologies,
web services is the most important one!
26A Question
- An XML document can be Well-formed but not valid
what does this mean?
27Where to Go For More Help - 1
- General XML references
- W3C's XML Page - www.w3.org/XML
- XML.com - www.xml.com
- XMLInfo -www.xmlinfo.com
- Parsers and validation references
- Robin Cover's Check or Validate XML page
- www.oasis-open.org/cover/check-xml.html
- IBM's DOMit Validation Servlet
- www.networking.ibm.com/xml/XmlValidatorForm.html
- Microsoft's XML Validation page
- http//msdn.microsoft.com/downloads/samples/intern
et/xml/xml_validator
28Where to Go For More Help - 2
- XML.com RUWF? Page
- www.xml.com/pub/tools/ruwf/check.html
- XML.com parse page
- www.xml.com/pub/Guide/XML_Parsers
- DTD and Style Sheet references
- Lumeria Network's Web-based DTD service
- www.dtd.com
- Cascading Style Sheets page at the W3C
- www.w3.org/Style/css
- CSS1 Specification
- www.w3.org/TR/REC-CSS1
29Where to Go For More Help - 3
- CSS2 Specification
- www.w3.org/TR/REC-CSS2/selector.html
- Web Design Group's CSS tutorials
- www.htmlhelp.com/reference/css
- WebMonkey's CSS tutorials
- www.hotwired.com/webmonkey/stylesheets
- Webreview's CSS articles
- webreview.com/wr/pub/Style_Sheets
- XSL page at the W3C
- www.w3.org/Style/XSL
- XSLT Recommendation
- www.w3.org/TR/xslt -TE
30Where to Go For More Help - 4
- XSLInfo
- www.xslinfo.com
- Document Object Model references
- DOM 1.0 Recommendation
- www.w3.org/DOM
- Converting XML documents to HTML using the DOM
- msdn.microsoft.com/xml/XSLGuide/default.asp Other
related technology references - XPath Recommendation
- www.w3.org/TR/xpath
- XPointer Recommendation
- www.w3.org/TR/WD-xptr
- SAX references
- http//www.megginson.com
31Where to Go For More Help - 5
- W3C's mailing list page
- www.w3.org/XML/discussion
- Some great tools on the web
- ADEPT Editor -www.arbortext.com
- CLIP! Editor -www.t2000-usa.com
- XMetaL -www.xmetal.com
- XML Pro -www.vervet.com
- AElfred -www.microstar.com
- expat -www.jclark.com
- SP -www.jclark.com
- LARK -www.texuality.com
- XML4J Parser -www.alphaworks.ibm.com
32Where to Go For More Help - 6
- Amaya -www.w3.org/Amaya
- Jumbo -www.xml-cml.org/jumbo
- Mozilla -www.mozilla.org
- POET Software -www.poet.com
- Microsoft's XML tools -www.microsoft.com/xml
- webMethods B2B -www.webmethods.com
- XML Software -www.xmlsoftware.com