XML - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

XML

Description:

Elements may have zero or more attributes. Attribute values must always be quoted. ... All attribute values must be quoted. Whitespace within tags is part of text. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 19
Provided by: facult95
Category:
Tags: xml | attribute

less

Transcript and Presenter's Notes

Title: XML


1
CIT 383 Administrative Scripting
  • XML

2
Topics
  1. What is XML?
  2. XML Structure
  3. REXML

3
eXtensible Markup Language
  • Extensible descriptive markup language framework
  • Began as subset of Standard Generalized Markup
    Language (SGML).
  • To ensure that data remains available after
    programs that originally created/read it become
    obsolete or unusable.

lt?xml version"1.0" encoding"UTF-8"?gt ltinventorygt
ltbook isbn0976694042gt ltauthorgtChris
Pinelt/authorgt lttitlegtLearn to
Programlt/titlegt lt/bookgt lt/inventorygt
4
Descriptive vs Presentational
  • Presentational describe how documents should look
  • ltbgttextlt/bgt turns on boldface for text
  • What if you want to change book titles from bold
    to italics?
  • Replace wont work if items other than books are
    bold.
  • Descriptive languages focus on the meaning
  • lttitlegtxml and yoult/titlegt
  • Stylesheets describe how to present logical
    items.
  • Can just be used for data storage, interchange.
  • A/K/A logical or structural markup languages.

5
XML-based Languages
  • Ant
  • Atom
  • CML
  • MathML
  • MML
  • MusicXML
  • ODF
  • OPML
  • RDF
  • SAML
  • SOAP
  • SVG
  • VoiceXML
  • WML
  • XHTML
  • XUL

6
Evolution of XML
  • 1986 SGML standard published as ISO 8879
  • 1987 Unicode proposal published
  • 1991 First volume of Unicode standard
  • 1996 XML work started
  • 1998 XML 1.0 released as a W3C standard
  • 2001 XML Schema language
  • 2004 XML 1.1 released (not widely used)
  • 2007 Unicode 5.0 published

7
XML Tree Structure
  • lttodogt
  • lttitlegt
  • Mondays List
  • lt/titlegt
  • ltitemgt
  • Study for midterm
  • lt/itemgt
  • ltitemgt
  • ltpriority10/gt
  • Scripting Class
  • lt/itemgt
  • ltitemgt
  • Bathe cat
  • lt/itemgt
  • lt/htmlgt

8
Elements and Attributes
  • An element consists of tags and contents
  • lttitlegtLearn to Programlt/titlegt
  • Begin and end tags are mandatory.
  • ltisbn number0976694042 /gt
  • Attributes
  • number0976694042
  • Elements may have zero or more attributes.
  • Attribute values must always be quoted.

9
Text
  • XML declaration specifies character encoding
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • Encodings
  • Unicode universal character set, UTF-8, UTF-32
  • ISO-8859 8-bit encodings, 8859-1 is West Europe
  • Entities
  • nnnn encodes specified Unicode character
  • name are named character entities, such as
  • lt is lt
  • gt is gt
  • amp is
  • currency symbols, fractions, Greek letters, math
    symbols, etc.

10
XML Syntax Rules
  1. There is one and only one root tag.
  2. Begin tags must be matched by an end tag.
  3. XML tags must be properly nested.
  4. XML tags are case sensitive.
  5. All attribute values must be quoted.
  6. Whitespace within tags is part of text.
  7. Newlines are always stored as LF.
  8. HTML-style comments lt!-- comment --gt

11
Correctness
  • Well-formed
  • Conforms to XML syntax rules.
  • A conforming parser will not parse documents that
    are not well-formed.
  • Valid
  • Conforms to XML semantics rules as defined in
  • Document Type Definition (DTD)
  • XML Schema
  • A validating parser will not parse invalid
    documents.

12
XML Schema Languages
lt?xml version"1.0" encoding"utf-8" ?gt
ltxsschema elementFormDefault"qualified"
xmlnsxs"http//www.w3.org/2001/XMLSchema"gt
ltxselement name"Address"gt
ltxscomplexTypegt ltxssequencegt
ltxselement name"Recipient" type"xsstring" /gt
ltxselement name"House" type"xsstring" /gt
ltxselement name"Street" type"xsstring" /gt
ltxselement name"Town" type"xsstring" /gt
ltxselement minOccurs"0" name"County"
type"xsstring" /gt ltxselement
name"PostCode" type"xsstring" /gt
ltxselement name"Country"gt ltxssimpleTypegt
ltxsrestriction base"xsstring"gt
ltxsenumeration value"FR" /gt ltxsenumeration
value"DE" /gt ltxsenumeration value"ES" /gt
ltxsenumeration value"UK" /gt ltxsenumeration
value"US" /gt lt/xsrestrictiongt
lt/xssimpleTypegt lt/xselementgt
lt/xssequencegt lt/xscomplexTypegt
lt/xselementgt lt/xsschemagt
  • Document Type Definitions
  • Inherited from SGML.
  • No support for all XML.
  • XML Schema
  • Most commonly used.
  • Schemas are XML docs.
  • A/K/A WXS, XSD
  • RELAX NG
  • REgular LAnguage for
  • XML Next Generation
  • XML and non-XML forms.

13
Ruby XML Parsers
  • REXML Ruby Electric XML
  • Standard with the ruby language.
  • Slow on large documents.
  • libxml-ruby
  • Ruby bindings for Gnome libxml2 XML toolkit.
  • Very fast (30X as fast as REXML).
  • HPricot
  • Parses XML as well as HTML.
  • Fast (3-4X as fast as REXML).
  • Does not check for well-formedness or validity.

14
Types of Parsing
  • Tree Parsing (DOM-like)
  • Good for small documents.
  • Loads entire document into memory.
  • Simple API
  • Stream Parsing (SAX-like)
  • Good for large documents.
  • User defines callback methods, passes to API.
  • Parser runs callback methods on pattern match.

15
Tree Parsing
  • Loads entire XML doc into memory.
  • require rexml/document
  • include REXML
  • input File.new(data.xml)
  • doc Document.new(input)
  • root doc.root
  • Search document as a tree using XPath
  • doc.elements.each(ch/section) do e
  • puts e.attributestitle
  • end

16
Stream Parsing
  • Define listener class.
  • class MyListener
  • include REXMLStreamListener
  • def tag_start(args)
  • puts start args.map x
    x.inspect.join(,
  • end
  • end
  • Invoke parser
  • require rexml/document
  • require rexml/streamlistener
  • include REXML
  • listen MyListener.new
  • source File.new(data.xml)
  • Document.parse_stream(source, listen)

17
XPath Searches
  • h.search("p")
  • Find all paragraph tags in document.
  • doc.search("/html/body//p")
  • Find all paragraph tags within the body tag.
  • doc.search("//a_at_src")
  • Find all anchor tags with a src attribute.
  • doc.search("//a_at_src'google.com'")
  • Find all a tags with a src attribute of
    google.com.

18
References
  1. Michael Fitzgerald, Learning Ruby, OReilly,
    2008.
  2. David Flanagan and Yukihiro Matsumoto, The Ruby
    Programming Language, OReilly, 2008.
  3. Hal Fulton, The Ruby Way, 2nd edition,
    Addison-Wesley, 2007.
  4. Robert C. Martin, Clean Code, Prentice Hall,
    2008.
  5. Dave Thomas with Chad Fowler and Andy Hunt,
    Programming Ruby, 2nd edition, Pragmatic
    Programmers, 2005.
Write a Comment
User Comments (0)
About PowerShow.com