Title: Applications of CFG
1Applications of CFG
- Parser and parser generator
- Markup languages.
2Grammar and XML
- XML Basics
- History
- Syntax
- Well-formed XML
- Valid XML
- Parsing XML
3What is XML (eXtremely Marketed Language)
What is XML (eXtensible Markup Language)
- Markup
- A sequence of characters inserted into a text
file, to indicate how the file should be
displayed, or to describe the logical structure. - Markup is everything in a document that is not
content. - Initially used in typesetting a document
4What is XML (eXtensible Markup Language)
- Markup in XML
- Markup indicators are called tags. e.g.
- ltfont colorbluegt
- A pair of tags and the things enclosed in tags is
called element. e.g - ltfont colorbluegt formatted as blue lt/fontgt
5What is XML (eXtensible Markup Language) (cont.)
- Extensible
- In general Something that is designed that users
or later designers can extend its capability. - In XML Allow you to define your own tags to
describe data - You can represent any information
- You can represent in the way you want
- XML is a meta-language
- A language to define other languages
- Use DTD to define the syntax of a language
6Markup (and extensible) languages are not new
- SGML (Standard Generalized Markup Language)
- Markup, extensible
- 1980 first publication, 1986 ISO standard
- HTML(HyperText Markup Language)
- Markup, hypertext, Subset of SGML
- Started 1990, CERN (Centre Européen de Recherche
Nucléaire, or European High-Energy Particle
Physics lab) - XML (eXtensible Markup Language)
- Subset of SGML
- Started 1996, adopted by W3C 1998
- Eliminate the complexity of SGML
- Separate the data from the formatting information
in HTML
SGML
HTML
7HTML
- table lttablegt
- Table head ltTHgt
- Table row ltTRgt
- Table data ltTDgt
open tag, element name
Attribute value
attribute
- lthtmlgtltbodygt Stock table
- ltTABLE border"1"gt
- ltTRgtltTHgt Exchange lt/THgt ltTHgt Name lt/THgt
ltTHgt Price lt/THgt lt/TRgt - ltTRgtltTDgt nasdaq lt/TDgt ltTDgt amazon corp lt/TDgt
ltTDgt 16.875 lt/TDgt lt/TRgt - ltTRgtltTDgt nyse lt/TDgt ltTDgt IBM inc lt/TDgt
ltTDgt 102.250lt/TDgt lt/TRgt - lt/TABLEgt lt/bodygtlt/htmlgt
stock.html
closing tag
data
Displayed in browser
8XML and HTML
- Similarities
- They are both markup languages
- They are both simple.
- Differences
9XML Example
attribute
- lt?xml version"1.0" ?gt
- ltstocksgt
- ltstock exchange"nasdaq"gt
- ltnamegtamazon corplt/namegt
- ltsymbolgtamznlt/symbolgt
- ltpricegt16lt/pricegt
- lt/stockgt
- ltstock exchange"nyse"gt
- ltnamegtIBM inclt/namegt
- ltpricegt102lt/pricegt
- lt/stockgt
- lt/stocksgt
element
stock.xml
- An XML document has a group of elements
- Each element has an opening tag and a closing
tag - An element can have attributes.
10Well formed XML Document
- An XML document is well formed if it conforms to
XML syntax rules. - Additional rules
- XML document must have a root element
- Attribute values must be quoted
- XML is case sensitive
- Try to find bugs in the following XML document
lt?xml version"1.0" ?gt ltstock
exchange"nasdaq"gt ltnamegtamazon corp
lt/namegt ltsymbolgtamznlt/symbolgt
ltpricegt16lt/pricegt lt/stockgt ltstock
exchange nyse gt ltnamegtIBM inclt/namegt
ltpricegt 102 lt/PRICEgt lt/stockgt
ltstocksgt
lt/stocksgt
11Valid XML document
lt?xml version"1.0" ?gt ltstocksgt ltnamegt
ltstockgt 102lt/stockgt lt/namegt ltpricegtIBM
inclt/pricegt ltsymbolgtamzn lt/symbolgt
ltpricegt16lt/pricegt ltstock exchange"nyse"gt
ltpricegt amazon lt/pricegt lt/stockgt
lt/stocksgt
- Problem
- Not every well formed document makes sense
- Solution
- Associate XML with a DTD or XML Schema
- Valid XML document conforms to the DTD or XML
Schema.
12XML DTD (Document Type Definition)
- What DTD is a set of rules to define the syntax
of a language. It is similar to context free
grammar. - Why Help XML generation and processing.
- How Write a sequence of element declarations and
attribute declarations. - Element declaration
- lt!ELEMENT tagName tagContentgt
- Attribute declaration
- lt!ATTLIST tagName attName attContentgt
Repeat 0 or more times
Occur 0 or once.
lt!ELEMENT stocks (stock)gt lt!ELEMENT stock (name,
symbol?, price)gt lt!ATTLIST stock exchange CDATA
gt lt!ELEMENT name (PCDATA)gt lt!ELEMENT symbol
(PCDATA)gt lt!ELEMENT price (PCDATA)gt
stock.dtd
13DTD and Context Free Grammar
- DTD is similar to Context Free Grammar
- lt!ELEMENT stocks (stock)gt
- stocks? e stock stocks
- lt!ELEMENT stock (name, symbol?, price)gt
- stock ? name price name symbol price
- DTD makes the language extensible
14Grammar and XML
- XML Basics
- History
- Syntax
- Well-formed XML
- Valid XML
- Parsing XML
15DOM (Document Object Model)
- What DOM is application programming interface
(API) for processing XML documents - http//www.w3c.org/DOM/
- Why unique interface. platform and language
independent - How It defines the logical structure of
documents and the way to access and manipulate it - With the Document Object Model, one can
- Create an object tree
- Navigate its structure
- Access, add, modify, or delete elements etc
16XML tree hierarchy
- XML can be described by a tree hierarchy
Document
Unit
Document
Sub-unit
Parent
Unit
Child
Sub-unit
Sibling
17DOM tree model
- Generic tree model
- Node
- Type, name, value
- Attributes
- Parent node
- Previous, next sibling nodes
- First, last child nodes
- Many other entities extends node
- Document
- Element
- Attribute
- ... ...
18DOM class hierarchy
DocumentFragment
Document
Text
CDATASection
CharacterData
Comment
Attr
Node
Element
DocumentType
NodeList
Notation
NamedNodeMap
Entity
DocumentType
EntityReference
ProcessingInstruction
19JavaDoc of DOM API
http//xml.apache.org/xerces-j/apiDocs/index.html
20A few words about javadoc
- javadoc is a command included in JDK
- It is a useful tool generate HTML description for
your programs, so that you can use a browser to
look at the description of the classes - JavaDoc describes classes, their relationships,
methods, attributes, and comments. - When you write java programs, the JavaDoc is the
first place that you should look at - For core java, there is JavaDoc to describe every
class in the language - To learn how to use regex, look at
javax.util.regex package - To learn how to write a string to a file, look at
the File class in javaDoc - To know how to use DOM, look at the javaDoc of
org.w3c.dom package. - If you are a serious java programmer
- you should have the core jdk javaDoc ready on
your hard disk - You should generate the javaDoc for other people
to look at. - To run javadoc, type
- Dgtjavadoc .java
- This is to generate JavaDoc for all the classes
under current directory.
21Some important DOM interfaces
- The DOM defines several Java interfaces
- Node The base data type of the DOM
- Element Represents element
- Attr Represents an attribute of an element
- Text The content of an element or attribute
- Document Represents the entire XML document. A
Document object is often referred to as a DOM
tree -
22Methods in Node interface
- Three categories of methods
- Node characteristics
- name, type, value
- Contextual location and access to relatives
- parents, siblings, children, ancestors,
descendants - Node modification
- Edit, delete, re-arrange child nodes
23XML parser and DOM
DOM XML parser
Your XML application
DOM API
DOM Tree
- When you parse an XML document with a DOM parser,
you get back a tree structure that contains all
of the elements of your document - DOM also provides a variety of functions you can
use to examine the contents and structure of the
document.
24DOM tree and DOM classes
there are more nodes. refer slide 28.
ltstocksgt
ltstock Exchangenyse gt
ltstock exchangenasdaqgt
ltnamegt
ltpricegt
ltnamegt
ltpricegt
ltsymbolgt
IBM
105
amzn
15.45
Amazon inc
Element
child
25Use Java to process XML
- Tasks
- How to construct the DOM tree from an XML text
file? - How to get the list of stock elements?
- How to get the attribute value of the second
stock element? - Construct the Document object
- Need to use an XML parser (XML4J)
- remember to import the necessary packages
- The benefits of DOM the following lines are the
only difference if you use another DOM XML parser.
26Get the first stock element
lt?xml version"1.0" ?gt ltstocksgt ltstock
exchange"nasdaq"gt ltnamegtamazon corplt/namegt
ltsymbolgtamznlt/symbolgt
ltpricegt16lt/pricegt lt/stockgt ltstock
exchange"nyse"gt ltnamegtIBM inclt/namegt
ltpricegt102lt/pricegt lt/stockgt lt/stocksgt
27Navigate to the next sibling of the first stock
element
lt?xml version"1.0" ?gt ltstocksgt ltstock
exchange"nasdaq"gt ltnamegtamazon corplt/namegt
ltsymbolgtamznlt/symbolgt
ltpricegt16lt/pricegt lt/stockgt ltstock
exchange"nyse"gt ltnamegtIBM inclt/namegt
ltpricegt102lt/pricegt lt/stockgt lt/stocksgt
28Be aware the Text object in two elements
lt?xml version"1.0" ?gt ltstocksgt ltstock
exchange"nasdaq"gt ltnamegtamazon corplt/namegt
ltsymbolgtamznlt/symbolgt
ltpricegt16lt/pricegt lt/stockgt ltstock
exchange"nyse"gt ltnamegtIBM inclt/namegt
ltpricegt102lt/pricegt lt/stockgt lt/stocksgt
Question How many children does the stocks node
have?
text
text
text
text
text
text
text
text
text
text