Title: Introduction to XML: DTD
1Introduction to XMLDTD
2Document type definition structure
- Topics
- Elements
- Attributes
- Entities
- Processing instructions (PI)
- DTD design
3DTD
lt! Document type description (DTD) example
(part) --gt lt!ELEMENT university
(department)gt lt!ELEMENT department (name,
address)gt lt!ELEMENT name (PCDATA)gt lt!ELEMENT
address (PCDATA)gt
- Document type description, structural description
- one rule /element
- name
- content
- a grammar for document instances
- regular clauses"
- (not necessary)
4DTD advantages
- validating parsers check that the document
conforms to the DTD - enforces logical use of tags
- there are existing DTD standards for many
application areas - common vocabulary
5Well-formed documents
- An XML document is well-formed if
- its elements are properly nestedso that it has a
hierarchical tree structure,and all elements
have an end tag (or are empty elements) - it has one and only one root element
- complies with the basic syntax and structural
rules of the XML 1.0 specification - rules for characters, white space, quotes, etc.
- and its every parsed entity is well-formed
6Validity
- An XML-document is valid if
- it is well-formed
- it has an attached DTD (or schema)
- it conforms to the DTD (or schema)
- Validity is checked with a validating parser,
either - the whole document at once (batch")
- interactively
7Document type declaration
- Shared
- lt!DOCTYPE catalog PUBLIC -//ORG_NAME//DTD
CATALOG//EN"gt - - flag(-/) indicates a less important
standard - ISO standards start with ISO
- ORG_NAME the owner of the DTD
- DTD file type
- CATALOG document name
- EN language
- the document type definition can be included in
the internal database of the processor (no
connection needed) - lt!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML
4.01//EN"gt
8External or internal DTD
- Document type declaration format
- lt!DOCTYPE document_element source location
internal subset of DTD gt - internal DTD, DTD and instance in the same
filelt!DOCTYPE catalog SYSTEM lt!ELEMENT
catalog and so on gt - simple example
- lt!DOCTYPE Mymessage SYSTEM lt!ELEMENT Mymessage
(PCDATA)gt - external DTD, examples
- lt!DOCTYPE dictionary SYSTEM dictionary.dtdgt
- lt!DOCTYPE dictionary SYSTEM http//www.evtek.fi/D
TD/dictionary.dtdgt
9External or internal DTD
- internal and externallt!DOCTYPE Mymessage SYSTEM
myDTD.dtd lt!ELEMENT Mymessage and so on
(PCDATA)gt - Shared document type
- lt!DOCTYPE Dictionary PUBLIC http//www.evtek.fi/
DTD/dictionary.dtdgt
10Element type declaration
- lt!ELEMENT country (capital)gt
- element name
- element content
- content declaration, content model
- delimiters (lt!, gt, (, )) and keyword (ELEMENT)
11Sub-elements, children
- Children in specified order
- lt!ELEMENT country (cname, capital, population)gt
- choice of a child (pipe )
- lt!ELEMENT country (cname official_name)gt
- optional singular element ? only one or zero
- lt!ELEMENT country (cname, capital, population?)gt
12Cardinality operators
- Number of occurrences
- zero or more, optional
- lt!ELEMENT country (cname, capital, city)gt
- one or more child elements, required
- lt!ELEMENT country (cname, neighbour_country)gt
- ? optional singular element
- none one required singular element
- repeating group
- lt!ELEMENT country (cname, (city,
city_population))gt
13Element content model
- Datalt!ELEMENT cname (PCDATA)gt
- "parsed character data"
- Elements
- sub-elements ( child elements)
- Mixed content
- data and elements
- lt!ELEMENT para (PCDATA sub super)gt
- PCDATA must be first in content model, group has
options - child element sequence, choices and cardinality
cannot be specified
14Empty element and ANY
- lt!ELEMENT image EMPTYgt
- in document instance must be ltimage/gt
- not allowed ltimagegtlt/imagegt
- a regular declaration lt!ELEMENT im (...)gt
- allows both ways ltim/gt, ltimgtlt/imgt or ltimgt...lt/imgt
- lt!ELEMENT some ANYgt
- element can contain any declared element,
- flexible, maybe too flexible?
15Short example Dictionary
- lt!ELEMENT dictionary (word_article)gt
- lt!ELEMENT word_article (head_word, pronunciation,
sense)gt - lt!ELEMENT head_word (PCDATA)gt
- lt!ELEMENT pronunciation (PCDATA)gt
- lt!ELEMENT sense (definition, example)gt
- lt!ELEMENT definition (PCDATA)gt
- lt!ELEMENT example (PCDATA)gt
16Dictionary XML
- lt?xml version"1.0" ?gt
- lt!DOCTYPE dictionary SYSTEM "dict.dtd"gt
- ltdictionarygt
- ltword_articlegt
- lthead_wordgt
- carry
- lt/head_wordgt
- ltpronunciationgt
- kaeri
- lt/pronunciationgt
17- ltsensegt
- ltdefinitiongt
- support the weight of and move from place
to place - lt/definitiongt
- ltexamplegt
- Railways and ships carry goods.
- lt/examplegt
- ltexamplegt
- He carried the news to everyone.
- lt/examplegt
- lt/sensegt
- ltsensegt
- ltdefinitiongt
- wear, possess lt/definitiongt
- ltexamplegt
- I never carry much money with me.
- lt/examplegt
- lt/sensegt
- lt/word_articlegt
18- ltword_articlegt
- lthead_wordgtgossamer
- lt/head_wordgt
- ltpronunciationgt
- gosomo
- lt/pronunciationgt
- ltsensegt
- ltdefinitiongtfine, silky substance of webs
made by - small spiders
- lt/definitiongt
- lt/sensegt
- lt/word_articlegt
- lt/dictionarygt
19Content models
- Try to make definitions unambiguous (clear)
- wrong (item?, item)
- right (item, item?)
- wrong ((surname, employee) (surname, customer))
- right (surname, (employee customer))
- lt!ELEMENT BookCatalog (Catalog, Publisher,
Book)gt - document must have ltBookCataloggt top level
element - contains always one ltCataloggt and at least one
ltPublishergt child - ltBookgt elements may not be present
20Attribute declarations ATTLIST
- Attributes can be used to describe the metadata
or properties of the associated element - attributes are also an alternative way to markup
data - lt!ATTLIST country
- population NMTOKEN IMPLIED
- language CDATA REQUIRED
- continent (Europe America Asia ) "Europe"gt
- CDATA
- character data, any text
- enumerated values (choice list)
- lt!ATTLIST country continent (Europe America
Asia ) "Europe"gt - remember that XML is case sensitive
- default value given above
- the parser may supply a default value if it is
not given
21Attribute defaults
- REQUIRED
- the attribute must appear in every instance of
the element - IMPLIED
- optional
- enumerated values can have a default
- no default for implied/required
- lt!ATTLIST catalog type CDATA REQUIREDgt
- lt!ATTLIST catalog type NMTOKEN IMPLIEDgt
- lt!ATTLIST catalog type (phone e-mail)gt
- lt!ATTLIST catalog type (phone e-mail) "phone"gt
22Attribute types
- NMTOKEN
- name token ltcountry population "100"gt
- NMTOKENS
- a list of name tokens delimited by white space
- These types are useful primarily to processing
applications. The types are used to specify a
valid name(s). You might use them when you are
associating some other component with the
element, such as a Java class or a security
algorithm - lt!ATTLIST DATA AUTHORIZED_USERS NMTOKENS
IMPLIEDgt - ltDATA SECURITY"ON"
- AUTHORIZED_USERS "IggieeB SelenaS
GuntherB"gt - element content
- lt/DATAgt
23Attribute types
- ID
- attribute value is the unique identifier for this
element instance, must be a valid XML name - IDREF
- reference to the element that has the same value
as that of the IDREF - IDREFS
- a list of IDREFs delimited by white space
24Attribute defaults
- lt!ATTLIST country position FIXED "independent"gt
- attribute must match the default value
- why for example to supply a value for an
application - reserved attributes,
- xmllang
- xmlspace
- prefix 'xml'
25Element vs attribute
- When to mark up with an element, when to use
attributes? - Element
- to describe structures, expandable
- when shown in the output
- contents cannot be defined as strictly as with
attributes - Attribute
- no structure, no multiple values
- internal information
- default values possible
26Entities
- Each XML document is an entity, could comprise of
several entities - document entity
- subdocuments" entities
- general entities
- internal or external
- parsed or unparsed (external only)
- a parsed entity can include any well-formed
content (replacement text) - entity declaration
- entity reference
- all unparsed entities must have an associated
notation
27Internal text entities
- Predefined string
- lt!ENTITY evitech Espoo Vantaa Institute of
Technology"gt - I study at the evitech
- Single versus double quotes
- lt!ENTITY sent 'His foot is 12" long'gt
- lt!ENTITY sent "His foot is 12quot long"gt
- Entity reference character string
- gt gt
- lt lt
- quot "
- apos '
- amps
- 60 lt
- 65 A
- x3C lt (hexadecimal)
- xFFF8 ... (Unicode)
28CDATA special characters in XML
- ltactiongt
- ltscript language 'Javascript'gt
- lt!CDATA
- function Fhello()
- if (n gt1 m gt 8)
- alert ("Hello")
-
- gt
- lt/scriptgt
- lt/actiongt
29External entity
- Outside the document entity itself
- within the same resource
- lt!ENTITY myfile SYSTEM "extra_files/file.xml"gt
- public location
- lt!ENTITY myfile PUBLIC "... description..."gt
- needs an index
- an unparsed external entity is a reference to an
external resource (I.e. an image file) - Binary file
- file type has to be declared
- lt!ENTITY myphoto SYSTEM "/figures/photo.gif"
NDATA GIFgt - Use Take a look at my photo ltpicture
name"myphoto"/gt.
30- ENTITY
- lt!ENTITY mypicture "123.jpg"gt
- lt!ELEMENT pic EMPTYgt
- lt!ATTLIST pic picfile ENTITY mypicturegt
- in the document instance ltpic
picfile"mypicture/gt - ENTITIES
- a list of ENTITY names
- Notation
- lt!ATTLIST image format NOTATION (TeX TIFF)gt
31Entity references, summary
- General parsed entity reference
- in the document instance
- not in the DTD
- entity hierarchy
- unparsed entity
- no references from the text
- given as attribute values
- parameter entity
- in DTD, not in the document instance
32Parameter entity
- Only usable within a DTD (not in an XML document)
- lt!ENTITY parapart "(emph supersc subsc)"gt
- lt!ELEMENT paragraph (parapart bold)gt
- lt!ELEMENT list (parapart item)gt
- lt!ELEMENT paragraph (emph supersc subsc
bold)gt
33Notation declaration
- lt!NOTATION PIXI SYSTEM ""gt
- lt!NOTATION TIFF SYSTEM "C\APPS\Show_tiff.exe"gt
- Entity declaration refers to notation
- lt!ENTITY Logo SYSTEM "logo.tif" NDATA TIFFgt
- Notation provides information for an application
how to process unparsed entities
34Without a DTD
- Attributes have no default values
- attributes are always text type CDATA
- all attributes are optional
- entities cannot be declared
- only standard entities are possible (apos)
- element contents are not clearly defined
- elements, data or mixed
35DTD design
- XML often replaces a previous system
- when transforming to XML
- a standard DTD could be selected (with possible
modifications) - partners, affiliations
- a new DTD is designed
- DTD design based on
- existing document models in the company
- (representative) model documents
- other designers consulted
36Document analysis
- Document features
- name, could it be without a name?
- how many occurrences
- preceding/ following information, regularity
- parts of the document
- standard contents (automatically generated)
- XML document (or parts of it) maybe generated
from a data base - use data base relation, descriptions and models
(UML) when designing DTD
37DTD design
- Standard DTD or new?
- Compatibility and data exchange
- processing needs, applications
- future needs, linking
- consistent names
- Element order, granularity, structure
- element vs. attribute?
- Rules? Order of rules ?
- comments?
- modularity?
- Naming style short or descriptive, upper or
lower case?
38Tree diagrams
dairy_farm
forestry_farm
lt!ELEMENT farm (farmer, farm_hand)gt
lt!ELEMENT farm (dairy_farm forestry_farm)gt
name
PCdata
39Standard DTDs
- http//www.xml.com/pub/rg/DTDs
- http//www.xml.org/
- http//xml.coverpages.org/
- http//www.ebxml.org/specs/ebBPSS.dtd
- MathML,
- CML (chemistry),
- UXF (UML eXchange Format),
- SMIL (multimedia),
- RDF (Resource Description Framework),
- HumanML (natural language),
- DocBook, etc.