ITR3%20lecture%202:%20XML - PowerPoint PPT Presentation

About This Presentation
Title:

ITR3%20lecture%202:%20XML

Description:

Sofix is an XML based cataloging format for classical music CDs. ... alto_sax, bariton, bass, bassoon, chamber orchestra, cello, choir, choir_master, ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 34
Provided by: kric2
Learn more at: https://openlib.org
Category:
Tags: 20xml | 20lecture | itr3

less

Transcript and Presenter's Notes

Title: ITR3%20lecture%202:%20XML


1
ITR3 lecture 2 XML
  • Thomas Krichel
  • 2002-10-16

2
Structure
  • URIs (we will come back to them in lecture 3)
  • XML
  • Sofix xml example

3
Literature
  • Castro, Elizabeth (2001) XML for the World Wide
    Web Peachpit Press
  • RFC 2396
  • http//openlib.org/home/krichel/lis900gp02i

4
Uniform Resource Identifiers URI
  • A Uniform Resource Identifier (URI) is a compact
    string of characters for identifying an abstract
    or physical resource.
  • They provide a simple and extensible means for
    identifying a resource.

5
Universal concept of resource
  • A resource can be anything that has identity.
    Not all resources are network retrievable''.
  • The resource identifier identifies a resource,
    not necessarily the state in which the resource
    is in at a particular point in time.

6
Benefits of uniformity
  • it allows different type of resource identifiers
    to be used in the same context, even when the
    mechanisms used to access those resources may
    differ
  • it allows uniform semantic interpretation of
    common syntactic conventions across different
    types of resource identifiers

7
Benefits of extensibility
  • allows introduction of new types of resource
    identifiers without interfering with the way that
    existing identifiers are used
  • it allows the identifiers to be reused in many
    different contexts, thus permitting new
    applications or protocols to leverage a
    pre-existing, large, and widely-used set of
    resource identifiers.

8
transcribability
  • The URI syntax was designed with global
    transcribability as one of its main concerns.
  • A URI is a sequence of characters, not a sequence
    of bytes
  • A URI may be transcribed from a non-network
    source, and thus should consist of characters
    that are most likely to be able to be typed into
    a computer
  • A URI often needs to be remembered by people, and
    it is easier for people to remember a URI when it
    consists of meaningful components.
  • Therefore it has a restricted set of characters,
    only US ASCII.

9
XML
  • Stands for eXtensible Markup Language
  • It is a recommendation by the World Wide Web
    Consortium (W3C). It is a new (1998) markup
    language that will transport a lot of contents
    over the Internet in the future.
  • As its level of complexity goes it sits in
    between HTML and SGML.

10
Importance of XML
  • XML will be, for the information industry, what
    the container is for international shipping.
  • A uniform syntactic convention for the encoding
    of any piece of information expressed as textual
    data (i.e. as characters)
  • Default character set is the UTF-8 encoding of
    Unicode.

11
HTML and XML
  • HTML comes with predefined tags such as HTML,
    HEAD, TITLE, BODY, H1, H2, P, UL, LI, IMG, A, EM,
    B etc
  • XML allows to use any tags.
  • XML has not yet replaced HTML. It lacks native
    support for images and links.

12
XML and SGML
  • SGML is the standard general markup language
    developed by an industry consortium
  • Very complicated, to extent that there is no full
    implementation software ever written
  • XML specs written by SGML aficionados who were
    aware of its problems

13
Original design goals
  • XML shall be straightforwardly usable over the
    Internet.
  • XML shall support a wide variety of applications.
  • XML shall be compatible with SGML.
  • It shall be easy to write programs which process
    XML documents.
  • The number of optional features in XML is to be
    kept to the absolute minimum, ideally zero.
  • XML documents should be human-legible and
    reasonably clear.
  • The XML design should be prepared quickly.
  • The design of XML shall be formal and concise.
  • XML documents shall be easy to create.
  • Terseness in XML markup is of minimal importance

14
Well-formed valid XML
  • Every piece of data that wants be be xml has to
    obey a set of rules. Otherwise it is just not XML
  • These rules ensure that the document is
    well-formed.
  • In addition, the XML document may obey to other
    rules, in that case it is called valid.

15
XML element
  • Syntax ltnamegtcontentslt/namegt
  • Where name is the name of the element and
    contents is the contents of the element.
  • ltnamegt is called the opening tag
  • lt/namegt is called the closing tag
  • Examples
  • ltsexgtFlt/sexgt
  • ltstorygtOnce upon a time there was. lt/storygt
  • Element names are case-sensitive. They must start
    with a letter or _.
  • Element names must not start with xml in any
    capitalization.

16
Attributes to XML elements
  • Are name/value pairs that further qualify element
    contents
  • Syntax ltname attribute_nameattribute_valuegt
    contentslt/namegt
  • Example
  • lttemperature unitFgt64lt/temperaturegt
  • ltswearword languagefrgtconlt/swearwordgt
  • Attribute names have to obey the same rules as
    element names.
  • Attribute values must be surrounded by single or
    double quotes.

17
Empty elements
  • Elements that are empty may be written as
    ltname/gt. This is a shorthand for ltnamegtlt/namegt.
  • Empty names may have attributes.
  • Example
  • ltgrade valueA/gt

18
Processing instructions
  • They are instructions to the software reading the
    XML.
  • General syntax is
  • lt?name attribute_name1attribute_value1
    attribute_name2attribute_value2 ?gt

19
comments
  • Start with lt!--
  • End with --gt
  • May not contain a double hyphen
  • Comments may not be nested i.e. no comments
    inside other comments.

20
Nesting elements
  • Elements are allowed to contain other elements.
  • Elements that contain other elements are called
    parent elements.
  • Elements that are contained in another element
    are children of that element.
  • Elements must be properly nested, i.e. child
    element closing tag must appear before parent
    element closing tag.

21
Root and prolog
  • There must be one root element that contains all
    other element is the document.
  • The prolog is what appears before the root
    element.
  • The prolog may contain the XML declaration.

22
XML declaration
  • The XML declaration is a special case of a
    processing instruction, it is written as
  • lt?xml version1.0?gt
  • If the XML declaration is there, it must be the
    first line.
  • You can declare your character set in the XML
    declaration, like
  • lt?xml version1.0 encodingucs-2?gt

23
Quote special symbols
  • is written as amp
  • lt is written as lt
  • gt is written as gt
  • is written as quot
  • is written as apos
  • Example ltstory contentshe pronounced the
    quotl-wordquot/gt

24
Document Type Definition DTD
  • DTDs are a legacy SGML tool to further define and
    refine the contents of an XML document. XML can
    be defined by an SGML
  • Still in use by the technologically retarded.
  • Not covered here, because there are more powerful
    replacements.

25
Example application sofix
  • Sofix is an XML based cataloging format for
    classical music CDs.
  • It is named after Sophie C. Rigny.
  • It is a creation of Thomas Krichel.
  • Used for teaching purposes only.

26
Key concepts in Sofix
  • Item an individual CD or a collection of CDs
    kept physically together (i.e. sold together)
  • Work a piece of music as recorded on a CD. For
    simplicity, we do not distinguish between
    composition and recording of that composition.
  • Track semantics associated with physical
    separation of tracks on the disk

27
Sofix in XML
  • ltitemgt
  • ltworkgt
  • lttrackgt
  • lt/trackgt
  • lt/workgt
  • ltitemgt

28
Sofix general rules
  • Record all titles in English. If no English title
    provided, use a translation if it is obvious. If
    the translation is not obvious, use original
    language.
  • All personal names as Lastname, Firstname
  • Translatable names in English.

29
Contents of ltitemgt
  • ltlabelnamegtname of labellt/labelnamegt
  • ltnumbergtnumber of the CDlt/numbergt
  • (followed by the works on the CD)

30
Contents of ltworkgt
  • lttitlegttitle of the worklt/titlegt
  • ltcompositionyeargt year when work was
    composedlt/compositionyeargt
  • ltrecordingyeargt year when the recording was made
    lt/recordingyeargt
  • ltcontributor rolecontributor rolegt name of
    contributor lt/contributorgt
  • Possibly many contributor, followed by a series
    of tracks

31
Contributor roles
  • alto, alto_sax, bariton, bass, bassoon,
    chamber orchestra, cello, choir, choir_master,
    clarinett,composer, conductor, flute,
    french_horn, horn, oboe, orchestra, organ, piano,
    piano_trio, prepared_piano, recorder, soprano,
    speaker, string_orchestra, string_quartett,
    viola, violin, xylophone

32
Attributes of lttrackgt
  • lttitlegt full title as given on CDlt/titlegt
  • lttimegt minutessecondslt/timegt
  • where minutes and seconds are numbers.

33
http//openlib.org/home/krichel
  • Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com