XML and XSL - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

XML and XSL

Description:

hierarchical (tree structure) Original content courtesy of Shaoping Moss. ... Friends & Family with full addresses for Holiday cards ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 40
Provided by: LAB987
Category:
Tags: xml | xsl

less

Transcript and Presenter's Notes

Title: XML and XSL


1
XML and XSL
  • A report on the workshop given by Shaoping Moss
    on October 16, 2004

.with additional examples from a real-life
project
Presented by ASIST members Caryn Anderson,
Prairie Clayton Kara Schwartz At Simmons
College, November 1, 2004
2
Topics discussed
  • SGML, XML, and HTML
  • XML and XSL Basics
  • XML in Libraries and Academics
  • XML in Future Web Development
  • Slide content courtesy of Shaoping Moss.

3
Markup Languages
  • Address the structure of a document.
  • Identify different components of the document.
  • Convey information to software that will allow it
    to
  • Index the data for searching.
  • Render the data.
  • Transform the data.
  • SGML, XML, and HTML are all markup languages.
  • Slide content courtesy of Shaoping Moss.

4
Document, Structure, and Format
  • A document is
  • A record which contains information , originally
    an inscribed or written record but now considered
    to include any format in which information might
    be held (e.g. map, manuscript, tape, video,
    software). (International Encyclopedia of
    Information and Library Science)
  • A collection of small elements, which can be
    headings, subheadings, paragraphs, quotations,
    etc
  • Structure vs Format
  • Structure is about the content of the document.
  • Format is about the way a document looks.
  • Slide content courtesy of Shaoping Moss.

5
What is SGML?
  • Stands for Standard Generalized Markup Language.
  • Initiated by Charles Goldfarb at IBM in the
    1960s.
  • Adopted as a standard of the International
    Organization for Standardization(ISO 8879) in
    1986.
  • Slide content courtesy of Shaoping Moss.

6
SGML and Its Subdivisions
  • SGML is composed of tag-set building rules.
  • SGML has given birth to other sets of
    subdivisions
  • HTML and XML.
  • CALS for defense.
  • BOEING for commercial airlines.
  • C-H for publishing.
  • OED for Old English Dictionary.
  • TEI guidelines for the Text Encoding Initiative.
  • EAD for Encoded Archival Descriptions.
  • Slide content courtesy of Shaoping Moss.

7
HTML Development
  • HTML stands for Hypertext Markup Language.
  • HTML was developed by Tim Berners-Lee at a
    physics lab near Geneva, Switzerland in 1992.
  • Its simplicity has contributed to the rapid
    growth of the World Wide Web in the 1990s.
  • HTML version 4 came out in 1997.
  • XHTML 1.0 is the latest HTML standard.
  • Slide content courtesy of Shaoping Moss.

8
HTML Problems
  • Easy HTML coding has made it harder for browsers
    to handle.
  • Tags are predefined in HTML.
  • Format and content are mixed and content is hard
    to reuse.
  • Slide content courtesy of Shaoping Moss.

9
What is XML?
  • XML is a new Web standard developed by the World
    Wide Web Consortium in 1998.
  • XML stands for eXtensible Markup Language.
  • XML was designed to describe data.
  • XML tags are not predefined in XML.
  • XML separates format from content and semantic
    structure.
  • Data encoded in XML can function much like a
    traditional database.
  • XML content can be output in many formats, such
    as XHTML, text, Word documents, PDF, etc
  • Slide content courtesy of Shaoping Moss.

10
The Display of the Document
My First XML Chapter 1 Introduction to XML
What is HTML? What is XML? Chapter 2 XML
Syntax Elements must have a closing tag
Elements must be properly nested
Slide content courtesy of Shaoping Moss.
11
An HTML Document
An HTML document describes the book
lth1gtMy First XMLlt/h1gt lth2gtIntroduction to
XMLlt/h2gt ltpgtWhat is HTML?lt/pgt ltpgtWhat is
XML?lt/pgt lth2gtXML Syntaxlt/h2gt ltpgtElements must
have a closing tag.lt/pgt ltpgtElements must be
properly nested.lt/pgt
Slide content courtesy of Shaoping Moss.
12
An XML Document
An XML document describes the book
ltbookgt lttitlegtMy First XMLlt/titlegt
ltchaptergtIntroduction to XML ltparagtWhat is
HTML?lt/paragt ltparagtWhat is XML?lt/paragt
lt/chaptergt ltchaptergtXML Syntax
ltparagtElements must have a closing tag.lt/paragt
ltparagtElements must be properly
nested.lt/paragt lt/chaptergt lt/bookgt
Slide content courtesy of Shaoping Moss, 2004
13
HTML Elements/Tags
An HTML document describes the book
lth1gtMy First XMLlt/h1gt lth2gtIntroduction to
XMLlt/h2gt ltpgtWhat is HTML?lt/pgt ltpgtWhat is
XML?lt/pgt lth2gtXML Syntaxlt/h2gt ltpgtElements must
have a closing tag.lt/pgt ltpgtElements must be
properly nested.lt/pgt
  • Are
  • defined by HTML standard
  • always the same
  • can be used in any order

Original slide content courtesy of Shaoping Moss.
14
XML Elements/Tags
An XML document describes the book
ltbookgt lttitlegt My First XMLlt/titlegt
ltchaptergt Introduction to XML ltparagt What
is HTML?lt/paragt ltparagt What is XML?
lt/paragt lt/chaptergt ltchaptergt XML
Syntax ltparagt Elements must have a closing
tag. lt/paragt ltparagt Elements must be
properly nested. lt/paragt lt/chaptergt lt/bookgt
  • Are
  • defined by user/groups (DTD/Schema)
  • different for each DTD/Schema
  • hierarchical (tree structure)

Original slide content courtesy of Shaoping Moss.
15
XML is flexible and extensible
An XML document describes the book for a
different user group
ltmanuscriptgt ltnamegt My First XML lt/namegt
ltpartgt Introduction to XML ltsectiongt What
is HTML? lt/sectiongt ltsectiongt What is XML?
lt/sectiongt lt/partgt ltpartgt XML
Syntax ltsectiongt Element Rules lt/sectiongt
ltparagt Elements must have a closing tag.
lt/paragt ltparagt Elements must be properly
nested. lt/paragt lt/partgt lt/manuscriptgt
Instead of book
Extend to accommodate greater detail of part
section AND paragraph
Original slide content courtesy of Shaoping Moss.
16
Differences between HTML and XML
XML is not a replacement for HTML. XML and HTML
were designed with different goals. - XML
was designed to describe data and to focus on
what data is. - HTML was designed to display
data and to focus on how data looks. HTML
structure and tags are very loose while XML
structure and tags are strict - XML documents
must be well-formed. - XML elements must be
properly nested. - All XML elements must be
closed. - Tag names must be case consistent.
Slide content courtesy of Shaoping Moss.
17
Differences HTML XML
Content Format Selection Organization
  • - Held in generic containers (lth1gt, ltpgt, etc.)
  • In the default format of the content tag OR
  • As defined by a Cascading Style Sheet (internal
    or external)
  • All content always included (no option to easily
    select or suppress content must manually change
    document)
  • Content only displayed in the order written (to
    change order you must manually change document
  • Held in specific containers that describe what
    the data is (ltbookgt, ltchaptergt, etc.)
  • -XSLT files define the formats of each section
    (i.e. font, color, size, etc.)
  • -multiple XSLTs for same XML
  • XSLT selects and determines order of display of
    content
  • Multiple XSLTs for same XML (one to produce just
    book title list, one to display full text, one
    for citations, etc.)

18
Differences HTML XML
Analogy What you can get
Address List in plain WORD document One
document of your list of contacts with all the
information that you have for each person in the
order you typed it.
  • Address List in database or MAIL MERGE data file
  • Friends Family with full addresses for Holiday
    cards
  • E-mail list of just Professional contacts for
    announcing new product
  • Special formatting of whole list for better
    display on PDA
  • Etc. etc. etc. all from SAME XML document

19
How to Build an XML file family
  • Establish the Document Type Definition (DTD) or
    Schema
  • Write a well-formed XML document that holds your
    data in the containers established by your
    DTD/Schema
  • Validate your XML document to make sure you
    conformed to your DTD/Schema
  • Build as many different XSL documents as you need
    to select data from your XML file, organize it
    the way you want it to appear, and format it so
    it looks the way you want.

Now you can link your XML file to whatever XSL
you want to get the kind of display you want at
any given time.
20
The XML family unit of files and languages
http//www.mysite.org/myfile.xml WEB PAGE
5. Displays content to browser
1. Calls the .xml file
Uses HTML for formatting
XML Where the data is held
XSL Instructions for using XML data and
displaying it
2. Calls .xsl for display instructions
Uses XSLT to select data from .xml file and
format it
DTD or Schema The organizational chart for the
data
3. Looks in .xml for content
Uses XSL-PATH to access certain spots in the .xml
file
4. Returns content to .xsl
File type .xml
File type .xsl
Uses XSL-FO for specifying formatting semantics
(?)
File types .dtd .xml (schemas)
For validation during creation
Languages used in XSLT documents during creation
21
The DTD or Schema
means there can be as many of this element as
you want
lt!ELEMENT booklist (book)gt lt!ELEMENT book
(booktitle,author,country,publisher,price,year)gt
lt!ELEMENT booktitle(PCDATA) lt!ELEMENT
author(PCDATA)gt lt!ELEMENT country(PCDATA)gt lt!ELE
MENT publisher(PCDATA)gt lt!ELEMENT
price(PCDATA)gt lt!ELEMENT year(PCDATA)gt
The DTD establishes the hierarchy of
elements/tags.
Original file content courtesy of Shaoping Moss.
22
The XML document
lt?xml version"1.0" encoding"UTF-8" ?gt lt!DOCTYPE
list SYSTEM "dtdforbooklist.dtd"gt lt?xml-stylesheet
type"text/xsl" href"xslforbooklist.xsl"?gt ltbook
listgt ltbookgt ltbooktitlegtHTML and XHTMLthe
Definitive Guidelt/booktitlegt ltauthorgtChuck
Muscianolt/authorgt ltauthorgtBill
Kennedylt/authorgt ltcountrygtUSAlt/countrygt ltpubli
shergtO Reillylt/publishergt ltpricegt19.95lt/pricegt
ltyeargt2000lt/yeargt lt/bookgt ltbookgt ltbooktitlegt
XHTML 1.0 Language Sourcebooklt/booktitlegt ltautho
rgtIan S. Grahamlt/authorgt ltcountrygtUSAlt/countrygt
ltpublishergtJohn Wiley and Sonslt/publishergt ltpr
icegt30.00lt/pricegt ltyeargt2000lt/yeargt lt/bookgt lt/b
ooklistgt
This is what DTD is being used.
This is what XSL is being used.
Original file content courtesy of Shaoping Moss.
23
The XSL document
lt?xml version"1.0" encoding"UTF-8"?gt ltxslstyles
heet xmlnsxsl"http//www.w3.org/1999/XSL/Transfo
rm" version"1.0"gt ltxsltemplate
match"/"gt lthtmlgt ltbodygt lth1gtMy Book
Collectionlt/h1gt lttable border"1"gt lttr
bgcolor"9acd32"gt ltthgtTitlelt/thgt
ltthgtAuthorlt/thgt ltthgtPublisherlt/thgt
ltthgtCountrylt/thgt ltthgtPricelt/thgt
lt/trgt ltxslfor-each select"booklist/book"gt ltxsls
ort select"publisher"/gt ltxslif
test"yeargt1995"gt lttrgt lttdgtltxslvalue-of
select"booktitle"/gtlt/tdgt lttdgtltxslvalue-of
select"author"/gtlt/tdgt lttdgtltxslvalue-of
select"publisher"/gtlt/tdgt lttdgtltxslvalue-of
select"country"/gtlt/tdgt lttdgtltxslvalue-of
select"price"/gtlt/tdgt lt/trgt lt/xslifgt lt/xslfor-
eachgt lt/tablegt lt/bodygt lt/htmlgt lt/xsltemplategt lt/x
slstylesheetgt
xsltemplate is XSLT for use the template
below
match is X-PATH for link to or start with
and / means the root element (booklist in
this case)
This is basic HTML for the template
xslfor-each with the select instruction is
XSLT for select from each of the books in the
booklist
xslsort with the select instruction is XSLT
for sort by publisher
xslif with the test instruction is XSLT for
only those books when the year is later than
1995
xslvalue-of with the select instruction is
XSLT for use the data from this element
You must close your XSLT commands
You must close the HTML tags of your template
24
The Web Page
Original file content courtesy of Shaoping Moss.
25
Done! not so hard
  • Logical
  • Flexible
  • Extensible
  • Interoperable!!

26
XML in Libraries
  • Use XML to mapping MARC to MARC XML, HTML, or
    MODS formats
  • MARC XML Conversion Stylesheets
  • Use XML to improve searching of archival finding
    aids and to catalog Web sites- Five College
    Archives Manuscript Collections.
  • http//asteria.fivecolleges.edu/index.html
  • XML-based eScholarship.
  • http//escholarship.cdlib.org/
  • Use XML for interlibrary loan.
  • XML-based database systems.
  • Slide content courtesy of Shaoping Moss.

27
XML in Academics
  • Text Encoding Initiative(TEI)
  • http//www.tei-c.org/
  • Initially launched in 1987, TEI is an
    internationally and interdisciplinary standard
    for encoding, keeping and analyzing textual
    content structure of digital texts.
  • This standard is designed for use with a broad
    range of text types, especially in the
    humanities. It is widely used in libraries,
    archives, and by publishers and researchers for
    online research and teaching and for the storage
    and exchange of large and small text collections.
  • Since 1987, TEI projects have mushroomed in all
    humanities disciplines, including language,
    literature, history, classics, social science and
    computer science.
  • Slide content courtesy of Shaoping Moss.

28
TEI projects
  • Women Writers Project.
  • http//www.wwp.brown.edu
  • Perseus Digital Library.
  • http//www.perseus.tufts.edu/
  • Early American Fiction Collection.
  • http//etext.lib.virginia.edu/eaf/pubindex.html
  • American Memory Project- Historical Collections
    for the National Digital Library.
  • http//lcweb2.loc.gov/ammem/ammemhome.html
  • The Newton Papers Project.
  • http//www.newtonproject.ic.ac.uk
  • Slide content courtesy of Shaoping Moss.

29
XML is Going to Be Everywhere
  • TEI guidelines for the Text Coding Initiative
  • http//www.tei-c.org/Guidelines2/index.html
  • EAD for Encoded Archival Descriptions
  • http//www.loc.gov/ead/
  • The Dublin Core Metadata Initiative (DCMI)
  • http//dublincore.org/
  • MARC XML-MARC 21 XML Schema
  • http//www.loc.gov/standards/marcxml/
  • MODS XML- Metadata Object Description Schema
  • http//www.loc.gov/standards/mods
  • Slide content courtesy of Shaoping Moss.

30
XML is Going to Be Everywhere
  • Resource Description Framework (RDF)
  • Information and Content Exchange (ICE)
  • Online Information Exchange (ONIX)
  • Metadata for Images in XML (MIX)
  • XML/EDI (Electronic Data Interchange)
  • Bioinformatic Sequence Markup Language (BSML)
  • Mathematical Markup Language (MathML)
  • Slide content courtesy of Shaoping Moss.

31
XML in Future Web Development
  • XML is a cross-platform, software and hardware
    independent tool for transmitting information.
  • XML will be as important to the future of the Web
    as HTML has been to the foundation of the Web.
  • XML will become the most common tool for all data
    manipulation and data transmission.
  • Every serious Web technology is now expected to
    define its relationship to XML.
  • Slide content courtesy of Shaoping Moss.

32
XML in Future Web Development
  • Every serious Web technology is now expected to
    define its relationship to XML.
  • - Catherine Ebenezer in Trends in Integrated
    Library Systems.
  • Slide content courtesy of Shaoping Moss.

33
Shaoping Moss
Information Technology Consultant Research and
Instructional Support Mount Holyoke
College Email smoss_at_mtholyoke.edu Phone
413.538.3034 Fax 413.538.3112
We are grateful to Shaoping Moss for being such
an excellent instructor and giving us permission
to use her slides and materials in this
presentation.
34
So this XML stuff is rad and all but could I see
why Id want to learn it and not just an encoding
set like EAD?
35
Well, suppose youve got a batch of metadata on
your hands. Not just any metadata, but some
weird set of information that cant really be
shoehorned into your pal MARC 21. You need some
way of organizing the metadata. It would be nice
if you could make the metadata look all pretty
and whatnot, while youre at it.
36
Heres where XML comes in!
  • Get your metadata together, having done all the
    sexy stuff like data dictionary creation first
  • Define labels for everything
  • Match related terms, including subordinates
  • Define your rules (Y can only appear after X, and
    if you have X and Y, you must have Z, but Q is
    optional, etc)
  • Youve pretty much just made up a schema right
    there
  • Wait, what was that about making it pretty?

37
Oh, right, it should be attractive. Well, then
you just start playing with XSL.
ltLINK REL"STYLESHEET" TYPE"text/css"
HREF"./games.css" TITLE"MASTER"/gt
Specifically, you tell the XSL to go look at the
plain ol stylesheet youve adapted from a
thousand other HTML pages.
38
So then youve got this.
39
Hey, wait. I thought you said this was all
cross-platform and cross-browser. How come this
isnt parsing in my browser? And how do I search
individual records? You mean I have to hand
encode every record?
Well, yes. You can write your own parser, export
encoded records from a database, or create a
search engine if you like. Youll just need more
than a semesters worth of practice to do it.
Write a Comment
User Comments (0)
About PowerShow.com