Overview of XML - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Overview of XML

Description:

XML is a language to define any set of tags which describe text; the text can be ... My Summer Holiday head My Summer Holiday /head History of Text Encoding ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 14
Provided by: TerryB73
Category:
Tags: xml | overview

less

Transcript and Presenter's Notes

Title: Overview of XML


1
Overview of XML
  • Terry Butler
  • Director, Research Computing
  • Arts Resource Centre
  • 2005 February

2
Overview of XML
  • What is XML?
  • History of text encoding technologies
  • XML in Corpus Linguistics
  • Tools for XML

3
What is XML?
  • What is XML?
  • Extensible Markup Language (XML) is a simple,
    very flexible text format (from the W3C page)
  • it is a W3C standard
  • w3c.org/TR/2004/REC-xml-20040204/
  • it is a human-readable language for text markup
  • and an easy-to-process language for information
    exchange between computer systems

4
XML vs. HTML
  • HTML has a predefined set of tags which are meant
    to display information via a particular program
    (a web browser)
  • XML is a language to define any set of tags which
    describe text the text can be shared, displayed
    and transformed in many ways

5
XML Basics
  • a DTD (document type definition) defines a class
    of documents each document is an instance
  • the document is created under control of the DTD,
    so it can be made well-formed and valid
  • not well formed
  • ltgreetingsgtIm pleased to ltactiongtbe
    herelt/greetingsgtlt/actiongt.
  • not valid
  • ltgreetingsgtIm pleased to ltactiongtbe
    herelt/actiongt.lt/grootingsgt
  • there are shortcuts for composing
  • entities nbsp myAddress
  • the underlying character set is UNICODE

6
History of Text Encoding Technologies
  • from GML to SGML to XML
  • www.oasis-open.org/cover/sgmlhist0.html
  • presentation-oriented markup (like Word) vs.
    descriptive markup
  • My Summer Holiday
  • ltheadgtMy Summer Holidaylt/headgt

7
History of Text Encoding Technologies
  • SGML (ISO 8879) was powerful but complex
  • used in publishing, technical documentation
    fields
  • difficult for humans to craft, computers to
    process
  • HTML appeared, and the web took off
  • simple SGML-like syntax, few tags
  • browsers could format the text for viewing
  • no rules checking!
  • SGML proved very hard to adapt to the web
  • browsers are the final arbiter

8
History of Text Encoding Technologies
  • XML designed to overcome these obstacles
  • document can be processed without the DTD
  • minimization and simplification of syntax
  • XML has many affiliated technologies
  • style sheets
  • linking
  • queries (searching)

9
Tools for XML
  • XML Spy Altova
  • powerful XML editing and stylesheet capabilities
  • excellent visualization and validation features
  • XMetaL Blast Radius
  • simple XML editor
  • familiar user interface (like word processor)
  • Turbo XML TIBCO
  • integrates with content repositories
  • has a transformation creation tool (XML
    Transform)

10
XML in Corpus Linguistics
  • XML in Corpus Linguistics
  • SGML, now XML used, to markup text with features
    of interest
  • users can both search and display the text with
    or without the tags
  • examples
  • BNC
  • ANC

11
BNC word class shown
12
BNC word class and tags shown
13
TAPoR Project
  • computer infrastructure (workshop, servers,
    storage)
  • funded by Canada Foundation for Innovation
  • resources to convert, store, query, transform and
    display textual data (encoded or not)
  • more information and projects at
    tapor.ualberta.ca
Write a Comment
User Comments (0)
About PowerShow.com