Prof. Ray Larson - PowerPoint PPT Presentation

About This Presentation
Title:

Prof. Ray Larson

Description:

Tuesday and Thursday 10:30 am - 12:00 pm. Fall 2003 ... XHTML is now a W3C 'recommendation' that applies XML conventions to HTML, and ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 68
Provided by: ValuedGate70
Category:
Tags: larson | prof | ray | xhtml

less

Transcript and Presenter's Notes

Title: Prof. Ray Larson


1
Lecture 14 Metadata and Markup
SIMS 202 Information Organization and Retrieval
  • Prof. Ray Larson Prof. Marc Davis
  • UC Berkeley SIMS
  • Tuesday and Thursday 1030 am - 1200 pm
  • Fall 2003
  • http//www.sims.berkeley.edu/academics/courses/is2
    02/f03/

2
Lecture Overview
  • Review
  • XML and Document Engineering
  • Metadata And Markup
  • XML As A Metadata Lingua Franca
  • METS
  • SGML vs. XML DTD Construction
  • XML Schemas
  • XML For Protocols And Metadata Languages
  • Readings/Discussion

3
Lecture Overview
  • Review
  • XML and Document Engineering
  • Metadata And Markup
  • XML As A Metadata Lingua Franca
  • METS
  • SGML vs. XML DTD Construction
  • XML Schemas
  • XML For Protocols And Metadata Languages
  • Readings/Discussion

4
Lecture Overview
  • Review
  • XML and Document Engineering
  • Metadata And Markup
  • XML As A Metadata Lingua Franca
  • METS
  • SGML vs. XML DTD Construction
  • XML Schemas
  • XML For Protocols And Metadata Languages
  • Readings/Discussion

5
XML as a common syntax
  • XML (and SGML) provide a way of expressing the
    structure of documents that can be verified and
    validated by document processing systems
  • Documents can be metadata structures
  • Such as the description of a particular
    photograph in our Phone project
  • XML thus provides a way of representing metadata
    descriptions as well as the content that they
    describe

6
XML as a common syntax
  • All XML documents follow some simple rules that
    make them interchangeable and usable across
    different systems
  • All data and markup is in UNICODE
  • All elements are marked by begin and end tags
  • All markup is case-sensitive
  • XML DTDs and/or Schemas define the valid
    structure (and sometimes content) of the documents

7
Example METS
  • METS the Metadata Encoding and Transmission
    Standard is a new Schema intended to provide
  • a standard for encoding descriptive,
    administrative, and structural metadata regarding
    objects within a digital library, expressed using
    the XML schema language of the World Wide Web
    Consortium
  • METS can be used to wrap complex sets of data
    (the actual data, with rules for encoding binary
    forms), the metadata describing the parts of that
    data, and the sequence and conditions under which
    the data can or should be presented or displayed

8
Lecture Overview
  • Review
  • XML and Document Engineering
  • Metadata And Markup
  • XML As A Metadata Lingua Franca
  • METS
  • SGML vs. XML DTD Construction
  • XML Schemas
  • XML For Protocols And Metadata Languages
  • Readings/Discussion

9
SGML/XML Structure
  • An SGML document consists of three parts
  • The SGML Declaration
  • The Document Type Definition (DTD)
  • The Document Instance
  • An XML document REQUIRES only the document
    instance, but for effective processing a DTD is
    very important
  • XML Schema (later) provides an alternative to
    DTDs for XML applications

10
Document Type Definitions
  • The DTD describes the structural elements and
    "shorthand" markup for a particular document type
    and defines
  • Names of "legal" elements
  • How many times elements can appear
  • The order of elements in a document
  • Whether markup can be omitted (SGML only)
  • Contents of elements (i.e., nested structures)
  • Attributes associated with elements
  • Names of "entities"
  • Short-hand conventions for element tags (SGML
    only)

11
DTD Components
  • The major components of a DTD are
  • Entity Declarations
  • Element Declarations
  • Attribute Declarations

12
Document Type Definitions
  • Entity Declarations are a "macro" definition
    facility for both DTD and Document instance parts
  • General Internal Entity Definitionslt!ENTITY name
    "substitute string"gtreferenced by name
  • General External Entity Definitionslt!ENTITY name
    SYSTEM "file path"gtreferenced by name
  • Parameter Entity Definitions (used only inside
    DTDs)lt!ENTITY name "substitute
    string"gtorlt!ENTITY name SYSTEM "file
    path"gtreferenced by name or name

13
Document Type Definitions
  • SGML Element Declarations define the structural
    elements of a document and its associated
    markuplt!ELEMENT name - - content_model or
    declared_content (include_list) -(exclude_list)
    gt
  • Omitted tag minimization indicates whether
    start-tags or end-tags can be omitted in the
    markup (o) or (-) are required in SGML but can
    NOT be used in XML

14
Document Type Definitions
  • Content model provides a nested structural
    description of the elements that make up this
    element, e.g.
  • lt!ELEMENT memo - - ((to from), body, close?)gt
  • lt!ELEMENT body - O (p) gt
  • lt!ELEMENT p - O (PCDATA q)gt
  • lt!ELEMENT q - - (PCDATA)gt...
  • ANY (in SGML) may be used to indicate a content
    model of any elements in the DTD, in any order

15
Document Type Definitions
  • Same content model in XML
  • lt?xml version 1.0?gt
  • lt!DOCTYPE memo
  • lt!ELEMENT memo ((to from), body,
    close?)gtlt!ELEMENT body (p) gtlt!ELEMENT p
    (PCDATA q) gtlt!ELEMENT q (PCDATA)gt
  • gt
  • Note the XML processing instruction Prolog
  • Note that in previous page is not legal XML

16
Document Type Definitions
  • Declared content can bePCDATA, CDATA, RCDATA,
    EMPTY
  • Inclusion and Exclusion lists can be used to
    indicate elements that can occur or are forbidden
    to occur in any sub-elements of the content model
    (NOT in XML), e.g.
  • lt!ELEMENT memo -- ((to from), body close?)
    (fn)gt
  • Says that element fn can appear anyplace in the
    memo

17
Document Type Definitions
  • Attribute Declarations define attributes
    associated with (potentially) each element of a
    document and provide the acceptable values for
    those attributes

18
Attributes Example
  • lt!ATTLIST associate_element attribute_name
    declared_value default_value gt
  • lt!ATTLIST memo status (PUBLIC CONFIDENTIAL)
    PUBLICgt
  • In markup of a document ltmemo
    status"CONFIDENTIAL"gtalso, because of the
    default setltmemogtwould be the same as ltmemo
    status"PUBLIC"gtThere are a variety of special
    defaults and data types that can be given in
    attribute definitions

19
Sample SGML DTD
lt!doctype ELIB-TEXTS lt!-- This is a DTD for
bibliographic records extracted from the
elib/rfc1357 simple bibliographic format.
--gt lt!ELEMENT ELIB-TEXTS o o (ELIB-BIB)gt lt!--
We allow most elements to occur any number of
times in any order --gt lt!-- this is because there
is little consistency in the actual usage.
--gt lt!ELEMENT ELIB-BIB - - (BIB-VERSION, ID,
ENTRY?, DATE?, TITLE, ORGANIZATION, (SERIES
TYPE REVISION REVISION-DATE AUTHOR-PERSONAL
AUTHOR-INSTITUTIONAL AUTHOR-CONTRIBUTING-PERSO
NAL AUTHOR-CONTRIBUTING-PERSONAL
AUTHOR-CONTRIBUTING-INSTITUTIONAL
CONTACT AUTHOR PROJECT PAGES BIOREGION
CERES-BIOREGION TEXTSOUP LOCATION
ULTIMATE-CLIENT URL KEYWORDS NOTES
ABSTRACT), (TEXT-REF PAGED-REF) )gt lt!-- We
won't make any assumptions about content... all
PCDATA --gt lt!ELEMENT ID - o (PCDATA)gt lt!ELEMENT
ABSTRACT - o (PCDATA)gt lt!ELEMENT
AUTHOR-CONTRIBUTING-INSTITUTIONAL - o
(PCDATA)gt lt!ELEMENT AUTHOR-CONTRIBUTING-PERSONAL
- o (PCDATA)gt lt!ELEMENT AUTHOR-PERSONAL-CONTRIBU
TING - o (PCDATA)gt etc gt
20
XML Version
lt!doctype ELIB-TEXTS lt!-- This is a DTD for
bibliographic records extracted from the
elib/rfc1357 simple bibliographic format.
--gt lt!ELEMENT ELIB-TEXTS(ELIB-BIB)gt lt!-- We
allow most elements to occur any number of times
in any order --gt lt!-- this is because there is
little consistency in the actual usage.
--gt lt!ELEMENT ELIB-BIB (BIB-VERSION, ID, ENTRY?,
DATE?, TITLE, ORGANIZATION, (SERIES TYPE
REVISION REVISION-DATE AUTHOR-PERSONAL
AUTHOR-INSTITUTIONAL AUTHOR-CONTRIBUTING-PERSONA
L AUTHOR-CONTRIBUTING-PERSONAL
AUTHOR-CONTRIBUTING-INSTITUTIONAL
CONTACT AUTHOR PROJECT PAGES BIOREGION
CERES-BIOREGION TEXTSOUP LOCATION
ULTIMATE-CLIENT URL KEYWORDS NOTES
ABSTRACT), (TEXT-REF PAGED-REF) )gt lt!-- We
won't make any assumptions about content... all
PCDATA --gt lt!ELEMENT ID (PCDATA)gt lt!ELEMENT
ABSTRACT (PCDATA)gt lt!ELEMENT AUTHOR-CONTRIBUTING-
INSTITUTIONAL (PCDATA)gt lt!ELEMENT
AUTHOR-CONTRIBUTING-PERSONAL (PCDATA)gt lt!ELEMENT
AUTHOR-PERSONAL-CONTRIBUTING (PCDATA)gt etc gt
21
Document Using That DTD
ltELIB-BIBgt ltBIB-VERSIONgtELIB-v1.0
lt/BIB-VERSIONgt ltIDgt6lt/IDgt ltENTRYgtFebruary 13
1995lt/ENTRYgt ltDATEgtMarch 1, 1993lt/DATEgt ltTITLEgtWat
er Conditions in California Report
2lt/TITLEgt ltORGANIZATIONgtCalifornia Department of
Water Resourceslt/ORGANIZATIONgt ltSERIESgt120-93lt/SER
IESgt ltTYPEgtbulletinlt/TYPEgt ltAUTHOR-INSTITUTIONALgtC
alifornia Department of Water Resources
lt/AUTHOR-INSTITUTIONALgt ltPAGESgt17lt/PAGESgt ltTEXT-RE
Fgt/elib/data/disk/disk5/documents/6/HYPEROCR/hyper
ocr.html lt/TEXT-REFgt ltPAGED-REFgt/elib/data/disk/
disk5/documents/6/OCR-ASCII-NOZONE
lt/PAGED-REFgt lt/ELIB-BIBgt
22
Dublin Core
  • Review
  • Simple metadata for describing internet resources
  • For Document-Like Objects
  • 15 Elements

23
Dublin Core Elements
  • Title
  • Creator
  • Subject
  • Description
  • Publisher
  • Other Contributors
  • Date
  • Resource Type
  • Format
  • Resource Identifier
  • Source
  • Language
  • Relation
  • Coverage
  • Rights Management

24
DC XML DTD Implementation
  • There have been various versions
  • This one is the one recommended (required) by the
    Open Archives Initiative Metadata Harvesting
    Protocol (OAI-MHP)
  • Uses XML Name Spaces
  • Available at http//dublincore.org/documents/2001/
    09/20/dcmes-xml/

25
DC Element and Attribute Definitions
lt!-- The elements from DCMES 1.1 --gt lt!-- The
name given to the resource. --gt lt!ELEMENT
dctitle (PCDATA)gt lt!ATTLIST dctitle xmllang
CDATA IMPLIEDgt lt!-- An entity primarily
responsible for making the content of the
resource. --gt lt!ELEMENT dccreator (PCDATA)gt
lt!ATTLIST dccreator xmllang CDATA IMPLIEDgt
lt!-- The topic of the content of the resource.
--gt lt!ELEMENT dcsubject (PCDATA)gt lt!ATTLIST
dcsubject xmllang CDATA IMPLIEDgt lt!-- An
account of the content of the resource. --gt
lt!ELEMENT dcdescription (PCDATA)gt lt!ATTLIST
dcdescription xmllang CDATA IMPLIEDgt lt!--
The entity responsible for making the resource
available. --gt lt!ELEMENT dcpublisher
(PCDATA)gt lt!ATTLIST dcpublisher xmllang CDATA
IMPLIEDgt lt!-- An entity responsible for making
contributions to the content of the resource.
--gt lt!ELEMENT dccontributor (PCDATA)gt
lt!ATTLIST dccontributor xmllang CDATA
IMPLIEDgt lt!-- A date associated with an event
in the life cycle of the resource. --gt lt!ELEMENT
dcdate (PCDATA)gt lt!ATTLIST dcdate xmllang
CDATA IMPLIEDgt
26
DC Element Definitions (cont.)
lt!-- The nature or genre of the content of the
resource. --gt lt!ELEMENT dctype (PCDATA)gt
lt!ATTLIST dctype xmllang CDATA IMPLIEDgt lt!--
The physical or digital manifestation of the
resource. --gt lt!ELEMENT dcformat (PCDATA)gt
lt!ATTLIST dcformat xmllang CDATA IMPLIEDgt
lt!-- An unambiguous reference to the resource
within a given context. --gt lt!ELEMENT
dcidentifier (PCDATA)gt lt!ATTLIST dcidentifier
xmllang CDATA IMPLIEDgt lt!ATTLIST dcidentifier
rdfresource CDATA IMPLIEDgt lt!-- A Reference
to a resource from which the present resource is
derived. --gt lt!ELEMENT dcsource (PCDATA)gt
lt!ATTLIST dcsource xmllang CDATA IMPLIEDgt
lt!ATTLIST dcsource rdfresource CDATA
IMPLIEDgt lt!-- A language of the intellectual
content of the resource. --gt lt!ELEMENT
dclanguage (PCDATA)gt lt!ATTLIST dclanguage
xmllang CDATA IMPLIEDgt lt!-- A reference to a
related resource. --gt lt!ELEMENT dcrelation
(PCDATA)gt lt!ATTLIST dcrelation xmllang CDATA
IMPLIEDgt lt!ATTLIST dcrelation rdfresource
CDATA IMPLIEDgt lt!-- The extent or scope of the
content of the resource. --gt lt!ELEMENT
dccoverage (PCDATA)gt lt!ATTLIST dccoverage
xmllang CDATA IMPLIEDgt lt!-- Information about
rights held in and over the resource. --gt
lt!ELEMENT dcrights (PCDATA)gt lt!ATTLIST
dcrights xmllang CDATA IMPLIEDgt
27
A More Complex SGML DTD
lt!DOCTYPE USMARC lt!-- USMARC DTD. UCB-SLIS
v.0.08 --gt lt!-- By Jerome P. McDonough, April 1,
1994 --gt lt!ELEMENT USMARC - - (Leader, Directry,
VarFlds)gt lt!ATTLIST USMARC Material
(BKAMCFMPMUVMSE) "BK" id
CDATA IMPLIEDgt lt!-- Author's Note the id
attribute for the USMARC element is
intended to hold a unique record number
for each MARC record in the
local database. That is to
say, it is intended ONLY as an aid in
maintaining the local database of MARC
records --gt lt!ELEMENT Leader - O (LRL, RecStat,
RecType, BibLevel, UCP, IndCount, SFCount,
BaseAddr, EncLevel, DscCatFm,
LinkRec, EntryMap)gt lt!ELEMENT Directry - O
(PCDATA)gt lt!ELEMENT VarFlds - O (VarCFlds,
VarDFlds)gt lt!-- Component parts of Leader
--gt lt!-- Logical Record Length --gt lt!ELEMENT LRL
- O (PCDATA)gt etc
28
More Complex DTD (cont.)
lt!-- Variable Data Fields --gt lt!ELEMENT VarDFlds
- O (NumbCode, MainEnty?, Titles, EdImprnt?,
PhysDesc?, Series?,
Notes?, SubjAccs?, AddEnty?, LinkEnty?,
SAddEnty?, HoldAltG?,
Fld9XX?)gt lt!-- Component Parts of Variable Data
Fields --gt lt!-- Numbers Codes --gt lt!ELEMENT
NumbCode - O (Fld010?, Fld011?, Fld015?, Fld017,
Fld018?, Fld019, Fld020,
Fld022, Fld023, Fld024,
Fld025, Fld027, Fld028, Fld029,
Fld030, Fld032, Fld033, Fld034,
Fld035, Fld036?,
Fld037, Fld039, Fld040?, Fld041?, Fld042?,
Fld043?, Fld044?,
Fld045?, Fld046?, Fld047?, Fld048, Fld050,
Fld051, Fld052,
Fld055, Fld060, Fld061, Fld066?,
Fld069, Fld070,
Fld071, Fld072, Fld074, Fld080?,
Fld082, Fld084, Fld086, Fld088, Fld090,
Fld096)gt lt!-- Main Entries --gt lt!ELEMENT
MainEnty - O (Fld100?, Fld110?, Fld111?,
Fld130?)gt lt!-- Titles --gt lt!ELEMENT Titles - O
(Fld210?, Fld211, Fld212, Fld214, Fld222,
Fld240?, Fld242, Fld243?, Fld245,
Fld246, Fld247)gt lt!-- Edition, Imprint, etc.
--gt lt!ELEMENT EdImprnt - O (Fld250?, Fld254?,
Fld255, Fld256?, Fld257?, Fld260?,
Fld261?, Fld262?, Fld263?,
Fld265?)gt lt!-- Physical Description, etc.
--gt lt!ELEMENT PhysDesc - O (Fld300, Fld305,
Fld306?, Fld310?, Fld315?,
Fld321, Fld340, Fld350?, Fld351, Fld355,
Fld357, Fld362)gt etc
29
Complex DTD (cont.)
lt!-- Title Statement --gt lt!ELEMENT Fld245 - O
(Six?, (abcfghknps))gt lt!ATTLIST Fld245
AddEnty (NoYesBlank) IMPLIED
NFChars (0123456789Blnk)
IMPLIEDgt etc lt!-- Subfield Element
Declarations --gt lt!ELEMENT a - O
(PCDATA)gt lt!ELEMENT b - O
(PCDATA)gt lt!ELEMENT c - O
(PCDATA)gt lt!ELEMENT d - O
(PCDATA)gt lt!ELEMENT e - O (PCDATA)gt
30
Document Markup
  • All document markup is derived from the DTD for
    the particular document type
  • In SGML the DTD should be referenced in the
    document using the DOCTYPE declaration
  • lt!DOCTYPE name SYSTEM "file_path"
    gtorlt!DOCTYPE name SYSTEM "file_path"
    doctype_declaration_subsetgtorlt!DOCTYPE name
    doctype_declaration_subsetgtThe
    doctype_declaration_subset can be any combination
    of elements, entity, and attribute declarations

31
HTML
  • HTML was not originally "real" SGML, the DTD was
    invented after the language
  • It is often more concerned with the form of the
    output on the screen than with the structural
    contents of the HTML docs
  • Relies on the application (such as Netscape) to
    implement interesting actions like hypertext
    linking
  • XHTML is now a W3C recommendation that applies
    XML conventions to HTML, and provides a growing
    set of capabilities within an XML framework (our
    phones use XHTML)

32
Lecture Overview
  • Review
  • XML and Document Engineering
  • Metadata And Markup
  • XML As A Metadata Lingua Franca
  • METS
  • SGML vs. XML DTD Construction
  • XML Schemas
  • XML For Protocols And Metadata Languages
  • Readings/Discussion

33
What are XML Schemas?
  • An XML vocabulary for expressing your data's
    structure AND content types, and even the
    business rules involved in processing the data
  • Written in XML themselves
  • Support namespaces for combining multiple schemas
    in the same documents
  • The slides in this section are based on an XML
    tutorial by Roger L. Costello

34
Example
ltlocationgt ltlatitudegt32.904237lt/latitude
gt ltlongitudegt73.620290lt/longitudegt
ltuncertainty units"meters"gt2lt/uncertaintygt lt
/locationgt
Is this data valid? To be valid, it must meet
these constraints (data business rules) 1.
The location must be comprised of a latitude,
followed by a longitude, followed by an
indication of the uncertainty of the
lat/lon measurements. 2. The latitude must be
a decimal with a value between -90 to 90 3.
The longitude must be a decimal with a value
between -180 to 180 4. For both latitude and
longitude the number of digits to the right
of the decimal point must be exactly six
digits. 5. The value of uncertainty must be a
non-negative integer 6. The uncertainty units
must be either meters or feet.
We can express all these data constraints using
XML Schemas
35
Validating your data
36
Purpose of XML Schemas
  • Specify
  • the structure of instance documents
  • "this element contains these elements, which
    contains these other elements, etc"
  • the datatype of each element/attribute
  • "this element shall hold an integer with the
    range 0 to 12,000" (DTDs don't do too well with
    specifying datatypes like this)

37
Why Schemas?
Motivation for XML Schemas
  • People are dissatisfied with DTDs
  • It's a different syntax
  • You write your XML (instance) document using one
    syntax and the DTD using another syntax --gt bad,
    inconsistent
  • Limited datatype capability
  • DTDs support a very limited capability for
    specifying datatypes. You can't, for example,
    express "I want the ltelevationgt element to hold
    an integer with a range of 0 to 12,000"
  • Desire a set of datatypes compatible with those
    found in databases
  • DTD supports 10 datatypes XML Schemas supports
    44 datatypes

38
Highlights of XML Schemas
  • XML Schemas are a tremendous advancement over
    DTDs
  • Enhanced datatypes
  • 44 versus 10
  • Can create your own datatypes
  • Example "This is a new type based on the string
    type and elements of this type must follow this
    pattern ddd-dddd, where 'd' represents a digit".
  • Written in the same syntax as instance documents
  • less syntax to remember
  • Object-oriented'ish
  • Can extend or restrict a type (derive new type
    definitions on the basis of old ones)
  • Can express sets, i.e., can define the child
    elements to occur in any order

39
Highlights of XML Schemas
  • Can specify element content as being unique (keys
    on content) and uniqueness within a region
  • Can define multiple elements with the same name
    but different content
  • Can define elements with nil content
  • Can define substitutable elements - e.g., the
    "Book" element is substitutable for the
    "Publication" element.

40
BookStore.dtd
lt!ELEMENT BookStore (Book)gt lt!ELEMENT Book
(Title, Author, Date, ISBN, Publisher)gt lt!ELEMENT
Title (PCDATA)gt lt!ELEMENT Author
(PCDATA)gt lt!ELEMENT Date (PCDATA)gt lt!ELEMENT
ISBN (PCDATA)gt lt!ELEMENT Publisher (PCDATA)gt
41
ELEMENT
ATTLIST
BookStore
Author
PCDATA
Book
ID
Title
CDATA
NMTOKEN
ISBN
Publisher
Date
ENTITY
This is the vocabulary that DTDs provide to
define your new vocabulary
42
http//www.w3.org/2001/XMLSchema
http//www.books.org (targetNamespace)
complexType
element
BookStore
Author
sequence
Book
schema
Title
boolean
string
ISBN
Publisher
Date
integer
This is the vocabulary that XML Schemas provide
to define your new vocabulary
One difference between XML Schemas and DTDs is
that the XML Schema vocabulary is associated with
a name (namespace). Likewise, the new vocabulary
that you define must be associated with a name
(namespace). With DTDs neither set of vocabulary
is associated with a name (namespace) DTDs
pre-dated namespaces.
43
lt?xml version"1.0"?gt ltxsdschema
xmlnsxsd"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.books
.org" xmlns"http//www.book
s.org" elementFormDefault"q
ualified"gt ltxsdelement name"BookStore"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Book"
minOccurs"1" maxOccurs"unbounded"/gt
lt/xsdsequencegt lt/xsdcomplexTypegt
lt/xsdelementgt ltxsdelement name"Book"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Title"
minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Author" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Date" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"ISBN" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Publisher" minOccurs"1" maxOccurs"1"/gt
lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Title" type"xsdstring"/gt
ltxsdelement name"Author" type"xsdstring"/gt
ltxsdelement name"Date" type"xsdstring"/gt
ltxsdelement name"ISBN" type"xsdstring"/gt
ltxsdelement name"Publisher" type"xsdstring"/gt
lt/xsdschemagt
BookStore.xsd
xsd Xml-Schema Definition
44
lt?xml version"1.0"?gt ltxsdschema
xmlnsxsd"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.books
.org" xmlns"http//www.book
s.org" elementFormDefault"q
ualified"gt ltxsdelement name"BookStore"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Book"
minOccurs"1" maxOccurs"unbounded"/gt
lt/xsdsequencegt lt/xsdcomplexTypegt
lt/xsdelementgt ltxsdelement name"Book"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Title"
minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Author" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Date" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"ISBN" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Publisher" minOccurs"1" maxOccurs"1"/gt
lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Title" type"xsdstring"/gt
ltxsdelement name"Author" type"xsdstring"/gt
ltxsdelement name"Date" type"xsdstring"/gt
ltxsdelement name"ISBN" type"xsdstring"/gt
ltxsdelement name"Publisher" type"xsdstring"/gt
lt/xsdschemagt
lt!ELEMENT BookStore (Book)gt
lt!ELEMENT Book (Title, Author, Date,
ISBN, Publisher)gt
lt!ELEMENT Title (PCDATA)gt lt!ELEMENT Author
(PCDATA)gt lt!ELEMENT Date (PCDATA)gt lt!ELEMENT
ISBN (PCDATA)gt lt!ELEMENT Publisher (PCDATA)gt
45
lt?xml version"1.0"?gt ltxsdschema
xmlnsxsd"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.book
s.org" xmlns"http//www.bo
oks.org"
elementFormDefault"qualified"gt ltxsdelement
name"BookStore"gt ltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement
ref"Book" minOccurs"1" maxOccurs"unbounded"/gt
lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Book"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Title" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Author" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Date" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"ISBN" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Publisher" minOccurs"1"
maxOccurs"1"/gt lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Title" type"xsdstring"/gt
ltxsdelement name"Author" type"xsdstring"/gt
ltxsdelement name"Date" type"xsdstring"/gt
ltxsdelement name"ISBN" type"xsdstring"/gt
ltxsdelement name"Publisher" type"xsdstring"/gt
lt/xsdschemagt
All XML Schemas have "schema" as the root element.
46
lt?xml version"1.0"?gt ltxsdschema
xmlnsxsd"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.book
s.org" xmlns"http//www.bo
oks.org"
elementFormDefault"qualified"gt ltxsdelement
name"BookStore"gt ltxsdcomplexTypegt
ltxsdsequencegt ltxsdelement
ref"Book" minOccurs"1" maxOccurs"unbounded"/gt
lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Book"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Title" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Author" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Date" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"ISBN" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Publisher" minOccurs"1"
maxOccurs"1"/gt lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Title" type"xsdstring"/gt
ltxsdelement name"Author" type"xsdstring"/gt
ltxsdelement name"Date" type"xsdstring"/gt
ltxsdelement name"ISBN" type"xsdstring"/gt
ltxsdelement name"Publisher" type"xsdstring"/gt
lt/xsdschemagt
The elements and datatypes that are used to
construct schemas - schema - element -
complexType - sequence - string come from the
http///XMLSchema namespace
47
XMLSchema Namespace
http//www.w3.org/2001/XMLSchema
complexType
element
sequence
schema
boolean
string
integer
48
lt?xml version"1.0"?gt ltxsdschema
xmlnsxsd"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.books
.org" xmlns"http//www.book
s.org" elementFormDefault"q
ualified"gt ltxsdelement name"BookStore"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Book"
minOccurs"1" maxOccurs"unbounded"/gt
lt/xsdsequencegt lt/xsdcomplexTypegt
lt/xsdelementgt ltxsdelement name"Book"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Title"
minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Author" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Date" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"ISBN" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Publisher" minOccurs"1" maxOccurs"1"/gt
lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Title" type"xsdstring"/gt
ltxsdelement name"Author" type"xsdstring"/gt
ltxsdelement name"Date" type"xsdstring"/gt
ltxsdelement name"ISBN" type"xsdstring"/gt
ltxsdelement name"Publisher" type"xsdstring"/gt
lt/xsdschemagt
Says that the elements defined by this schema -
BookStore - Book - Title - Author - Date
- ISBN - Publisher are to go in
this namespace
49
Book Namespace (targetNamespace)
http//www.books.org (targetNamespace)
BookStore
Author
Book
Title
ISBN
Publisher
Date
50
lt?xml version"1.0"?gt ltxsdschema
xmlnsxsd"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.books
.org" xmlns"http//www.book
s.org" elementFormDefault"q
ualified"gt ltxsdelement name"BookStore"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Book"
minOccurs"1" maxOccurs"unbounded"/gt
lt/xsdsequencegt lt/xsdcomplexTypegt
lt/xsdelementgt ltxsdelement name"Book"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Title"
minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Author" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Date" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"ISBN" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Publisher" minOccurs"1" maxOccurs"1"/gt
lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Title" type"xsdstring"/gt
ltxsdelement name"Author" type"xsdstring"/gt
ltxsdelement name"Date" type"xsdstring"/gt
ltxsdelement name"ISBN" type"xsdstring"/gt
ltxsdelement name"Publisher" type"xsdstring"/gt
lt/xsdschemagt
The default namespace Is http//www.books.org whi
ch is the targetNamespace!
This is referencing a Book element
declaration. The Book in what namespace? Since
there is no namespace qualifier it is referencing
the Book element in the default namespace, which
is the targetNamespace! Thus, this is a
reference to the Book element declaration in this
schema.
51
lt?xml version"1.0"?gt ltxsdschema
xmlnsxsd"http//www.w3.org/2001/XMLSchema"
targetNamespace"http//www.books
.org" xmlns"http//www.book
s.org" elementFormDefault"q
ualified"gt ltxsdelement name"BookStore"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Book"
minOccurs"1" maxOccurs"unbounded"/gt
lt/xsdsequencegt lt/xsdcomplexTypegt
lt/xsdelementgt ltxsdelement name"Book"gt
ltxsdcomplexTypegt ltxsdsequencegt
ltxsdelement ref"Title"
minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"Author" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Date" minOccurs"1" maxOccurs"1"/gt
ltxsdelement ref"ISBN" minOccurs"1"
maxOccurs"1"/gt ltxsdelement
ref"Publisher" minOccurs"1" maxOccurs"1"/gt
lt/xsdsequencegt
lt/xsdcomplexTypegt lt/xsdelementgt
ltxsdelement name"Title" type"xsdstring"/gt
ltxsdelement name"Author" type"xsdstring"/gt
ltxsdelement name"Date" type"xsdstring"/gt
ltxsdelement name"ISBN" type"xsdstring"/gt
ltxsdelement name"Publisher" type"xsdstring"/gt
lt/xsdschemagt
This is a directive to any instance documents
which conform to this schema Any elements used
by the instance document which were declared in
this schema must be namespace qualified.
52
Referencing a schema in an XML instance document
lt?xml version"1.0"?gt ltBookStore xmlns
"http//www.books.org"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce" xsischemaLocation"http/
/www.books.org
BookStore.xsd"gt
ltBookgt ltTitlegtMy Life and
Timeslt/Titlegt ltAuthorgtPaul
McCartneylt/Authorgt ltDategtJuly,
1998lt/Dategt ltISBNgt94303-12021-4389
2lt/ISBNgt ltPublishergtMcMillin
Publishinglt/Publishergt lt/Bookgt
... lt/BookStoregt
1
3
2
1. First, using a default namespace declaration,
tell the schema-validator that all of the
elements used in this instance document come from
the http//www.books.org namespace. 2. Second,
with schemaLocation tell the schema-validator
that the http//www.books.org namespace is
defined by BookStore.xsd (i.e., schemaLocation
contains a pair of values). 3. Third, tell the
schema-validator that the schemaLocation
attribute we are using is the one in the
XMLSchema-instance namespace.
53
XMLSchema-instance Namespace
http//www.w3.org/2001/XMLSchema-instance
schemaLocation
type
noNamespaceSchemaLocation
nil
54
Referencing a schema in an XML instance document
targetNamespace"http//www.books.org"
schemaLocation"http//www.books.org
BookStore.xsd"
BookStore.xsd
BookStore.xml
- uses elements from namespace
http//www.books.org
- defines elements in namespace
http//www.books.org
A schema defines a new vocabulary. Instance
documents use that new vocabulary.
55
Note multiple levels of checking
BookStore.xml
BookStore.xsd
XMLSchema.xsd (schema-for-schemas)
Validate that the xml document conforms to the
rules described in BookStore.xsd
Validate that BookStore.xsd is a valid schema
document, i.e., it conforms to the rules
described in the schema-for-schemas
56
Default Value for minOccurs and maxOccurs
  • The default value for minOccurs is "1"
  • The default value for maxOccurs is "1"

ltxsdelement ref"Title" minOccurs"1"
maxOccurs"1"/gt
Equivalent!
ltxsdelement ref"Title"/gt
57
Much More to XMLSchema!
  • This was an overview of some basics
  • There are many other features, such as
  • The ability to import other schemas or parts of
    schemas
  • Ability to specify many data types
  • Etc.
  • XMLSchema definitions are at W3C
  • http//www.w3.org/TR/xmlschema-0/ is a good
    place to start

58
Lecture Overview
  • Review
  • XML and Document Engineering
  • Metadata And Markup
  • XML As A Metadata Lingua Franca
  • METS
  • SGML vs. XML DTD Construction
  • XML Schemas
  • XML For Protocols And Metadata Languages
  • Readings/Discussion

59
Other Protocols and Metadata Systems Using XML
  • SOAP (Simple Object Access Protocol)
  • DAV/DASL (Distributed Authoring and Versioning)
  • SDLIP (Simple Digital Library Interoperability
    Protocol)
  • RDF (Resource Description Framework)
  • ADL Gazetteer Protocol
  • OAI-MHP (already discussed)
  • MPEG-7 (more next time)
  • METS
  • Also versions of MARC and other formats in XML

60
SGML and XML Sources and Resources
  • Books
  • van Herwijnen, Eric. Practical SGML. (2nd Ed.)
    Boston Kluwer Academic Publishers, 1994.
  • Goldfarb, Charles F. The SGML Handbook. Oxford
    Clarenden Press, 1990. (and MANY XML books)
  • Web Sites
  • The W3C web site (all XML standards documents)
  • http//www.w3.org
  • Robin Covers SGML/XML Site
  • http//www.oasis-open.org/cover/sgml-xml.html

61
Lecture Overview
  • Review
  • XML and Document Engineering
  • Metadata And Markup
  • XML As A Metadata Lingua Franca
  • METS
  • SGML vs. XML DTD Construction
  • XML Schemas
  • XML For Protocols And Metadata Languages
  • Readings/Discussion

62
Discussion Vam Makam
  • Kirk covers examples of DTDs for books and
    newspapers. Many individuals and corporations
    have been creating numerous DTDs for themselves
    and general purposes. What are some innovative
    and useful ideas for areas where designing DTDs
    might be useful? For ideas that may have already
    been thought of, how could they be improved or
    extended?

63
Discussion Vam Makam
  • However, recent XML DTDs have emerged, newer
    ideas such as XML schemas have presented
    themselves as a better option. Given the thought
    process and work gone into designing existing
    DTDs, at what point is it worth modifying an
    existing DTD to an XML schema?
  • Now that you have learned how to design a dtd and
    have basic knowledge about XML, what are some
    existing technologies that combined with XML
    become more useful?

64
Discussion Annie Yeh
  • Kirk addresses the advantages of using external
    DTDs, the reusability of public DTDs, the ability
    to focus on content rather than structure, easier
    management or multiple documents, and easier data
    error checking. What are some of the existing
    repositories in which we can store these DTDs?
    What are some of the ways with which we can
    facilitate this process? What are their pros and
    cons? What are some of the more ideal interfaces
    with which to facilitate this?

65
Discussion Annie Yeh
  • What are the differences between DTDs and
    Schemas, and what are the pros and cons of each?

66
Next Time
  • Metadata for Motion Pictures MPEG-7
  • Readings/Discussion
  • MPEG-7 (Part 1) (J. M. Martinez, R. Koenen, F.
    Pereira)
  • MPEG-7 (Part 2) (J. Martinez)

67
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com