XML and - PowerPoint PPT Presentation

About This Presentation
Title:

XML and

Description:

Web sites. collections. services. physical places. people. abstract 'works' ... music XML and 'meta-tagging', BECTa Pathfinders, Coventry, 26 Feb 2002. 32 ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 49
Provided by: petejo
Category:
Tags: xml | cat | download | how | make | music | sites | to | tree

less

Transcript and Presenter's Notes

Title: XML and


1
XML and meta-tagging Technical seminar for
Pathfinder LEAs, BECTa, Coventry, 26 February
2002
Email p.johnston_at_ukoln.ac.uk URL http//www.ukoln.
ac.uk/
  • Pete Johnston
  • UKOLN, University of Bath
  • Bath, BA2 7AY

UKOLN is supported by
2
XML and meta-tagging
  • What is metadata what is it used for?
  • Sharing metadata
  • semantics introducing the Dublin Core
  • syntax introducing the Extensible Markup
    Language (XML)
  • structure the limits of XML
  • Introducing the Resource Description Framework
    (RDF)

3
What is metadata?
  • Data associated with objects which relieves
    their potential users of having to have full
    advance knowledge of their existence or
    characteristics. A user might be a program or a
    person.
  • Dempsey and Heery, 1998
  • Machine understandable information about web
    resources or other things.
  • Berners-Lee, 1997
  • Structured data about resources that can be used
    to help support a wide range of operations

4
What resources, objects, things?
  • HTML documents
  • digital images
  • databases
  • books
  • museum objects
  • archival records
  • metadata records
  • Web sites
  • collections
  • services
  • physical places
  • people
  • abstract works
  • concepts
  • events

5
Who/what is metadata for?
  • Used by
  • human agents (owner, user/researcher, 3rd party
    services)
  • software agents (e.g. aggregators, portals,
    brokers)
  • Different flavours of metadata serve different
    purposes
  • simple, generic vs. rich, specific
  • published widely vs. shared within community vs.
    used by resource owner/manager
  • Created by
  • software tools (resource creation tools, indexing
    robots/web crawlers)
  • human agents (resource creator/owner, other
    parties)

6
Metadata embedded in resource
e.g. meta elements in HTML docs summary
properties in word processor docs Can resource
support embedding of metadata? Does metadata
creator have write access to resource? Can
service extract embedded metadata? Metadata about
aggregates of resources? Metadata about people,
places, concepts?
7
Metadata linked from resource
e.g. link elements in HTML docs Metadata record
may be remote from resource Can resource support
embedding of link? Does metadata creator have
write access to resource? Can service follow link
to metadata record? Metadata about aggregates of
resources? Metadata about people, places,
concepts?
8
Metadata points to resource
e.g. most metadata records Metadata record may
be remote from resource Does not require
embedding of metadata or link Does not require
metadata creator to have write access to
resource Service obtains metadata record
independently of resource Metadata record can
describe anything (with identifier) Metadata
record may persist after resource deleted
9
Metadata managed in database
Metadata content stored in database, exposed in
form(s) appropriate for service(s)
10
What operations?
Owner / manager / provider establish control of own resourcesadminister/manage (through time)disclose/promote own resources widelyenable and control access/usecontextualise
Other metadata creator disclose/promote resources (including resources owned by others) re-contextualise (re-describe, annotate)
Discovery service disclose/promote resources from range of providers re-contextualise (re-describe, annotate)facilitate user discovery
End user find, identify, select resources from range of providers obtain/use interpret
11
Resources
Single resource provider
Resource owner Metadata creator Service
provider
12
Multiple resource owners/Metadata creators/Local
service providers Separate portal service provider
Multiple resource providers
13
Multiple resource owners/Metadata creators/Local
service providers Other metadata
creators Separate portal service provider
14
(No Transcript)
15
Metadata for resource discovery
  • Metadata for resource discovery
  • is used beyond its creator community
  • is combined/compared with metadata from other
    communities
  • is aggregated or cross-searched by services
  • Challenges of interoperability
  • How does a metadata provider make metadata
    records available in a commonly understood form?
  • (How does a service provider obtain these
    metadata records from data providers?)

16
How is metadata shared?
  • Metadata as language metadata records as sets of
    statements
  • Effective transmission of information requires
    agreement on
  • semantics
  • what terms mean
  • e.g. cat, to sit, mat
  • structure
  • significance of arrangement of terms
  • e.g. sentence subject -gt verb -gt object (in
    English.)
  • syntax
  • rules of expression
  • The cat sat on the mat.

17
Sharing metadata semanticsIntroducing the
Dublin Core
18
Introducing the Dublin Core
  • Initiative to improve resource discovery on Web
  • not for complex resource description
  • simple document-like objects
  • extended to other classes of resource
  • Interdisciplinary consensus on simple element set
  • 15 elements
  • all optional
  • all repeatable

19
Introducing the Dublin Core (2)
  • Title
  • Subject
  • Description
  • Creator
  • Publisher
  • Contributor
  • Date
  • Type
  • Format
  • Identifier
  • Source
  • Language
  • Relation
  • Coverage
  • Rights

20
Introducing the Dublin Core (3)
  • Simplicity of semantics, ease of use
  • Provides basic semantic interoperability
  • across domains
  • across language communities
  • Allows for extensibility
  • but tension between extending DC and choosing
    other, richer schema
  • Interoperability requires
  • use of content rules/standards
  • clarity about resource being described
  • e.g. digital surrogate v physical original

21
Using the Dublin Core
  • Not a replacement for richer descriptive
    standards
  • A pidgin language for use by tourists on the
    Internet commons
  • Tom Baker, A Grammar of Dublin Core
  • Can provide 15 windows into richer resource
    descriptions
  • disclose rich description in simple form
  • semantic cross-walks, mappings
  • (if you have rich descriptions, then) export
    rather than create?

22
Sharing metadata syntaxIntroducing XML
23
Introducing XML
  • Extensible Markup Language
  • Recommendation of W3C, 1998, 2000
  • Defines means of describing tree-structured data
    in text-based format
  • embedded markup delimits and describes data
  • Meta-language
  • language for describing markup languages
  • can define unlimited number of markup languages
  • Widely adopted for transferring data between
    programs, systems

24
Introducing XML (2)
  • Simple syntax
  • Rules of XML made public so any programmer can
    write parser
  • Many parsers available for application developer
  • reusable software components
  • standard programming interfaces
  • Data independent of platform
  • Support from major software vendors
  • use of XML increasingly invisible to user
  • Foundation for Web services
  • distributed applications invoked over Web

25
Creator
Date
Title
Doc
J Smith
2001-11-05
Report
1
lttablegt ltrecordgt ltdocgt1lt/docgt ltcreatorgtJ
Smithlt/textgt ltdategt2001-11-05lt/dategt lttitlegtReport
lt/titlegt lt/recordgt lt/tablegt
26
Creator
Date
Title
Doc
Serialisation
ltrecordgt ... lt/recordgt
Transmission
ltrecordgt ... lt/recordgt
Remote application
De-serialisation
27
XML document types vocabularies
  • XML lets me make up names for element types!
    Great!
  • But.
  • XML says nothing about what your names mean
  • will a human recipient of your document recognise
    your ltlevelgt element?
  • will a software agent process your ltlevelgt
    element correctly?
  • Communication requires consensus on
  • structural model of class of document/data
  • labelling of components
  • semantics of components
  • Shared use of common XML vocabularies

28
XML DTDs, XML Schemas
  • Means to codify syntax rules of vocabulary
  • what markup is allowed
  • structural constraints on use of markup
  • N.B. say nothing about what markup means
  • Document Type Definition
  • part of XML Recommendation
  • W3C XML Schema
  • recent W3C recommendation
  • data-typing i.e. tighter control on element
    content
  • support for combining vocabularies
  • uses XML syntax
  • Parser/authoring tool can validate markup of
    instance against rules in DTD or Schema

29
XML namespaces
  • Applications wish to use element from multiple
    vocabularies (DTDs/Schemas)
  • particularly true of metadata applications
  • problems of name collisions
  • XML Namespaces
  • recommendation of W3C
  • provides universal naming mechanism
  • Namespace
  • a collection of names
  • given a name, which has the form of a URI
  • Element type names, attribute names qualified by
    a namespace name (a URI)
  • through use of prefix

30
Sharing metadata structure The limits of XML
31
The problem with XML
  • Statement
  • this resource (song, document, picture... etc!)
    has dccreator Don Van Vliet
  • Multiple expressions in XML
  • ltsong id123gt
  • lttitlegtFrownlandlt/titlegt
  • ltcreatorgtDon Van Vlietlt/creatorgt
  • lt/songgt
  • ltlyric id456 titleFrownlandgt
  • ltcreator nameDon Van Vliet/gt
  • lt/lyricgt
  • ltmusic id789 creatorDon Van Vlietgt
  • lttitle textFrownland/gt
  • lt/musicgt

32
The problem with XML (2)
  • Different communities make different design
    choices for DTDs/XML Schemas
  • all good (and valid)
  • human reader of document can interpret (maybe)
  • program needs prior knowledge of structural
    conventions in each XML schema
  • Within resource description community, meaning(s)
    of structure(s) may be limited
  • Across communities, potentially unlimited
  • not scalable in an open environment
  • how to manage ever increasing set of conventions
  • always encountering unknown schemas

33
The problem with XML (3)
  • XML allows users to add arbitrary structure to
    their documents but says nothing about what the
    structures mean.
  • Berners-Lee, 2001
  • Consensus on syntax
  • use of XML
  • Consensus on semantics of terms
  • meaning of (uniquely named through XML namespace)
    elements/attributes
  • No consensus on meaning of structure
  • e.g. parent-child element relations

34
Introducing RDF
  • Resource Description Framework Model Syntax
  • Recommendation of W3C, 1999
  • Generic architecture for metadata
  • set of conventions for applications exchanging
    metadata
  • allow semantics to be defined by different
    resource description communities
  • accommodate mixing of metadata from diverse
    sources

35
Introducing RDF (2)
  • Defines
  • model for making statements about resources
  • conventions for encoding statements using XML
    syntax
  • Resource any object identified by URI
  • not necessarily accessible via Web
  • Property attribute to describe resource
  • properties also uniquely identified by URI
  • Statement triple of specific resource, named
    property, and value

36
The RDF model
  • A resource has some property whose value is
    either (i) a simple string value (literal)

http//js.org/doc/1
author
John
  • The resource identified by the URI
    http//js.org/doc/1 has a property author whose
    value is John
  • Or, John is the author of the resource
    identified by http//js.org/doc/1

37
The RDF model (2)
  • or (ii) another resource...

http//js.org/doc/1
author
name
email
John
john_at_js.org
  • The value of property author is another
    resource which has a property name with value
    John and a property email with value
    john_at_js.org

38
The RDF model (3)
  • which may itself have a URI

author
http//js.org/doc/1
http//js.org/person/john
name
email
John
john_at_js.org
39
The power of the RDF model
  • Extensible model
  • supports any vocabularies
  • Supports arbitrary complexity of description
  • URIs as unique fixed points to identify
  • resources
  • properties
  • Descriptions created independently can be
    merged using URIs as anchors
  • i.e. supports distributed metadata

40
First source
author
http//js.org/doc/1
http//js.org/person/john
name
email
John
john_at_js.org
41
Second source
http//js.org/doc/1
subject
XML
42
Third source
organisation
http//js.org/person/john
JS Foundation
43
Three descriptions merged
44
The RDF XML syntax
  • XML representation of model
  • to store/exchange descriptions
  • All property names made unique through use of XML
    namespaces
  • Conventions for the meaning of structures in XML
    document
  • Service can know in advance the meaning of
    structures
  • even if unanticipated vocabularies used
  • partial understanding
  • can read multiple descriptions into store and
    merge on URIs
  • Generated by tools. more later!

45
RDF Schema
  • Resource Description Framework Schema
  • Candidate Recommendation of W3C, 2000
  • Provides mechanisms to describe
  • terms used in RDF statements
  • semantic relationships between terms
  • e.g. Dublin Core metadata element set defined
    using RDF(S)
  • Defines type system
  • resources grouped into classes
  • classes related hierarchically (subClassOf)
  • properties related hierarchically (subPropertyOf)
  • use of properties constrained (domain, range)

46
RDF Schema (2)
  • RDF Schema employs RDF model
  • expressible using RDF/XML syntax
  • Other ontology languages building on RDF/RDFS
  • e.g. DAML-OIL
  • describe more complex relations between entities
  • Berners-Lees vision of Semantic Web
  • software agents navigating web of
    machine-processable descriptions and ontologies
  • making inferences about data collected
  • communicating via partial understanding

47
Summary
  • Resource discovery metadata is shared across
    boundaries of domain, sector etc
  • Effective sharing requires consensus on
  • semantics shared vocabularies of uniquely named
    terms
  • syntax XML
  • structure common XML DTD/schema or RDF?
  • Simple RDF model as basis of machine-processable
    statements about resources

48
Acknowledgements
  • UKOLN is funded by Resource the Council for
    Museums, Archives and Libraries, the Joint
    Information Systems Committee (JISC) of the UK
    higher and further education funding councils, as
    well as by project funding from the JISC and the
    European Union. UKOLN also receives support from
    the University of Bath where it is based.
  • http//www.ukoln.ac.uk/
Write a Comment
User Comments (0)
About PowerShow.com