Title: Creating Topic Maps Topic Maps and Knowledge Organization
1Creating Topic Maps Topic Maps and Knowledge
Organization
- Steve Pepper
- pepper.steve_at_gmail.com
- Oslo University College, 2007-09-15
2Course agenda
- Week 37 09-08 Introduction to Topic Maps Part
1 - Week 38 09-15 Creating a topic map
- Week 39 09-22 Introduction to Topic Maps Part
2 - Week 42 10-13 Ontology-driven editing
- Week 43 10-20 The machinery of Topic Maps
- Week 46 11-10 (Semantic Web)
- Week 48 11-24 (Ontologies)
- Terminology
- Topic Maps The technology and the standard
- topic maps The artefacts (documents) we create
3Todays agenda
- Quick recap basic concepts and building blocks
- Topic Maps and Knowledge Organization
- Metadata, taxonomies, thesauri, faceted
classification - Interchange syntaxes
- XTM, LTM and CTM
- Demo Creating a topic map using LTM
- Pay close attention...
4Recap Core concepts
5Recap Basic building blocks
- Basic building blocks are
- Topics e.g. Puccini, Lucca, Tosca
- Associations e.g. Puccini was born in Lucca
- Occurrences e.g. http//www.opera.net/puccini/bi
o.htmlis a biography of Puccini - Each of these constructs can be typed
- Topic types composer, city, opera
- Association types born in, composed by
- Occurrence types biography, street map,
synopsis
6Topic Maps and Knowledge Organization
- Keywords controlled vocabularies
- Taxonomies, thesauri classifications
- Indexes glossaries
- Ontologies
7Bibliographic languages
- Work language
- Author language
- Title language
- Edition language
- Subject language
- Classification language
- Index language
- Document language
- Production language
- Carrier language
- Location language
- Svenonius, Elaine (2000)The Intellectual
Foundation of Information Organization.Cambridge,
MA MIT Press (p.54)
- Work languages
- Work languages describe information entities,
their intellectual (as opposed to physical)
attributes, and relationships among them. (p.87) - Document languages
- A document is a particular space-time embodiment
of information a document language describes and
provides access to this embodiment. (p.107) - Subject languages
- A subject language is used to depict what a
document is about. (p.127)
8Two perspectives
- Works have tended to be conflated with documents
- So in practice there have been two kinds of
language - Document languages
- describe the work and its manifestations
- document-centric (or resource-centric), e.g.
- document metadata (Dublin Core)
- bibliographic records (MARC)
- Subject languages
- describe the subject space in which the work
exists - subject-centric, e.g.
- thesauri, taxonomies (ICD)
- classification schemes (LCSH, DDC)
- faceted classification (Colon)
9Metadata
- Data about data
- Information about documents
- e.g. author, title, publisher, date, format,
keywords - Useful for managing the content
- Especially suitable for librarians
- Somewhat useful for searching
- Especially for experts
- Less useful for end-users
- the user starts out wanting to know more about a
subject - traditional metadata, however, focuses on the
document - if aboutness is provided at all, it gets squeezed
into a single field
10Keywords
- Primitive form of subject-based classification
- The keywords are used to describe the subject
- Cheap and simple Folksonomies and tagging.
- But also problematic because authors
- misspell keywrods,
- use different keywords/terms/tags for the same
thing, and - use keywords that make no sense
- Secondary problem
- No way for the user to find out what keywords
have been used - A keyword is a topic name
11Controlled vocabularies
- Solution create a list of legal keywords!
- Requires somewhere to keep the list, and a
process for new terms - Benefits
- Solves problems of misspelling and duplicates
(synonyms) - Disadvantages
- Introduces some overhead (a flat list is
difficult to manage) - Users can still search using the wrong terms
- Users (and authors) still have difficulty finding
terms - A controlled vocabulary is a well-defined set of
topics with one name per topic
12Taxonomies
- Organize the keywords into a tree
- Most general at the top, more specific further
down - Common structure used by Yahoo!, etc.
- The folder metaphor
- file systems, email, favourites
- Requires relationships between terms
- Relationships state that one term is more
specificthan another - Advantage terms somewhat easier to find
- Disadvantage real world does not fit neatly into
a hierarchy - A taxonomy is a set of topics related through a
specific type of hierarchical association
13Thesauri
- Like a taxonomy, but with some extensions
- Also better defined there are ISO standards for
thesauri - Relationship types
- BT Broader term NT Narrower term
- USE Preferred term UF Non-preferred terms
- RT Related term
- SN Scope note
- A thesaurus is a set of topics related through
particular, predefined association types - BT/NT (hierarchical) and RT (untyped,
associative) - (Scope notes are a kind of occurrence)
- (USE and UF represent multiple names for the same
concept/topic)
14Faceted classification
- Invented by S. R. Ranganathan in the 1930s
- Defines a number of facets or dimensions
- Defines a set of terms within each facet
- Sometimes these terms are arranged in a taxonomy
- Documents are classified against each facet
separately - A faceted classification is a collection of topic
hierarchies - Each hierarchy contains topics whose names are
used as terms within a particular facet - XFML An XML interchange syntax for faceted
classification inspired by Topic Maps
15Expressivity progression
open model
- Topic maps
- use any types, properties, and relationships you
like - Faceted classification
- multiple vocabularies, taxonomies or thesauri
(one per facet) - Thesauri
- more formal taxonomy still no topic types two
association types - Taxonomy
- terms arranged in a hierarchy no topic types
single association type - Controlled vocabulary, folksonomies
- just a list of terms no topic types no
associations
fixed model
no model
16Document-centric approaches
- Traditional metadata is document-centric
- Provides substantial descriptive power for
documents - Allows connection into subject-based
classification - Crucial for the management of content
- However, users are most interested in the
subjects - Taxonomies, thesauri, and faceted classification
are also document-centric - These are methods for subject-based
classification - They provide hardly any descriptive power for
subjects
17Subject-centric approaches
- Topic maps are subject-centric
- They provide great descriptive power for subjects
- Good as finding aids, because subjects are what
users care about - Documents can be treated as subjects
- This enables topic maps to capture metadata as
well - It also enables topic maps to stitch metadata and
subject-based classification together into one
seamless whole - Topic Maps is the knowledge model par excellence
- A subject-centric knowledge model that
encompasses every other kind of knowledge
organization model - Topic Maps can therefore be used to relate and
combine taxonomies, indexes, thesauri,
classifications, etc. etc.
18Syntaxes
- XTM, LTM and CTM
- What are they?
- When should I use which?
19Topic Maps Syntaxes
- HyTM (HyTime Topic Maps)
- Original syntax, expressed in terms of SGML and
HyTime - No longer part of ISO 13250
- XTM (XML Topic Maps Syntax)
- Later, XML-based syntax, recently moved to
version 2.0 - Easy to understand but very verbose
- LTM (Linear Topic Map Notation)
- Defined by Ontopia in 2001 and supported by other
products - A simple ASCII syntax for rapid prototyping
- CTM (Compact Topic Maps Syntax)
- ISO standard replacement for LTM
- Complete draft exists, but no implementations yet
20Topic Map XTM 1.0 Syntax
lt!ELEMENT topicMap ( topic association
mergeMap ) gt lt!ATTLIST topicMap id
ID IMPLIED xmlns CDATA FIXED
'http//www.topicmaps.org/xtm/1.0/'
xmlnsxlink CDATA FIXED 'http//www.w3.org/1999/
xlink' xmlbase CDATA IMPLIED gt lt?xml
version"1.0" encoding"ISO-8859-1"?gt lttopicMap
xmlns"http//www.topicmaps.org/xtm/1.0/"
xmlnsxlink"http//www.w3.org/1999/xlink" gt
lt!-- topics, associations, and mergeMap elements
go here --gt lt/topicMapgt
21Topic Map LTM Syntax
/ topics, associations, and occurrences go here
/
22Topic XTM 1.0 Syntax
lt!ELEMENT topic ( instanceOf,
subjectIdentity?, ( baseName occurrence )
) gt lt!ATTLIST topic id ID REQUIRED gt ltto
pic id"italy"gt ... lt/topicgt lttopic
id"puccini"gt ... lt/topicgt
23Topic LTM Syntax
topic-id italy puccini
24Topic Name XTM 1.0 Syntax (1 of 2)
lt!ELEMENT baseName ( scope?, baseNameString,
variant ) gt lt!ATTLIST baseName id
ID IMPLIED gt lt!ELEMENT baseNameString
( PCDATA ) gt lt!ATTLIST baseNameString id
ID IMPLIED gt lt!ELEMENT variant
( parameters, variantName?, variant )
gt lt!ATTLIST variant id ID
IMPLIED gt lt!ELEMENT variantName ( resourceRef
resourceData ) gt lt!ATTLIST variantName id
ID IMPLIED gt
25Topic Name XTM 1.0 Syntax (2 of 2)
lttopic id"la-boheme"gt ltbaseNamegt
ltbaseNameStringgtLa Bohèmelt/baseNameStringgt
ltvariantgt ltparametersgt
ltsubjectIndicatorRef xlinkhref"http//www.to
picmaps.org/xtm/1.0/core.xtmsort"/gt
lt/parametersgt ltvariantNamegt
ltresourceDatagtBohème, Lalt/resourceDatagt
lt/variantNamegt lt/variantgt
lt/baseNamegtlt/topicgt
26Topic Name LTM Syntax
topic-id basename sortname?
dispname? la-boheme La Bohème"
"Bohème, La"
27Topic Type XTM 1.0 Syntax
Use ltinstanceOfgt subelement lttopic id"opera"gt
... lt/topicgt lttopic id"tosca"gt ltinstanceOfgt
lttopicRef xlinkhref"opera"/gt
lt/instanceOfgt lt/topicgt lttopic id"boito"gt
ltinstanceOfgt lttopicRef xlinkhref"composer"/
gt lt/instanceOfgt ltinstanceOfgt lttopicRef
xlinkhref"librettist"/gt lt/instanceOfgt lt/topic
gt
28Topic Type LTM Syntax
topic-id topic-type tosca
opera boito composer librettist
29Occurrence XTM 1.0 Syntax
Use ltoccurrencegt subelementexternal/internal
resources ltresourceRefgt or ltresourceDatagt lt!ELEM
ENT occurrence ( instanceOf?, scope?, (
resourceRef resourceData ) ) gt lt!ATTLIST
occurrence id ID IMPLIED gt lttopic
id"la-boheme"gt ltoccurrencegt
ltinstanceOfgtlttopicRef xlinkhref"homepage"/gtlt/in
stanceOfgt ltresourceRef
xlinkhref"http//www.opera.it/Opere/La-Boheme/La
-Boheme.html"/gt lt/occurrencegt ltoccurrencegt
ltinstanceOfgtlttopicRef xlinkhref"premiere-date"/
gtlt/instanceOfgt ltresourceDatagt1896 (1
Feb)lt/resourceDatagt lt/occurrencegtlt/topicgt
30Occurrence LTM Syntax
topic-id, occurrence-type, URL
data la-boheme, homepage,
"http//www.opera.it/Opere/La-Boheme/La-Boheme.htm
l" la-boheme, premiere-date, 1896 (1 Feb)
31Topic Complete XTM 1.0 Syntax
lttopic id"la-boheme"gt ltinstanceOfgtlttopicRef
xlinkhref"opera"/gtlt/instanceOfgt ltbaseNamegt
ltbaseNameStringgtLa Bohèmelt/baseNameStringgt
ltvariantgt ltparametersgt
ltsubjectIndicatorRef xlinkhref"http//
www.topicmaps.org/xtm/1.0/core.xtmsort"/gt
lt/parametersgt ltvariantNamegtltresourceDatagtBoh
eme, Lalt/resourceDatagtlt/variantNamegt
lt/variantgt lt/baseNamegt ltoccurrencegt
ltinstanceOfgtlttopicRef xlinkhref"homepage"/gtlt/in
stanceOfgt ltresourceRef
xlinkhref"http//www.opera.it/Opere/La-Boheme/La
-Boheme.html"/gt lt/occurrencegt ltoccurrencegt
ltinstanceOfgtlttopicRef xlinkhref"premiere-date"
/gtlt/instanceOfgt ltresourceDatagt1896 (1
Feb)lt/resourceDatagt lt/occurrencegt lt/topicgt
32Topic Complete LTM Syntax
la-boheme opera "La Bohème" "Boheme, La
la-boheme, homepage, "http//www.opera.it/O
pere/La-Boheme/La-Boheme.html" la-boheme,
premiere-date, 1896 (1 Feb)
33Association XTM 1.0 Syntax
lt!ELEMENT association (instanceOf?, scope? ,
member)gtlt!ATTLIST association id ID
REQUIREDgtlt!ELEMENT member (roleSpec?, (topicRef
...)) gt lt!ATTLIST member id ID
IMPLIEDgtlt!ELEMENT roleSpec (topicRef ...)
gt ltassociationgt ltinstanceOfgtlttopicRef
xlinkhref"composed-by"/gtlt/instanceOfgt
ltmembergt ltroleSpecgtlttopicRef
xlinkhref"composer"/gtlt/roleSpecgt lttopicRef
xlinkhref"puccini"/gt lt/membergt ltmembergt
ltroleSpecgtlttopicRef xlinkhref"work"/gtlt/roleSpe
cgt lttopicRef xlinkhref"tosca"/gt
lt/membergt lt/associationgt
34Association LTM Syntax
assoc-type ( role-player, role-player, ...
) composed-by( puccini , tosca ) Note 1
There can be more than two role-players in an
association. Well talk about that next
week. Note 2 The above is an oversimplification
due to the fact that we have not yet talked about
role types. Well do that next week. The exact
syntax should be as follows assoc-type (
role-player role-type, role-player
role-type, ... ) composed-by( puccini
composer, tosca work ) When omitted, the role
type will be assumed to be identical to the type
of the role-playing topic. This can be a useful
short-hand and we will use it for now, but it is
not always what you want...
35Subject Identity XTM 1.0 Syntax
lt!ELEMENT topic (instanceOf, subjectIdentity?,...
)gt lt!ELEMENT subjectIdentity (resourceRef?,
(topicRef subjectIndicatorRef)) gt lt! Refer
to a resource as subject --gt lttopic id"foo"gt
ltsubjectIdentitygt ltresourceRef
xlinkhref"http//www.ontopia.net"/gt
lt/subjectIdentitygt ltbaseNamegt
ltbaseNameStringgtThe Ontopia Websitelt/baseNameStrin
ggt lt/baseNamegt lt/topicgt lt! Refer to a subject
indicator --gt lttopic id"bar"gt
ltsubjectIdentitygt ltsubjectIndicatorRef
xlinkhref"http//www.ontopia.net/about.html"/gt
lt/subjectIdentitygt ltbaseNamegt
ltbaseNameStringgtOntopialt/baseNameStringgt
lt/baseNamegt lt/topicgt
36Subject Identity LTM Syntax
topic-id names subject-address-URL topic-id
names _at_subject-indicator-URL / Refer to a
resource as subject / foo "The Ontopia
Website" "http//www.ontopia.net" / Refer to
a subject indicator / bar "Ontopia"
_at_"http//www.ontopia.net/about.html"
37Scope XTM 1.0 Syntax
lt!-- "scope" subelements on baseName, occurrence,
and association (also "parameters" on
variantName) --gt lttopic id"composed-by"gt
ltbaseNamegt ltbaseNameStringgtcomposed
bylt/baseNameStringgt lt/baseNamegt ltbaseNamegt
ltscopegtlttopicRef xlinkhref"composer"/gtlt/scopegt
ltbaseNameStringgtcomposer oflt/baseNameStringgt
lt/baseNamegt lt/topicgt lttopic id"la-boheme2"gt
ltbaseNamegt ltbaseNameStringgtLa Bohème
(Leoncavallo)lt/baseNameStringgt lt/baseNamegt
ltbaseNamegt ltscopegtlttopicRef
xlinkhref"leoncavallo"/gtlt/scopegt
ltbaseNameStringgtLa Bohèmelt/baseNameStringgt
lt/baseNamegt lt/topicgt
38Scope LTM syntax
(name or occurrence or association) /
scoping-topic(s) born-in "composed by"
"composer of" / composer la-boheme1
"La Bohème (Puccini)" "La Bohème" /
puccini la-boheme2 "La Bohème
(Leoncavallo)" "La Bohème" /
leoncavallo
39Demo Creating a topic map
40Home assignment
- Prerequisites
- You have installed Java and the OKS Samplers
- You know the basics of LTM
- http//www.ontopia.net/download/ltm.html
- Create your first topic map
- Decide what domain you want to cover
- Write LTM in a text editor (Notepad, TextPad,
emacs, ...) - Keep it in its own directory
- Copy to .../apache-tomcat/webapps/omnigator/WEB-IN
F/topicmaps for testing in the Omnigator - Use Reload function
41Your own topic map
- Choose something that really interests you
- Its much more fun than something boring!
- Some ideas
- Sport (football, cricket, ...)
- Culture (music, film, literature, theatre, ...)
- Study courses
- Project management
- Conference website
- Languages
- Geography
- This first topic map is your own personal one
- The next one will be a group project for term
assessment - Requirements
- Minimum 4 topic types, 4 association types, 4
occurrence types - Minimum 10 topics, 20 associations, 10
occurrences - Send to pepper.steve_at_gmail.com by Monday 29
September
42Next lecture
- Monday September 22
- Same time, same place
- Agenda
- Advanced features (roles, scope, identity,
reification) - Help with home assignment