Title: Topic Maps
1Topic Maps
- Vijay Raghavan
- Distinguished Professor
- University of Louisiana at Lafayette -ULL
- The Center for Advanced Computer Studies -CACS
2What are Topic Maps?
- An ontology technology
- based on a model of concepts and relations
- not hierarchical a network model
- starts from what the information is about
- Designed for information and knowledge
organization - based on the concepts from back-of-book indexes
- used to organize large sets of information
resources - Also used for information integration
- identity model used for automated merging
- information can be gathered from many different
sources and integrated - very powerful for portal integration
3The 2-Layer Topic Map Model
- The core concepts of Topic Maps are based on
those of the back-of-book index - The same basic concepts have been extended and
generalized for use with - digital information
- Envisage a 2-layer data model consisting of
- a set of information resources (below), and a
knowledge map (above) - This is like the division of a book
4(1) The Information Layer
- The lower layer contains the content
- usually digital, but need not be
- can be in any format or notation
- can be text, graphics, video, audio, etc.
- This is like the content of the book to which the
back-of-book index refers
5(2) The Knowledge Layer
- The upper layer consists of topics and
associations - Topics represent the subjects that the
information is about - Like the list of topics that forms a back-of-book
index - Associations represent relationships between
those subjects - Like see also relationships in a back-of-book
index
6(2) The Knowledge Layer Example
7Linking the Layers Through Occurrences
- The two layers are linked
- together
- Occurrences are relationships with information
resources that are pertinent to a given subject - The links (or locators) are like page numbers in
a back-of-book index
8Types
- In topic maps topics can be typed, which provides
considerable power for describing the world from
which the topics are taken. -
- This is a capability is missing from traditional
classification techniques. - Using this, one could create types and assign
them to topics, and thus say that "topic maps"
is-a "technology, "Norwegian" is-a "language",
and "TMQL" is-a "query language", and so on. - This is a very simple capability, but it is also
very powerful.
9Types
- Once explicit types have been provided it is
possible to let the user perform searches such as
"find 'paris', but show only 'places'", or to
show lists of all cities, separate from other
kinds of subjects. - Without types there is no way to do this, since
the necessary information will be missing. - Since the types are themselves topics the creator
of the topic map can choose which types to use - As a result, the model is infinitely extensible
and adaptable and can capture just about any kind
of information.
10Additional features Scope
- Scope can be attached to any name, occurrence, or
association in a topic map. Basically, scope can
be attached to anything you can say in a topic
map. Scope allows you to qualify a statement, but
still express it. - Users can then choose to see all information in
all scopes, or only those in particular scopes,
basically tailoring their view of the world as
they want to see it.
11Additional features Scope
- Example
-
- If we have topic map about languages, and basing
it on the ISO 639 and Ethnologue lists of
language codes. - We might want to record that ISO 639 assigns
English to the Germanic language group, while
Ethnologue considers it a West Germanic language. -
- This can be done by scoping the association
between English and Germanic with a topic
representing ISO 639, and the association between
English and West Germanic with a topic
representing Ethnologue. - Similarly, one might use scope to record that
what Ethnologue calls Maldivian, ISO 639 calls
Divehi.
12Additional features URI
- URIs are used to identify subjects.
-
- A topic may have any number of subject
identifiers (URIs) which identify the subject the
topic is about. - These URIs should point to resources which
describe the subject to a human the resources
are known as subject indicators. - This allows subjects to be uniquely identified
across topic maps and the entire web. For
example, the URI http//www.topicmaps.org/xtm/1.0/
core.xtmsuperclass-subclass uniquely identifies
the subclassing association type.
13Additional features Merge
- This unambiguous identification of subjects is
used in topic maps to merge topics that, through
these identifiers, are known to have the same
subject. - Two topics with the same subject are replaced by
a new topic that has the union of the
characteristics (names, occurrences, and
associations) of the two originals. There is in
fact a well-defined procedure for automatically
merging topic maps based on this rule. - The combination of globally unique identifiers
and the merging procedure makes integration of
diverse information sources and reuse of
information very much easier.
14OASIS
- OASIS is a published subjects activity, which is
developing guidelines for how to create, publish,
and maintain subject indicators intended for wide
usage. - One example of this is well-known URIs for all
the countries in the world (based on the ISO 3166
country codes), which will allow us to tell that
when you say 'Norway' in one topic map and I say
'Norge' in another, we mean the same thing. More
on this in a future article.
15Use case forskning.no
- Norwegian government portal to popular science
and research information - basically an online popular science journal
- owned by the Norwegian Research Council
- name means research.no
- Purpose
- present science and research information to young
adults - intended to raise interest and recruitment
16The Dual Classification
17Topic maps Standards
- Topic maps are an ISO standard, published as
ISO/IEC 13250 in 2000. - That standard defines the basic model and an
SGML-based syntax for it, which uses HyTime for
linking, and is therefore known as HyTM. - TopicMaps.Org is an organization that was formed
to create a more web-optimized topic map syntax
based on XML and URIs.
18Topic maps Standards
- TopicMaps.Org published its XML Topic Maps (XTM)
1.0 specification in early 2001, and in October
of the same year that syntax was accepted into
the second edition of ISO 13250 as an annex. - Today, XTM is the main topic map syntax and is
supported by nearly all topic map tools.
19How to create a topic map?
- For creating a topic map there are four main
approaches - Have humans author the topic maps manually. This
usually gives very high-quality and rich topic
maps, but at the cost of human labor. This is
appropriate for some projects, while
prohibitively expensive for others. - Automatically generate the topic map from
existing source data. This can give very good
results if the existing data are well-structured
. - if not, there are various natural-language
processing tools that might help.
20Topic Maps Construction Approaches
- Automatically produce the topic map from
structured source data like XML, RDBMSs, LDAP
servers, and more specialized applications. - To produce a topic map we need a text editor,
and for automatic generation XSLT stylesheets can
be used perfectly well. This won't be enough for
all uses, of course, and therefore there is
specialized software for topic map editing and
automatic generation of topic maps.
21Topic Maps Construction
22Functionalities of the Construction Process
- A TM building approach must include the following
functionalities - Defining resources identifying resource types,
adding, deleting, modifying, and merging
resources. - Identifying and maintaining concepts/topics.
- Identifying and maintaining relationships/associat
ions between topics and relationship instances.
23Functionalities of the Construction Process..
- Defining different views on a Topic Map including
selected topics, relationships, and/or resources. - Storing Topic Maps persistently either in
standard XTM files or in databases. - Merging Topic Maps.
- Importing/exporting Topic Maps.
- Including external resources
- Providing a user interface for search and
navigation in the TM - Evaluating and validating the resulting Topic
Map.
24TM construction approaches
- Topic Maps Building from Existing Data Sources
- TM construction from structured document content
such as XML documents and web pages. - from document metadata (RDF documents)
- From structured knowledge such as ontologies,
existing database schema, learning repositories - from unstructured documents.
25TM construction from structured document content
- This approach intends to extract knowledge from
web sites to help users find relevant information
on the Web using clustering (unsupervised
learning) techniques. - The process starts by defining the profile of a
TM (and later applying it to Web sites). Which
characterize Topic Maps and help evaluate their
relevance to users' information needs. - Second, the analysis identifies topics that have
no interest
26from document metadata
- This aims to develop a framework and toolkit for
auto-generating topic maps, called MapMaker, it
consists of a set of configurable processing
modules - which are chained together according to the needs
of each individual auto-generation application. - The different modules have access to an RDF model
that is constructed during processing. - The RDF model is cleaned and extended, finally
converted to a topic map.
27From structured knowledge
- The architecture of the repository includes
wrappers created to convert disperse knowledge
structures into an integrated XML schema used in
the repository - the repository is implemented as a relational
database (using MYSQL), an XML-enabled
application server, a customized XML schema for
Topic Maps, a set of XML stylesheets for
transforming and displaying topic maps, a set of
Java servlets and jsp programs to generate XML
files dynamically from the database,
28from unstructured documents
- The proposed approaches are based on different
extraction techniques, namely learning techniques
and Natural Language Processing techniques. - This approach has focused on Inductive Natural
Language Processing techniques to construct a
Navigable Topic Map adapted to different users'
viewpoints from free textual documents. - In fact, by keeping track of words' association
patterns, the system detect fluctuations in
words' meanings which can reveal different points
of view.
29Putting Topic Maps in Context
- Topic maps are really an add-on to XML, something
that adds extra value beyond what XML itself can
do. -
- Topic maps can be used without using XML at all.
- The two standards are similar without competing.
They both have data models, interchange syntaxes,
query languages, schema languages, and so on. - Being developed for different purposes and doing
different things. they can peacefully coexist and
complement one another.
30Putting Topic Maps in Context
- The relationship between RDF and topic maps is
less obvious, however. Structurally, they are
very similar, and their semantics are very close,
although the distinctions in topic maps between
base names, occurrences, and associations do not
exist in RDF. - At first glance it may appear that they are
nearly the same, but on closer inspection it
turns out that their respective communities think
of the technologies in very different ways, and
that features such as scope and merging actually
make them rather different after all. Again, the
conclusion seems to be that they are good for
different things, and that there is room for
both.
31What should be used where?
- Generally, use XML for interchange and document
contents, RDF for fine-grained metadata, and
topic maps for making information findable and
anything that is mostly about relationships
32Benefits of topic maps
- A simple, intuitive model
- easy to teach to people, easy to apply
- focus on findability subjects and their
relations - Supports information architecture patterns
- taxonomies, thesauri, faceted classification
- synonym rings, best bets
- Formal structure
- supports advanced searching and querying
- can be exploited in many different ways, once
created - Easy to create web sites from
- the site structure flows from the Topic Maps
ontology - a simple natural model means you get an
understandable site - advanced search capabilities can be added easily
33 Summary
- Topic maps are not so much an extension of the
traditional schemes as on a higher level. That
is, thesauri extend taxonomies, by adding more
built-in relationships and properties. - Topic maps do not add to a fixed vocabulary, but
provide a more flexible model with an open
vocabulary. - A consequence of this is that topic maps can
actually represent taxonomies, thesauri, faceted
classification, synonym rings, and authority
files, simply by using the fixed vocabularies of
these classifications as a topic map vocabulary.
34Tools References
- The TAO of topic maps, the classic introduction
to topic maps. - TM4J is an open source topic map engine project
in Java. - Perl XTM is an open source topic map engine in
Perl. - tmproc is an open source topic map engine in
Python. - The Omnigator is a free (as in beer) topic map
browser that can display any topic map. There's
also an online demo. - easytopicmaps.com is a wiki site about topic
maps.
35Tools References
- topicmap.com is a useful site about topic maps.
- XTM 1.0 is currently the most important
specification. It has now been incorporated in
the second edition of the topic map ISO standard.
- isotopicmaps.com tells you where the topic map
standards are headed next. - LTM, the Linear Topic Map notation, is a
text-based syntax for topic maps that is easier
to read and write for humans than XTM. - Jan Algermissen maintains a registry of publicly
available topic maps.