Title: Overview of technological solutions to terminology services
1Overview of technological solutions to
terminology services
- Doug Tudhope
- Hypermedia Research Unit
- University of Glamorgan
JISC Terminology Workshop, London, February 2004
- Networked Knowledge Organisation Systems/Services
- Broad review technological approaches
- NKOS Lifecycle
- Introduce Workshop Demonstrations
- Critical Issues and possible gaps
- References
3Taxonomy of Knowledge Organisation Systems
- Term Lists
- Authority Files, Glossaries, Gazetteers,
Dictionaries - Classification and Categorization
- Subject Headings
- Classification Schemes and Taxonomies
- eg DDC, scientific taxonomies
- Relationship Schemes
- Thesauri
- Semantic Networks (eg WordNet)
- (Ontologies)
- Hodg00, http//www.clir.org/pubs/abstract/pub91ab
- Thesauri
- 3 Standard Relationships between concepts
(Aitc00) - Equivalence, Hierarchical, Associative
- Inherent domain lexicon (lead-in vocabulary)
- Concept definitions and warrant (Scope Notes)
- Ontologies
- Higher level conceptualisation (McGu02, Noy)
- formal definition of relationships
- inference rules and definition of roles
(sometimes) - KOS an element of ontologies and schemas
- Jaco03, Ontologies and the Semantic Web,.
- ASIST Bulletin, April/May 2003, Special Issue on
Semantic Web
5Recent Sources
- NKOS Networked Knowledge Organization
Systems/Services - http//jodi.ecs.soton.ac.uk/?vol4iss4 NKOS
JoDI Special Issue - http//www.multites.com/conference03.htm
MultiTes Conference - http//nkos.slis.kent.edu/ JCDL and ECDL
Workshops 2003 - http//www.lub.lu.se/SEMKOS/ SEMKOS IP
Proposal Resources - Semantic Web - RDF/XML, RDF Schema, Metalog, OWL
- http//www.w3.org/2001/sw/ W3C Semantic Web
Activity - http//www.semanticweb.org/
- http//ontoweb.aifb.uni-karlsruhe.de/ OntoWeb
- http//www.w3c.rl.ac.uk/SWAD/thesaurus.html
SWAD-Europe Thesaurus index - Semantic Grid - Semantic Web, Web service,
eScience, GRID links - http//www.semanticgrid.org/
- http//www.w3.org/2002/ws/ W3C Web Services
Activity - http//www.ariadne.ac.uk/issue29/gardner/intro.htm
l Gardners Intro to Web Services
6JISC Application Area
- Search/retrieval for educational purposes(?)
- students, teachers, researchers
- possibly
- Generalised search
- possibly integrated into applications
- triggered to take account of context (eg Brow02)
- link eScience applications?
- Current operational systems (eg RDN)
- lack terminology services
- some browsing categories
- but not integrated into search
- Information Science
- Controlled Terminology
- Information Retrieval (probabilistic, full text)
- Intellectual/Automatic Indexing
- search/browse, user interfaces
- Facet Analysis
- Ontology Engineering (AI Knowledge
Representation) - formal (finer grained) representation,
description logics - automated reasoning, Semantic Web
- Distributed Systems
- Z39.50, Web Services, Semantic Grid
- Language Engineering
- Social Engineering
8Enriching / Formalising KOS
- KOS Legacy - large (multilingual) vocabularies,
- indexed multimedia (and print) collections
- Product of peer review and follow standards
- However
- Not utilised to full potential in some
applications - Designed for human inspection, semantic structure
not explicitly represented - May be inconsistently evolved from various
sources -
- Opportunity to formalise / enrich
- Partly a matter of representation in RDF/XML
- but may be inconsistencies in logical structure
- --gt deconstruction and ontological formalisation
- --gt mutually exclusive concept structures
9Facet Analysis (a link between technologies)
- Fundamental categories / foundational concepts
- eg CRG Entity, Part, Property, Material,
Process, Operation, Product, Agent, Space, Time,
... - Mapped to facets for particular KOS
- Basis of several scientific and industrial KOS
- Synthesis rules for principled combination of
concepts - rules for combining base concepts when
indexing/querying - Browsing and Searching applications
10KOS integration into DL services from Hill02
Research Agenda KOS/DL
- Taxonomy of KOS - KOS types linked to DL service
protocols - Registries of KOS and KOS-level metadata to
represent them - XML/RDF KOS representations - customisable
- Core set of relationship types across all KOS
- General KOS service protocol
- from which protocols for specific types of KOS
can be derived - Robust linking model in which DL entities
(collections, objects, and services) can refer to
KOS entities (concepts, labels, and
relationships) - Visualization tools that fully use and display
the rich semantics embedded in KOS - gt move towards a model of search service flow?
- - how semantic search services combine
11Terminology Services from Koch04 Structured
Overview - Activities to advance the powerful use
of vocabularies
- Searching for concepts
- schemes in registries
- concepts/terms in taxonomy servers
- Search support for queries
- collection finding
- cross-searching, cross-browsing, mapping services
- KOS browsing and user interface/visualisation
- query expansion, disambiguation
- automatic indexing and classification
- extraction/mining of terms
- translation support using vocabularies
12Workshop Demonstrations
- in context of NKOS Information Lifecycle
13NKOS Information Lifecycle
- KOS creation and maintenance
- Mapping, merging vocabularies
- Document creation and maintenance
- Indexing, classification, annotation
- intellectual, automatic
- Discovery of services and databases/collections
- Searching for concepts --gt controlled
terminology, auto-disambiguation - Querying and result display
- Cross-searching, cross-browsing, mapping services
- KOS browsing and user interface/visualisation
- Query expansion
- Extraction/mining of terms
- Translation support using vocabularies
- Content integration and mediation
14High Level Thesaurus (HILT) - Information Science
Pilot Terminology Service HILT team, Wordmap s/w,
OCLC discovery of collections cross-searching
JISC collections mapping from Terminologies to
15geoXwalk - Geographic Information Science
Geo-spatial Gazetteer Service Edina, Data
Archive, CIE feature (concept) searching geographi
c searching, spatial operators spatial result
visualisation, flexible footprint geoparser -
automated geographic indexing
16Renardus - Information Science
cross-browsing service NetLab, UKOLN, ILRT, SUB,
classification mapping via DDC cross-searching
EU subject gateways (multilingual) user interface
for browsing in large classifications
17Learning and Teaching Portal SSL - Information
Systems Simulations Ltd, Index Learning and
Teaching Support Network Web-based thesaurus
service vocabulary management - Suggest a
Term data entry browse and search
18CIE Health Demonstrator - Information Science,
facet analysis
Adiuri Systems Ltd (from IDEA Project) Waypoint
Health Info search demonstrator faceted,
multi-concept query via browsing non-zero
match, postings displayed faceted browsing user
19COHSE Conceptual Open Hypermedia - Ontology,
description logic, hypertext navigation
Link Navigation Using Ontologies Manchester,
Southampton University Open Hypermedia System
(Soton DLS) open-source downloadable tools
for Ontology and Annotation Services eg OilEd
lightweight ontology editor for DAMLOIL
20OpenGALEN - Ontology, GRAIL logic, facet analysis
Open GALEN Common Reference Model - Medical
coding and classification systems Manchester
University, faceted compositional rather than
traditional enumerative medical
codes multilingual GALEN-in-use
Project OpenKnoME, GALEN Case Env toolsets
21Co-ODE Collaborative Open Ontology Development
EnvOntology management
Manchester University new project develop
Ontology management tools as plugins for Protégé
(Stanford) building on earlier experience with
OilEd concern with usability
22FACET faceted knowledge organisation for
semantic retrieval- Information Science, facet
University of Glamorgan, Science Museum faceted,
multi-concept bestmatch search semantic
expansion as browsing service faceted thesaurus
search interface standalone and Web demonstrators
23E-Biosci EC platform e-publishing and info
integration in Life Sciences - Information Science
European Molecular Biology Organisation Collexis
B. V. technology semantic matching conceptual
fingerprints link genomic data life sciences
research lit multilingual integrated search full
text/data/researchers peer-reviewed, different
publishing models
24SKOS Simple knowledge organisation for the
semantic webInformation Science, Ontology
CCLRC, SWAD-EUROPE project Migrate existing KOS
to SemWeb via common RDF schema for thesauri and
for inter-thesaurus mapping (formal OWL spec
planned) use cases for thesaurus
services lightweight RDF service demonstrators
using Jena RDF API toolkit
25Some critical issues
- Standards
- User Interface
- Gaps?
26Critical issues (1) Standards
- Ongoing initiatives to revise thesaurus standards
- ANSI/NISO Z39.19
- BS 5723 and BS 6723 - Dext03
- BSI public draft soon, extended scope,
interoperability - Thesaurus Representations
- RDF - SWAD03 Topic Map - Ligh03 various XML
- Possibilities to extend current relationships by
specialisation, - enriching standards but maintaining compatibility
- KOS Service Protocols - Bind04
- service oriented approach with composite service
provision - not based on atomic elements of data structures
and relationships - expansion service provision
- NKOS Registry - Vizi01 MEG Registry Project
27Cost/benefit issues
- Thesaurus long-lived, pragmatic and useful tool
- cost-effective granularity of relationships
- for some search apps
- Domain lexicon (UF/ALTs, Scope Notes)
- Cost/benefit issues in KOS formalisation
- Application dependent level of precision in
concept use - Some apps very precise use of concepts (medical?)
- Other apps may vary in concept application
(humanities?) - Indexer - Searcher variation
- Results based on probable relevance judgements
28Critical issues (2) User interface
- User interface critical
- given controlled terminology demands
- Offer different options
- Move beyond minimal assumptions
- of current web search engines on
- users, query structure, collections
- Link with service protocol issues
- kind of interfaces easily afforded
- Accessibility issues
29Critical issues (3) Gaps?
- Language Engineering
- Related standards - Shre03
- POS tagging tools
- large statistical corpora
- --gt source of context data
- for disambiguation, annotation, proactive search
- JISC-specific corpora?
- Collect portal use data --gt taxonomies, synonyms
- Time-varying synonyms - BBCi04
- Probabilistic IR
- term frequency information, automatic weighting
30Social Engineering?
- What do users really want?
- Problems of introducing new technologies
- Sometimes a matter of both reflecting and shaping
user needs - Done implicitly by successful projects
- but also extant literature on sociology/philosophy
of innovation - Lessons from
- Participatory Design, Rapid Application
Development - Tudh00 - evolving network prototypes, user expectations,
requirements and working practices - Lead / Ambassador Users
- training, tailoring and advocacy / motivation.
31Contact Information
- Doug Tudhope
- School of Computing
- University of Glamorgan
- Pontypridd CF37 1DL
- Wales, UK
- dstudhope_at_glam.ac.uk
- http//www.comp.glam.ac.uk/pages/staff/dstudhope
