Title: Taxonomies And Ontologies
1Taxonomies And Ontologies
T.B. Rajashekar National Centre for Science
InformationIndian Institute of ScienceBangalore
560 012 (raja_at_ncsi.iisc.ernet.in)
Prepared for the talk at the Sarada Ranganathan
Endowment for Library ScienceBangalore 9th
August 2003
2Agenda
- To highlight
- Growing importance of taxonomies and ontologies
- Definitions, methodologies, tools, examples,
experiences, Trends - Draw implications for information professionals
3Growing Importance
- Google search
- Taxonomy/ies 186,000 pages
- Ontology/ies 202,000 pages
- Library Classification 1,160,000
- Thesauri/us 899,000
- WOS search shows significant increase in number
of publications related to taxonomies and
ontologies during 2000 and 2001
4Why this Interest?
- Rapidly growing volume of digital information
- Within enterprises
- Internet
- Enterprises
- Effective management and access to intellectual
capital of the enterprise (corporate memory) - Internet
- Access and exchange of meaningful information and
data human beings, between machines and
services (software agents) - Semantic web - Content schemas metadata,
encoding schemes, vocabulary
5Definitions - Taxonomy
- Definitions vary widely in scope from simple
and narrow to complex and broad - Roots in biological taxonomy
- Adapted mainly by KM, enterprise portal, content
management and CS community - Beginning to be used and accepted in information
science - Lets peruse a few definitions given by KM and
taxonomy experts
6Definitions - Taxonomy
- Systematic way of classifying knowledge
- Provides a hierarchical structure of concepts,
- using terms that help in the development of a
common language to aid knowledge sharing -
- - Taxonomies Framework for Corporate Knowledge,
by Jan Wyllie (2003)
7Definitions Taxonomy
- A correlation of different functional languages
used by the enterprise - to support a mechanism for navigation and gaining
access to the intellectual capital of the
enterprise - by providing such tools as portal navigation
aids, authority for tagging documents and other
information objects, - support for search engines, and knowledge maps
- and possibly, a knowledge base in its own right.
-
- Enterprise Taxonomy
- - Alan Gilchrist and Peter Kibby, TFPL
8Definitions Taxonomy
- Structures that provide a way of classifying
things living organisms, products, books - into a series of hierarchical groups to make them
easier to identify, study, or locate. - Taxonomies consist of two parts structures and
applications. - Structures consist of the categories (or terms)
themselves and the relationships that link them
together. - Applications are the navigation tools available
to help users find information - - Jean Graef, Montague Institute
9Definitions Taxonomy
- The specification of the names of people, places,
things, and anything else that is needed to allow
search engines and other content applications to
work better. - - Joseph A. Busch, Taxonomy Strategies.
10Definitions Taxonomy
Taxonomy Facets
- Joseph A. Busch, Taxonomy Strategies
11Definitions Taxonomy
- Taxonomy a classification of elements within a
domain (read corporate domain) - Domain a sphere of knowledge, influence, or
activity - Classification the operation of grouping
elements and establishing relationships between
them (or the product of that operation) - Relationships a defined linkage between two
elements - Element an object or concept
- Mike Crandall, Microsoft Information Services
12Definitions Taxonomy
- Global Knowledge Object (GKO) taxonomy standard
(standard data architecture and vocabulary ). - PwCs trade name for their standard
unstructured data architecture - Establishes a defined series of fields
- Defines vocabularies allowing consistent tagging
of content based on business usage and context - Includes multiple taxonomies covering
unstructured data/content - Acts as a logical data model representing what
we need to know about our content - Provides a standard approach for meta-data
tagging of content to enable repurposing in
multiple systems - GKO will support multi-term and multi-language
vocabulary sets, allowing differences across
business units and languages
- Mark Zoeckler, Director, PricewaterhouseCoopers
13Definitions Taxonomy
- Taxonomies are evolving to be much more than
traditional classification systems - Support structure, content and applications
(navigation tools) - Customized to reflect the language, culture and
goals of particular enterprise - Often created using a combination of human
efforts and software - Reflect disparate information resources
e-mails, memos, people, documents, etc. - Part of a process constantly refined and updated
14Definitions - Ontology
- Ontologies are more complex than taxonomies, and
express relationships between the elements of a
taxonomy, such as part of or located in. - Theory
- Ontology Science or study of being
- Term used to refer to the shared understanding of
some domain of interest, as a set of concepts
(e.g. entities, attributes, processes), their
definitions and their interrelationships world
view (conceptual ontology)
15Definitions Ontology
- An explicit ontology typically includes a
vocabulary of terms and some specification of
their meaning (i.e. definitions) and
relationships (Rokhlenko Oleg, Data Integration
Seminar, Spring 2002) - Why ontologies?
- Communication between people and organizations
within a domain - Interoperability between systems
- Prerequisite for the semantic web
16 The term procedure used by one tool is
translated into the term method used by the
other via the ontology, whose term for the same
underlying concept is process.
Example (From Rokhlenko Oleg, Data Integration
Seminar, Spring 2002)
procedure viewer
give me the procedure for
give me the process for
translator
procedure ???
here is the procedure for
Ontology
??? process
give me the METHOD for
procedure process
translator
METHOD process
here is the process for
method library
here is the METHOD for
17Building (corporate) Taxonomies
18Building Taxonomies - 1
Mike Crandall, Microsoft Information Services
- Define project scope
- Define boundaries, determine required resources
- Obtain resource commitments
- User needs survey
- What content users need, how they access it
- Information audit
- Existing content, its structure, who is
responsible - Involve users
- Include key stakeholders in the process
- Decide on architectural approach how you are
going to do it - Build taxonomy
19Building Taxonomies - 1
Mike Crandall, Microsoft Information Services
The Process
- Identify
- business
- needs
- _______
- User
- needs
- survey
- Tag
- audit
- Content
- audit
- Collect/
- structure
- terms
- ________
- Build
- vocabs
- Define
- rules
- Create
- change
- control
- process
- Tag
- content
- ________
- Embed
- vocab
- access
- in tools
- Provide
- guidelines
- for use
- Expose
- Content
- ________
- Embed
- tags in
- interfaces
- Segment
- content by
- attributes
- Enable
- thru
- XML/XSL
- Define
- needed
- attributes
- _______
- Build
- object
- model
- Create
- flat list
- Provide
- mapping
- schema?
20Building Taxonomies - 2
Joseph A. Busch, Taxonomy Strategies
- Phase 1 Scope taxonomy
- Brief, identify and interview stakeholders and
subject matter experts. - Collect example documents.
- Discover existing controlled vocabularies.
- Analyze data (inductive and deductive).
- Present scoping results.
- Phase 2 Build taxonomy
- Develop broad taxonomy outline (1-3 levels deep)
- Review, revise, and approve taxonomy outline with
stakeholders and subject matter experts. - Fill in taxonomy outline (with approx 1500 terms)
- Review, revise, and approve draft taxonomy with
stakeholders and subject matter experts.
21Building Taxonomies - 2
Joseph A. Busch, Taxonomy Strategies
Taxonomy outline example (MS Word)
22Building Taxonomies - 2
Joseph A. Busch, Taxonomy Strategies
- Phase 3 Maintain taxonomy
- Develop tagging rules and procedures.
- Specify taxonomy maintenance business process.
- Document taxonomy maintenance procedures.
- Train users.
23Building Taxonomies - 2
Joseph A. Busch, Taxonomy Strategies
Taxonomy maintenance Editorial rules
24Building Taxonomies - 2
Joseph A. Busch, Taxonomy Strategies
Taxonomy maintenance -business process
25Building Taxonomies - 2
Joseph A. Busch, Taxonomy Strategies
Taxonomy maintenance change scenario
old old label to be deleted new new label
to take its place Remember to match using the
full path of the code, since parts of codes may
have similar names. for each file in the
staging area V vpath for the file
system(iwmetadata -get V) gt tmpFile
updateFlag false if (tmpFile contains
old) Note that if you are renaming an
internal term, a file may have multiple
concepts with that internal label, so be
sure to loop through them all.
s/old/new/ updateFlag true
if (updateFlag true) system(iwmetadata
set tmpFile V) Listing 1 Simple
metadata update pseudocode.
26Building Taxonomies - 2
Joseph A. Busch, Taxonomy Strategies
Taxonomy Example
27Use of Taxonomies 1 Microsoft Information
Services
28Use of Taxonomies 1
Microsoft Information Services
- Content creation- tagging of documents and other
items - Site navigation- using categories
- Information retrieval - search
- Personalization- delivery
- Link content to people via user profiles
requires well tagged content
29Content Tagging
30Tagged Content
31Category-based Navigation
32(No Transcript)
33Categories in Search
34(No Transcript)
35Use of Taxonomies 2 Joseph A. Busch Taxonomy
Strategies
36Case Study Halliburton
- Consolidate multiple taxonomies that have been
developed for separate product lines so that
employees, contractors and customers can easily
find information about products and services
regardless of what product line theyre in. - Find everything on a piece of equipment before
performing a service. - Find a specific piece of information that you
know is there (or should be there). - Help communities of practice organize detailed
technical information in their intranet portals. - Help organize and present search results in
categories and sub-categories
37Halliburton Taxonomy
- Key data drivers are Business Process and
Logistics (materials and equipment).
Case Study
Logistics
38New Halliburton facet search browser
39Case Study NASA
- Make it easy for various audiences to find all
the relevant information from all the NASA
programs quickly. - Provide one stop shopping for NASA resources.
- Provide search results targeted to user
interests. - Help users to easily find links to databases and
tools. - Inspire the next generation of explorers.
- Prepare Americas future in innovation,
technology and exploration. - Advance e-government strategies through projects
that are citizen-centric.
40NASA Taxonomy
- Key data drivers are business roles, functions,
and skills.
Case Study
41Use of Taxonomies 3 TopicMap in Highwire
E-Journal Service Automatic tagging Semio
Tagger (Entrieva) Visualization Using
Hyperbolic Tree technology of Inxight Software
Visualization
42(No Transcript)
43(No Transcript)
44(No Transcript)
45(No Transcript)
46Benefits of Taxonomies
- Improved management of unstructured information
- Provide context (additional value)
- Simplify navigation
- Improve search
- Enable personalization
- Share information between systems and people in
B2B e-commerce - Make connections between related concepts across
disciplines/ applications/ services
47Taxonomy Software
- Approaches for taxonomy generation
- Training by example Editor defines a category
is and provides a training set - Rules-based documents are classified according
to specified business rules - Statistical methods word patterns are
identified according to parameters such as
frequency of use and proximity to other words and
phrases (clustering) - Natural language processing (computational
linguistics) classification relies on extensive
dictionary and thesaurus for identifying concepts
48Taxonomy Software
- Key features supported by taxonomy software
- Provide a variety of user control and interaction
- Suggest a classification that the user can accept
or override - Co-exist with content-management and
search-and-retrieval software - Most taxonomy tools automatically insert metadata
as XML tags into documents - Visualization graphical presentation of taxonomy
49Taxonomy Software
- Vendors
- Heavy weights players whose product suites
encompass whole lifecycle of information,
including taxonomy creation, document
classification and information retrieval (Verity,
Autonomy) - Specialist taxonomy suppliers using innovative
approaches - Consolidation in the market place between
classification tool vendors and search engine
providers
50Standards
- Interoperability
- Need for interchange of information between
organizations in a commonly agreed way - XML - basic level of information exchange
standard - Need for broad industry-wide consortia to use
agreed schemas for sharing data across
applications - Related standards
- RDF framework for describing and using metadata
schemes (e.g. Dublin Core) - OIL (Ontology Inference Layer)
- WSDL (Web Services Definition Language), UDDI
(Universal Description, Discovery and
Integration), ISO Topic Maps
51Taxonomy Development Some Lessons
What experts are saying?
- Precise methodologies do not exist still an art
form - Invest in small, pilot projects stakeholders
can judge success, potential for quick success - Use people with domain knowledge and library
skills - Keep it simple a good taxonomy is intuitive and
intelligible
52Taxonomy Development Some Lessons
What experts are saying?
- Involve users at all aspects of development and
usage lifecycle - Focus on fitness for purpose better to have
several application/user specific taxonomies than
one-size-fits-all approach - Must be constantly updated based on usage and
changing terminology - Use technology to identify patterns and
terminology, but use human intervention for final
judgment
53Taxonomy Development Some Challenges
What experts are saying?
- Overall number of nodes
- Right balance between breadth and depth
- Choosing appropriate hierarchical structures
- To start from scratch or use an off-the-shelf
taxonomy - Choose between automation and specialist human
classifiers
54Future Directions
What experts are saying?
- Taxonomies and ontologies are closely linked to
the future of the web - widely accepted
taxonomies of hundreds (if not thousands) of
different knowledge domains are the building
blocks of the future semantic web. - Intelligent web will incorporate topic maps,
knowledge maps and ontologies that act on the
basis of the precise meaning of specified terms
and the relationship between them. - It is also expected that, in addition to an
intelligence being artificially situated in the
network, it also has to come from enhancing the
intelligence, disciplines and skills of the users
using taxonomies and search engines.
55Implications for Information Professionals
- Huge demand for taxonomy specialists, portal/
intranet/ website content management, metadata,
vocabulary development - How can we contribute?
- Enhance (corporate/ web) taxonomy, ontology
development theory, principles and procedures
with those of classification and thesauri design - Evolve and impart to our students competencies in
developing enterprise taxonomies, metadata
schemas and integration of these into enterprise
IKM systems (portals, CMS, DMS, DLs, etc.)
56Implications for Information Professionals
- Lessons for us?
- Systematization of taxonomy development and
maintenance - Dynamic nature of enterprise taxonomies
- Awareness of existing taxonomies and vocabularies
and their adoption (e.g. taxonomywarehouse.com)
57Thank You!