Title: Enabling the Semantic Web: The role of metadata, semantics and domain ontologies
1Enabling the Semantic WebThe role of metadata,
semantics and domain ontologies
- Vipul Kashyap
- Presentation to National Library of Medicine
- 8th January, 2002
2Outline
- What is the Semantic Web ?
- Metadata, Ontologies and the Semantic Web
- A Three Level Approach for the Semantic Web
- The Semantic Web Fabric A Collection of Metadata
and Ontologies - Components of the Semantic Web Fabric
- Metadata-based approach for Heterogeneous Digital
Data - OBSERVER Incremental Query Expansion across
Multiple Ontologies - Ontology Integration and Query Rewriting
- Intensional Loss of Information
- Extensional Loss of Information
- Conclusions and Future Work
3What is the Semantic Web ?
- Semantics
- meaning or relationship of meanings, or relating
to meaning (Webster), - meaning and use of data (Information System
perspective) - Semantic Web
- An extension of the current web, in which
information is given well-defined meaning, better
enabling computers and people to work in
cooperation Berners-Lee, Hendler, Lassila, 2001 - Emergent Semantic Web
- a semantic platform for people and applications
to collaborate in creating, validating, and using
dynamic knowledge where semantics emerges from
the interactions
4Metadata, Ontologies and the Semantic Web
Get the titles, authors, documents, maps
published by the United States Geological Service
(USGS) about regions having a population greater
than 5000, area greater than 1000 acres having a
low density urban area land cover
domain specific metadata terms chosen from
domain specific ontologies
What is Metadata ?
What are Ontologies ?
- data/information about data - useful/derived
properties of media - properties/relationships
between objects - may or may not capture
information content of underlying data
- collection of terms, definitions and
interrelationships - specification of a
representational vocabulary for a shared
domain of discourse - Semantically rich metadata
capturing the information content of
underlying data repositories - DL descriptions
organized as a lattice
5 Examples of Metadata for Digital Data
6 A Metadata Classification The Information
Pyramid
User
Ontologies Classifications Domain
Models
Domain Specific Metadata
area, population (Census), land-cover,
relief (GIS),metadata concept
descriptions from ontologies
Content Descriptive Metadata
Domain Independent (structural)
Metadata (C class-subclass
relationships, HTML Document Type
Definitions, C program structure)
Direct Content
Based Metadata (inverted lists,
document vectors, WAIS, Glimpse, LSI)
Content Dependent Metadata (size, max colors,
rows, columns)
Content Independent Metadata (creation-date,
location, type-of-sensor)
Data (Heterogeneous Types/Media)
7A Three-Level Approach for the Semantic Web
Ontological-terms (Domain, Application specific)
Vocabulary
used-by
used-by
Metadata
Content
(content descriptions, intensional)
abstracted-into
abstracted-into
Data
Representation
(heterogeneous types, media)
Problem Components
Solution Components
8 The Semantic Web FabricA Collection of Metadata
Descriptions and Ontologies
Ontology
Server
MetadataRepository
Distributed Computing Infrastructure (J2EE, .NET,
CORBA, Agents)
9Components of the Semantic Web Fabric
- Bootstrapping, Creation and Maintenance of
Semantic Knowledge - Collaborative and Sociological Processes,
Statistical Techniques - Ontology Building, Maintenance and Versioning
Tools - Re-use of Existing Semantic Knowledge
(Ontologies) - Annotation/Association/Extraction of Knowledge
with/from Underlying Data - Information Retrieval and Analysis (Distributed
Querying/Search/Inference Middleware) - Semantic Discovery and Composition of Services
- Distributed Computing/Communication
Infrastructures - Component based technologies, Agent based
systems, Web Services - Repositories for managing data and semantic
knowledge - Relational Databases, Content Management Systems,
Knowledge Base Systems
Collaboration between people and applications
10Metadata-based Approach for handling
Heterogeneous Digital Data
- Annotation/Association/Extraction of Knowledge
with/from Underlying Data - Structured Databases
- Mapping concepts in domain ontologies to schema
metadata elements - Text Databases
- Mapping of concepts in domain ontologies to
textual metadata - Information Retrieval and Analysis
- Structured Databases
- Distributed Query Processing across Multiple
Information Sources - Text Databases
- Mapping SQL/Description Logic based queries into
text retrieval expressions - Re-use of Existing Semantic Knowledge
- Interoperation Across Multiple Ontologies
- Loss of Information
11Metadata-based Approach Analysis of Schema
Metadata
Schematic Conflicts
Naming Conflicts Database Identifier Conflicts S
chema Isomorphism Conflicts Missing Data
Items Conflicts
Data Value Attribute Conflict Entity
Attribute Conflict Data Value Entity Conflict
Naming Conflicts
Generalization Conflicts Aggregation Conflicts
Known Inconsistency Temporal Inconsistency Accep
table Inconsistency
Data Representation Conflicts
Data Scaling Conflicts
Data Precision Conflicts
Default Value Conflicts
Attribute Integrity Constraint Conflicts
12Metadata-based ApproachDescribing database
objects using DL expressions
ONTOLOGICAL TERMS
AgencyConcept
DocumentConcept
hasOrganization
All documents stored in the database have been
published by some agency Database Documents ?
(AND DocumentConcept
(hasOrganization AgencyConcept))
DATABASE OBJECTS AGENCY(RegNo, Name,
Affiliation) DOC(Id,
Title, Agency)
- Advantages
- Use of ontologies for an intensional domain
specific description of data - Representation of extra information
- Relationships between objects not represented in
the database schema - Using terminological relationships in the
ontology
13Metadata-based ApproachMapping ontological
elements to textual metadata
Domain Specific !!
ltACCRUEgt(ltSENTENCEgt(person.name,
ltPHRASEgt(ltInputgt)),
ltSENTENCEgt(person.name,
ltSTEMgt(appointed),
ltPHRASEgt(ltInputgt)),
ltSENTENCEgt(person.name,
ltSTEMgt(become),
ltPHRASEgt(ltInputgt)))
ltACCRUEgt(ltSENTENCEgt(person.name,
ltSTEMgt(leader),
party.name),
ltSENTENCEgt(person.name,
ltSTEMgt(representing),
party.name))
Parameterization !!
14Metadata-based ApproachMapping DL queries to
Topic Expressions
has_document from (AND person (FILLS name
Alexandr Shokhin) (FILLS profession Prime
Minister))
ltACCRUEgt( ltTOPICgt(person),
ltPHRASEgt(ltWORDgt(Aleksandr), ltWORDgt(Shokhin)),
ltACCRUEgt(
ltSENTENCEgt(ltPHRASEgt(ltWORDgt(Aleksandr),
ltWORDgt(Shokhin)),
ltSTEMgt(appointed),
ltPHRASEgt(ltWORDgt(Prime), ltWORDgt(Minister))), ltSE
NTENCEgt(ltPHRASEgt(ltWORDgt(Aleksandr),
ltWORDgt(Shokhin)),
ltSTEMgt(becomes), ltPHRASEgt(ltWORDgt(Prim
e), ltWORDgt(Minister)))))
15(No Transcript)
16Metadata-based ApproachUsing DL expressions to
reason about information
Query hasDocument for (FILLS hasOrganization
USGS))
- Reasoning with DL Expressions - Ontological
Inferences - DocumentConcept -
(hasOrganization, USGS )
Challenge 1 Use of Multiple Ontologies Challenge
2 Estimating the Loss of Information
17OBSERVER Ontology-based System Enhanced with
(terminological) Relationships for Vocabulary
hEterogeneity Resolution
...
IRM
Ontologies
User Query
Mappings/ Ontology Server
Inter-ontologyRelationships
Query Processor
Query Processor
Query Processor
Mappings/ Ontology Server
Mappings/ Ontology Server
...
...
Ontologies
Ontologies
...
Repositories
Repositories
18Controlled and Incremental Query Expansion to a
new Ontology
Query Construction
Local Ontology
Yes
No
END
19Bibliography Data Ontology The Red Ontology
Conference
Agent
Person
Organization
Author
Publisher
University
Thesis
Periodical-Publication
http//www-ksl.stanford.edu/knowledge-sharing/onto
logies/html/bibliographic-data/
20A subset of WordNet 1.5 The Blue Ontology
http//www.cogsci.princeton.edu/wn/w3wn.html
21Inter-ontological relationships
- Synonyms
- leads to semantics preserving translations
- Hyponyms/Hypernyms
- lead to semantics altering translations
- typically results in loss of recall and precision
- List of Hyponyms
- technical-manual hyponym manual
- book hyponym book
- proceedings hyponym book
- thesis hyponym book
- misc-publication hyponym book
- technical-reports hyponym book
- press hyponym periodical-publicatio
n - periodical hyponym periodical-publicatio
n
22Ontology Integration and Query Rewriting
union(Journal, union(Book, Proceedings, ...,
Misc-Publication)), union(Periodical-Publication,
union(Book, ....., Misc-Publication)),
Document
Journal, Periodical-Publication
union(Book, Proceedings, ..., Misc-Publication)
Technical-Manual
GuideBook
23Intensional Loss of Information
- Original Query
- NAME PAGES for (AND BOOK (FILLS CREATOR Carl
Sagan)) - Modified Query
- NAME PAGES for (AND document (FILLS
doc-author-name Carl Sagan)) - Terminological Relationships
- BOOK ? (AND PUBLICATION (ATLEAST 1 ISBN))
- PUBLICATION ? (AND document (ATLEAST 1
PLACE-OF-PUBLICATION)) - Terminological Difference
- (AND (ATLEAST 1 ISBN) (ATLEAST 1
PLACE-OF-PUBLICATION)) - Loss of Information
- Instead of books authored by Carl Sagan, OBSERVER
returns those documents by Carl Sagan that may
not have an ISBN or may not have been published
24Intensional Loss of Information Disadvantages
and Advantages
- May not make sense as it mixes two vocabularies,
- e.g., does Book - Book make any sense ?
- The problem becomes worse if the two ontologies
are in different languages, - e.g., English and Italian
- Makes it hard for the system to differentiate
between the various alternatives - On the other hand
- An information loss interval doesnt make much
sense to the user.
25Estimating Loss of Information based on Term
Extensions
Loss in Precision
Loss in Recall
Ext(Term)
Ext(Translation)
Precision Ext(Term) ? Ext(Translation)
Ext(Translation)
Recall Ext(Term) ? Ext(Translation)
Ext(Term)
Percentage Loss Ext(Term) ?
Ext(Translation)
Ext(Term) Ext(Translation)
26Estimating Term Extension Intervals
- Intersections
- Ext(Expr1) ? Ext(Expr2).low 0
- Ext(Expr1) ? Ext(Expr2).high min
(Ext(Expr1).high, Ext(Expr2).high) - Unions
- Ext(Expr1) ? Ext(Expr2).low max
(Ext(Expr1).low, Ext(Expr2).low) - Ext(Expr1) ? Ext(Expr2).high
Ext(Expr1).high Ext(Expr2).high - Term
- Ext(Term).high Ext(Term).low Ext(Term)
27Estimating Intervals of Information Loss
- Intervals of Precision and Recall
- Precision.high, Precision.low
- Recall.high, Recall.low
- Leads to Intervals of Information Loss
28Comparison of two translations
- Consider two translations
- Trans1 with bounds low1 and high1
- Trans2 with bounds low2 and high2
- Choosing the appropriate translation.
- Compute mLossi (lowi highi)/2
- if mLoss1 lt mLoss2, choose Trans1
- if mLoss2 lt mLoss1, choose Trans2
- if mLoss1 mLoss2, choose translation with
lesser interval (highi - lowi) - Need for probabilistic models
- Let (low1, high1) (10, 80) and (low2, high2)
(20, 60) - mLoss2 (40) lt mLoss1 (45) gt Trans2 is chosen
- However there are cases for which Trans1 returns
a lower (10 - 20) loss !
29Semantic Adaptation of Precision and Recall
- Term subsumes Translation
- Ext(Translation) ? Ext(Term) ? Ext(Term) ?
Ext(Translation) Ext(Translation) - Precision 1,
- Recall Ext(Translation)
- Ext(Term)
- However Term and Translation belong to different
ontologies - Ext(Term) Ext(Term) ? Ext(Translation)
- Recall.low Ext(Translation).low
-
Ext(Translation).low Ext(Term) - Recall.high Ext(Translation).high
- max(Ext(Translation)
.high, Ext(Term) - Need to evolve a common framework for relating
subsumption and information loss
30Semantic Adaptation of Precision and Recall
- Translation subsumes Term
- Analogous (Dual ?) of the previous case
- Recall 1
- Precision Ext(Term)
- Ext(Translation)
- Cases of no Information Loss
- Translation of a term by the intersection of its
immediate parents which is also its definition - Translation of a term by the union of its
immediate children if there exists a covering
relationship between the two - Need for extensional inter-ontological
relationships - e.g., 20 of publications are 50 of books
- characterizing degree of overlap
31Computation of Precision and Recall in the
absence of Semantic Relationships
- Precision
- Precision.low 0
- Precision.high max min(Ext(Term),
Ext(Translation).high), -
Ext(Translation).high -
min(Ext(Term), Ext(Translation).low), - Ext(Translation).low
- Recall
- Recall.low 0
- Recall.high min(Ext(Term),
Ext(Translation).high) - Ext(Term)
32Choosing an optimal translationLocal v/s Global
Decision Making
Publication
Document
LOSS(Document, Book)
Document
Document
Journal
Publication
Book
Journal
Book
Journal
LOSS(Publication, Journal)
LOSS(Journal, Book)
LOSS(Document, Publication)
- Local Decision Making
- LOSS(Publication, Journal) gt LOSS(Document,
Publication) - Document is chosen as the translation
- But LOSS(Book, Document) gt LOSS(Book, Journal) !!
- Global Decision Making
- Both translations Document, Journal are passed
on to the next level - Journal is chosen as the appropriate translation
33Loss of Information for Correlated Answers across
Ontologies
New Answeri
Ideal Answer
Ideal Answer
New Answeri
Ideal Answer
Answeri1
Answeri1
- NewAnsweri Correlated answer from previous
ontologies (O1, Oi) - Answeri1 Answer obtained from new target
ontology Oi1 - The following case arise
- NewAnsweri1 NewAnsweri ? Answeri1
- Loss(NewAnsweri1) gt Max loss defined by user
- NewAnsweri and Answeri1 are displayed separately
to the user with an appropriate warning
34Conclusions
- Analysis of the Semantic Web Technology Space
- Proposed a layered approach for analysis
- Identified components of the Semantic Web Fabric
- Re-use of pre-existing real world ontologies
(off the shelf) - Mapping the ontologies to structured and text
databases - Mechanisms for translation of queries across
different ontologies - Approach for adaptation of information loss based
on semantic relationships - Loss of information measures to determine the
semantic appropriateness of a particular ontology
and translation - The future Semantic Web will be based on browsing
domain specific ontologies and vocabularies - Need to provide critical underlying
infrastructure based on the above
35Future Work
- Extensions to current work
- Information Extraction from Textual Data
- Evolve a common framework to relate subsumption
with loss of information - Explore relationships with standards such as SQL,
XML/RDF based QLs, DAMLOIL - Complex probabilistic modeling for ranking
translations - Experimentation and Validation of measures for
Loss of Information - Bootstrapping, Creation, Validation of Semantic
Knowledge - Ongoing work in collaboration with Stanford
University and University of Georgia (NSF ITR
Proposal) - Use of statistical clustering to determine
central terms - Use of consensus analysis across SMEs to enrich
terminology and create ontology - Use of scalable knowledge composition to re-use
existing knowledge and support ontology
interoperation - Use of IScapes to specify and validate hypotheses
and feedback from the process to generate new
semantic knowledge - Interaction of above processes
- Ontology Maintenance and Versioning