Enabling the Semantic Web: The role of metadata, semantics and domain ontologies

1 / 31
About This Presentation
Title:

Enabling the Semantic Web: The role of metadata, semantics and domain ontologies

Description:

Information Retrieval and Analysis (Distributed Querying/Search/Inference Middleware) ... PHRASE ( WORD (Aleksandr), WORD (Shokhin) ... –

Number of Views:70
Avg rating:3.0/5.0
Slides: 32
Provided by: arunmu
Category:

less

Transcript and Presenter's Notes

Title: Enabling the Semantic Web: The role of metadata, semantics and domain ontologies


1
Enabling the Semantic WebThe role of metadata,
semantics and domain ontologies
  • Vipul Kashyap
  • Presentation to National Library of Medicine
  • 8th January, 2002

2
Outline
  • What is the Semantic Web ?
  • Metadata, Ontologies and the Semantic Web
  • A Three Level Approach for the Semantic Web
  • The Semantic Web Fabric A Collection of Metadata
    and Ontologies
  • Components of the Semantic Web Fabric
  • Metadata-based approach for Heterogeneous Digital
    Data
  • OBSERVER Incremental Query Expansion across
    Multiple Ontologies
  • Ontology Integration and Query Rewriting
  • Intensional Loss of Information
  • Extensional Loss of Information
  • Conclusions and Future Work

3
What is the Semantic Web ?
  • Semantics
  • meaning or relationship of meanings, or relating
    to meaning (Webster),
  • meaning and use of data (Information System
    perspective)
  • Semantic Web
  • An extension of the current web, in which
    information is given well-defined meaning, better
    enabling computers and people to work in
    cooperation Berners-Lee, Hendler, Lassila, 2001
  • Emergent Semantic Web
  • a semantic platform for people and applications
    to collaborate in creating, validating, and using
    dynamic knowledge where semantics emerges from
    the interactions

4
Metadata, Ontologies and the Semantic Web
Get the titles, authors, documents, maps
published by the United States Geological Service
(USGS) about regions having a population greater
than 5000, area greater than 1000 acres having a
low density urban area land cover
domain specific metadata terms chosen from
domain specific ontologies
What is Metadata ?
What are Ontologies ?
- data/information about data - useful/derived
properties of media - properties/relationships
between objects - may or may not capture
information content of underlying data
- collection of terms, definitions and
interrelationships - specification of a
representational vocabulary for a shared
domain of discourse - Semantically rich metadata
capturing the information content of
underlying data repositories - DL descriptions
organized as a lattice
5
Examples of Metadata for Digital Data
6
A Metadata Classification The Information
Pyramid
User
Ontologies Classifications Domain
Models
Domain Specific Metadata
area, population (Census), land-cover,
relief (GIS),metadata concept
descriptions from ontologies
Content Descriptive Metadata
Domain Independent (structural)
Metadata (C class-subclass
relationships, HTML Document Type
Definitions, C program structure)
Direct Content
Based Metadata (inverted lists,
document vectors, WAIS, Glimpse, LSI)
Content Dependent Metadata (size, max colors,
rows, columns)
Content Independent Metadata (creation-date,
location, type-of-sensor)
Data (Heterogeneous Types/Media)
7
A Three-Level Approach for the Semantic Web
Ontological-terms (Domain, Application specific)
Vocabulary
used-by
used-by
Metadata
Content
(content descriptions, intensional)
abstracted-into
abstracted-into
Data
Representation
(heterogeneous types, media)
Problem Components
Solution Components
8

The Semantic Web FabricA Collection of Metadata
Descriptions and Ontologies
Ontology
Server
MetadataRepository
Distributed Computing Infrastructure (J2EE, .NET,
CORBA, Agents)
9
Components of the Semantic Web Fabric
  • Bootstrapping, Creation and Maintenance of
    Semantic Knowledge
  • Collaborative and Sociological Processes,
    Statistical Techniques
  • Ontology Building, Maintenance and Versioning
    Tools
  • Re-use of Existing Semantic Knowledge
    (Ontologies)
  • Annotation/Association/Extraction of Knowledge
    with/from Underlying Data
  • Information Retrieval and Analysis (Distributed
    Querying/Search/Inference Middleware)
  • Semantic Discovery and Composition of Services
  • Distributed Computing/Communication
    Infrastructures
  • Component based technologies, Agent based
    systems, Web Services
  • Repositories for managing data and semantic
    knowledge
  • Relational Databases, Content Management Systems,
    Knowledge Base Systems

Collaboration between people and applications
10
Metadata-based Approach for handling
Heterogeneous Digital Data
  • Annotation/Association/Extraction of Knowledge
    with/from Underlying Data
  • Structured Databases
  • Mapping concepts in domain ontologies to schema
    metadata elements
  • Text Databases
  • Mapping of concepts in domain ontologies to
    textual metadata
  • Information Retrieval and Analysis
  • Structured Databases
  • Distributed Query Processing across Multiple
    Information Sources
  • Text Databases
  • Mapping SQL/Description Logic based queries into
    text retrieval expressions
  • Re-use of Existing Semantic Knowledge
  • Interoperation Across Multiple Ontologies
  • Loss of Information

11
Metadata-based Approach Analysis of Schema
Metadata
Schematic Conflicts
Naming Conflicts Database Identifier Conflicts S
chema Isomorphism Conflicts Missing Data
Items Conflicts
Data Value Attribute Conflict Entity
Attribute Conflict Data Value Entity Conflict
Naming Conflicts
Generalization Conflicts Aggregation Conflicts
Known Inconsistency Temporal Inconsistency Accep
table Inconsistency
Data Representation Conflicts
Data Scaling Conflicts
Data Precision Conflicts
Default Value Conflicts
Attribute Integrity Constraint Conflicts
12
Metadata-based ApproachDescribing database
objects using DL expressions
ONTOLOGICAL TERMS
AgencyConcept
DocumentConcept
hasOrganization
All documents stored in the database have been
published by some agency Database Documents ?
(AND DocumentConcept
(hasOrganization AgencyConcept))
DATABASE OBJECTS AGENCY(RegNo, Name,
Affiliation) DOC(Id,
Title, Agency)

  • Advantages
  • Use of ontologies for an intensional domain
    specific description of data
  • Representation of extra information
  • Relationships between objects not represented in
    the database schema
  • Using terminological relationships in the
    ontology

13
Metadata-based ApproachMapping ontological
elements to textual metadata
Domain Specific !!
ltACCRUEgt(ltSENTENCEgt(person.name,

ltPHRASEgt(ltInputgt)),
ltSENTENCEgt(person.name,
ltSTEMgt(appointed),
ltPHRASEgt(ltInputgt)),
ltSENTENCEgt(person.name,
ltSTEMgt(become),
ltPHRASEgt(ltInputgt)))
ltACCRUEgt(ltSENTENCEgt(person.name,
ltSTEMgt(leader),

party.name),
ltSENTENCEgt(person.name,
ltSTEMgt(representing),
party.name))
Parameterization !!
14
Metadata-based ApproachMapping DL queries to
Topic Expressions
has_document from (AND person (FILLS name
Alexandr Shokhin) (FILLS profession Prime
Minister))
ltACCRUEgt( ltTOPICgt(person),
ltPHRASEgt(ltWORDgt(Aleksandr), ltWORDgt(Shokhin)),
ltACCRUEgt(
ltSENTENCEgt(ltPHRASEgt(ltWORDgt(Aleksandr),
ltWORDgt(Shokhin)),

ltSTEMgt(appointed),
ltPHRASEgt(ltWORDgt(Prime), ltWORDgt(Minister))), ltSE
NTENCEgt(ltPHRASEgt(ltWORDgt(Aleksandr),
ltWORDgt(Shokhin)),
ltSTEMgt(becomes), ltPHRASEgt(ltWORDgt(Prim
e), ltWORDgt(Minister)))))
15
(No Transcript)
16
Metadata-based ApproachUsing DL expressions to
reason about information
Query hasDocument for (FILLS hasOrganization
USGS))
- Reasoning with DL Expressions - Ontological
Inferences - DocumentConcept -
(hasOrganization, USGS )
Challenge 1 Use of Multiple Ontologies Challenge
2 Estimating the Loss of Information
17
OBSERVER Ontology-based System Enhanced with
(terminological) Relationships for Vocabulary
hEterogeneity Resolution
...
IRM
Ontologies
User Query
Mappings/ Ontology Server
Inter-ontologyRelationships
Query Processor
Query Processor
Query Processor
Mappings/ Ontology Server
Mappings/ Ontology Server
...
...
Ontologies
Ontologies
...
Repositories
Repositories
18
Controlled and Incremental Query Expansion to a
new Ontology
Query Construction
Local Ontology
Yes
No
END
19
Bibliography Data Ontology The Red Ontology
Conference
Agent
Person
Organization
Author
Publisher
University
Thesis
Periodical-Publication
http//www-ksl.stanford.edu/knowledge-sharing/onto
logies/html/bibliographic-data/
20
A subset of WordNet 1.5 The Blue Ontology
http//www.cogsci.princeton.edu/wn/w3wn.html
21
Inter-ontological relationships
  • Synonyms
  • leads to semantics preserving translations
  • Hyponyms/Hypernyms
  • lead to semantics altering translations
  • typically results in loss of recall and precision
  • List of Hyponyms
  • technical-manual hyponym manual
  • book hyponym book
  • proceedings hyponym book
  • thesis hyponym book
  • misc-publication hyponym book
  • technical-reports hyponym book
  • press hyponym periodical-publicatio
    n
  • periodical hyponym periodical-publicatio
    n

22
Ontology Integration and Query Rewriting
union(Journal, union(Book, Proceedings, ...,
Misc-Publication)), union(Periodical-Publication,
union(Book, ....., Misc-Publication)),
Document
Journal, Periodical-Publication
union(Book, Proceedings, ..., Misc-Publication)
Technical-Manual
GuideBook
23
Intensional Loss of Information
  • Original Query
  • NAME PAGES for (AND BOOK (FILLS CREATOR Carl
    Sagan))
  • Modified Query
  • NAME PAGES for (AND document (FILLS
    doc-author-name Carl Sagan))
  • Terminological Relationships
  • BOOK ? (AND PUBLICATION (ATLEAST 1 ISBN))
  • PUBLICATION ? (AND document (ATLEAST 1
    PLACE-OF-PUBLICATION))
  • Terminological Difference
  • (AND (ATLEAST 1 ISBN) (ATLEAST 1
    PLACE-OF-PUBLICATION))
  • Loss of Information
  • Instead of books authored by Carl Sagan, OBSERVER
    returns those documents by Carl Sagan that may
    not have an ISBN or may not have been published

24
Intensional Loss of Information Disadvantages
and Advantages
  • May not make sense as it mixes two vocabularies,
  • e.g., does Book - Book make any sense ?
  • The problem becomes worse if the two ontologies
    are in different languages,
  • e.g., English and Italian
  • Makes it hard for the system to differentiate
    between the various alternatives
  • On the other hand
  • An information loss interval doesnt make much
    sense to the user.

25
Estimating Loss of Information based on Term
Extensions
Loss in Precision
Loss in Recall
Ext(Term)
Ext(Translation)
Precision Ext(Term) ? Ext(Translation)
Ext(Translation)
Recall Ext(Term) ? Ext(Translation)
Ext(Term)
Percentage Loss Ext(Term) ?
Ext(Translation)
Ext(Term) Ext(Translation)
26
Estimating Term Extension Intervals
  • Intersections
  • Ext(Expr1) ? Ext(Expr2).low 0
  • Ext(Expr1) ? Ext(Expr2).high min
    (Ext(Expr1).high, Ext(Expr2).high)
  • Unions
  • Ext(Expr1) ? Ext(Expr2).low max
    (Ext(Expr1).low, Ext(Expr2).low)
  • Ext(Expr1) ? Ext(Expr2).high
    Ext(Expr1).high Ext(Expr2).high
  • Term
  • Ext(Term).high Ext(Term).low Ext(Term)

27
Estimating Intervals of Information Loss
  • Intervals of Precision and Recall
  • Precision.high, Precision.low
  • Recall.high, Recall.low
  • Leads to Intervals of Information Loss

28
Comparison of two translations
  • Consider two translations
  • Trans1 with bounds low1 and high1
  • Trans2 with bounds low2 and high2
  • Choosing the appropriate translation.
  • Compute mLossi (lowi highi)/2
  • if mLoss1 lt mLoss2, choose Trans1
  • if mLoss2 lt mLoss1, choose Trans2
  • if mLoss1 mLoss2, choose translation with
    lesser interval (highi - lowi)
  • Need for probabilistic models
  • Let (low1, high1) (10, 80) and (low2, high2)
    (20, 60)
  • mLoss2 (40) lt mLoss1 (45) gt Trans2 is chosen
  • However there are cases for which Trans1 returns
    a lower (10 - 20) loss !

29
Semantic Adaptation of Precision and Recall
  • Term subsumes Translation
  • Ext(Translation) ? Ext(Term) ? Ext(Term) ?
    Ext(Translation) Ext(Translation)
  • Precision 1,
  • Recall Ext(Translation)
  • Ext(Term)
  • However Term and Translation belong to different
    ontologies
  • Ext(Term) Ext(Term) ? Ext(Translation)
  • Recall.low Ext(Translation).low

  • Ext(Translation).low Ext(Term)
  • Recall.high Ext(Translation).high
  • max(Ext(Translation)
    .high, Ext(Term)
  • Need to evolve a common framework for relating
    subsumption and information loss

30
Semantic Adaptation of Precision and Recall
  • Translation subsumes Term
  • Analogous (Dual ?) of the previous case
  • Recall 1
  • Precision Ext(Term)
  • Ext(Translation)
  • Cases of no Information Loss
  • Translation of a term by the intersection of its
    immediate parents which is also its definition
  • Translation of a term by the union of its
    immediate children if there exists a covering
    relationship between the two
  • Need for extensional inter-ontological
    relationships
  • e.g., 20 of publications are 50 of books
  • characterizing degree of overlap

31
Computation of Precision and Recall in the
absence of Semantic Relationships
  • Precision
  • Precision.low 0
  • Precision.high max min(Ext(Term),
    Ext(Translation).high),

  • Ext(Translation).high

  • min(Ext(Term), Ext(Translation).low),
  • Ext(Translation).low
  • Recall
  • Recall.low 0
  • Recall.high min(Ext(Term),
    Ext(Translation).high)
  • Ext(Term)

32
Choosing an optimal translationLocal v/s Global
Decision Making
Publication
Document
LOSS(Document, Book)
Document
Document
Journal
Publication
Book
Journal
Book
Journal
LOSS(Publication, Journal)
LOSS(Journal, Book)
LOSS(Document, Publication)
  • Local Decision Making
  • LOSS(Publication, Journal) gt LOSS(Document,
    Publication)
  • Document is chosen as the translation
  • But LOSS(Book, Document) gt LOSS(Book, Journal) !!
  • Global Decision Making
  • Both translations Document, Journal are passed
    on to the next level
  • Journal is chosen as the appropriate translation

33
Loss of Information for Correlated Answers across
Ontologies
New Answeri
Ideal Answer
Ideal Answer
New Answeri
Ideal Answer
Answeri1
Answeri1
  • NewAnsweri Correlated answer from previous
    ontologies (O1, Oi)
  • Answeri1 Answer obtained from new target
    ontology Oi1
  • The following case arise
  • NewAnsweri1 NewAnsweri ? Answeri1
  • Loss(NewAnsweri1) gt Max loss defined by user
  • NewAnsweri and Answeri1 are displayed separately
    to the user with an appropriate warning

34
Conclusions
  • Analysis of the Semantic Web Technology Space
  • Proposed a layered approach for analysis
  • Identified components of the Semantic Web Fabric
  • Re-use of pre-existing real world ontologies
    (off the shelf)
  • Mapping the ontologies to structured and text
    databases
  • Mechanisms for translation of queries across
    different ontologies
  • Approach for adaptation of information loss based
    on semantic relationships
  • Loss of information measures to determine the
    semantic appropriateness of a particular ontology
    and translation
  • The future Semantic Web will be based on browsing
    domain specific ontologies and vocabularies
  • Need to provide critical underlying
    infrastructure based on the above

35
Future Work
  • Extensions to current work
  • Information Extraction from Textual Data
  • Evolve a common framework to relate subsumption
    with loss of information
  • Explore relationships with standards such as SQL,
    XML/RDF based QLs, DAMLOIL
  • Complex probabilistic modeling for ranking
    translations
  • Experimentation and Validation of measures for
    Loss of Information
  • Bootstrapping, Creation, Validation of Semantic
    Knowledge
  • Ongoing work in collaboration with Stanford
    University and University of Georgia (NSF ITR
    Proposal)
  • Use of statistical clustering to determine
    central terms
  • Use of consensus analysis across SMEs to enrich
    terminology and create ontology
  • Use of scalable knowledge composition to re-use
    existing knowledge and support ontology
    interoperation
  • Use of IScapes to specify and validate hypotheses
    and feedback from the process to generate new
    semantic knowledge
  • Interaction of above processes
  • Ontology Maintenance and Versioning
Write a Comment
User Comments (0)
About PowerShow.com