Knowledge Access Semantic technology for KM

About This Presentation

Title:

Knowledge Access Semantic technology for KM

Description:

mobile phone, PDA, blackberry, laptop, ... 12. Knowledge access ... use bold font. Insert an image here. XML content. this part of the document is the product price ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 111

Provided by: Opte

Category:

more less

Transcript and Presenter's Notes

Title: Knowledge Access Semantic technology for KM

1
Knowledge AccessSemantic technology for KM
ACAI 05 SEKT SUMMER SCHOOL ON KNOWLEDGE
TECHNOLOGY

John Davies
BT Research
john.nj.davies_at_bt.com

2
Overview

Introduction to the Semantic Web
Language stack
Semantic Search and Browse
Knowledge Sharing
Natural Language Generation Summarisation
Knowledge Delivery via Device Independence
Quiz!

3
Limitations of the Web today

Machine-to-human, not machine-to-machine

4
The Semantic Web

allowing information to be shared and processed
adding context and structure Tim Berners-Lee
an extension of the current web in which
information is given well-defined meaning, better
enabling computers and people to work in
cooperation
An open platform

5
Semantic Web
The Semantic Web is an extension of the current
web in which information is given well-defined
meaning, better enabling computers and people to
work in co-operation. Berners-Lee et al.,
2001
6
... Semantic Web HISTORY
10.2.2004 Resource Description Framework
(RDF) Web Ontology Language (OWL) become W3C
recommendations
Source http//www.zakon.org/robert/internet/tim
eline/
7
Semantic Web Layers
Entailment of the Implicit
Explicit Semantics
Relational Distributed Data
Data Exchange
8
Where we are Today the Syntactic Web
Hendler Miller 02
9
i.e. the Syntactic Web is

A place where
computers do the presentation (easy) and
people do the linking and interpreting (hard).
Why not get computers to do more of the hard
work?

Goble 03
10
Hard Work using the Syntactic Web

Complex queries involving background knowledge
Find information about animals that use sonar
but are not either bats, dolphins or whales
Locating information in data repositories
Travel enquiries
Prices of goods and services
Results of human genome experiments
Delegating complex tasks to web agents
Book me a holiday next weekend somewhere warm,
not too far away, and where they speak French or
English

11
Motivation Knowledge Management

Knowledge workers are overwhelmed with
information
from intranets, emails, external newslines
but may still lack the information required
They need information identified
by semantics, not just keywords
by their interests and their task context
in a form appropriate to their current physical
context
mobile phone, PDA, blackberry, laptop,

12
Knowledge access

context-aware tools for access to
semantically-annotated knowledge
search, browse, share, summarise
integrated into day-to-day business processes
automatic knowledge delivery based on current
context
activity, location, device, interests
support multiple end-user devices

13
XML is a first step

Semantic markup
HTML ? layout
use bold font
Insert an image here
XML ? content
this part of the document is the product price
this document describes a telecommunications
service

14
XML

ltplaygt
lttitlegtThe Life and Death of King
Johnlt/titlegt
ltDramatis Personaegt
ltpersonagtThe Earl of PEMBROKElt/personagt
ltpersonagtThe Earl of ESSEXlt/personagt
lt/Dramatis Personaegt
ltStagedirgtSCENE England, the
Court.lt/Stagedirgt
ltactgtAct 1
ltscenegtScene I.
ltspeechgt
ltspeakergtJohnlt/speakergt
ltlinegtNow, Chatillon, what would
France with us?lt/linegt
lt/speechgt

15
QuizXML

Standard search engine
WWW pages indexed
maps keywords to WWW pages
QuizXML
A finer-grained index
maps keywords to documents and the XML tags in
which they occur

QuizXML demo

17
XML is a first step

Metadata (with limitations)
within documents, not across documents
prescriptive, not descriptive
No commitment on vocabulary and modelling
primitives (subclass, instance, etc)
ltvehiclegt
ltcargtford
ltenginegtxyz123-4lt/enginegt
ltmodelgtmondeogtlt/mondeogt
lt/cargt
lt/vehiclegt
RDF and ontologies are the next step

18
What are Ontologies?

Ontologies provide a shared and common
understanding of a domain (medicine, finance, )
a shared specification of a conceptualisation
Concept map
A simple example - Yahoo
BusinessEconomy gt Finance gt Banking
for WWW, defined using RDF(S) OWL

19
Taxonomies
Animals
Vertebrates
Invertebrates
..
Insects
Arachnids
Reptiles
Mammals
20
Ontology of People and their Roles
Employee
Expert
Analyst
Manager
Programme Mgr
Project Mgr
21
Structure of an Ontology

Typically two distinct components
Names for important concepts and relationships in
the domain
Elephant is a concept whose members are a kind of
animal
Herbivore is a concept whose members are those
animals who eat only plants
Background knowledge/constraints on the domain
Adult_Elephants weigh at least 2,000 kg
No individual can be both a Herbivore and a
Carnivore

22
Why develop an ontology?

Define web resources more precisely and make them
amenable to machine processing
Make domain assumptions explicit
Easier to change domain assumptions
Easier to understand and update legacy data
Separate domain and operational knowledge
Re-use separately
A community reference for applications
To share a consistent understanding of what
information means

23
Ontologies - Some Examples

General purpose ontologies
The Upper Cyc Ontology, http//www.cyc.com/cyc-2-1
/index.html
IEEE Standard Upper Ontology, http//suo.ieee.org/
Domain and application-specific ontologies
RDF Site Summary RSS, http//groups.yahoo.com/grou
p/rss-dev/files/schema.rdf
Dublin Core, http//dublincore.org/
UMLS, http//www.nlm.nih.gov/research/umls/
Open Biological Ontologies http//obo.sourceforge
.net/
FOAF www.foaf.org
Ontologies in a wider sense
Agrovoc, http//www.fao.org/agrovoc/
UNSPSC, http//eccma.org/unspsc/
DAML.org library http//www.daml.org/

24
Ontology and Logic

Reasoning over ontologies
Inferencing capabilities
X is author of Y ? Y is written by X
X co-wrote D Y co-wrote D ?
X and Y collaborate
Cars are a kind of vehicle
Vehicles have 2 or more wheels ?
Cars have 2 or more wheels

25
RDF and RDF-S

W3C standards
RDF-S defines the ontology
classes and their properties and relationships
There are books and authors. Authors write books.
RDF defines the instances of these classes and
their properties
Mark Twain is an author
Mark Twain wrote Adventures of Tom Sawyer
Adventures of Tom Sawyer is a book

26
An example RDF Schema
Annotation of WWW resources and semantic links
domain
range
Writer
Book
hasWritten
subClassOf
FamousWriter
type
Schema(RDFS)
Data(RDF)
25/12/68
type
DoB
hasWritten
/twain.com/mark
books.com/ISBN00010475
27
RDF
hasName (http//www.famouswriters.org/twain/mark
, Mark Twain) hasWritten (http//www.famousw
riters.org/twain/mark, http//www.books.org/ISB
N00001047582) title (http//www.books.org/ISBN0
0001047582, The Adventures of Tom
Sawyer) XML version ltrdfDescription
rdfabouthttp//www.famouswriters.org/twain/markgt
ltshasNamegtMark Twainlt/shasNamegt ltshasWritten
rdfresourcehttp//www.books.org/ISBN0001047/gt lt
/rdfDescriptiongt
28
QuizRDF

Searching RDF-annotated web resources

29
RDF metadata annotations
Data (WWW document)
Annotation (metadata)
Lost information

Subjective
One of several interpretations
Not exhaustive

RDF
30
RDF as an Enrichment
Text
Annotation
RDF
Text
31
Precision and recall - the IR dilemma

Trade-off between precision and recall
recall - how many of relevant were found
precision - how many of found were relevant
Holy grail high precision high recall
QuizRDF offers both
separately
closely-coupled

32
Indexing data model
33
Multidimensional Indexing

Traditional search engine indexing
term ? documents
employee ? URI1, URI3, URI9
miller ? URI3, URI7
QuizRDF indexing
ltliteral,class,propertygt ? URIs
ltgeorge, Employee, first_namegt ? URI2
ltmiller, Employee, last_namegt ? URI1, URI3
ltmiller, Employee, ?gt ? URI1, URI3, URI7

34
QuizRDF demo
35
Two Retrieval Channels
Browser interface
Keyword query
RQL

Precise
Machine readable
Subjective
Incomplete
Higher precision

Original content
Complete
Imprecise
Higher recall

36
Contribution

Combination of
User familiar keyword search
More precise RDF querying
Data and metadata as complementary
Low threshold, high ceiling
Works on non-RDF information
Exploits RDF where it exists
Integrates browsing and querying
Fits users info seeking behavior

37
Conclusions about RDF(S)

Next step up from plain XML
(small) ontological commitment to modeling
primitives
possible to define domain vocabulary
limited reasoning
subsumption, but no transitivity, symmetry,
limited expressive power
no cardinality constraints, equality,
disjointness,

38
Web Ontology Language Requirements

Desirable features identified for Web Ontology
Language
Extends existing Web standards
Such as XML, RDF, RDFS
Easy to understand and use
Should be based on familiar KR idioms
Formally specified
Of adequate expressive power
Possible to provide automated reasoning support

39
OWL Language

OWL is based on Description Logics knowledge
representation formalism
OWL (DL) benefits from many years of DL research
Well defined semantics
Formal properties well understood (complexity,
decidability)
Known reasoning algorithms
Implemented systems (highly optimised)
Three species of OWL
OWL Full maximum expressivity, undeciable
OWL DL based on SHIQ DL, decidable
OWL Lite - subset of OWL DL, most efficient
reasoning

40
Why OWL?

OWL Web Ontology Language
Owls superior intelligence is known throughout
the Hundred Acre Wood, as are his talents for
Writing, Spelling, other Educated and Special
tasks.
"My spelling is Wobbly. It's good spelling, but
it Wobbles, and the letters get in the wrong
places."

41
QuizOWL!
42
Re-cap

XML, RDF, OWL language stack
Increasingly sophisticated search
QuizXML
subdocument searching
QuizRDF
browsing by concept and across relations
searching on metadata and full-text
Next steps in semantic search
identification of named entities within documents
Exploitation of world knowledge
KIM (Ontotext)

43
The KIM Platform

A platform offering services and infrastructure
for
(semi-) automatic semantic annotation
ontology population
semantic indexing and retrieval of content
query and navigation
Based on an Information Extraction technology
Aim to underpin Semantic Web applications
by providing a metadata generation technology
in a standard, consistent, and scalable framework

44
Ontologies
http//proton.semanticweb.org/

PROTON - a light-weight upper-level ontology
250 NE classes
100 relations and attributes
covers mostly NE classes, and to a smaller degree
general concepts

45
Ontologies II
46
KIM World KB

Aims to cover the most popular entities in the
world
Entities of general importance like the ones
that appear in the news
KIM knows about
Organizations, all important sorts of business,
international, political, government, sport,
academic
Specific people, (e.g. Politicians)
Locations countries, regions, cities, roads,
etc.

47
KIM World KB Content

Collected from various sources, like geographical
and business intelligence gazetteers.
KIM also learns from documents indexed
via GATE information extraction
KB scale
RDF Statements Small KB Full KB
- explicit 444,086 2,248,576
- after inference 1,014,409 5,200,017

48
KIM Scaling on Data

The Semantic Repository is based on Sesame/OWLIM.
Our practical tests demonstrate a perfect
performance on top of
1.2M entity descriptions
about 15M explicit statements
above 30M statements after forward chaining.
Fulltext indexing with Lucene
.5M docs, retrieval in milliseconds

49
Semantic Annotation
50
Simple Usage Highlight, Hyperlink, and
51
Simple Usage Explore and Navigate
52
People search for People

A recent large-scale human interaction study on a
personal content IR system, carried out by
Microsoft demonstrated that
The most common query types in our logs were
People/places/things, Computers/internet and
Health/science. In the People/places thing
category, names were especially prevalent. Their
importance is highlighted by the fact that 25 of
the queries involved peoples names ... . In
contrast, general informational queries are less
prevalent.

53
Semantic Queries

The standard IR query is
give me documents that contain the words
company, Europe, telecommunication
KIM provides indexing retrieval wrt NEs
More precise specification and satisfaction of
information needs
specify the NEs we are interested in, and to
restrict them by their attributes and relations
Give me documents that mention a company in
Europe from the telecommunications industry
sector

54
Precision in Semantic Search

KIM can match
a query Documents concerning a telecom company
in Europe, John Smith, and a date in the first
half of 2002.
With a document containing At its meeting on
the 10th of May, the board of Vodafone appointed
John G. Smith as CTO"
Classical IR cannot do the required reasoning
Vodafone is a mobile operator, which is a kind of
telecom company
Vodafone is in the UK, which is a part of Europe.
5th of May is a "date in first half of 2002
John G. Smith matches John Smith.

55
Entity Pattern Search
56
Pattern Search Entity Results
57
Entity Pattern Search KIM Explorer
58
Predefined Pattern Search
59
Pattern Search Multiple-Entity Results
60
Pattern Search, Referring Documents
61
Document Details
62
KIM - summary

KIM is a platform for
semantic annotation,
ontology population,
semantic indexing and retrieval,
providing an API for remote access and
integration,
based on Information Extraction (IE) using mature
HLT (GATE).
powered by massive world knowledge
http//www.ontotext.com/kim

63
SEKTAgent

Periodic agent search for named entities
e.g. a person in an organisation
Returns relevant documents and metadata
Proactive knowledge delivery
Linked to device indepedence module (see later)
Based upon KIM architecture
Result-led indexing
Adds relevant pages to next crawl list

64
SEKTAgent demo
65
TAP

Uses Google for traditional search
Augments results with relevant data aggregated
from distributed (and semantically annotated)
data
Offers distributed query interface

66
TAP
tap.stanford.edu for more information
67
Swoogle

Searching for semantic web documents and
ontologies
See swoogle.umbc.edu

68
Google vs. Swoogle

How to find a popular ontology that defines the
concept of person?
Ask Google?
Type Person filetyperdf
Type Person filetypeowl
More complicated query person rdfsClass
filetyperdf
Ask Swoogle?
Type person in document search
1 http//xmlns.com/foaf/0.1/index.rdf

69
Find Time Ontology
We can use a set of keywords to search ontology.
For example, time, before, after are basic
concepts for a Time ontology.
70
Beyond search, beyond documents

a long list of documents is rarely the ultimate
information need of the end user
theres too much relevant information!
support for the next step - the analysis of the
returned information
e.g. key points on a topic from a large document
you dont want to read
e.g. creation of a digest of information from
multiple documents about Bushs statements on a
given topic

71
Search Engine trends
markets

Seamless and integrated
one search engine for Web and desktop
implicit queries based on user activity
Personalisation
based on user interaction
Beyond document lists
sub-document analysis
Taxonomies and classification
taxonomy / enterprise search growing at 10 p.a.
Ontologies and semantic annotation
A coherent approach to all these issues

72
Knowledge Sharing

Sharing knowledge through an organisation
learning from success and failures of others
avoiding duplication of effort
(Virtual) communities of practice
Groups with shared interests who will benefit
from collaboration and sharing knowledge
(Using WWW technology to increase collaborative
radius)

73
Communities the Semantic Web

Communities require a shared conceptual
vocabulary
Consensual, evolving concept map
Ontologies!
OntoShare
automates sharing of knowledge in an
organisation via community-based RDF(S) ontologies

74
OntoShare

Sharing and Classifying resources according to an
Ontology
Informs users when relevant document added to
store
Ontology-based personalisation
Provides knowledge store for browsing and
searching

75
(No Transcript)
76
OntoShare Sharing knowledge

User shares knowledge
WWW document
Any textual data
Can supply annotation

77
OntoShare Sharing knowledge

System automatically extracts keywords summary
System assigns knowledge to concepts

78
OntoShare Sharing knowledge

System emails an alert to selected users based on
match to user profile

79
OntoShare Evolving Ontologies

OntoShare automatically suggests changes to
concept characterisation
Concept characterisations evolve over time

80
OntoShare Evolving Ontologies

User can suggest new concepts for ontology at any
time
System emails community on suggestion (à la
Usenet) and counts votes

81
Finding People Collaboration

Use of personal profiles
Who else is interested in this document?
Who else is interested in this topic?
Encouraging exchange of tacit knowledge
Discussion threads around shared knowledge
Adding value to the knowledge stored

82
SWAP Semantic Web and Peer-to-Peer

Distributed Knowledge Management
Different participants with different
conceptualizations of their domain
Different knowledge sources
Physically distributed, dynamic environment
Peer-To-Peer Approach
Decentralized nature Local control
Symmetry Everyone is provider and consumer
P2P networks as a reflection of social networks
Flexible collaboration beyond hierarchical
structures

83
Case Study The Bibster System

Scenario Sharing of bibliographic metadata in a
Peer-to-Peer network
Bibliographic metadata is created and maintained
in a decentralized manner,
Researchers are willing to share their data
Use of semantics is crucial in this setting
The Bibster system allows users to
Easily share bibliographic data
Save work in finding this data
Avoid re-typing this data by hand

84
Semantic Methods in Bibster

Semantic representation and querying of metadata
Extraction and classification from e.g. BibTeX
files
Semantic Web Research Community Ontology andACM
Topic hierarchy as light-weight ontologies
Peer selection using semantic topologies
Scalability requires intelligent query routing
Semantic descriptions of peers expertise as
basis for peer selection
Semantic duplicate detection
Highly redundant and inconsistent representation
of bibliographic metadata
Semantic similarity measures to detect duplicates

85
Bibster Screenshot
Open Source http//bibster.sourceforge.net/
86
NLG - Summarisation

NLG takes as input structured data in a knowledge
base or ontology and produces natural language
text
Applied to provide automatic documentation of
ontologies or generate textual reports from
formal knowledge
Keeps texts constantly up-to-date so they reflect
changes in the ontology
OntoSum, University of Sheffield

87
The Property Hierarchy

Special linguistically-motivated properties
introduced to make the NLG modules more generic
active-action (e.g. works-for)
passive-action (e.g., published-by)
Attribute (e.g. has-age, has-web-address)
part-whole (e.g., consists-of)
All properties from the ontology were made
sub-properties of one of these 4
Attribute properties recognised using heuristics,
such as property name starts with has
(hasWebPage)

88
Summary Structuring

Capture regular patterns can be applied
recursively
Describe-Instance -gt Describe-Attributes, Descri
be-Part-Whole, Describe-Active-Actions, Describe
-Passive-Actions
Describe-Attributes -gt
attribute(Instance, Attribute),
Describe-Attributes
Collect all subproperties of Attribute property
relating to Instance
Attribute(John, hasMobileNumber)..

89
Ontology-Based Aggregation

Joining attribute and part-whole properties with
the same first argument to have more coherent
sentences
ATTR(Researcher XXX, Appellation
Dr)ATTR(Researcher XXX, string
my_email_at_sheff)ATTR(Researcher XXX, string
012344567)ATTR(Researcher XXX, string
www.mypage.ac.uk)
Without aggregationKalina Bontcheva has a Dr
appellation. Kalina Bontcheva has email
my_email_at_shef.com. Kalina Bon
With aggregationKalina Bontcheva has a Dr
appellation, email my_email_at_shef.com and

90
Lexicalisation of Classes Properties

3 options
Specified by ontology engineer
Same as concept/property name
Added manually when parameterising OntoSum

91
Description of HSBC
Financial Institution
Person
Organisation
Bank
lendsTo
lendsTo
HSBC
employees
market-cap
43bn
137000
92
Description of HSBC
93
Innovative aspects

Can tailor summary to device profile
Apply length restriction
e.g. for text message for mobile phone
Generate HTML for web browser or plain text for
email
See device independence (next!)
Readability heuristics
introduce lists when verbalising more than 3
attributes
Use of ontology mapping rules to run same system
on multiple ontologies

94
Related work

Wilcock (Helsinki)
Fully automatic, no lexicon
Talking OWLs, ISWC-03
MIAKT
Some manual input
More effort, more fluency
OntoSum based on MIAKT
Bontcheva, NLDB04

95
OntoSum demonstration
96
Device Independence

context-aware tools for access to
semantically-annotated knowledge
search, browse, share, summarise
integrated into day-to-day business processes
automatic knowledge delivery based on current
context
activity, location, device, interests
support multiple end-user devices

97
Device independence

3 approaches
Hand-craft different sites for different devices
Labour intensive, difficult to maintain
Extend HTML to describe interaction, navigation
and selection
Server software generates output in suitable
format using CC/PP
Inflexible difficult to control output
precisely
No support for large volume sites
Unclear what extensions are necessary and
sufficient
SEKT approach
Use templates to format data content appropriate
for each class of device
Fine control of output based on CC/PP profiles
can handle large volumes of structured data -
XML databases
device-dependencies coded in the templates, e.g.
mouse capability

98
Device Profiles in RDF

CC/PP - W3C RDF standard for describing device
characteristics
CC/PP vocabularies define device components and
component attributes
UAProf is an application of CC/PP adopted by many
terminal device manufacturers
An ontology of devices inheritance and
specialisation
Profile references and Profile Diffs are sent
with an information request
javax.ccpp package for processing profiles

99
User Profiles

Effective presentation must take user preferences
accessibility issues into account
Font size
Colour preference
Hi res/Lo res
Device characteristics and preference/
accessibility requirements need to be combined
Effective screen size depends on both physical
size and user preferences (e.g. font size)
Specialisation/extension of UAProf

100
Profile Engine

The Profile engine combines device and user
profiles to generate a set of conditions
The engine can be queried by other applications
PROLOG is being used as a prototyping language
Arithmetic calculations of effective screen size
(for example) require more than RDF/OWL
DL (DIG) interface to SWI-Prolog

101
Content Adaptation

The content adaptation engine uses conditions
generated by profile engine queries
Example conditions
Screen size x font size ?
number of characters of text
GraphicsSupported?
Colour or BW
Device characteristic or
Accessibility issue

102
Content Generation

Different content must be generated for different
devices
The current context (set of conditions) will be
made available to SEKT applications
Natural Language Processing techniques are be
used to generate or modify information
Mobile phone 400 character text message
PC multimedia document
NLG describing ontology-based knowledge in
natural language (OntoSum!)

103
Device Independence

A functional presentation of a resource should be
available via any suitable device
Requirements include content selection, layout
transformation and style selection
At present, no one language can be interpreted by
all clients
It follows that content must be formatted for the
target device on the server

104
Templates

Declarative templates are used to format the
(XML-based) data
Context (conditions) can be used to select
templates, and sections within templates
Template 1 WML
InputEnabled?
Template 2 HTML
GraphicsWanted?
Separation of data storage, processing and
display
W3C working group on device independence
No standard for templates (yet)

105
Overview
UAProf (RDF(S))
Device Properties
Context
User preferences
Repurposed Information
Raw Information
Profiling engine
Content Adaptation
(syntactic semantic)
106
Device Independence demo
107
Device Independence Summary

Device and User profiles need to be combined
using a suitable ontology
A profile reasoning engine is used to generate
conditions on the format
Content can be generated according to the context
(set of conditions)
NLP techniques can be used to generate/summarise
text (semantic)
Templates are used to transform the results to a
format suitable for the device at hand (syntactic)

108
Conclusion