Title: Diapositiva 1
1OKKAM AN ENTITY NAME SYSTEM (ENS) FOR THE
SEMANTIC WEB FP7-215032 Large-scale Integrating
Project
Paolo Bouquet University of Trento, Italy Project
Coordinator
Semantic Interoperability A Practical
Approach Berlin, 25 March 2008
2Factsheet
3The OKKAM consortium
- University of Trento, IT
- L3S Hannover, DE
- SAP Research, DE
- Elsevier, NL
- Expert System, IT
- Europe Unlimited, BE
- MAC, IE
- EPFL, CH
- DERI Galway, IE
- University of Malaga, SP
- INMARK, SP
- ANSA, IT
5 universities
4 SMEs
3 corporations
4Background a vision for the Semantic Web
- Knowledge representation is a field which
currently seems to have the reputation of being
initially interesting, but which did not seem to
shake the world to the extent that some of its
proponents hoped. - It made sense but was of limited use on a small
scale, but never made it to the large scale. This
is exactly the state which the hypertext field
was in before the Web. -
- The Semantic Web is what we will get if we
perform the same globalization process to
Knowledge Representation that the Web initially
did to Hypertext. We remove the centralized
concepts of absolute truth, total knowledge, and
total provability, and see what we can do with
limited knowledge. - Tim Berners-Lee, What the Semantic Web can
represent, 1998
5In practice
Semantic Web
Sonia
Knows
Paolo
Is_involved_in
Works-for
UniTN
Coordinates
UniMORE
Wisdom
Works-for
Web_page
Web_page
Web_page
www.unitn.it
www.trento.it
www.ryanair.com
href
href
href
www.l3s.org
href
href
www.paolobouquet.net
href
href
href
www.google.com
ockham.org
Hypertext Web
6The Semantic Web as a global K space
- The Semantic Web way to information integration
- People will create and publish collections of RDF
statements about any resources (documents,
people, locations, events, topics, ). - If information is contained in e.g. databases,
they can be exported to RDF with some additional
work. - Any collection of RDF triples defines a graph,
whose nodes are resources and arcs are relations
between resources. - The meaning of relations can be defined in
vocabularies/ontologies - Different RDF graphs can be merged by
collapsing nodes and relations with identical
identifier (URI). - The expected outcome is a smooth information
integration across distributed information
sources (a global distributed space of knowledge)?
7A Web-like scenario
London
8London!?
- Did you mean
- London, KY
- London, Laurel, KY
- London, OH
- London, Madison, OH
- London, AR
- London, Pope, AR
- London, TX
- London, Kimble, TX
- London, MO
- London, MO
- London, London, MI
- London, London, Monroe, MI
- London, Uninc Conecuh County, AL
- London, Uninc Conecuh County, Conecuh, AL
- London, Uninc Shelby County, IN
- London, Uninc Shelby County, Shelby, IN
- London, Deerfield, WI
- London, Deerfield, Dane, WI
- Or
- London, Jack2612 Almes DrMontgomery, AL(334)
272-7005 - London, Jack R2511 Winchester RdMontgomery, AL
36106-3327(334) 272-7005 - London, Jack1222 Whitetail TrlVan Buren, AR
72956-7368(479) 474-4136 - London, Jack7400 Vista Del Mar AveLa Jolla, CA
92037-4954(858) 456-1850 - ...
9Entity-centricIM
10An RDF scenario
hotel.it
I-ESA2008.org
Marriot Hotel
built_in
located_in
I-ESA
Berlino
2000
held_in
attends
part_of
bouquet
Berlin
Germania
placed_in
rdftype
Germany
country
11An example
Mariott Hotel
I-ESA
built_in
located_in
held_in
Berlin
2000
attends
part_of
Germany
bouquet
rdftype
country
Query find me an a hotel located where I-ESA is
held in 2008
12Knowlegde integration in the global space
- To implement this scenario, however, we need to
address and solve two serious integration
problems - Schema-level heterogeneity matching different
vocabularies/ontologies so that equivalent (or
related) classes and properties can be put in
relation or collapsed - Instance-level mismatch resolving identities
between instances, so that all statements about
the same entity can be found and used
13State-of-the-art
- Schema-level integration is well-studied and
supported (see ontology matching methods and
tools) - See presentations today!
- Instance-level integration is mostly neglected,
perhaps because most people believe it is an easy
(or at least easier) problem, mainly
technological and not worth investigating for
researchers.
14And indeed
- the situation of entity-level integration is
currently as follows - The RDF metadata of some of the most important
WWW and Semantic Web conference are poorly
integrated at the instance (entity) level - FOAF profiles rely on ad hoc methods for
identifying people (and dont identify much else) - The most common available ontology editors (e.g.
Protégé) or metadata management systems (e.g. in
digital libraries) generate local URIs for any
newly created instance - Some efforts exist to create reusable URIs (e.g.
LSID, DOI), but are very vertical or commercial - URI retrieval and reuse in general is not well
supported
15What we have today
- lthttp//www.w3.org/RDF/Validator/run/1198080996828
megt lthttp//xmlns.com/foaf/0.1/namegt "Heiko
Stoermer" - lthttp//www.w3.org/RDF/Validator/run/1198080996828
megt lthttp//xmlns.com/foaf/0.1/knowsgt
_jA64256 - _jA64256 lthttp//xmlns.com/foaf/0.1/namegt "Paolo
Bouquet" - lthttp//www.owl-ontologies.com/Ontology1198072971.
owlPerson_2gt lthttp//www.owl-ontologies.com/Ontol
ogy1198072971.owlNamegt "Paolo
Bouquet"lthttp//www.w3.org/2001/XMLSchemastring
gt - lthttp//www.owl-ontologies.com/Ontology1198072971.
owlPerson_2gt lthttp//www.owl-ontologies.com/Onto
logy1198072971.owlattendedgt lthttp//www.owl-onto
logies.com/Ontology1198072971.owlConference_1gt - lthttp//www.owl-ontologies.com/Ontology1198072971.
owlConference_1gt lthttp//www.owl-ontologies.com/O
ntology1198072971.owlNamegt "SWAP2007"lthttp//ww
w.w3.org/2001/XMLSchemastringgt .
16What we would like for tomorrow
- lthttp//www.okkam.org/entity/ok5f23a5cea6834c4dae7
3b78cdc17aec1gt lthttp//xmlns.com/foaf/0.1/namegt
"Heiko Stoermer" - lthttp//www.okkam.org/entity/ok5f23a5cea6834c4dae7
3b78cdc17aec1gt lthttp//xmlns.com/foaf/0.1/knowsgtlt
http//www.okkam.org/entity/ok20070630118579125205
6gt - lthttp//www.okkam.org/entity/ok2007063011857912520
56gt lthttp//xmlns.com/foaf/0.1/namegt "Paolo
Bouquet" - lthttp//www.okkam.org/entity/ok2007063011857912520
56gt lthttp//www.owl-ontologies.com/Ontology119807
2971.owlNamegt "Paolo Bouquet"lthttp//www.w3.or
g/2001/XMLSchemastringgt - lthttp//www.okkam.org/entity/ok2007063011857912520
56gtlthttp//www.owl-ontologies.com/Ontology1198072
971.owlattendedgt lthttp//www.okkam.org/entity/ok
asdf12191185791252056gt - lthttp//www.okkam.org/entity/okasdf121911857912520
56gtlthttp//www.owl-ontologies.com/Ontology1198072
971.owlNamegt"SWAP2007"lthttp//www.w3.org/2001/
XMLSchemastringgt
17The OKKAM approach from DNS to ENS
- By analogy with the Domain Name System (DNS) for
the Web, we need something like an Entity Naming
System (ENS) for the Semantic Web - Instead of resolving domain names into IP
addresses, the ENS will resolve local entity
names (or descriptions) into their global unique
identifiers (URI) - It can be made available (through public APIs and
protocols) as a service which can be invoked by
any content creation and management application
18ENS-empowered applications
- Applications which can interact with the ENS for
- Creating new content (in any format) in which
entities are annotated with global IDs - Offering smart services based on the availability
of global IDs - Examples text processors, HTML/XML editors,
RDF/OWL editors, filters for exporting relational
DBs, metadata management tools,
19An example word processing
I-ESA2008 in Berlin
ENS
I-ESA2008 in ltidgtBerlinlt/idgt
20What counts as an entity in OKKAM?
- OKKAM is about instances (individuals, objects),
not about universal resources (e.g. classes or
properties)? - Why? Because schemas embody viewpoints, entities
dont expect objections on this! - For example
- Michael Schumacher, his web page, Ferrari,
Maranello, Italy are all entities - Pegasus, p, ?2, are entities
- Computer_science as a topic may be an entity
(borderline )? - Pizza Margherita in a food ontology ??
- La Divina Commedia, MS Word, VW Golf are
tricky entities - The class Person and the property
Has_email_address are not entities - Types of entities to start with people,
locations, organizations, events.
21The steps of OKKAM
- First, creating a scalable and sustainable
infrastructure ENS for the Semantic Web, which
will enable in practice a space of globally
unique reusable identifiers (e.g. URIs) - Second, bootstrapping the ENS by populating it
with billions of entities and by making available
tools for creating OKKAMized content - Third, showcase the benefits of the approach by
building high-impact applications and services
which take advantage from this infrastructure to
do something that was not possible before
22Research challenges
- Infrastructure
- Designing and implementing a scalable,
distributed and secure storage service - Providing reliable and efficient methods for
entity matching - Providing methods for managing the lifecycle of
entities - Bootstrapping the Web of Entities
- Developing methods for automatically populating
the repository of the ENS - Supporting the OKKAMization of a critical mass of
content and data - Make available simple ENS-Empowered tools for end
users - Ensuring trust through secure and
privacy-respectful operations
23OKKAM applications
- In the project, there are three application
scenarios - Enterprise knowledge management systems Product
Lifecycle Management (with and within SAP) - Entity-intensive content production supporting
the creation of news (ANSA) and scientific
articles (Elsevier) in an OKKAM-empowered
environment - Entity-centric semantic search engine making
search a different experience (DERI Ireland)
24Expected advantages
- Improving data-level integration of information
- Automatic merging of knowledge from different
sources on the Web - Integration of corporate knowledge across
formats, systems, applications - Enabling distributed queries on multiple sources
- Web2.0
- Improving the quality of tagging of multimedia
material - Making social network portable across multiple
platforms - New tools for navigating/searching
- entity-centric semantic search
- entity-centric browser or explorers
- Vertical application domains
- business intelligence
- publishing news
- knowledge management
- ...
25Conclusions
- OKKAM supports entity-centric semantic
integration and interoperability - First, find the entities you are interested it
- Second, collect all statements about them
- Third, clean results through schema/vocabulary
mapping - It provides a simple but powerful way of
integrating data without starting from
schemas/vocabularies - It enables the vision of the Semantic Web as an
open, decentralized, virtually integrated
knowledge base sketched by TBL in the initial
quote.
26But why OKKAM?
- Ockham's Razor (14 century)
- entities should not be multiplied beyond
necessity - OKKAMs Razor (21 century)
- entity identifiers should not be multiplied
beyond necessity
27fp7.www.
.org