Title: DB Lunch @ Berkeley 10.28.05
1DB Lunch _at_ Berkeley 10.28.05
- Semantic Interoperability in Large Scale
Heterogeneous Networks - Philippe Cudré-Mauroux, EPFL
- Joint work with
- Karl Aberer (advisor _at_ EPFL)
- Manfred Hauswirth (Semantic Gossiping)
- T. van Pelt, L. Zhou A. Feher (Implementation)
2Overview
- Motivation
- Picture Sharing in Decentralized Settings
- Decentralized Data Integration
- Peer Data Management Systems
- Probabilistic Message-passing
- Aspects of self-organization
- Studying semantic interoperability in the large
- Applications
- GridVine
- PicShark
- Conclusions
31. Motivation Picture Sharing
- Profusion of Digital Images
- Variety of powerful devices
- gigabytes of pictures is the new norm
- Most of the images are kept local
- Some are shared
- Mostly point-to-point
- Primitive search capabilities
4Opportunity
- More and more software use metadata to organize
images locally - (Semi) Structured metadata (e.g., XML, PSA)
- Ontological metadata (e.g., RDF, XMP)
- Type-based metadata (e.g., WinFS)
lt?xpacket begin'' id'W5M0MpCehiHzreSzNTczkc9d'?gt
ltxxapmeta xmlnsx'adobensmeta/'gt ltrdfRDF
xmlnsrdf 'http//www.w3.org/1999/02/22-rdf-synt
ax-ns'gt ltrdfDescription about''
xmlnsxap'http//ns.adobe.com/xap/1.0/'gt
ltxapCreateDategt2001-12-19T184903Zlt/xapCreateDa
tegt ltxapModifyDategt2001-12-19T200928Zlt/xap
ModifyDategt ltxapCreatorgt John Doe
lt/xapCreatorgt lt/rdfDescriptiongt
5Hurdle Metadata Heterogeneity
- Why not taking advantage of those metadata in a
distributed setting? - X Syntactic discrepancies
- X Semantic heterogeneity
- All the aforementioned standards are extensible
- Shared representation is not enough
ImageGUID cDate
A0657B25 05.08.04
109E7A25 05.08.04
VS
ltescDategt 05/08/2004 lt/escDategt
ltrdfProperty rdfIDLength-Y"gt
ltrdfslabelgtLength-Ylt/rdfslabelgt
ltrdfssubPropertyOf rdfresource"length"/gt lt/rdf
Propertygt
ltrdfProperty rdfID"width"gt
ltrdfslabelgtWidthlt/rdfslabelgt
ltrdfssubPropertyOf rdfresource"length"/gt lt/rdf
Propertygt
VS
6Beyond Keyword Search
- searching semantically richer objects in large
scale heterogeneous networks
ltxapCreateDategt2001-12-19T184903Zlt/xapCreateDa
tegt ltxapModifyDategt2001-12-19T200928Zlt/xapModi
fyDategt
date?
ltesDofCreationgt 05/08/2004 lt/esDofCreationgt
?
?
?
?
?
ltmyRDFDategt Jan 1, 2005 lt/myRDFDategt
72. Decentralized Semantics
- Traditional database techniques (e.g., LAV/GAV)
rely on centralized schemas to integrate data
sources - Not applicable to our context
- Scale (upper ontologies?)
- Churn
- Autonomy
Date
m(Date) myDate
m(Date) yourDate
myDate
yourDate
8Semantic Interoperability
Q2ltGUIDgtp/GUIDlt/GUIDgt FOR p IN T12 WHERE
p/Creator LIKE "Robi"
Q1ltGUIDgtp/GUIDlt/GUIDgt FOR p IN
/Photoshop_Image WHERE p/Creator LIKE "Robi"
Photoshop (own schema)
WinFS (known schema)
ltPhotoshop_Imagegt ltGUIDgt178A8CD8865lt/GUIDgt
ltCreatorgtRobinsonlt/Creatorgt ltSubjectgt ltBaggt
ltItemgt Tunbridge Wells lt/Itemgt
ltItemgtRoyal Councillt/Itemgt lt/Baggt
lt/Subjectgt lt/Photoshop_Imagegt
ltWinFSImagegt ltGUIDgt178A8CD8866lt/GUIDgt ltAuthorgt
ltDisplayNamegt Henry Peach Robinson
ltDisplayNamegt ltRolegtPhotographerlt/Rolegt
ltAuthorgt ltKeywordgt Tunbridge lt/Keywordgt
ltKeywordgtCouncillt/Keywordgt lt/WinFSImagegt
T12 ltPhotoshop_Imagegt ltGUIDgtfs/GUIDlt/GUIDgt
ltCreatorgt fs/Author/DisplayName
lt/Creatorgtlt/Photoshop_ImagegtFOR fs IN
/WinFSImage
- ? Extending semantic interoperability techniques
to decentralized settings
92.1 Peer Data Management Systems
escDate ? xapCreateDate
weather
article
- Local pairwise mappings
- Peer Data Management Systems (PDMS)
- Pairwise mappings overcome global schema
heterogeneity - Transitive closures on mapping operations
10Problem Precision/Recall Tradeoff
- Semantic Query routing
- To whom shall I forward a query posed against my
local schema? - Some (most) mappings will be (partially) faulty
- Low expressive power of mappings
- Automatic schema alignment techniques
- Granularity of conceptualizations
- Local query resolution
- Low recall
- Flooding (PDMS)
- Low precision
- Standard deductive integration is not sufficient
- Uncertainty on mappings and conceptualizations
- abductive reasoning (on transitive closures of
mappings)
112.2. Probabilistic Message Passing
- Link-based analysis of the PDMS
- Mapping Cycles
- Parallel Paths
- ? Semantics as global agreement
m0
m1
m4
m5
m2
m3
q VS m3(m4(m0(q)))
12Computing a Marginal for one cycle
observed
unknown
- P(m0, m1, m2, m3, f0)
- P(m0) P(m1) P(m2) P(m3) P(f0 m0, m1, m2, m3,)
- P(m0 f0) ?m1, m2, m3 P(m0, m1, m2, m3 , f0)
P(f0)-1 - But feedbacks on different cycles are correlated
- Need to express a global probabilistic model for
the mapping graph
13A Brief Intro to Factor-Graphs
- g(x1, x2, x3, x4) fA(x1, x2)fB(x2, x3, x4)
14Deriving PDMS Factor-Graphs
15PDMS Factor-Graphs
- Cyclic graph
- Junction Tree? Clustering / Stretching of
variables? - Not applicable (decentralization)
- Iterative Sum-Product
- Approximate results
- How to perform iterative sum-product by message
passing on the mapping graph? - Message passing in factor graph does not
correspond to connectivity of mapping graph - We want to rely on decentralized computations
only - Locality VS Globality of nodes in the factor
graph - Mappings local
- Feedback factor common, global knowledge
- Observed feedback variables neighborhood
16Embedded Message-Passing (1)
17Embedded Message-Passing (2)
18Sending Messages in the Mapping Graph
- Message-Passing Schedules
- Periodic
- Lazy (piggybacking on query forwarding)
- No message overhead
19Implemented System
- Schemas
- Import from OWL (Web Ontology Language)
- Mappings
- KnowledgeWeb Ontology Alignment API
- Import from RDF/XML
- Automated on-the-fly creation
- Comparison to standard alignments
- Automatic derivation of quality measures
P(mcorrect F) for the mappings using
iterative message-passing - Per-Hop Forwarding Behaviors (Semantic Gossiping)
20Some (Preliminary) Results Convergence
(undirected example graph, prior 0.7 delta 0.1)
21Impact Of Cycle Length
(simple cycle, prior 0.5)
22Fault-tolerance (faulty links)
(undirected example graph, prior 0.8 delta 0.1)
23Preliminary Results EON (Alignment contest)
- Worst-case scenario no prior knowledge
- Set of 6 schemas on bibliographic data (approx.
30-40 attributes) - 396 generated attribute mappings (84 incorrect)
242.3. Semantic Gossiping
- Selectively reformulate queries through mapping
links - Semantic disances
- Cycles analysis (?)
- Results analysis
- Syntactic distance
- Lost predicates
pTitle ?CreatureJoe (R5)
X
pTitle ?CreatorJoe (R3)
pTitle ?AuthorJoe (R2)
pTitle ?CreatorJoe (R4)
pTitre ?AuteurJoe (R1)
X
???AuthorJoe (R4))
25Self-Organization
- Two types of self-organization
- Static network
- Self-organizing dissemination of queries (?)
- Dynamic network
- Self-organizing network of mappings
- Idea
- Quality evaluation of mappings through Semantic
Gossiping - Drop low quality links
- Reorganized network leads to different quality
evaluation - Dynamic network changes
- ? self-organizing, self-referential semantic
network
26Some Results (1)
Sensitivity to TTL (cycle analysis only, 25
schemas, 4 concepts)
27Some Results (2)
Scalability (results analysis only, 4 concepts,
TTL3, misclassification rate0.1, 2
documents/peer on avg.)
282.4. Semantic Interoperability in the Large
- Do we have enough (good) mappings?
- Modeling semantic interoperability
- The semantic connectivity graph
- Idea as for physical network analyses, define a
connectivity layer - Unweighted, non-redundant version of the
Schema-to-schema graph - Observation
- Peers in a set Ps are semantically interoperable
iff Ss is strongly connected, with Ss ? s ?p ?
Ps, p?s
- Schema-to-Schema Graph
- Logical model
- Directed
- Weighted
- Redundant
29Analyzing Semantic Interoperability in the Large
- Analyzing semantic interoperability in
large-scale, decentralized networks - Percolation theory for directed graphs
- Based on recent graph-theoretic frameworks
- Random graphs with specific degree distributions
pjk, clustering coefficients cc and
bidirectionality coefficient bc - Necessary condition for semantic interoperability
in the large ?j,k (jk-j(bccc)-k)pjk 0 - Excellent approximations of the size of
semantically interoperable clusters in the graph - Analysis Sequence Retrieval System
303. Applications
- GridVine
- Self-organizing semantic overlay network
- PicShark
- Self-organizing middleware to export pictures and
create mappings
313.1 GridVine
- Building large-scale semantic systems
- Self-organizing semantic overlay network
32Semantic Mediation Layer
Semantic Mediation Layer
Correlated / Uncorrelated
Overlay Layer
Correlated / Uncorrelated
Physical layer
33Features
- Based on the P-Grid P2P structure
- Distributed Hash Table developed at EPFL
- Self-organized, scalable, decentralized
- Resolves key-based searches in O (log(n)) even
for unbalanced trees - Semantic Web compliant
- RDF triples, RDFS schemas, OWL mappings
- Structured searches
- RDQL queries
- Semantic Gossiping
- Fosters semantic interoperability
34GridVine Annotating Content
35Decentralized Query Resolution Overview
363.2 PicShark
- Where do the translation links come from?
- Middleware for sharing semi-structured metadata
attached to pictures and creating translation
links
60 moments
PicShark
(Distributed) Hashtable (e.g., GridVine)
Features Extractor
Insert
PSP
Retrieve
Metadata Extractor
XMP
Information Tracker
WinFS
37Features
- Self-Organization of mappings
- Based on low-level features extracted from
- Picture (color moment, textures)
- Structured Metadata (lexicographical analysis)
- Self-Organization of annotations
- Probabilistic propagation of annotations between
similar individuals - Self-Organization of query propagation
- Schema distance based on probabilistic
subsumption - Propagation within a certain diameter
- Driven by user interaction
- Scalable
- Computationally expensive operations are local at
the peers - Only simple in-network operations (look-ups)
- (on-going) collaborative effort with Microsoft
Research Asia
38PicShark Prototype
394. Conclusions
- Fundamental issue Interoperability in large
scale (semi) structured environments - Content Sharing
- Information search
- Semantic Web?
- Traditional techniques are not sufficient
- Scale
- Autonomy
- Uncertainty
- Self-organizing, decentralized stochastic
processes - Data Indexation
- Data Integration
- Query dissemination
40Some References (1)
Semantic Gossiping A Framework for Semantic
Gossiping Karl Aberer, Philippe Cudré-Mauroux,
Manfred Hauswirth SIGMOD Record, 31(4), December
2002. The Chatty Web Emergent Semantics through
Gossiping Karl Aberer, Philippe Cudré-Mauroux,
Manfred Hauswirth, International World Wide Web
Conference (WWW 03). Probabilistic
Message-Passing in Peer-Data Management
Systems Philippe Cudré-Mauroux, Karl Aberer, and
Andras Feher International Conference on Data
Engineering (ICDE 06). Self-Organizing
Semantics Start making sense The Chatty Web
approach for global semantic agreements, Karl
Aberer, Philippe Cudré-Mauroux, Manfred
Hauswirth, Journal of Web Semantics, 1 (1),
December 2003. Emergent Semantics Principles and
Issues Karl Aberer, Philippe Cudré-Mauroux and
Aris M. Ouksel (editors) Tiziana Catarci
Mohand-Said Hacid, Arantza Illarramendi, Vipul
Kashyap, Massimo Mecella, Eduardo Mena, Erich J.
Neuhold, Olga De Troyer, Thomas Risse, Monica
Scannapieco, Fèlix Saltor, Luca de Santis,
Stefano Spaccapietra, Steffen Staab and Rudi
Studer International Conference on Database
Systems for Advanced Applications (DASFAA 04).
41Some References (2)
Semantic Interoperability In the Large A
Necessary Condition For Semantic Interoperability
In The LargePhilippe Cudré-Mauroux and Karl
AbererInternational Conference on Ontologies,
DataBases, and Applications of Semantics (ODBASE
04). Analyzing Semantic Interoperability in
Bioinformatic Database Networks Philippe
Cudré-Mauroux, Julien Gaugaz, Adriana Budura and
Karl Aberer Semantic Network Analysis (SNA
05). GridVine Building Internet-Scale Semantic
Overlay Networks Karl Aberer, Philippe
Cudré-Mauroux, Manfred Hauswirth and Tim van
Pelt International Semantic Web Conference (ISWC
04). Semantic Overlay Netwoks (tutorial) Karl
Aberer and Philippe Cudré-Mauroux International
Conference on Very Large Data Bases (VLDB
05). more references at http//lsirpeople.e
pfl.ch/pcudre/
42Questions?