Title: Taxonomy-based Routing in P2P Networks
1Taxonomy-based Routing in P2P Networks
- Alexander Löser
- HPLB, Friday, 12th. of March 2004
2Hello,
- I m Alexander Löser (aloeser_at_cs.tu-berlin.de)
- ö Alt 148, öoe
-
- Round your mouth to say the letter "o" and say
"e" instead - round your lips and push them
forward, like a fish, or as in puckering to kiss
3Hello,
- I m Alexander Löser (aloeser_at_cs.tu-berlin.de)
- Graduate Research Associate/Project Manager _at_
University of Technology Berlin, Group
Computation and Information Structures - Research Interests
- Semantic Query Routing
- Distributed SW applications for MP3 Searching
- P2P based storage and retrieval of meta data
- Research activities with
- Fraunhofer Institute ISST (Berlin)
- Edutella Project (L3S Hannover)
4Agenda
- Networking
- Information Integration in Berlin
- My Work Area
- Semantic Query Routing
- Benefits of my work for the Semantic Web
- P2P People _at_ HP
5Networking Data Integration in Berlin/Hannover
- University of Technology
- Corespondences, Mapping Rules (Busse)
- Mediator Based Information Integration
- in an EU-wide Migration Data Network (Kutsche)
- of E-Learning Sources (ROODOLF, VELO) (Löser)
- Taxonomy-based Query Routing (Löser)
- Humboldt University
- Knowledge Management in Bioinformatics (Leser)
- Schema matching, Schema mapping and data
transformation (Naumann) - Learning Lab Lower Saxony, Hannover
- Personalized Access to Distributed Learning
Repositories (Nejdl)
6My Work Area Adaptive Taxonomy based Overlays
- Problem
- Broadcasting all queries to all information
sources obviously doesnt scale efficiently - Idea
- Route Queries only to those data sources
providing (possible) results
7Adaptive Taxonomy based Overlays
- Assumptions
- large number of autonomous data sources
- data sources are dynamically available
- each data source provides a description, e.g.
a classification within one or more taxonomy - My Application Scenarios
- Distributed Educational Learning Materials
- MP3 File Sharing
8Adaptive Taxonomy based Overlays
9Adaptive Taxonomy based Overlays
DMOZ /Top/Computers//UML/Education/
10Adaptive Taxonomy based Overlays
DMOZ /Top/Computers//UML/Education/
DMOZ /Top/Computers//UML/Education/
11Adaptive Taxonomy based Overlays
How will a learner discover relevant data sources
for a particular topic?
DMOZ /Top/Computers//UML/Education/
DMOZ /Top/Computers//UML/Education/
12Conceptual ideaLogical Overlay Structures
13Conceptual ideaLogical Overlay Structures
DMOZ /Top/Computers//UML/Education/
14Conceptual ideaLogical Overlay Structures
SourceIDA,B,C
DMOZ /Top/Computers//UML/Education/
15Conceptual ideaLogical Overlay Structures
Taxonomy based Overlays DMOZ /Top/Computers//UML
/Education/
SourceIDA,B,C
DMOZ /Top/Computers//UML/Education/
16Adaptive Taxonomy-based Overlays
Taxonomy based Overlays DMOZ /Top/Computers//UML
/Education/
Local Result Computation
SourceIDA,B,C
http//cis.tu-berlin.de/NE/awc.htm
DMOZ /Top/Computers//UML/Education/
DMOZ /Top/Computers//UML/Education/
17Problem Definition P2P-based Catalogue
SourceID
catalogue of all available data source models
Classified Models
Classified Query
-gt Classification DMOZ /Top/Computers//UML/Educ
ation/
SourceIDA -gtClassification DMOZ
/Top/Computers//UML/Education/
18Why a distributed Catalogue?
- no Central Point of Power
- participants keep control about their materials
- no interested in central administration/organizat
ion - Self organizing
- nodes join leave catalog automatically
- no central organization needed
- Low cost
- P2P search system is literally free
- Costs are distributed among participants
- Scalability
- Need to register 100.000s of participants
- Availability, Robustness
- DDoS attacks will not affect P2P
Philosophy
Costs
Technical
19Why a distributed Catalogue?
- no Central Point of Power
- participants keep control about their materials
- no interested in central administration/organizat
ion - Self organizing
- nodes join leave catalog automatically
- no central organization needed
- Low cost
- P2P search system is literally free
- Costs are distributed among participants
- Scalability
- Need to register 100.000s of participants
- Availability, Robustness
- DDoS attacks will not affect P2P
Philosophy
Costs
Technical
20Why a distributed Catalogue?
- no Central Point of Power
- participants keep control about their materials
- no interested in central administration/organizat
ion - Self organizing
- nodes join leave catalog automatically
- no central organization needed
- Low cost
- P2P search system is literally free
- Costs are distributed among participants
- Scalability
- Need to register 100.000s of participants
- Availability, Robustness
- DDoS attacks will not affect P2P
Philosophy
Costs
Technical
21Why a distributed Catalogue?
Roussopoulos, Giuli, Baker, Maniatis, Rosenthal,
Jeff Mogul, 2 P2P or Not 2 P2P? IPTPS 04
22How does it work?
-
- Data Source Descriptions
- Distributed Hash Tables
- Inverted Indicies
- Key-2-Query References
- Load Balancing Methods
- -------------------------------------
- Distributed Data Source Catalogue
23Distributed Hash Tables The lookup problem
131.196.45.61 stores DMOZ /Top/Computers//UML/Ed
ucation/
22.65.144.31 requests DMOZ /Top/Computers//UML/E
ducation/
24Distributed Hash Tables The lookup problem
- Uses Hash function
- Key identifier SHA-1(key)
- Node identifier SHA-1(IP address)
DMOZ /Top/Computers//UML/Education/
K54 131.196.45.61 N54 22.65.144.31 N8
131.196.45.61 stores DMOZ /Top/Computers//UML/Ed
ucation/
22.65.144.31 requests DMOZ /Top/Computers//UML/E
ducation/
25Distributed Hash Tables The lookup problem
- Uses Hash function
- Key identifier SHA-1(key)
- Node identifier SHA-1(IP address)
DMOZ /Top/Computers//UML/Education/
K54 131.196.45.61 N54 22.65.144.31 N8
N56 stores K54
N8 lookup K54
26Distributed Hash Tables The lookup problem
- Uses Hash function
- Key identifier SHA-1(key)
- Node identifier SHA-1(IP address)
27Distributed Hash Tables The lookup problem
- Uses Hash function
- Key identifier SHA-1(key)
- Node identifier SHA-1(IP address)
- Both are uniformly distributed
- Both exist in the same ID space
- A key is stored at its successor node with next
higher ID
28Distributed Hash Tables The lookup problem
29Distributed Hash Tables The lookup problem
30Distributed Hash Tables The lookup problem
- CHORDStoica2001 Characteristics
- One Operation Map a Key to a Node
- Provides peer-to-peer hash lookup
- Efficient O(log(n)) messages per lookup
- Robust as nodes fail and join (1.6 unanswered
messages if 50 of the node fail) - Good primitive for peer-to-peer systems
31Technologies Distributed Hash Tables
Data Source Model Peer PID A PATH
ClassificationPathDMOZ/Top/Computers/UML/Educati
on/
Consistent Hashing K12 SHA-1(PATH
ClassificationPathDMOZ/Top/Computers/UML/Educati
on/)
Hash Table
32Technologies Distributed Hash Tables
N5
N10
N110
Insert K12PIDA
N20
N20
N99
N32
N80
N60
33Technologies Distributed Hash Tables
34Technologies Distributed Hash Tables
N5
N10
N110
N20
N99
N32
N80
N60
35Technologies Distributed Hash Tables
N5
N10
N110
K12PIDA
N20
N99
N32
Lookup(K12)
N80
N60
36Technologies Distributed Hash Tables
N5
N10
N110
N20
N99
N32
N80
N60
37Technologies Distributed Hash Tables
N5
N10
ClassificationPath dmoz/Top/UML/Education/
N110
N20
N99
N32
PIDA is classified as ClassificationPath
dmoz/Top/UML/Education
Which sources are classificated as
dmoz/Top/UML/Education/
N80
N60
38Technologies Inverted Index
Problem Two Data Sources use same Classification
PID A PATH ClassificationPathDMOZ/Top/Compute
rs/UML/Education/
PID B PATH ClassificationPathDMOZ/Top/Compute
rs/UML/Education/
Hash Table
39Technologies Inverted Index
N5
Insert at K12PIDA Insert at K12PIDB
N10
N110
N20
N99
N32
N80
N60
40Technologies Inverted Index
N5
N10
N110
K12PIDA, PIDB
N20
N99
N32
Lookup(K12)
Lookup(K12)PIDA, PIDB
N80
N60
41(No Transcript)
42Technologies Key-2-Query References
Query Which Data Sources stores Materials about
/Top/Computer/UML/Education
/Top/Computer/UML/Education PIDA,B
43Technologies Key-2-Query References
Query Which Data Sources stores Materials about
/Top/Computer/UML/Education
K12 PIDA, PIDB
44Technologies Key-2-Query References
Query Which Data Sources stores Materials about
/Top/Computer/UML/Education AND specialized
Topics
Query
K12 PIDA, PIDB
K02 PIDC
45Technologies Key-2-Query References
Problem Storing Successor Relationship
/Top
46Technologies Key-2-Query References
/Top
Storage of additional Key-to-Query Relations
47Technologies Key-2-Query References
48(No Transcript)
49Technologies BFS Lookup
Problem Peers and it Successors
/Top/Computers/UML/
Query
50Technologies BFS Lookup
Problem Peers and it Successors
/Top/Computers/UML/
/Top
/Top/Computer
Query
/Top/Computer/UML
/Top/Computer/UML/Education PIDA, PIDB
/Top/Computer/UML/Education/UseCase PIDC
Solution Breath First Search in DHT
51(No Transcript)
52Technologies Complex Models
- Model PID D
- PATH ClassificationPathTU/NE/Modeling/UML
- PATH ClassificationPathDMOZ/Top/Computers/UML/E
ducation/UML
53Technologies Complex Models
- Model PID D
- PATH ClassificationPathTU/NE/Modeling/UML
(K05) - PATH ClassificationPathDMOZ/Top/Computers/UML/E
ducation/UML (K12)
54Technologies Complex Queries
ALL Data Sources storing UML materials classified
as PATH ClassificationPathDMOZ/Top/Computers/
UML/Education/UML(K12) AND PATH
ClassificationPathTU/NE/Modeling/UML (K05)
Lookup (K12) AND (K05)
55Technologies Complex Queries
Query Lookup (K12) AND (K05)
56Technologies Complex Queries
Query Lookup (K12) AND (K05)
- Lookup (K12) PIDA,B,D
- Lookup (K05) PIDD
57Technologies Complex Queries
Query Lookup (K12) AND (K05)
- Lookup (K12) PIDA,B,D
- Lookup (K05) PIDD
- Intersection (PIDA,B,D) AND (PIDD)
58Technologies Complex Queries
Query Lookup (K12) AND (K05)
- Lookup (K12) PIDA,B,D
- Lookup (K05) PIDD
- Intersection (PIDA,B,D) AND (PIDD)
- Intersection PIDD
59TechnologiesFast Inverted Index Intersection
- Sort Keys, Use structure to intersect in
partitions - Sort-Merge Join
- Zig-Zag Join
- Adaptive Set Intersection
- Prefetch
- Compress Intersected Keys , Compare Compressions
- Gap Compression
- Bloom Filter
- Caching Precomputation of popular Combinations
- Intersect only Best Keys
- Incremental IntersectionRanking
60Technologies Load Balancing
- Problem Models and Queries are not equally
distributed and requested among Catalogue Nodes - Query Load Balancing (to many queries on one
node) - CUP Controlled Update Propagation
Roussopoulos03 - PCX Path Caching with Expiration Stoica01
- Index Load Balancing (To many registered data
sources at one node) - Index Load Balancing Directory Steenkiste02
- Virtual Server Stoica01
61(No Transcript)
62Open Issues Inter-Taxonomy Relationships
NE
DMOZ
/InfMod/DB/Books
/../DataBases/Books
Problems Cycles,Inconsistencies ! Any Ideas?
63Discussion
?
64Discussion
!
65My CHORD ExtensionsDiscussion
- Distributed storage of Taxonomies and Source Ids
- Inverted Index
- Key-to-Query (SUCC) References for Taxonomies
- Reuse of existing relations
- Scalable Lookup, self-organizing search engine
- Chord, O(Log N) each message
- Chord ring topology
- Taxonomy based Routing in DHT
- Breath-First Search
- Complex Queries
- Prefatch
- Fast Inverted Index Intersextion
- Load Balancing (CUP, LBM)
66My CHORD ExtensionsDiscussion
- Distributed storage of Taxonomies and Source Ids
- Inverted Index
- Key-to-Query (SUCC) References for Taxonomies
- Reuse of existing relations
- Scalable Lookup, self-organizing search engine
- Chord, O(Log N) each message
- Chord ring topology
- Taxonomy based Routing in DHT
- Breath-First Search
- Complex Queries
- Prefatch
- Fast Inverted Index Intersextion
- Load Balancing (CUP, LBM)
- Scalable,
- distributed,
- self organizing,
- robust,
- inexpensive
- technology to store and query several kinds of
meta data, such as - Predefined Vocabularies,
- Taxonomies,
- Semantic Web Service Descriptions
- ...
67My CHORD Extensions Discussion
- Extending existing NARSES event driven Chord
simulation - Simulating 5000 Catalogue Nodes, 200.000 Data
Sources, 30.000 Categories - Categories and Models taken from DMOZ/MusicMoz
- First Results 04/2004
- Some evaluation questions
- Scalability 1005000 Catalogue Nodes
- Bandwith Joining/Leaving Nodes, Queries
- Robustness Failing catalog nodes
- Load balancing CUP/PCX, LBM
- Intersection Sort Merge, Bloom FilterGap
Compression, Incremental Intersection
68Benefit for the Semantic WebDistributed Semantic
Web Service Repository
69Benefit for the Semantic WebWhats possible to
store and query in a DHT?
70Benefit for the Semantic WebPossible Taks for me
_at_HP (?)
- Investiagte methods to store query existing Web
Service Descriptions in a DHT - Simulate storage query 10.000s of
taxonomy-based source descriptions in a DHT
71Benefit for the Semantic WebSome Visions
- Distributed Semantic Search in Gnutella Who
shares MP3 files of MusicMoz/Style/Brazilian/Bos
sa Nova by MusicMoz/Composer/Gilberto
Alex - Distributed Semantic Web Service Repository Find
in all online Online Bookshops Books where
dcauthorGoethe and dclanguagede SWWS,
HP - Web of Personalized ServersWho shares documents
with the keywords Information Integration in my
company? YouServ, IBM - Distributed fresh GoogleFind documents about
Infromtoin IntegartoinFresh results, no one
month indexing delay P-Store, HP - Distributed Educational Meta Data RepositoryFind
any resource where dclanguage is equal to de
and lomcontext is equal to undergrad.
EDUTELLA
My Focus
72Selected Publications
- Super Peer-based Routing and Clustering
Strategies for RDF-based Peer-to-Peer networks.
Nejdl, Löser et.al 12th WWW 2003 - Information Integration in Schema based
Peer-to-Peer Networks Löser, Nejdl et.al 15th
CAISE 2003 - Semantic Overlay Clusters within Super-Peer
Networks Löser, Naumann, Nejdl 29th VLDB
Workshop Paper 2003 - Super-Peer-Based Routing Strategies for RDF-Based
Peer-to-Peer Networks. Nejdl, Löser et.al
Elseviers Journal on Web Semantics - Efficient Data Store Discovery in a Scientific
P2P Network. Löser, Wolpers, Siberski, Nejdl 2nd
ISWC, Workshop Paper 2003
73(No Transcript)
74P2P State of the ArtSome People (_at_HP)
- _at_HP
- Mary Baker, Zhichen Xu, ....
- (Internet Systems Storage Lab, Palo Alto)
- Bernard Burg??? (Distributed Semantic Metadata
Repository) - _at_Europe
- Karl Aberer (Switzerland), Rudi Studer (Ger),
Wolfgang Nejdl (Ger), Clemens Böhm (Ger), Peter
Triantafillou (Greece), ........ - _at_US
- Hector Garcia-Molina, Vana Kalogeraki, Ion
Stoica, - Amit P. Sheth, Mema Roussopoulos,....