Taxonomy-based Routing in P2P Networks - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Taxonomy-based Routing in P2P Networks

Description:

Round your mouth to say the letter 'o' and say 'e' instead - round ... University of Technology Berlin, Group 'Computation and ... Saxony, Hannover ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 74
Provided by: alexand78
Category:

less

Transcript and Presenter's Notes

Title: Taxonomy-based Routing in P2P Networks


1
Taxonomy-based Routing in P2P Networks
  • Alexander Löser
  • HPLB, Friday, 12th. of March 2004

2
Hello,
  • I m Alexander Löser (aloeser_at_cs.tu-berlin.de)
  • ö Alt 148, öoe
  • Round your mouth to say the letter "o" and say
    "e" instead - round your lips and push them
    forward, like a fish, or as in puckering to kiss

3
Hello,
  • I m Alexander Löser (aloeser_at_cs.tu-berlin.de)
  • Graduate Research Associate/Project Manager _at_
    University of Technology Berlin, Group
    Computation and Information Structures
  • Research Interests
  • Semantic Query Routing
  • Distributed SW applications for MP3 Searching
  • P2P based storage and retrieval of meta data
  • Research activities with
  • Fraunhofer Institute ISST (Berlin)
  • Edutella Project (L3S Hannover)

4
Agenda
  • Networking
  • Information Integration in Berlin
  • My Work Area
  • Semantic Query Routing
  • Benefits of my work for the Semantic Web
  • P2P People _at_ HP

5
Networking Data Integration in Berlin/Hannover
  • University of Technology
  • Corespondences, Mapping Rules (Busse)
  • Mediator Based Information Integration
  • in an EU-wide Migration Data Network (Kutsche)
  • of E-Learning Sources (ROODOLF, VELO) (Löser)
  • Taxonomy-based Query Routing (Löser)
  • Humboldt University
  • Knowledge Management in Bioinformatics (Leser)
  • Schema matching, Schema mapping and data
    transformation (Naumann)
  • Learning Lab Lower Saxony, Hannover
  • Personalized Access to Distributed Learning
    Repositories (Nejdl)

6
My Work Area Adaptive Taxonomy based Overlays
  • Problem
  • Broadcasting all queries to all information
    sources obviously doesnt scale efficiently
  • Idea
  • Route Queries only to those data sources
    providing (possible) results

7
Adaptive Taxonomy based Overlays
  • Assumptions
  • large number of autonomous data sources
  • data sources are dynamically available
  • each data source provides a description, e.g.
    a classification within one or more taxonomy
  • My Application Scenarios
  • Distributed Educational Learning Materials
  • MP3 File Sharing

8
Adaptive Taxonomy based Overlays
9
Adaptive Taxonomy based Overlays
DMOZ /Top/Computers//UML/Education/
10
Adaptive Taxonomy based Overlays
DMOZ /Top/Computers//UML/Education/
DMOZ /Top/Computers//UML/Education/
11
Adaptive Taxonomy based Overlays
How will a learner discover relevant data sources
for a particular topic?
DMOZ /Top/Computers//UML/Education/
DMOZ /Top/Computers//UML/Education/
12
Conceptual ideaLogical Overlay Structures
13
Conceptual ideaLogical Overlay Structures
DMOZ /Top/Computers//UML/Education/
14
Conceptual ideaLogical Overlay Structures
SourceIDA,B,C
DMOZ /Top/Computers//UML/Education/
15
Conceptual ideaLogical Overlay Structures
Taxonomy based Overlays DMOZ /Top/Computers//UML
/Education/
SourceIDA,B,C
DMOZ /Top/Computers//UML/Education/
16
Adaptive Taxonomy-based Overlays
Taxonomy based Overlays DMOZ /Top/Computers//UML
/Education/
Local Result Computation
SourceIDA,B,C
http//cis.tu-berlin.de/NE/awc.htm
DMOZ /Top/Computers//UML/Education/
DMOZ /Top/Computers//UML/Education/
17
Problem Definition P2P-based Catalogue
SourceID
catalogue of all available data source models
Classified Models
Classified Query
-gt Classification DMOZ /Top/Computers//UML/Educ
ation/
SourceIDA -gtClassification DMOZ
/Top/Computers//UML/Education/
18
Why a distributed Catalogue?
  • no Central Point of Power
  • participants keep control about their materials
  • no interested in central administration/organizat
    ion
  • Self organizing
  • nodes join leave catalog automatically
  • no central organization needed
  • Low cost
  • P2P search system is literally free
  • Costs are distributed among participants
  • Scalability
  • Need to register 100.000s of participants
  • Availability, Robustness
  • DDoS attacks will not affect P2P

Philosophy
Costs
Technical
19
Why a distributed Catalogue?
  • no Central Point of Power
  • participants keep control about their materials
  • no interested in central administration/organizat
    ion
  • Self organizing
  • nodes join leave catalog automatically
  • no central organization needed
  • Low cost
  • P2P search system is literally free
  • Costs are distributed among participants
  • Scalability
  • Need to register 100.000s of participants
  • Availability, Robustness
  • DDoS attacks will not affect P2P

Philosophy
Costs
Technical
20
Why a distributed Catalogue?
  • no Central Point of Power
  • participants keep control about their materials
  • no interested in central administration/organizat
    ion
  • Self organizing
  • nodes join leave catalog automatically
  • no central organization needed
  • Low cost
  • P2P search system is literally free
  • Costs are distributed among participants
  • Scalability
  • Need to register 100.000s of participants
  • Availability, Robustness
  • DDoS attacks will not affect P2P

Philosophy
Costs
Technical
21
Why a distributed Catalogue?
Roussopoulos, Giuli, Baker, Maniatis, Rosenthal,
Jeff Mogul, 2 P2P or Not 2 P2P? IPTPS 04
22
How does it work?
  • Data Source Descriptions
  • Distributed Hash Tables
  • Inverted Indicies
  • Key-2-Query References
  • Load Balancing Methods
  • -------------------------------------
  • Distributed Data Source Catalogue

23
Distributed Hash Tables The lookup problem
131.196.45.61 stores DMOZ /Top/Computers//UML/Ed
ucation/
22.65.144.31 requests DMOZ /Top/Computers//UML/E
ducation/
24
Distributed Hash Tables The lookup problem
  • Uses Hash function
  • Key identifier SHA-1(key)
  • Node identifier SHA-1(IP address)

DMOZ /Top/Computers//UML/Education/
K54 131.196.45.61 N54 22.65.144.31 N8
131.196.45.61 stores DMOZ /Top/Computers//UML/Ed
ucation/
22.65.144.31 requests DMOZ /Top/Computers//UML/E
ducation/
25
Distributed Hash Tables The lookup problem
  • Uses Hash function
  • Key identifier SHA-1(key)
  • Node identifier SHA-1(IP address)

DMOZ /Top/Computers//UML/Education/
K54 131.196.45.61 N54 22.65.144.31 N8
N56 stores K54
N8 lookup K54
26
Distributed Hash Tables The lookup problem
  • Uses Hash function
  • Key identifier SHA-1(key)
  • Node identifier SHA-1(IP address)

27
Distributed Hash Tables The lookup problem
  • Uses Hash function
  • Key identifier SHA-1(key)
  • Node identifier SHA-1(IP address)
  • Both are uniformly distributed
  • Both exist in the same ID space
  • A key is stored at its successor node with next
    higher ID

28
Distributed Hash Tables The lookup problem
29
Distributed Hash Tables The lookup problem
30
Distributed Hash Tables The lookup problem
  • CHORDStoica2001 Characteristics
  • One Operation Map a Key to a Node
  • Provides peer-to-peer hash lookup
  • Efficient O(log(n)) messages per lookup
  • Robust as nodes fail and join (1.6 unanswered
    messages if 50 of the node fail)
  • Good primitive for peer-to-peer systems

31
Technologies Distributed Hash Tables
Data Source Model Peer PID A PATH
ClassificationPathDMOZ/Top/Computers/UML/Educati
on/
Consistent Hashing K12 SHA-1(PATH
ClassificationPathDMOZ/Top/Computers/UML/Educati
on/)
Hash Table
32
Technologies Distributed Hash Tables
N5
N10
N110
Insert K12PIDA
N20
N20
N99
N32
N80
N60
33
Technologies Distributed Hash Tables
34
Technologies Distributed Hash Tables
N5
N10
N110
N20
N99
N32
N80
N60
35
Technologies Distributed Hash Tables
N5
N10
N110
K12PIDA
N20
N99
N32
Lookup(K12)
N80
N60
36
Technologies Distributed Hash Tables
N5
N10
N110
N20
N99
N32
N80
N60
37
Technologies Distributed Hash Tables
N5
N10
ClassificationPath dmoz/Top/UML/Education/
N110
N20
N99
N32
PIDA is classified as ClassificationPath
dmoz/Top/UML/Education
Which sources are classificated as
dmoz/Top/UML/Education/
N80
N60
38
Technologies Inverted Index
Problem Two Data Sources use same Classification
PID A PATH ClassificationPathDMOZ/Top/Compute
rs/UML/Education/
PID B PATH ClassificationPathDMOZ/Top/Compute
rs/UML/Education/
Hash Table
39
Technologies Inverted Index
N5
Insert at K12PIDA Insert at K12PIDB
N10
N110
N20
N99
N32
N80
N60
40
Technologies Inverted Index
N5
N10
N110
K12PIDA, PIDB
N20
N99
N32
Lookup(K12)
Lookup(K12)PIDA, PIDB
N80
N60
41
(No Transcript)
42
Technologies Key-2-Query References
Query Which Data Sources stores Materials about
/Top/Computer/UML/Education
/Top/Computer/UML/Education PIDA,B
43
Technologies Key-2-Query References
Query Which Data Sources stores Materials about
/Top/Computer/UML/Education
K12 PIDA, PIDB
44
Technologies Key-2-Query References
Query Which Data Sources stores Materials about
/Top/Computer/UML/Education AND specialized
Topics
Query
K12 PIDA, PIDB
K02 PIDC
45
Technologies Key-2-Query References
Problem Storing Successor Relationship
/Top
46
Technologies Key-2-Query References
/Top
Storage of additional Key-to-Query Relations
47
Technologies Key-2-Query References
48
(No Transcript)
49
Technologies BFS Lookup
Problem Peers and it Successors
/Top/Computers/UML/
Query
50
Technologies BFS Lookup
Problem Peers and it Successors
/Top/Computers/UML/
/Top
/Top/Computer
Query
/Top/Computer/UML
/Top/Computer/UML/Education PIDA, PIDB
/Top/Computer/UML/Education/UseCase PIDC
Solution Breath First Search in DHT
51
(No Transcript)
52
Technologies Complex Models
  • Model PID D
  • PATH ClassificationPathTU/NE/Modeling/UML
  • PATH ClassificationPathDMOZ/Top/Computers/UML/E
    ducation/UML

53
Technologies Complex Models
  • Model PID D
  • PATH ClassificationPathTU/NE/Modeling/UML
    (K05)
  • PATH ClassificationPathDMOZ/Top/Computers/UML/E
    ducation/UML (K12)

54
Technologies Complex Queries
ALL Data Sources storing UML materials classified
as PATH ClassificationPathDMOZ/Top/Computers/
UML/Education/UML(K12) AND PATH
ClassificationPathTU/NE/Modeling/UML (K05)
Lookup (K12) AND (K05)
55
Technologies Complex Queries
Query Lookup (K12) AND (K05)
  • Lookup (K12) PIDA,B,D

56
Technologies Complex Queries
Query Lookup (K12) AND (K05)
  • Lookup (K12) PIDA,B,D
  • Lookup (K05) PIDD

57
Technologies Complex Queries
Query Lookup (K12) AND (K05)
  • Lookup (K12) PIDA,B,D
  • Lookup (K05) PIDD
  • Intersection (PIDA,B,D) AND (PIDD)

58
Technologies Complex Queries
Query Lookup (K12) AND (K05)
  • Lookup (K12) PIDA,B,D
  • Lookup (K05) PIDD
  • Intersection (PIDA,B,D) AND (PIDD)
  • Intersection PIDD

59
TechnologiesFast Inverted Index Intersection
  • Sort Keys, Use structure to intersect in
    partitions
  • Sort-Merge Join
  • Zig-Zag Join
  • Adaptive Set Intersection
  • Prefetch
  • Compress Intersected Keys , Compare Compressions
  • Gap Compression
  • Bloom Filter
  • Caching Precomputation of popular Combinations
  • Intersect only Best Keys
  • Incremental IntersectionRanking

60
Technologies Load Balancing
  • Problem Models and Queries are not equally
    distributed and requested among Catalogue Nodes
  • Query Load Balancing (to many queries on one
    node)
  • CUP Controlled Update Propagation
    Roussopoulos03
  • PCX Path Caching with Expiration Stoica01
  • Index Load Balancing (To many registered data
    sources at one node)
  • Index Load Balancing Directory Steenkiste02
  • Virtual Server Stoica01

61
(No Transcript)
62
Open Issues Inter-Taxonomy Relationships
NE
DMOZ
/InfMod/DB/Books
/../DataBases/Books
Problems Cycles,Inconsistencies ! Any Ideas?
63
Discussion
?
64
Discussion
!
65
My CHORD ExtensionsDiscussion
  • Distributed storage of Taxonomies and Source Ids
  • Inverted Index
  • Key-to-Query (SUCC) References for Taxonomies
  • Reuse of existing relations
  • Scalable Lookup, self-organizing search engine
  • Chord, O(Log N) each message
  • Chord ring topology
  • Taxonomy based Routing in DHT
  • Breath-First Search
  • Complex Queries
  • Prefatch
  • Fast Inverted Index Intersextion
  • Load Balancing (CUP, LBM)

66
My CHORD ExtensionsDiscussion
  • Distributed storage of Taxonomies and Source Ids
  • Inverted Index
  • Key-to-Query (SUCC) References for Taxonomies
  • Reuse of existing relations
  • Scalable Lookup, self-organizing search engine
  • Chord, O(Log N) each message
  • Chord ring topology
  • Taxonomy based Routing in DHT
  • Breath-First Search
  • Complex Queries
  • Prefatch
  • Fast Inverted Index Intersextion
  • Load Balancing (CUP, LBM)
  • Scalable,
  • distributed,
  • self organizing,
  • robust,
  • inexpensive
  • technology to store and query several kinds of
    meta data, such as
  • Predefined Vocabularies,
  • Taxonomies,
  • Semantic Web Service Descriptions
  • ...

67
My CHORD Extensions Discussion
  • Extending existing NARSES event driven Chord
    simulation
  • Simulating 5000 Catalogue Nodes, 200.000 Data
    Sources, 30.000 Categories
  • Categories and Models taken from DMOZ/MusicMoz
  • First Results 04/2004
  • Some evaluation questions
  • Scalability 1005000 Catalogue Nodes
  • Bandwith Joining/Leaving Nodes, Queries
  • Robustness Failing catalog nodes
  • Load balancing CUP/PCX, LBM
  • Intersection Sort Merge, Bloom FilterGap
    Compression, Incremental Intersection

68
Benefit for the Semantic WebDistributed Semantic
Web Service Repository
69
Benefit for the Semantic WebWhats possible to
store and query in a DHT?
70
Benefit for the Semantic WebPossible Taks for me
_at_HP (?)
  • Investiagte methods to store query existing Web
    Service Descriptions in a DHT
  • Simulate storage query 10.000s of
    taxonomy-based source descriptions in a DHT

71
Benefit for the Semantic WebSome Visions
  • Distributed Semantic Search in Gnutella Who
    shares MP3 files of MusicMoz/Style/Brazilian/Bos
    sa Nova by MusicMoz/Composer/Gilberto
    Alex
  • Distributed Semantic Web Service Repository Find
    in all online Online Bookshops Books where
    dcauthorGoethe and dclanguagede SWWS,
    HP
  • Web of Personalized ServersWho shares documents
    with the keywords Information Integration in my
    company? YouServ, IBM
  • Distributed fresh GoogleFind documents about
    Infromtoin IntegartoinFresh results, no one
    month indexing delay P-Store, HP
  • Distributed Educational Meta Data RepositoryFind
    any resource where dclanguage is equal to de
    and lomcontext is equal to undergrad.
    EDUTELLA

My Focus
72
Selected Publications
  • Super Peer-based Routing and Clustering
    Strategies for RDF-based Peer-to-Peer networks.
    Nejdl, Löser et.al 12th WWW 2003
  • Information Integration in Schema based
    Peer-to-Peer Networks Löser, Nejdl et.al 15th
    CAISE 2003
  • Semantic Overlay Clusters within Super-Peer
    Networks Löser, Naumann, Nejdl 29th VLDB
    Workshop Paper 2003
  • Super-Peer-Based Routing Strategies for RDF-Based
    Peer-to-Peer Networks. Nejdl, Löser et.al
    Elseviers Journal on Web Semantics
  • Efficient Data Store Discovery in a Scientific
    P2P Network. Löser, Wolpers, Siberski, Nejdl 2nd
    ISWC, Workshop Paper 2003

73
(No Transcript)
74
P2P State of the ArtSome People (_at_HP)
  • _at_HP
  • Mary Baker, Zhichen Xu, ....
  • (Internet Systems Storage Lab, Palo Alto)
  • Bernard Burg??? (Distributed Semantic Metadata
    Repository)
  • _at_Europe
  • Karl Aberer (Switzerland), Rudi Studer (Ger),
    Wolfgang Nejdl (Ger), Clemens Böhm (Ger), Peter
    Triantafillou (Greece), ........
  • _at_US
  • Hector Garcia-Molina, Vana Kalogeraki, Ion
    Stoica,
  • Amit P. Sheth, Mema Roussopoulos,....
Write a Comment
User Comments (0)
About PowerShow.com