Title: Seminar: Information Management in the Web
1Seminar Information Management in the Web
- Gnutella, Freenet and more an overview of file
sharing architectures - Thomas Zahn
2Peer-to-Peer - Introduction
- "opposite" of Client/Server
- no central servers ? information highly
distributed - every peer acts as a client AND server
- -gt can query, reply to queries and route messages
at the same time - every peer can directly "talk" to any other peer
3Popular Peer-to-Peer Networks
- Napster
- Gnutella
- Freenet
- FastTrack (Kazaa)
- CHORD, CAN, PASTRY, TAPESTRY
4Napster
- was used primarily for file sharing
- NOT a pure peer-to-peer network
- gt hybrid system
- peer turns to central DB for querying
(client/server) - peer downloads directly from other peer(s)
(peer-to-peer)
5Napster
5
4
central DB
6
3
3. Download Request
2. Response
1. Query
4. File
1
2
Peer
6Gnutella - overview
- pure peer-to-peer
- used for file sharing
- very popular gt practically proven ?
- very simple protocol
- no routing "intelligence"
- messages are always broadcast
7Gnutella - PING/PONG
3
6
Ping 1
Ping 1
Pong 3
Pong 6
Pong 6,7,8
Pong 6,7,8
Ping 1
7
5
Pong 5
Pong 3,4,5
1
2
Pong 7
Ping 1
Ping 1
Ping 1
Pong 2
Known Hosts 2
Pong 8
8
Pong 4
Ping 1
3,4,5
6,7,8
Query/Response analogous
4
8Gnutella - Pro Con
- VERY simple protocol gt easy to implement
- very little overhead
- practically proven functionality (?)
- message broadcasts flood network
- gtheavy network traffic
- gt bad, bad scalibility
9Gnutella Reachable Peers
10Gnutella Generated Traffic in Bytes (1)
- query message length 83 bytes
- simple query relaying (no responses)
11Gnutella Generated Traffic in Bytes (2)
- Mean percentage of users who typically share
content 30 - Mean perctg. of users who typically have
responses to search queries 40 - Mean number of search responses the typical
respondent offers 10 - Mean length of search responses the typical
respondent offers 60 - ? "Standard client settings yield a whopping 17MB
generated in response to search query "
12Freenet - Concepts
- peer-to-peer file storage retrieval system
- every document has a globally unique ID
- efficient (?) retrieval algorithm
- documents are retrieved with sublinear effort
- routing based on likelihood of answer capability
- focus on security
13Freenet Query Routing (1)
- every peer maintains routing table
- table contains known peers along with the IDs of
the documents their are storing - a request is routed to the peer most likely to
have an answer (closest matching ID) - responses are sent back upstream
- intermediate peers also store document and
augment their routing tables
14Freenet Query Routing (2)
Routing Table B 14, 20 Doc Cache 19, 30
2. Forward to best match
C
Routing Table C 19, 30 D 45, 51 Doc Cache 14,20
Routing Table C 19, 30 D 17, 45, 51 Doc
Cache 14, 17, 20
Routing Table B 14, 20 X 47, 60 Doc Cache 5, 89
Routing Table B 14, 17, 20 X 47, 60 Doc
Cache 5, 17, 89
1. Query for doc 17
3. C has no match -gt backtrack
A
B
4. Forward query to 2nd best match
6. Route back response
Routing Table B 14, 20 Z 105, 110 Doc Cache 17,
45, 51, 102, 205
5. Send back doc 17
D
15Freenet Document Insert
- analogous to query routing
- insert is routed to the peer most likely to be
interested in new doc (closest matching ID) - intermediate peers cache document and augment
routing tables - until TTL is reached
16Freenet - Discussion
- efficient routing algorithm (compared to
Gnutella) - adequate security features/heuristics (the more
popular a document, the more frequently it gets
cached) - no metasearch
- no updates, deletes possible
- worst case query routing DFS
17FUtella Concepts
- peer-to-peer platform for general knowledge
sharing - tries to model learning style of humans
- content-based routing
- combines and extends approaches from
- Gnutella (message format)
- JXTA (peer groups)
- JXTA Search (queryspaces and registrations)
- FreeNet (routing of registration discoveries)
18FUtella - Knowledge Groups
FUtella Net
Knowledge Group Queryspace "Computer
Architecture"
Group Head Peer E
Inserts Registration
E
. . .
Mi
M1
Members M1 - Mi
19FUtella - Knowledge Group Discovery 1
Routing Table "computer" -gt B "computer analysis"
-gt Y Registration Cache "computer" B "computer
analysis" Y
Routing Table "computer analysis" -gt C "computer
systems" -gt D "data base" -gt A Registration
Cache "computer analysis" Y "computer systems"
Z "data base" X
C
1. Discovery request "computer architecture"
2. Forward discovery request
3. C has no cached registration for "computer
architecture -gt backtrack
A
B
Routing Table "computer" -gt B "computer systems"
-gt Z "computer architecture" -gt E Registration
Cache "computer systems" Z "computer"
B "computer architecture" E
D
Routing Table "computer" -gt B "data base" -gt
X Registration Cache "computer" B "data base" X
4. Forward discovery request to 2nd best match
20FUtella - Knowledge Group Discovery 2
6. Forward discovery response
5. Discovery response
A
B
D
Containing registration "computer architecture" E
Routing Table "computer analysis" -gt C "computer
architecture" -gt D "computer systems" -gt D "data
base" -gt A Registration Cache "computer analysis"
Y "computer architecture" E "computer
systems" Z "data base" X
Routing Table "computer" -gt B "computer systems"
-gt Z "computer architecture" -gt E Registration
Cache "computer systems" Z "computer"
B "computer architecture" E
Routing Table "computer" -gt B "computer
architecture" -gt D "data base" -gt X Registration
Cache "computer" B "computer architecture"
E "data base" X
21Futella - Query Processing
1. Discovery request "computer architecture"
2. Forward discovery request
C
3. C has no cached registration for "computer
architecture -gt backtrack
A
B
4. Forward discovery request to 2nd best match
6. Forward discovery response
D
7. Send query
5. Discovery response containing cached
registration
8. Forward query to member
E
M1
9. Query response
8.Forward query to member
Knowledge group "computer architecture"
. . .
9. Query response
Mi
22Futella - Test Results (1)
23FUtella - Test Results (2)
24Conclusion
- first and second generation P2P systems still
most widely used - practically proven
- very flexible in terms of topology
- bad scalibility (Gnutella)
- no guaranteed lower bound on query effort
(Freenet) - (scientificly) far better approach DHTs (see
next presentation)
25Questions ?