Seminar: Information Management in the Web - PowerPoint PPT Presentation

About This Presentation
Title:

Seminar: Information Management in the Web

Description:

was used primarily for file sharing. NOT a pure peer-to-peer network = hybrid system ... used for file sharing. very popular = practically proven ? very ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 26
Provided by: zahn8
Category:

less

Transcript and Presenter's Notes

Title: Seminar: Information Management in the Web


1
Seminar Information Management in the Web
  • Gnutella, Freenet and more an overview of file
    sharing architectures
  • Thomas Zahn

2
Peer-to-Peer - Introduction
  • "opposite" of Client/Server
  • no central servers ? information highly
    distributed
  • every peer acts as a client AND server
  • -gt can query, reply to queries and route messages
    at the same time
  • every peer can directly "talk" to any other peer

3
Popular Peer-to-Peer Networks
  • Napster
  • Gnutella
  • Freenet
  • FastTrack (Kazaa)
  • CHORD, CAN, PASTRY, TAPESTRY

4
Napster
  • was used primarily for file sharing
  • NOT a pure peer-to-peer network
  • gt hybrid system
  • peer turns to central DB for querying
    (client/server)
  • peer downloads directly from other peer(s)
    (peer-to-peer)

5
Napster
5
4
central DB
6
3
3. Download Request
2. Response
1. Query
4. File
1
2
Peer
6
Gnutella - overview
  • pure peer-to-peer
  • used for file sharing
  • very popular gt practically proven ?
  • very simple protocol
  • no routing "intelligence"
  • messages are always broadcast

7
Gnutella - PING/PONG
3
6
Ping 1
Ping 1
Pong 3
Pong 6
Pong 6,7,8
Pong 6,7,8
Ping 1
7
5
Pong 5
Pong 3,4,5
1
2
Pong 7
Ping 1
Ping 1
Ping 1
Pong 2
Known Hosts 2
Pong 8
8
Pong 4
Ping 1
3,4,5
6,7,8
Query/Response analogous
4
8
Gnutella - Pro Con
  • VERY simple protocol gt easy to implement
  • very little overhead
  • practically proven functionality (?)
  • message broadcasts flood network
  • gtheavy network traffic
  • gt bad, bad scalibility

9
Gnutella Reachable Peers
10
Gnutella Generated Traffic in Bytes (1)
  • query message length 83 bytes
  • simple query relaying (no responses)

11
Gnutella Generated Traffic in Bytes (2)
  • Mean percentage of users who typically share
    content 30
  • Mean perctg. of users who typically have
    responses to search queries 40
  • Mean number of search responses the typical
    respondent offers 10
  • Mean length of search responses the typical
    respondent offers 60
  • ? "Standard client settings yield a whopping 17MB
    generated in response to search query "

12
Freenet - Concepts
  • peer-to-peer file storage retrieval system
  • every document has a globally unique ID
  • efficient (?) retrieval algorithm
  • documents are retrieved with sublinear effort
  • routing based on likelihood of answer capability
  • focus on security

13
Freenet Query Routing (1)
  • every peer maintains routing table
  • table contains known peers along with the IDs of
    the documents their are storing
  • a request is routed to the peer most likely to
    have an answer (closest matching ID)
  • responses are sent back upstream
  • intermediate peers also store document and
    augment their routing tables

14
Freenet Query Routing (2)
Routing Table B 14, 20 Doc Cache 19, 30
2. Forward to best match
C
Routing Table C 19, 30 D 45, 51 Doc Cache 14,20
Routing Table C 19, 30 D 17, 45, 51 Doc
Cache 14, 17, 20
Routing Table B 14, 20 X 47, 60 Doc Cache 5, 89
Routing Table B 14, 17, 20 X 47, 60 Doc
Cache 5, 17, 89
1. Query for doc 17
3. C has no match -gt backtrack
A
B
4. Forward query to 2nd best match
6. Route back response
Routing Table B 14, 20 Z 105, 110 Doc Cache 17,
45, 51, 102, 205
5. Send back doc 17
D
15
Freenet Document Insert
  • analogous to query routing
  • insert is routed to the peer most likely to be
    interested in new doc (closest matching ID)
  • intermediate peers cache document and augment
    routing tables
  • until TTL is reached

16
Freenet - Discussion
  • efficient routing algorithm (compared to
    Gnutella)
  • adequate security features/heuristics (the more
    popular a document, the more frequently it gets
    cached)
  • no metasearch
  • no updates, deletes possible
  • worst case query routing DFS

17
FUtella Concepts
  • peer-to-peer platform for general knowledge
    sharing
  • tries to model learning style of humans
  • content-based routing
  • combines and extends approaches from
  • Gnutella (message format)
  • JXTA (peer groups)
  • JXTA Search (queryspaces and registrations)
  • FreeNet (routing of registration discoveries)

18
FUtella - Knowledge Groups
FUtella Net
Knowledge Group Queryspace "Computer
Architecture"
Group Head Peer E
Inserts Registration
E
. . .
Mi
M1
Members M1 - Mi
19
FUtella - Knowledge Group Discovery 1
Routing Table "computer" -gt B "computer analysis"
-gt Y Registration Cache "computer" B "computer
analysis" Y
Routing Table "computer analysis" -gt C "computer
systems" -gt D "data base" -gt A Registration
Cache "computer analysis" Y "computer systems"
Z "data base" X
C
1. Discovery request "computer architecture"
2. Forward discovery request
3. C has no cached registration for "computer
architecture -gt backtrack
A
B
Routing Table "computer" -gt B "computer systems"
-gt Z "computer architecture" -gt E Registration
Cache "computer systems" Z "computer"
B "computer architecture" E
D
Routing Table "computer" -gt B "data base" -gt
X Registration Cache "computer" B "data base" X
4. Forward discovery request to 2nd best match
20
FUtella - Knowledge Group Discovery 2
6. Forward discovery response
5. Discovery response
A
B
D
Containing registration "computer architecture" E
Routing Table "computer analysis" -gt C "computer
architecture" -gt D "computer systems" -gt D "data
base" -gt A Registration Cache "computer analysis"
Y "computer architecture" E "computer
systems" Z "data base" X
Routing Table "computer" -gt B "computer systems"
-gt Z "computer architecture" -gt E Registration
Cache "computer systems" Z "computer"
B "computer architecture" E
Routing Table "computer" -gt B "computer
architecture" -gt D "data base" -gt X Registration
Cache "computer" B "computer architecture"
E "data base" X
21
Futella - Query Processing
1. Discovery request "computer architecture"
2. Forward discovery request
C
3. C has no cached registration for "computer
architecture -gt backtrack
A
B
4. Forward discovery request to 2nd best match
6. Forward discovery response
D
7. Send query
5. Discovery response containing cached
registration
8. Forward query to member
E
M1
9. Query response
8.Forward query to member
Knowledge group "computer architecture"
. . .
9. Query response
Mi
22
Futella - Test Results (1)
23
FUtella - Test Results (2)
24
Conclusion
  • first and second generation P2P systems still
    most widely used
  • practically proven
  • very flexible in terms of topology
  • bad scalibility (Gnutella)
  • no guaranteed lower bound on query effort
    (Freenet)
  • (scientificly) far better approach DHTs (see
    next presentation)

25
Questions ?
  • ?
Write a Comment
User Comments (0)
About PowerShow.com