PeertoPeer Systems - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

PeertoPeer Systems

Description:

Users first connect to the Napster's centralized server to one of the regional ... Napster uses some heuristic evaluation mechanisms about the reliability of a ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 43
Provided by: muk1
Learn more at: https://www.cs.odu.edu
Category:

less

Transcript and Presenter's Notes

Title: PeertoPeer Systems


1
Peer-to-Peer Systems
  • Chapter 25

2
What is Peer-to-Peer (P2P)?
  • Napster?
  • Gnutella?
  • Most people think of P2P as music sharing

3
What is a peer?
  • Contrasted with Client-Server model
  • Servers are centrally maintained and administered
  • Client has fewer resources than a server

4
What is a peer?
  • A peers resources are similar to the resources
    of the other participants
  • P2P peers communicating directly with other
    peers and sharing resources

5
P2P Concepts
  • Client-client as opposed to client-server
  • File sharing I get a copy from someone, and now
    make it available for others to download---copies
    are/workload is spread out
  • Advantages Scalable, stable, self-repairing
  • Process A peer joins the system when a user
    starts the application, contributes some
    resources while
  • making use of the resources provided by others,
    and leaves the system when the user exits the
    application.
  • Session One such join-participate-leave cycle
  • Churn The independent arrival and departure by
    thousandsor millions of peers creates the
    collective effect we call churn.
  • The user-driven dynamics of peer participation
    must be taken into
  • account in both the design and evaluation of any
    P2P application. For example, the distribution of
    session length
  • can affect the overlay structure, the resiliency
    of the
  • overlay, and the selection of key design
    parameters.

6
Types of clients
  • Based on the client behavior, there are three
    types of clients
  • True clients (not active participants take but
    dont give short duration of stay)
  • Peers Clients that stay long enough and
    well-connected enough to participate actively
    (Take and give)
  • Servers (Give, but dont take)
  • Safe vs. probabilistic protocols
  • Mostly logarithmic order of performance/cost

7
Levels of P2P-ness
  • P2P as a mindset
  • Slashdot
  • P2P as a model
  • Gnutella
  • P2P as an implementation choice
  • Application-layer multicast
  • P2P as an inherent property
  • Ad-hoc networks

8
P2P Goals/Benefits
  • Cost sharing
  • Resource aggregation
  • Improved scalability/reliability
  • Increased autonomy
  • Anonymity/privacy
  • Dynamism
  • Ad-hoc communication

9
P2P File Sharing
  • Content exchange
  • Gnutella
  • File systems
  • Oceanstore
  • Filtering/mining
  • Opencola

10
P2P File Sharing Benefits
  • Cost sharing
  • Resource aggregation
  • Improved scalability/reliability
  • Anonymity/privacy
  • Dynamism

11
P2P Application Taxonomy
P2P Systems
Distributed Computing SETI_at_home
File Sharing Gnutella
Collaboration Jabber
Platforms JXTA
12
Management/Placement Challenges
  • Per-node state
  • Bandwidth usage
  • Search time
  • Fault tolerance/resiliency

13
Approaches
  • Centralized
  • Flooding
  • Document Routing

14
Centralized
Bob
Alice
  • Napster model
  • Benefits
  • Efficient search
  • Limited bandwidth usage
  • No per-node state
  • Drawbacks
  • Central point of failure
  • Limited scale

Jane
Judy
15
Flooding
Carl
Jane
  • Gnutella model
  • Benefits
  • No central point of failure
  • Limited per-node state
  • Drawbacks
  • Slow searches
  • Bandwidth intensive

Bob
Alice
Judy
16
Connectivity
17
Napster
  • Uses a centralized directory mechanism
  • To control the selection of peers
  • To generate other revenue-generating activities
  • In addition is has several regional servers
  • Users first connect to the Napsters centralized
    server to one of the regional servers
  • Basically, each client system has a Napster proxy
    that keeps track of the local shared files and
    informs the regional server
  • Napster uses some heuristic evaluation mechanisms
    about the reliability of a client before it
    starts using it as a shared workspace

18
Gnutella and Kazaa
  • Unlike Napster, it is a pure P2P with no
    centralized component---all peers are completely
    equal
  • Protocol
  • Ensures that each user system is concerned with a
    few Gnutella nodes
  • Search for files if the distance specified is 4,
    then all machines within 4 hops of the client
    will be probed (1st all M/C within 1 hop then 2
    hops and so on)
  • The anycast mechanism becomes extremely costly as
    system scales up.
  • Kaaza also does not have a centralized control
    (as Gnutella) it uses Plaxton trees.

19
CAN
  • Content Addressable Network
  • Each object is expected to have a unique system
    wide name or identifier
  • The name is hashed into a d-tuple--- identifier
    is converted into a random-looking number using
    some cryptographic hash function
  • In a 2-dimensional CAN the id is hashed to a
    2-dimensional tuple (x,y)
  • Same scheme is used to convert machine IDs
  • Recursively subdivide the space of possible
    d-dimensional identifiers, storing each object at
    the node owning the part of the space (zone) that
    objects ID falls in.
  • When a new node is added, it shares its space
    with the new node similarly when a node leaves,
    its space is owned by a nearby node
  • Once a user provides the search key, it is
    converted to (x,y) the receiving CAN node finds
    a path from itslef to the node having (x,y)
    space. If d is the dimensions, and N is the of
    nodes, then the number of hops is (d/4)N1/d
  • TO take care of node failures, there will be
    backups.
  • Cost is high when there are frequent joins/leaves

20
Document Routing
001
012
  • FreeNet, Chord, CAN, Tapestry, Pastry model
  • Benefits
  • More efficient searching
  • Limited per-node state
  • Drawbacks
  • Limited fault-tolerance vs redundancy

212 ?
212 ?
332
212
305
21
Document Routing CAN
  • Associate to each node and item a unique id in an
    d-dimensional space
  • Goals
  • Scales to hundreds of thousands of nodes
  • Handles rapid arrival and failure of nodes
  • Properties
  • Routing table size O(d)
  • Guarantees that a file is found in at most dn1/d
    steps, where n is the total number of nodes

Slide modified from another presentation
22
CAN Example Two Dimensional Space
  • Space divided between nodes
  • All nodes cover the entire space
  • Each node covers either a square or a rectangular
    area of ratios 12 or 21
  • Example
  • Node n1(1, 2) first node that joins ? cover the
    entire space

7
6
5
4
3
n1
2
1
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
23
CAN Example Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
24
CAN Example Two Dimensional Space
  • Node n2(4, 2) joins ? space is divided between
    n1 and n2

7
6
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
25
CAN Example Two Dimensional Space
  • Nodes n4(5, 5) and n5(6,6) join

7
6
n5
n4
n3
5
4
3
n1
n2
2
1
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
26
CAN Example Two Dimensional Space
  • Nodes n1(1, 2) n2(4,2) n3(3, 5)
    n4(5,5)n5(6,6)
  • Items f1(2,3) f2(5,1) f3(2,1) f4(7,5)

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
27
CAN Example Two Dimensional Space
  • Each item is stored by the node who owns its
    mapping in the space

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
28
CAN Query Example
  • Each node knows its neighbors in the d-space
  • Forward query to the neighbor that is closest to
    the query id
  • Example assume n1 queries f4
  • Can route around some failures
  • some failures require local flooding

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
29
CAN Query Example
  • Each node knows its neighbors in the d-space
  • Forward query to the neighbor that is closest to
    the query id
  • Example assume n1 queries f4
  • Can route around some failures
  • some failures require local flooding

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
30
CAN Query Example
  • Each node knows its neighbors in the d-space
  • Forward query to the neighbor that is closest to
    the query id
  • Example assume n1 queries f4
  • Can route around some failures
  • some failures require local flooding

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
31
CAN Query Example
  • Each node knows its neighbors in the d-space
  • Forward query to the neighbor that is closest to
    the query id
  • Example assume n1 queries f4
  • Can route around some failures
  • some failures require local flooding

7
6
n5
n4
n3
5
f4
4
f1
3
n1
n2
2
f3
1
f2
0
2
3
4
5
6
7
0
1
Slide modified from another presentation
32
CFS and PAST
  • Files are replicated prior to storage---copies
    are stored at adjacent locations in the hashed-id
    space
  • Make use of indexing systems to locate nodes on
    which they store objects or from which they
    retrieve copies
  • IDs are hashed to a 1-dimensional space
  • Leaves/Joins result in several file
    copies---could be a bottleneck

33
OceanStore
  • Focused on long term archival storage (rather
    than file sharing)---e.g., digital libraries
  • Ensure codes --- class of error-correcting codes
    that can reconstruct a valid copy of a file given
    some percentage of copies

34
Distributed Indexing in P2P
  • Two requirements
  • A lookup mechanism to track down a node holding
    an object
  • A superimposed file system that knows how to
    store and retrieve files
  • DNS---a distributed object locator M/C names to
    IP addresses
  • P2P indexing tools let users store (key, value)
    pairs---a distributed hash system

35
Chord
  • It is a major DHT architecture
  • Forms a massive virtual ring in which every node
    in the distributed system is a member---each
    owning part of a periphery.
  • If hash value of a node is h, and the lower value
    is hL, and the higher is hH, then the node with h
    owns objects in the range hL
  • E.g., if a,b, c hash to 100, 120, and 175,
    respectively, then b is responsible for IDs in
    the range 101-120 c is responsible for 121-175.
  • When a new node joins, it computes its hash and
    then joins at the right place in the ring then
    the corresponding range of objects are
    transferred to it.
  • Potential problems---adjacent nodes could be far
    apart in distance
  • Statistics Average path length in an internet is
    22 network routers leading an average length of
    10 milliseconds this further slowed by slow nodes

36
Chord---cont.
  • Two mechanisms in Chord
  • Applications that repeatedly access the same
    object---Chord nodes cache link information so
    that after the initial lookup each node on the
    path remembers (its IP addresses) all nodes on
    the path for future use.
  • When a node joins the Chord system, at hashed
    location hash(key), it looks up the nodes
    associated with hash(key)/2, hash(key)/4,
    hash(key)/8, etc. This is in a circular range.
  • It uses a binary search to locate an object
    resulting in log(N) search time but this is not
    good enough---cached pointers help the effort
  • Frequent leaves creates dangling pointersa
    problem
  • Churnfrequent joins/leaves---results in several
    key shuffles---a problem

37
Document Routing Chord
  • MIT project
  • Uni-dimensional ID space
  • Keep track of log N nodes
  • Search through log N nodes to find desired key

38
Pastry
  • Basic idea Construction of a matrix (of size r x
    logrN) of pointers at each participating node---r
    is a radix and N is the size of the network If N
    165 and r 5, then each matrix is of size 16 x
    5.
  • Maps keys to a hashed space (like others)
  • By following the pointers, a request is routed
    closer and closer to the node owning the portion
    of the space that an object belongs to.
  • Hexadecimal addresses with r5, the address has
    5 hexadecimals 65A1FC as in the example.
  • Top row has indices from 0 to F representing the
    1st hexadecimal in the hash address. For 65A1FC,
    there is a match at 6, so it has another level of
    index 0-F representing the 2nd position in the
    address. For the current node, there is a 2nd
    level match at 5 so this node is extended to
    next level from 0-F once again there is a match
    at A which is further expanded to the 4th level
    This has 0-F in the 4th position, current one
    matching at F. This is further expanded to 5th
    level from 0-F (not shown in Figure 25.5). Thus,
    it has 16 x 5 matrix of pointers to nodes.
  • To take care of joins/leaves, Pastry periodically
    probes each pointer (finger) and repairs broken
    links when it notices problems
  • It uses an application-level multicast (overlay
    multicast architecture)

39
Doc Routing Tapestry/Pastry
43FE
993E
13FE
  • Global mesh
  • Suffix-based routing
  • Uses underlying network distance in constructing
    mesh

73FE
F990
04FE
9990
ABFE
239E
1290
40
Node Failure Recovery
  • Simple failures
  • know your neighbors neighbors
  • when a node fails, one of its neighbors takes
    over its zone
  • More complex failure modes
  • simultaneous failure of multiple adjacent nodes
  • scoped flooding to discover neighbors
  • hopefully, a rare event

Slide modified from another presentation
41
Comparing Guarantees
State
Search
Model
log N
log N
Uni-dimensional
Chord
Multi-dimensional
2d
dN1/d
CAN
b logbN
logbN
Global Mesh
Tapestry
logbN
Neighbor map
Pastry
b logbN b
42
Remaining Problems?
  • Hard to handle highly dynamic environments
  • Usable services
  • Methods dont consider peer characteristics
Write a Comment
User Comments (0)
About PowerShow.com