P2P Systems - PowerPoint PPT Presentation

About This Presentation
Title:

P2P Systems

Description:

Definition: Nodes of equal roles exchanging information and services directly ... System metadata (e.g filename, bitrate, filesize etc) ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 45
Provided by: gza6
Category:
Tags: p2p | donkey | systems

less

Transcript and Presenter's Notes

Title: P2P Systems


1
P2P Systems technologies
  • Zacharioudakis Giorgos

2
Presentation overview
  • P2P architectures typical systems
  • Technical issues
  • Popular P2P Systems
  • Research areas
  • Project JXTA technology
  • Vision about SeLene project

3
What is Peer-to-Peer?
  • Definition Nodes of equal roles exchanging
    information and services directly
  • Scale millions (billions?) of peers
  • Nature of peers PCs
  • Application lightweight semantics (e.g.,
    file-sharing)
  • Is this a new idea?
  • IP routing
  • DNS, NTP
  • Distributed Databases

4
P2P vs. Distributed DBMS
  • Traditional DDBMS Issues
  • Transactions
  • Network Partitions
  • Distributed Query Optimization
  • Interoperation of heterogeneous data sources
  • Reliability/failure of nodes
  • Complex features do not scale
  • Example P2P application file-sharing
  • Simple data model query language
  • No complex query optimization
  • Easy interoperation
  • No guarantee on quality of results
  • Individual site availability unimportant
  • Local updates
  • No transactions
  • Network partitions OK
  • Simple Amenable to large-scale network of
    PCs

5
P2P Applications
  • File sharing
  • Napster, Gnutella
  • Instant Messaging
  • Jabber
  • Distributed Computation
  • SETI_at_home
  • Web services
  • Akamai
  • Distributed storage
  • Freenet
  • Anonymity, censorship resistance
  • Mixmaster remailers
  • Red Rover, Publius
  • Cooperative work
  • Groove
  • Other ...

6
Technical issues
  • scalability
  • fault tolerance
  • speed
  • bandwidth consumption
  • processing cost
  • security
  • anonymity
  • publishing/retrieval
  • metadata
  • semantic querying
  • availability of results
  • interoperability
  • ...

7
Metadata and Interoperability
  • Metadata
  • System metadata (e.g filename, bitrate, filesize
    etc)
  • Resource metadata (e.g relations, hierarchies
    etc)
  • Currently, queries are in the form of keyword
    matching
  • We would like to perform queries in more
    expressive languages, taking advantage of
    semantic knowledge metadata
  • Technologies
  • Programming interfaces
  • XML-RPC, SOAP, HTTP, JXTA
  • Data and metadata representation - common
    ontologies and format
  • XML, RDF

8
Different Approaches to Distributed Search
  • Network topology based architectures
  • Relies on the organization of peers within the
    network to route requests
  • These approaches focus on how to reduce the
    diameter of the graph representing the
    distributed networks
  • Content based approaches
  • Message content is used in either the
    organization of the network or the routing of
    messages or both
  • These approaches focus on how to reduce the query
    path-length of the access structure they use

9
Spectrum of Purity
  • Hybrid
  • Centralized index, P2P file storage and transfer
  • Napster, SETI_at_home
  • Super-peer
  • A pure network of hybrid clusters
  • Morpheus, e-donkey
  • Pure
  • functionality completely distributed
  • Freenet, Gnutella

10
Publishing/Requesting/Responding
  • hybrid
  • central indexing
  • each node registers to a central index
  • queries are performed to the central index
  • retrieval is done from other peer nodes
  • pure
  • each peer manages its own index about local
    (remote) resources
  • queries are typically performed with broadcasts
  • retrieval is done from responding peers that
    hold the requested resource
  • super-peers
  • some nodes act as coordinators and manage indices
    for a subset of nodes
  • each node registers to its local coordinator
  • queries are performed to the coordinators, which
    in turn communicate as in a distributed p2p
    system with other super-peers
  • retrieval is done from other peers that hold
    the requested resource

11
Representative P2P Systems
  • Network topology based architectures
  • Napster
  • Gnutella
  • Morpheus
  • Content based architectures
  • Chord
  • P-Grid

12
Napster (hybrid)
  • Membership Each client joins a server, where he
    registers its local files to the central index
  • Query A client make queries to the central
    server which returns references to the clients
    that actually hold the resources
  • Retrieval The client connects to other peer
    clients and retrieves the resource. The selection
    is performed by the user but it could be done
    automatically based on bandwidth, load or other
    criteria

13
Napster (hybrid)

14
Gnutella (pure)
  • Gnutella is not a system it is a protocol, with
    various existing gnutella clients that implement
    it.
  • Membership Through a predefined static list
    with addresses or through host caches, a peer
    can connect to a set of gnutella clients. After
    connection a client expands its list of known
    addresses with the lists obtained from other
    peers.
  • Query A peer broadcasts a query to its known
    peers these forward the query to their known
    peers and so on until a max TTL (packets Time To
    Live) is reached, which is the depth limit of the
    query.
  • Retrieval Peers that hold the requested resource
    respond to the peer that issued the query.
    Through the reverse path of the query, the
    originating peer finally discovers a list of
    peers having the resource and then obtains it
    from one of them.

15
Gnutella (pure)
Breadth-First Search (BFS)
16
Gnutella (pure)
  • Each peer maintains a small minimum number of
    simultaneous active connections
  • These peers are selected from a locally
    maintained host catcher list containing the
    addresses of all known peers
  • Peer discovery
  • watching PING-PONG messages
  • noting the addresses of peers initiating queries
  • receiving connections from previously unknown
    hosts
  • out-of-band channels (IRC, Web)
  • host caches
  • Query propagation upon receiving a query a peer
    broadcasts it to all peers that is currently
    connected to, and so on as a chain letter
  • If a peer has a file that matches the query,
    sends an answer back (though it still forwards
    the query). This process continues to a maximum
    depth (search horizon)

17
Morpheus (Super-Peer)
  • Self organizing network
  • Neither search requests nor actual downloads pass
    through any central server
  • The network is multi-layered, so that more
    powerful computers get to become search hubs
    ("SuperNodes")
  • Any client may become a SuperNode, if it meets
    the criteria of processing power, bandwidth and
    latency
  • Network management is automatic - SuperNodes
    appear and disappear according to demand

18
Morpheus (Super-Peer)
SN2
SN4
SN4 12.34.56.78
SN3
SN1
19
Morpheus (Super-Peer)
  • Intelligent downloads
  • Morpheus implements a type of fail-over system
    that attempts to locate another peer sharing the
    same file, and automatically resume the download
    where it left off at the failed host
  • When Morpheus search engine finds that more than
    one active peer is serving a particular file, it
    associates the list of peers with the file for
    later reference
  • If the user instructs Morpheus to download the
    file, it can distribute the download task over
    this list of peers
  • SuperNodes act like local search
    hubs
    and proxy search requests
    on
    behalf of their connected peers

20
Chord (content based search)
  • Chord is a lookup service, not a search service
  • Based on binary search trees
  • Provides just one operation
  • A peer-to-peer hash lookup
  • Lookup(key) ? IP address
  • Chord does not store the data
  • Uses Hash function
  • Key identifier SHA-1 (key)
  • Node identifier SHA-1 (IP address)
  • Both are uniformly distributed
  • Both exist in the same ID space
  • How to map key IDs to node IDs?
  • A key is stored at its successor node with next
    higher ID (modulo N)

M
0
21
Chord (content based search)
  • The goal of Chord is to provide the performance
    of a binary search which means O(log N) query
    path-length
  • In order to manage a maximum path-length O(log N)
    each node maintains a routing table (called
    finger table) with at most m entries (where
    mlogN)
  • The ith entry in the table at node n contains
    the identity of the first node s that succeeds n
    by at least 2i-1 on the identifier circle (all
    arithmetic modulo 2m)
  • i.e., s successor(n 2i-1), 1 i m
  • Note that the first finger of n is its
    immediate successor on the circle

existing node
not existing node, but a possible value in ID
space
22
Chord (content based search)
  • Important characteristics
  • Each node stores info only about a small number
    of possible IDs (at most logN)
  • Knows more info about nodes closely following it
    on the identifier circle
  • A nodes table does not generally contain enough
    info to locate the successor of an arbitrary key
    k

0
1
7
6
2
5
3
4
23
Chord (content based search)
Finger Table Allows Log(n)-time Lookups
  • How do we locate the successor of a key k?
  • If n can find a node whose ID is closer than its
    own to k, that node will know more about the
    identifier circle in the region of k than n does
  • Thus n searches its finger table for the node j
    whose ID most immediately precedes k, and asks j
    for the node it knows whose ID is closest to k

N5
N10
N110
  • By repeating this process, n learns about nodes
    with IDs closer and closer to k
  • Gradually we will find the immediate predecessor
    of k

K19
N20
N99
N32
N80
N60
24
Chord Autonomy
  • When new keys are inserted the system is not
    affected. It just finds the appropriate node and
    stores it
  • When nodes join or leave, the finger tables must
    be correctly maintained and also some keys must
    be transferred to other nodes
  • Also, every key is stored only in one node, which
    means that if that node becomes unavailable the
    key is also unavailable
  • This incurs an O(log2N) cost for maintaining the
    finger tables and assuring correctness of the
    system while nodes join/leave the system
  • This imply a restricted autonomy of the system
  • The only replicated information is (implicitly)
    the finger tables, because each node has to
    maintain its own

25
P-Grid
  • Basic characteristics
  • Based on building distributed, binary prefix
    trees
  • Use of randomized algorithms for constructing the
    access structure, updating the data and
    performing the search
  • Scale gracefully, equally for all nodes
  • Access structure
  • We assume that the index terms are binary
    strings, built from 0s 1s
  • The search space is partitioned into intervals
  • Every peer takes over responsibility for one
    interval
  • As each key corresponds to a path in the binary
    prefix tree the peer is also responsible for one
    path of the search tree
  • Each peer stores the peers responsible for the
    other branches of the path for routing
  • Search requests are either processed locally or
    forwarded to the peers on the alternative branches

26
P-Grid
  • P-Grid construction
  • Initially, all peers are responsible for the
    whole search space
  • Whenever peers meet, they try to make a
    refinement to the access structure
  • they split the search space into two parts and
    each take the responsibility for the one half
  • They also store the reference to the other peer
    in order to cover the other part of the search
    space
  • The same happens whenever two peers meet, that
    are responsible for the same interval at the same
    level
  • To avoid overspecialization of peers, we restrict
    the maximal length of paths that can be
    constructed to a defined maxlength

27
P-Grid
Key intervals Level 0
001
0010
01
0100
100
1001
1011
110
28
P-Grid
queries
Key intervals Level 0
0
1
Key intervals Level 1
01
11
00
10
Key intervals Level 2
001
0010
01
0100
100
1001
1011
110
29
P-Grid Autonomy
  • The system implies that peers eventually meet,
    but does not examine how does this occur, i.e. it
    is possible that they never meet
  • As many peers can be responsible for the same key
    the general problem is how to find all those
    peers in case of an update
  • Proposed solutions
  • multiple BFS or DFS searches for a key and
    propagating the update to them
  • Creating lists of buddies for each peer (i.e.
    other peers that share the same key) and
    propagate the update to all buddies
  • These imply that although the system is
    decentralized and peers does not rely to central
    authorities, the construction and update of the
    access structure may impose some performance
    issues, especially when updating a key

30
P-Grid Autonomy
  • When a new node enters the system, assumes that
    he is responsible over the whole prefix namespace
    interval
  • When he meets with other nodes they split the
    interval and each maintain a reference to the
    other node
  • When a node leaves abruptly, the other nodes have
    incorrect references and as soon as they are
    aware of it they resume responsibility over
    that prefix interval
  • The replicated information in this system is the
    multiple references to the same keys and the
    buddies lists (when used) in order to face the
    update problem

31
P2P comparison
32
P2P performance metrics
  • Bandwidth
  • Storage (replication)
  • Processing cost
  • Path-length (required hops)
  • Quality of Results
  • Number of results
  • Satisfaction (true if results gt X, false
    otherwise)
  • Time to satisfaction

33
Hybrid p2p
  • Advantages
  • Simple to manage and availability of results -due
    to central indexing
  • Less (aggregated) bandwidth consumption
  • Small processing cost for peers
  • Idle nodes that do not offer resources does not
    downscale systems performance
  • Disadvantages
  • Does not scale
  • Single point of failure
  • Great processing cost for server
  • Vulnerable to censorship

34
Pure p2p
  • Advantages
  • Efficiency harnessing unused resources
  • Self-organizing
  • Robustness and availability through replication
  • Anonymity/legal protection/censorship resistant
  • Disadvantages
  • Difficult to manage and poor results due to lack
    of central indexing
  • Bandwidth consuming
  • Idle nodes downscale the overall performance
  • Higher processing cost for peers

35
Super peers
  • Advantages
  • Scalable
  • Fault tolerant
  • Adaptable and self-organizing
  • Efficient
  • Low path-length
  • Disadvantages
  • Hard to manage/maintain
  • Complex topology, difficult to evaluate its
    metrics (through simulation or trace driven
    analysis)

36
Content-based searching architectures
  • Advantages
  • Low search cost ( O(logN) )
  • Harnessing the content information into queries.
  • Good approach for content that can be described
    with simple attributes.
  • Less messages per query than a random graph.
  • Load balancing.
  • Disadvantages
  • More restrictions than topology-based
    architectures when nodes join/leave, rehashing
    and content migration needs to be performed.
  • A peer needs to know what is looking for, to map
    it to an address.
  • Not practical for content described by multiple
    attributes.
  • Storage and routing are closely connected

37
Conclusions about p2p systems
  • Benefits
  • efficiency harnessing unused resources
  • Self-organizing
  • Sharing cost of ownership
  • Robustness and availability through replication
  • Anonymity/legal protection
  • Challenges
  • No authority to enforce behavior
  • Cooperation
  • Unreliability of individual peers
  • Efficiency of distributed operations (absolute
    resources)
  • Imposed research issues
  • Resource Management
  • Security
  • Efficient Search

38
Resource Management
  • Resource
  • Storage/information
  • CPU processing
  • Bandwidth
  • Issues
  • fairness
  • load balancing

39
Security
  • Issues
  • Reputation
  • Trust
  • Accountability
  • Information Preservation Quality
  • Denial of service attacks
  • Problem Detecting and punishing bad behavior

40
Efficiency of Search
  • Problem finding needle in haystack
  • Efficiency measured in terms of absolute
    resources consumed
  • Bandwidth
  • Processing cost
  • Several factors
  • Purity
  • Control
  • Query expressiveness

41
Project JXTA
  • JXTA is a set of protocols which allow peers to
    discover and communicate with each other
  • Protocols are defined in terms of XML messages
    exchanged between peers
  • JXTA is platform (e.g Windows), language (e.g
    Java) and transport (e.g TCP/IP) independent

42
JXTA Concepts
  • Concepts
  • Peer - a node that speaks the JXTA protocols
  • Peer Group - a collection of cooperating peers
  • Message - a datagram containing an envelope,
    protocol headers and bodies
  • Pipe - an async communication channel for
    sending/receiving messages
  • Advertisement - an XML document that publishes
    the existence of a resource (peer, peer group,
    pipe, service)

43
JXTA Model
44
JXTA Protocols
  • Peer Discovery Protocol - used between any peers
    to find other peers, peer groups, or
    advertisements
  • Peer Information Protocol - used to learn about
    another peer's properties
  • Peer Resolver Protocol - 'foundation protocol'
    for the Peer Discovery Protocol and the Peer
    Information Protocol. Can be used to build other
    protocols as well. Defines send/receive 'generic
    queries' and responses to be sent from one peer
    to another
  • Peer Membership Protocol - used to find out
    about, join and leave groups
  • Pipe Binding Protocol - used to bind a pipe to an
    actual endpoint
  • Peer Endpoint Protocol - used to provide routing
    information for paths between peers (if a direct
    connection is not possible)

45
JXTA Search
  • JXTASearch is a framework for searching in
    distributed networks
  • A protocol for registration, query and response
  • A series of services for interacting via this
    protocol

46
JXTA Search
  • Advantages
  • Supports very dynamic networks
  • Reduce publishing and query response latency
  • Centralized control (centralized implementation
    of security, accounting, membership, )
  • Disadvantages
  • Single point of failure
  • Scalability
  • Centralized control 

47
Towards a Super-Peer Architecture for SeLene
48
References
  • http//www.internet2.edu/presentations/20020131-P2
    P-Kan.htm
  • http//softwaredev.earthweb.com/java/article/0,,12
    082_783281,00.html
  • http//www.cs.vu.nl/pub/globe/cp2pc/notes/allnotes
    /jxta.overview
  • http//wiki.cs.uiuc.edu/cs427/P2PArchitecture
  • http//www.stanford.edu/class/cs347/handouts/p2p.p
    pt
  • http//cv.uoc.es/grc0_000228_web/Marques/Tesi_JM.
    htm
  • http//iew3.technion.ac.il/spektory/098223/presen
    tations/fastTrack.ppt
Write a Comment
User Comments (0)
About PowerShow.com