Title: Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems
1Distributed Systems Concepts and Design Chapter
10 Peer-to-Peer Systems
- Bruce Hammer, Steve Wallis, Raymond Ho
210.1 Introduction
- Peer-to-Peer Systems
- Where data and computational resources are
contributed by many hosts - Objective to balance network traffic and reduce
the load on the primary host - Management requires knowledge of all hosts, their
accessibility, (distance in number of hops),
availability and performance. - They exploit existing naming, routing, data
replication and security techniques in new ways
310.1 Introduction
- Goal of Peer-to-Peer Systems
- Sharing data and resources on a very large scale
- Applications that exploit resources available at
the edges of the Internet storage, cycles,
content, human presence (Shirky 2000) - Uses data and computing resources available in
the personal computers and workstations
410.1 Introduction
- Characteristics of Peer-to-Peer Systems
- Each computer contributes resources
- All the nodes have the same functional
capabilities and responsibilities - No centrally-administered system
- Offers a limited degree of anonymity
- Algorithm for placing and accessing the data
- Balance workload, ensure availability
- Without adding undue overhead
510.1 Introduction
- Evolution of Peer-to-Peer Systems
- Napster download music, return address
- Freenet, Gnutella, Kazaa and BitTorrent
- More sophisticated greater scalability,
anonymity and fault tolerance - Pastry, Tapestry, CAN, Chord, Kademlia
- Peer-to-peer middleware
610.1 Introduction
- Evolution (Continued)
- Immutable Files, (music, video)
- GUIDs (Globally Unique Identifiers)
- Middleware to provide better routing algorithms,
react to outages - Evolve to mutable files
- Application within one companys intranet
710.2 Napster and its Legacy
- Napster
- Provided a means for users to share music files
primarily MP3s - Launched 1999 several million users
- Not fully peer-to-peer since it used central
servers to maintain lists of connected systems
and the files they provided, while actual
transactions were conducted directly between
machines - Proved feasibility of a service using hardware
and data owned by ordinary Internet users
810.2 Napster and its Legacy
910.2 Napster and its Legacy
- Bit Torrent
- Designed and implemented 2001
- Next generation from Napster - true Peer To Peer
(P2P) - Can handle large files e.g WAV, DVD, FLAC (e.g
1CD approx 500KB) - After the initial pieces transfer from the seed,
the pieces are individually transferred from
client to client. The original seeder only needs
to send out one copy of the file for all the
clients to receive a copy - Tracker URL hosted at Bit Torrent site e.g
Traders Den
1010.2 Napster and its Legacy
- Bit Torrent (contd)
- Many Bit Torrent clients e.g Vuze
- Keep track of seeders and leechers
- Torrent contains metdata about the files to be
shared and about the tracker - Tracker - coordinates the file distribution, and
which controls which other peers to download the
pieces of the file.
1110.2 Napster and its Legacy
1210.2 Napster and its Legacy
1310.2 Napster and Its Legacy
1410.2 Napster and its Legacy
1510.2 Napster and its Legacy
1610.2 Napster and its Legacy
1710.3 Peer-to-Peer Middleware
- Peer To Peer Middleware
- To provide mechanism to access data resources
anywhere in network - Functional Requirements
- Simplify construction of services across many
hosts in wide network - Add and remove resources at will
- Add and remove new hosts at will
- Interface to application programmers should be
simple and independent of types of distributed
resources
1810.3 Peer-to-Peer Middleware
- Peer To Peer Middleware (contd)
- Non-Functional Requirements
- Global Scalability
- Load Balancing
- Optimization for local interactions between
neighboring peers - Accommodation to highly dynamic host availability
- Security of data in an environment simplify
construction of services across many hosts in
wide network - Anonymity, deniability and resistance to
censorship
1910.3 Peer-to-Peer Middleware
- Peer To Peer Middleware (contd)
- Global scalability, dynamic host availability and
load sharing and balancing across large numbers
of computers pose major design challenges. - Design of Middleware layer
- Knowledge of locations of objects must be
distributed throughout network - Use of replication to achieve this
2010.4 Routing Overlays
- Routing Overlays
- Sub-systems, APIs, within the peer-to-peer
middleware - Responsible for locating nodes and objects
- Implements a routing mechanism in the application
layer - Separate from any other routing mechanisms such
as IP routing - Ensures that any node can access any object by
routing each request thru a sequence of nodes - Exploits knowledge at each node to locate the
destination
2110.4 Routing Overlays
- GUIDs
- pure names or opaque identifiers
- Reveal nothing about the locations of the objects
- Building blocks for routing overlays
- Computed from all or part of the state of the
object using a function that deliver a value that
is very likely to be unique. Uniqueness is then
checked against all other GUIDs - Not human readable
2210.4 Routing Overlays
- Tasks of a routing overlay
- Client submits a request including the object
GUID, routing overlay routes the request to a
node at which a replica of the object resides - A node introduces a new object by computing its
GUID and announces it to the routing overlay - Clients can remove an object
- Nodes may join and leave the service
2310.4 Routing Overlays
- Types of Routing Overlays
- DHT Distributed Hash Tables
- DOLR Distributed Object Location and Routing
- DOLR is a layer over the DHT that maps GUIDs and
address of nodes - DHT GUIDs are stored based on the hash value
- DOLR GUIDs host address is notified using the
Publish() operation
2410.5 Overlay Case Studies Pastry, Tapestry
2510.6 Application Case Studies Squirrel,
OceanStore, Ivy Squirrel
2610.7 Summary
- Napster immutable data, unsophisticated routing
- Current mutable data, routing overlays,
sophisticated algorithms - Internet or company intranet support
- Distributed Computing (SETI)
2710.7 Summary
- Benefits of Peer-to-Peer Systems
- Ability to exploit unused resources (storage,
processing) in the host computers - Scalability to support large numbers of clients
and hosts with load balancing of network links
and host computer resources - Self-organizing properties of the middleware
platforms reduces costs
2810.7 Summary
- Weaknesses of Peer-to-Peer Systems
- Costly for the storage of mutable data compared
to trusted, centralized service - Can not yet guarantee anonymity to hosts
2910 Peer-to-Peer Systems