Title: P2P: An Overview
1P2P An Overview
- Dr. Tony White
- Carleton University
2Outline
- Introduction
- Evolution of Network Computing
- Definitions
- The Rise of Edge Computing
- Why Peer-to-Peer? What is it?
- Applications
- Cycle Sharing
- Content Delivery
-
- Open Problems
- Summary
3Evolution of Network Computing
- Web introduced
- - A common protocol HTTP
- - A common document format HTML
- - A universal client the browser
- Client/server
- - Introduced inequalities
- - Required homogeneity
4P2P Definition
- Peer-to-peer computing is the location and
sharing of computer resources and services by
direct exchange between servents. - A servent is a peer that can adopt the roles of
both server and client when operating.
5P2P Definition
- P2P is a class of applications that takes
advantage of resources -- storage, cycles,
content, human presence -- available at the edges
of the Internet. Because accessing these
decentralized resources means operating in an
environment of unstable connectivity and
unpredictable IP addresses, P2P nodes must
operate outside the DNS system and have
significant or total autonomy from central
servers. Clay Shirkey February, 2000
6Definitions I
- Pure peer-to-peer is completely decentralized and
characterized by lack of a central server or
central entity clients make direct contact with
one another. - Computational peer-to-peer uses P2P technology to
disseminate computational tasks over multiple
clients peers do not have a direct connection to
one another.
7Definitions II
- Datacentric peer-to-peer is information and data
residing on systems or devices that is accessible
to others when users connect. It is sometimes
called peer-assisted or grid-assisted delivery.
Applications include distributed file and content
sharing. - Usercentric/hybrid peer-to-peer involves clients
contacting others via a central server or entity
to communicate, share data, or process data.
Often used in collaboration applications.
8What is a P2P network?
- It is an overlay network
- Peer applications know IP addresses of other peer
applications. - Link between two nodes is actually an
application-level connection.
9What matters?
- Topology of overlay matters
- Where content is stored matters
- Search protocol matters
- Gnutella results in
- Poor performance
- Poor reliability
10The Rise of Edge Computing
- In P2P, clients also are servers, hence are
peers. - Driving P2P is the abundance of
- Computing power
- Non-volatile storage
- Network bandwidth
- (This seems to turn thin clients on their heads.)
- Sharing from the edge
- Physical Resources cycles, disk
- Information Resources files, database access
- Services code mobility implied
11P2P Enables Complete Access
- P2P file swapping is the obvious application
- Text, audio, video, executables,
- Searching and sharing
- Resources
- Information
- Information processing capacity
- Searches
- More current than Google
- Indexing web logs (blogs, klogs )
- More focused search within a peer group
12P2P Enables Complete Access
- Searching and sharing
- Instant messaging
- locate user quickly independent of service
provider. - Buyers and sellers
- P2P auctions compete with Ebay.
- Blogging
- Sharing of self.
- Edge-based multi-media streaming
- Web radio
- Web TV
- Peer shells
- Script complex P2P applications from simpler
ones. - Service creation using service composition.
13P2P Enables Complete Access
- A New Style of Distributed Computing
- P2P applications tolerate peers coming/going.
- Result depends on which peers are available.
- High availability comes from probability that
some peers are available. - Not on load-balancing and fail over schemes.
- Must avoid tragedy of the commons.
14Examples of Early P2P
- Some new Internet applications are different
- SETI_at_home
- Instant messaging services (AIM, MS Messenger, )
- P2P applications no central authority/server.
- Napster quasi-P2P
- Gnutella
- Freenet
- These applications are vertically integrated
- Non-standard protocols
- Closed namespaces
- Stand alone
15Problems I
- Topology
- Bandwidth usage
- Fault tolerance
- Search efficiency
- Identity
- Trust
- Anonymity
- Security
- Authorization
- Privacy
16Problems II
- Namespaces
- Community Management
- Overlaps traditional enterprise groups
- Highly dynamic, user controlled
- Firewall traversal
- Political
- IT loses control of content distribution
- No control of information flow!
- Legal
- DRM
17What is needed?
- Interoperability (common protocols standards)
- Communication protocols (e.g. JXTA, Jabber, )
- Representation of identity (or not!)
- Semantic content (meta-data)
- Secure information exchange
- Must be able to guarantee trust within a network
- Prevent unauthorized access to network
- Policy-based control of information exchange
- Ubiquity
- Buy-in from large groups of users
18Securing Distributed Computationsin a Commercial
Environment
- Philippe Golle, Stanford University
- Stuart Stubblebine, CertCo
19Example of a Distributed Computation
- 580,000 active participants
- 565,800 years of CPU time since 1996
- 26.1 TeraFLOPs / sec
20Commercialization supply
- A dozen of companies have recruited thousands of
participants - 100 million in venture funding in 2000
- www.mithral.com
- www.dcypher.net
- (with www.processtree.com)
- www.distributed.net
- www.entropia.com
- www.parabon.com
- www.uniteddevices.com
- www.popularpower.com
- www.distributedsciences.com
- www.datasynapse.com
- www.juno.com
21Commercialization demand
- Super-computing market 2 billion / year
- Computationally intensive parallelizable
projects - Drug design research
- Mathematical research
- Economic simulations
- Digital entertainment
22Cheaters!
"Fifty percent of the project's resources have
been spent dealing with security problems"
The really hard part has to do with verifying
computational results" David Anderson,
Seti_at_home's director.
23Cycle Sharing Participants
- Trusted supervisor
- Maintains a pool of registered participants
- Bids for large computations
- Divides the computation into tasks that are
assigned to participants - Collects the results and distributes payment to
the participants - Example Distributed.net, Entropia.com, etc
- Untrusted participants
- May range from large companies to individual
users - Participants are anonymous (No real world
leverage) - Participants may collude. We distinguish between
real-world entities (agents) and anonymous
participants. - Participants may leave the computation at any
time, either temporarily or for good.
24Organization
- Distribution of tasks
- The unit of computation is a task
- Assumption all tasks have the same size and can
be run by any participant within the same time
bounds. - The supervisor runs a probabilistic algorithm to
assign tasks to participants. - The supervisor keeps track of who did what
25Security
- Definition a computation is secure if no
rational, non-risk-seeking participant ever
cheats. - Collusion may occur only before tasks are
assigned. - A participant has 3 choices
- Request a computation and do it
- Request a computation and NOT do it
- Take a leave
- Assumption all errors are malicious
26Utility function of an agent
a
Run the computation Cheat and guess the
result
Cheating detected Cheating
undetected
L a E
a Payment received per task E Benefit of
defecting (E e a) L Cost of getting caught
cheating
- Security condition (aE)P L(1-P) lt 0
- where P is the probability that cheating is
undetected
27Basic scheme
- Registration
- Participant performs d1 unpaid tasks
- The supervisor verifies them (at limited cost)
- The participant is accepted iff all the results
are correct - Assignment of a task
- A task is given to N participants chosen
uniformly independently at random - The number N is chosen according to the
probability distribution - Payment a constant amount a per task if all the
results agree - If not, the task is re-assigned to a new set of
participants - Severance a participant is paid an amount d.a
28Properties
- Computational overhead
(aE)P L(1-P) lt 0 - Security condition
Computational overhead Setup time Maximum coalition size Maximum e
10 10 1 1
17 10 10 1
46 10 1 10
243 10 1 100
29Participants with varying computational resources
- Until now, implicit assumption that all
participants have the same computational
resources. - Unrealistic assumption
- Security threat an adversary may briefly control
a number of participants out of proportions with
her real computational power
- Activity a probability distribution over the
pool of participants, which evolves dynamically
over time - Participants are drawn at random according to the
Activity - We define rules for updating the activity
- Security implications
30Content Delivery Networks
- Swarmcast/OnionNetworks
- File is stored in multiple locations
- Idea is to retrieve portions of file from
separate hosts - File is split into small (32k) pieces
- Requests are random
- Space of packets bigger than file
- Only subset of packets required
- Technique is Forward Error Correction
- Kazaa/Morpheus
- MojoNation (HiveCache)
- Distributed backup and restore system
31Privacy Networks Publius
- Publius
- Publishers want to publish anonymously
- Servers host random-looking content
- Storage
- The publisher takes the key, K that is used to
encrypt the file and splits it into n shares,
such that any k of them can reproduce the
original K, but k-1 give no hints as to the key. - Each server receives the encrypted Publius
content and one of the shares. - Retrieval
- A retriever must get the encrypted Publius
content from some server and k of the shares. - Content is tied to URL that is used to recover
the data and the shares.
32Privacy Networks Freehaven
- Anonymity
- Publishers that insert documents,
- Readers that retrieve documents,
- Servers that store documents.
- Uses a free, low-latency, two-way mixnet for
forward-anonymous communication. - Accountability
- Reputation and micropayment schemes, which allow
us to limit the damage done by servers that
misbehave. - Persistence
- Publisher of a document determines its lifetime.
- Flexibility
- System functions smoothly as peers dynamically
join or leave
33OceanStoreToward Global-Scale, Self-Repairing,
Secure and Persistent Storage
- John Kubiatowicz
- University of California at Berkeley
34OceanStore Context Ubiquitous Computing
- Computing everywhere
- Desktop, Laptop, Palmtop
- Cars, Cellphones
- Shoes? Clothing? Walls?
- Connectivity everywhere
- Rapid growth of bandwidth in the interior of the
net - Broadband to the home and office
- Wireless technologies such as CMDA, Satelite,
laser - Where is persistent data????
35Utility-based Infrastructure?
- Data service provided by storage federation
- Cross-administrative domain
- Pay for Service
36OceanStore Everyones Data, One Big Utility
The data is just out there
- How many files in the OceanStore?
- Assume 1010 people in world
- Say 10,000 files/person (very conservative?)
- So 1014 files in OceanStore!
- If 1 gig files (ok, a stretch), get 1 mole of
bytes! - Truly impressive number of elements but small
relative to physical constants - Aside new results 1.5 Exabytes/year (1.5?1018)
37OceanStore Assumptions
- Untrusted Infrastructure
- The OceanStore is comprised of untrusted
components - Individual hardware has finite lifetimes
- All data encrypted within the infrastructure
- Responsible Party
- Some organization (i.e. service provider)
guarantees that your data is consistent and
durable - Not trusted with content of data, merely its
integrity - Mostly Well-Connected
- Data producers and consumers are connected to a
high-bandwidth network most of the time - Exploit multicast for quicker consistency when
possible - Promiscuous Caching
- Data may be cached anywhere, anytime
38The Peer-To-Peer ViewIrregular Mesh of Pools
39Key ObservationWant Automatic Maintenance
- Cant possibly manage billions of servers by
hand! - System should automatically
- Adapt to failure
- Exclude malicious elements
- Repair itself
- Incorporate new elements
- System should be secure and private
- Encryption, authentication
- System should preserve data over the long term
(accessible for 1000 years) - Geographic distribution of information
- New servers added from time to time
- Old servers removed from time to time
- Everything just works
40Outline Three Technologies anda Principle
- Principle ThermoSpective Systems Design
- Redundancy and Repair everywhere
- Structured, Self-Verifying Data
- Let the Infrastructure Know What is important
- Decentralized Object Location and Routing
- A new abstraction for routing
- Deep Archival Storage
- Long Term Durability
41Attack Resistant P2P
- Content can be compromised by
- Attack by malicious agents
- Censorship
- Faulty nodes
- Remember
- Nodes have finite resources
42Gnutella
43Morpheus/Kazaa
super peer
44Examples
- Napster shut down by attacks on central server
- Gnutella spammed by Flatplanet
- Removal of a few peers shatters Gnutella
- 63 from 1800 in figures
45Performance
After deletion of 2/3 of peers, 99 of remainder
can still access 99 of the data items
46DRN design Jared Saia
- Topology based upon butterfly network (constant
degree version of hypercube) - Each vertex of butterfly called a supernode
- Each supernode represents a set of peers
- Each peer is in multiple supernodes
47DRN Topology
N peers, n supernodes Each peer participates in
Clogn randomly chosen supernodes Supernode X
connected to supernode Y means all nodes in X
connected to all nodes in Y
48Conclusion
- P2P systems popular today
- Limewire, Kazaa
- Existing P2P systems vulnerable and inefficient
- Many challenges ahead
- Search
- Resource Management
- Security and Privacy
Lots of good research to be done
49Appendix I
- Open Problems in P2P Data Sharing
50Open Problems in Data Sharing Peer-To-Peer Systems
- Hector Garcia-Molina
- ICDT Conference, January 10, 2003
- Contributors Mayank Bawa, Brian Cooper, Arturo
Crespo, - Neil Daswani, Prasanna Ganesan, Sergio Marti,
- Qi Sun, Beverly Yang and others
51P2P Challenges
- Search
- Resource Management
- Security Privacy
? not independent challenges!
52Search
- Search Options
- Query Expressiveness
- Comprehensiveness
- Topology
- Data Placement
- Message Routing
53Comparison
????
??
????
?
???
54Content Addressable Network (CAN)
1
Nodes
Data
2
A distributed hash table on Internet scales
55Comparison
????
?
??
????
????
??
???
?
???
??
56Challenge Exploring the Space
autonomy
gnutella
can
efficiency
robustness
57Search Index Link (SIL) Model
- Forwarding search link (FSL)
- Non-forwarding search link (NSL)
- Forwarding index link (FIL)
- Non-forwarding index link (NIL)
58SIL Model
- Forwarding search link (FSL)
- Non-forwarding search link (NSL)
- Forwarding index link (FIL)
- Non-forwarding index link (NIL)
59Super-Peer Network
D
E
A
H
C
core
B
G
F
60SIL Challenges
- Desirable graph properties
- Desirable features
- Dynamic configuration
61Example Property Redundancy
B
C
A
- Redundancy exists in a SIL graph if a link can
be removed without reducing coverage
62Example Undesirable Feature
B
E
A
D
C
- One-index cycle Node A has an index link to B,
and there is a search path from B to A
63Avoiding Undesirable Features
- Node D is joining the system
- what neighbors should it connect to?
- what type of links should it use?
B
E
A
D
?
C
64Open Problems Security
- Availability (e.g., coping with DOS attacks)
- Authenticity
- Anonymity
- Access Control (e.g., IP protection,
payments,...)
65Authenticity
title origin of species
author charles darwin
?
date 1859
body In an island far, far away ...
...
66More than Just File Integrity
title origin of species
author charles darwin
?
date 1859
00
body In an island far, far away ...
checksum
67More than Fetching One File
Torigin Y? Adarwin B?
68Solutions
- Authenticity Function A(doc) T or F
- at expert sites, at all sites?
- can use signature expert sig(doc)
user - Voting Based
- authentic is what majority says
- Time Based
- e.g., oldest version (available) is authentic
69Added Challenge Efficiency
- Example Current music sharing
- everyone has authenticity function
- but downloading files is expensive
70How to Track Peer Behavior?
- Trust Vector v1, v2, v3, v4
a b c d
- Pair of values total downloads, good
downloads ?
71Trust Operations
update?
a
1, .9, .5, 0, 0
.5
.9
b
c
1, 1, 0, .3, 1
1, 0, 1, 1, .2
.3
1
.3
.2
e
d
72Issues
- Trust computations in dynamic system
- Overloading good nodes
- Bad nodes provide good content sometimes
- Bad nodes can build up reputation
- Bad nodes can form collectives
- ...
73Sample Results
Fraction of inauthentic downloads
Fraction of malicious peers
74P2P Challenges
- Search
- Resource Management
- Security and Privacy
75Resource Management
1
2
capacity C1
capacity C2
3
capacity C3
- Local work ?Ci
- Remote work (1 - ?) Ci
76Incentives for Remote Work
- What is best value for ??
- How do I get remote nodes to work for me?
2
1
C2
C1
- Local work ?Ci
- Remote work (1 - ?) Ci
3
C3
77Conclusion
- P2P systems popular today
- Limewire, Kazaa
- Existing P2P systems vulnerable and inefficient
- Many challenges ahead
- Search
- Resource Management
- Security and Privacy
Lots of good research to be done
78For Additional Information
- Google
- Stanford Peers, OceanStore, Tapestry, Chord
- http//www-db.stanford.edu/peers/
- http//www.freehaven.net/
- http//cs1.cs.nyu.edu/waldman/publius/
- http//www.onionnetworks.com
79Appendix II
80Peer-to-Peer is Not Always Decentralizedwhen
Centralization is Good
- Nelson Minar
- ltnelson_at_monkey.orggt
- http//www.media.mit.edu/nelson/
81Talk Overview
- Topologies of distributed systems
- Strengths and weaknesses
- Conclusions
Warning Broad generalizations ahead
82What is P2P Anyway?
- Decentralized Systems no
- Popular Power fails test
- Napster fails test
- Most Instant Messaging fails test
- Confuses topology with function
- Edge Resources yes
- Small computers on edges contribute back
- All peers are active participants
83Distributed Systems Topologies
- Get away from fundamentalism
- Pure P2P, True P2P, etc
- Focus instead on system architecture
- How do the pieces fit together?
- Concentrate on connection topology
- Which topology for which problem?
84Centralized
- Client/server
- Web servers
- Databases
- Napster search
- Instant Messaging
- Popular Power
85Ring
- Fail-over clusters
- Simple load balancing
- Assumption
- Single owner
86Hierarchical
87Decentralized
- Gnutella
- Freenet
- Hive
- Internet routing
88(No Transcript)
89Centralized Centralized
- N-tier apps
- Database heavy systems
- Web services gateways
- Grand Central
90Centralized Ring
- Serious web applications
- High availability servers
91Centralized Decentralized
- Clip2 Gnutella Reflector
- FastTrack
- KaZaA
- Morpheus
- Email
92What about other topologies?
- Centralized Hierarchical?
- Back end tree of information
- Caching architectures
- Decentralized Ring?
- P2P network of fail-over clusters
- Decentralized Hierarchical?
- Decentralized Centralized?
93Strengths and Weaknesses
- Plenty of topologies to choose from
- What is each kind good for?
- Need a set of properties to measure
- Caution What follows is very high level
94Things to Measure
- Manageability
- How hard is it to keep working?
- Information coherence
- How authoritative is info? (Auditing,
non-repudiation) - Extensibility
- How easy is it to grow?
- Fault tolerance
- How well can it handle failures?
- Security
- How hard is it to subvert?
- Resistance to legal or political intervention
- How hard is it to shut down? (Can be good or bad)
- Scalability
- How big can it grow?
95Centralized
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- System is all in one place
- All information is in one place
- No one can add on to system
- Single point of failure
- Simply secure one host
- Easy to shut down
- One machine. But in practice?
96Ring
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Simple rules for relationships
- Easy logic for state
- Only ring owner can add
- Fail-over to next host
- As long as ring has one owner
- Shut down owner
- Just add more hosts
97Hierarchical
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Chain of authority
- Cache consistency
- Add more leaves, rebalance
- Root is vulnerable
- Too easy to spoof links
- Just shut down the root
- Hugely scalable DNS
98Decentralized
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Very difficult, many owners
- Difficult, unreliable peers
- Anyone can join in!
- Redundancy
- Difficult, open research
- No one to sue! (but follow )
- Theory yes Practice no
99Centralized Ring
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Just manage the ring
- As coherent as ring
- No more than ring
- Ring is a huge win
- As secure as ring
- Still single place to shut down
- Ring is a huge win
Common architecture for web applications
100Centralized Decentralized
- Manageable
- Coherent
- Extensible
- Fault Tolerant
- Secure
- Lawsuit-proof
- Scalable
- Same as decentralized
- Better than decentralized
- Anyone can still join!
- Plenty of redundancy
- Same as decentralized
- Still no one to sue
- Looking very hopeful
Best architecture for P2P networks?
101Centralized vs. Decentralized
- Centralized is pretty good!
- Manageable
- Coherent
- Security
- Decentralized is exciting
- Extensible
- Massive fault tolerance
- Lawsuit-proof
- Scalability is the big question
102Conclusions
- Centralized is easy to deal with
- Major architecture for distributed systems
- Combines well with rings
- Decentralized is good, needs research
- Coherence, Manageability, Security
- Scalability
- Hierarchical is overlooked
- Combining architectures is powerful
103Peer-to-Peer is Not Always Decentralizedwhen
Centralization is Good
- Nelson Minar
- ltnelson_at_monkey.orggt
- http//www.media.mit.edu/nelson/
Thanks to Marc Hedlund, Raffi Krikorian, Tony
White
104Appendix III
105P2P Industry Outline
- Theres no peer-to-peer market any more than
theres a client/server market Anne Manes, Sun
Microsystems - Peer-to-peer encompasses a wide range of
technologies centered around decentralizing
computing - Business and revenue models are still fuzzy
- There are clear opportunities and research
excitement
106Distribution of P2P Companies
Category Examples Industry Share
Distributed Computing Entropia United Devices 35
Collaboration / Knowledge Management Groove Networks Engenia 20
Content Distribution Akamai Proksim 10
Infrastructure / Platform Akavi Xdegrees 10
File Sharing Kazaa Napster 10
Distributed Search OpenCola Thinkstream 5
(From P2P 101 An Overview of the P2P Landscape
by Larry Cheng)
107Major Features of P2P Industry
- (From P2P 101 An Overview of the P2P Landscape
by Larry Cheng) - Lack of experienced, quality management teams
- Lack of detailed business models
- Skeptical investors
- 150 active companies
- Estimated 95 failure rate
- The elephant in the room is the fact that most
companies here are not commercially viable. -
Heard from a speaker at OReilly
108Current P2P Business Models
- Sell P2P products to end-users
- No current revenue-generating business model
- Sometimes coupled with content-sale models
- Sell content through P2P
- Subscription-based I buy content from you
- Sponsor-based Someone pays you to give me
content - Ad-based You give me content and sell ads
109Current P2P Business Models
- Sell something which lets others profit from P2P
- Solve a critical problem for decentralized
applications - Offer support and enhanced services for free
tools - Specialized packages for particular industries
- Tools and libraries for P2P infrastructure
- The people most likely to make money during a
Gold Rush are the ones selling pickaxes and
shovels. Andy Oram, The OReilly Network