P2P: An Overview

About This Presentation

Title:

P2P: An Overview

Description:

Title: PowerPoint Presentation Last modified by: Tony White Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:294

Avg rating:3.0/5.0

Slides: 108

Provided by: carle175

Category:

more less

Transcript and Presenter's Notes

Title: P2P: An Overview

1
P2P An Overview

Dr. Tony White
Carleton University

2
Outline

Introduction
Evolution of Network Computing
Definitions
The Rise of Edge Computing
Why Peer-to-Peer? What is it?
Applications
Cycle Sharing
Content Delivery
Open Problems
Summary

3
Evolution of Network Computing

Web introduced
- A common protocol HTTP
- A common document format HTML
- A universal client the browser

Client/server
- Introduced inequalities
- Required homogeneity

4
P2P Definition

Peer-to-peer computing is the location and
sharing of computer resources and services by
direct exchange between servents.
A servent is a peer that can adopt the roles of
both server and client when operating.

5
P2P Definition

P2P is a class of applications that takes
advantage of resources -- storage, cycles,
content, human presence -- available at the edges
of the Internet. Because accessing these
decentralized resources means operating in an
environment of unstable connectivity and
unpredictable IP addresses, P2P nodes must
operate outside the DNS system and have
significant or total autonomy from central
servers. Clay Shirkey February, 2000

6
Definitions I

Pure peer-to-peer is completely decentralized and
characterized by lack of a central server or
central entity clients make direct contact with
one another.
Computational peer-to-peer uses P2P technology to
disseminate computational tasks over multiple
clients peers do not have a direct connection to
one another.

7
Definitions II

Datacentric peer-to-peer is information and data
residing on systems or devices that is accessible
to others when users connect. It is sometimes
called peer-assisted or grid-assisted delivery.
Applications include distributed file and content
sharing.
Usercentric/hybrid peer-to-peer involves clients
contacting others via a central server or entity
to communicate, share data, or process data.
Often used in collaboration applications.

8
What is a P2P network?

It is an overlay network
Peer applications know IP addresses of other peer
applications.
Link between two nodes is actually an
application-level connection.

9
What matters?

Topology of overlay matters
Where content is stored matters
Search protocol matters
Gnutella results in
Poor performance
Poor reliability

10
The Rise of Edge Computing

In P2P, clients also are servers, hence are
peers.
Driving P2P is the abundance of
Computing power
Non-volatile storage
Network bandwidth
(This seems to turn thin clients on their heads.)
Sharing from the edge
Physical Resources cycles, disk
Information Resources files, database access
Services code mobility implied

11
P2P Enables Complete Access

P2P file swapping is the obvious application
Text, audio, video, executables,
Searching and sharing
Resources
Information
Information processing capacity
Searches
More current than Google
Indexing web logs (blogs, klogs )
More focused search within a peer group

12
P2P Enables Complete Access

Searching and sharing
Instant messaging
locate user quickly independent of service
provider.
Buyers and sellers
P2P auctions compete with Ebay.
Blogging
Sharing of self.
Edge-based multi-media streaming
Web radio
Web TV
Peer shells
Script complex P2P applications from simpler
ones.
Service creation using service composition.

13
P2P Enables Complete Access

A New Style of Distributed Computing
P2P applications tolerate peers coming/going.
Result depends on which peers are available.
High availability comes from probability that
some peers are available.
Not on load-balancing and fail over schemes.
Must avoid tragedy of the commons.

14
Examples of Early P2P

Some new Internet applications are different
SETI_at_home
Instant messaging services (AIM, MS Messenger, )
P2P applications no central authority/server.
Napster quasi-P2P
Gnutella
Freenet
These applications are vertically integrated
Non-standard protocols
Closed namespaces
Stand alone

15
Problems I

Topology
Bandwidth usage
Fault tolerance
Search efficiency
Identity
Trust
Anonymity
Security
Authorization
Privacy

16
Problems II

Namespaces
Community Management
Overlaps traditional enterprise groups
Highly dynamic, user controlled
Firewall traversal
Political
IT loses control of content distribution
No control of information flow!
Legal
DRM

17
What is needed?

Interoperability (common protocols standards)
Communication protocols (e.g. JXTA, Jabber, )
Representation of identity (or not!)
Semantic content (meta-data)
Secure information exchange
Must be able to guarantee trust within a network
Prevent unauthorized access to network
Policy-based control of information exchange
Ubiquity
Buy-in from large groups of users

18
Securing Distributed Computationsin a Commercial
Environment

Philippe Golle, Stanford University
Stuart Stubblebine, CertCo

19
Example of a Distributed Computation

580,000 active participants
565,800 years of CPU time since 1996
26.1 TeraFLOPs / sec

20
Commercialization supply

A dozen of companies have recruited thousands of
participants
100 million in venture funding in 2000
www.mithral.com
www.dcypher.net
(with www.processtree.com)
www.distributed.net
www.entropia.com
www.parabon.com
www.uniteddevices.com
www.popularpower.com
www.distributedsciences.com
www.datasynapse.com
www.juno.com

21
Commercialization demand

Super-computing market 2 billion / year
Computationally intensive parallelizable
projects
Drug design research
Mathematical research
Economic simulations
Digital entertainment

22
Cheaters!
"Fifty percent of the project's resources have
been spent dealing with security problems"
The really hard part has to do with verifying
computational results" David Anderson,
Seti_at_home's director.
23
Cycle Sharing Participants

Trusted supervisor
Maintains a pool of registered participants
Bids for large computations
Divides the computation into tasks that are
assigned to participants
Collects the results and distributes payment to
the participants
Example Distributed.net, Entropia.com, etc

Untrusted participants
May range from large companies to individual
users
Participants are anonymous (No real world
leverage)
Participants may collude. We distinguish between
real-world entities (agents) and anonymous
participants.
Participants may leave the computation at any
time, either temporarily or for good.

24
Organization

Distribution of tasks
The unit of computation is a task
Assumption all tasks have the same size and can
be run by any participant within the same time
bounds.
The supervisor runs a probabilistic algorithm to
assign tasks to participants.
The supervisor keeps track of who did what

25
Security

Definition a computation is secure if no
rational, non-risk-seeking participant ever
cheats.
Collusion may occur only before tasks are
assigned.
A participant has 3 choices
Request a computation and do it
Request a computation and NOT do it
Take a leave
Assumption all errors are malicious

26
Utility function of an agent
a
Run the computation Cheat and guess the
result
Cheating detected Cheating
undetected
L a E
a Payment received per task E Benefit of
defecting (E e a) L Cost of getting caught
cheating

Security condition (aE)P L(1-P) lt 0
where P is the probability that cheating is
undetected

27
Basic scheme

Registration
Participant performs d1 unpaid tasks
The supervisor verifies them (at limited cost)
The participant is accepted iff all the results
are correct
Assignment of a task
A task is given to N participants chosen
uniformly independently at random
The number N is chosen according to the
probability distribution
Payment a constant amount a per task if all the
results agree
If not, the task is re-assigned to a new set of
participants
Severance a participant is paid an amount d.a

28
Properties

Computational overhead
(aE)P L(1-P) lt 0
Security condition

Computational overhead Setup time Maximum coalition size Maximum e
10 10 1 1
17 10 10 1
46 10 1 10
243 10 1 100

Overhead for small
p

29
Participants with varying computational resources

Until now, implicit assumption that all
participants have the same computational
resources.
Unrealistic assumption
Security threat an adversary may briefly control
a number of participants out of proportions with
her real computational power

Activity a probability distribution over the
pool of participants, which evolves dynamically
over time
Participants are drawn at random according to the
Activity
We define rules for updating the activity
Security implications

30
Content Delivery Networks

Swarmcast/OnionNetworks
File is stored in multiple locations
Idea is to retrieve portions of file from
separate hosts
File is split into small (32k) pieces
Requests are random
Space of packets bigger than file
Only subset of packets required
Technique is Forward Error Correction
Kazaa/Morpheus
MojoNation (HiveCache)
Distributed backup and restore system

31
Privacy Networks Publius

Publius
Publishers want to publish anonymously
Servers host random-looking content
Storage
The publisher takes the key, K that is used to
encrypt the file and splits it into n shares,
such that any k of them can reproduce the
original K, but k-1 give no hints as to the key.
Each server receives the encrypted Publius
content and one of the shares.
Retrieval
A retriever must get the encrypted Publius
content from some server and k of the shares.
Content is tied to URL that is used to recover
the data and the shares.

32
Privacy Networks Freehaven

Anonymity
Publishers that insert documents,
Readers that retrieve documents,
Servers that store documents.
Uses a free, low-latency, two-way mixnet for
forward-anonymous communication.
Accountability
Reputation and micropayment schemes, which allow
us to limit the damage done by servers that
misbehave.
Persistence
Publisher of a document determines its lifetime.
Flexibility
System functions smoothly as peers dynamically
join or leave

33
OceanStoreToward Global-Scale, Self-Repairing,
Secure and Persistent Storage

John Kubiatowicz
University of California at Berkeley

34
OceanStore Context Ubiquitous Computing

Computing everywhere
Desktop, Laptop, Palmtop
Cars, Cellphones
Shoes? Clothing? Walls?
Connectivity everywhere
Rapid growth of bandwidth in the interior of the
net
Broadband to the home and office
Wireless technologies such as CMDA, Satelite,
laser
Where is persistent data????

35
Utility-based Infrastructure?

Data service provided by storage federation
Cross-administrative domain
Pay for Service

36
OceanStore Everyones Data, One Big Utility
The data is just out there

How many files in the OceanStore?
Assume 1010 people in world
Say 10,000 files/person (very conservative?)
So 1014 files in OceanStore!
If 1 gig files (ok, a stretch), get 1 mole of
bytes!
Truly impressive number of elements but small
relative to physical constants
Aside new results 1.5 Exabytes/year (1.5?1018)

37
OceanStore Assumptions

Untrusted Infrastructure
The OceanStore is comprised of untrusted
components
Individual hardware has finite lifetimes
All data encrypted within the infrastructure
Responsible Party
Some organization (i.e. service provider)
guarantees that your data is consistent and
durable
Not trusted with content of data, merely its
integrity
Mostly Well-Connected
Data producers and consumers are connected to a
high-bandwidth network most of the time
Exploit multicast for quicker consistency when
possible
Promiscuous Caching
Data may be cached anywhere, anytime

38
The Peer-To-Peer ViewIrregular Mesh of Pools
39
Key ObservationWant Automatic Maintenance

Cant possibly manage billions of servers by
hand!
System should automatically
Adapt to failure
Exclude malicious elements
Repair itself
Incorporate new elements
System should be secure and private
Encryption, authentication
System should preserve data over the long term
(accessible for 1000 years)
Geographic distribution of information
New servers added from time to time
Old servers removed from time to time
Everything just works

40
Outline Three Technologies anda Principle

Principle ThermoSpective Systems Design
Redundancy and Repair everywhere
Structured, Self-Verifying Data
Let the Infrastructure Know What is important
Decentralized Object Location and Routing
A new abstraction for routing
Deep Archival Storage
Long Term Durability

41
Attack Resistant P2P

Content can be compromised by
Attack by malicious agents
Censorship
Faulty nodes
Remember
Nodes have finite resources

42
Gnutella
43
Morpheus/Kazaa
super peer
44
Examples

Napster shut down by attacks on central server
Gnutella spammed by Flatplanet
Removal of a few peers shatters Gnutella
63 from 1800 in figures

45
Performance
After deletion of 2/3 of peers, 99 of remainder
can still access 99 of the data items
46
DRN design Jared Saia

Topology based upon butterfly network (constant
degree version of hypercube)
Each vertex of butterfly called a supernode
Each supernode represents a set of peers
Each peer is in multiple supernodes

47
DRN Topology
N peers, n supernodes Each peer participates in
Clogn randomly chosen supernodes Supernode X
connected to supernode Y means all nodes in X
connected to all nodes in Y
48
Conclusion

P2P systems popular today
Limewire, Kazaa
Existing P2P systems vulnerable and inefficient
Many challenges ahead
Search
Resource Management
Security and Privacy

Lots of good research to be done
49
Appendix I

Open Problems in P2P Data Sharing

50
Open Problems in Data Sharing Peer-To-Peer Systems

Hector Garcia-Molina
ICDT Conference, January 10, 2003
Contributors Mayank Bawa, Brian Cooper, Arturo
Crespo,
Neil Daswani, Prasanna Ganesan, Sergio Marti,
Qi Sun, Beverly Yang and others

51
P2P Challenges

Search
Resource Management
Security Privacy

? not independent challenges!
52
Search

Search Options
Query Expressiveness
Comprehensiveness
Topology
Data Placement
Message Routing

53
Comparison
????
??
????
?
???
54
Content Addressable Network (CAN)
1
Nodes
Data
2
A distributed hash table on Internet scales
55
Comparison
????
?
??
????
????
??
???
?
???
??
56
Challenge Exploring the Space
autonomy

gnutella
can

efficiency
robustness

57
Search Index Link (SIL) Model

Forwarding search link (FSL)
Non-forwarding search link (NSL)
Forwarding index link (FIL)
Non-forwarding index link (NIL)

58
SIL Model

Forwarding search link (FSL)
Non-forwarding search link (NSL)
Forwarding index link (FIL)
Non-forwarding index link (NIL)

59
Super-Peer Network
D
E
A
H
C
core
B
G
F
60
SIL Challenges

Desirable graph properties
Desirable features
Dynamic configuration

61
Example Property Redundancy
B
C
A

Redundancy exists in a SIL graph if a link can
be removed without reducing coverage

62
Example Undesirable Feature
B
E
A
D
C

One-index cycle Node A has an index link to B,
and there is a search path from B to A

63
Avoiding Undesirable Features

Node D is joining the system
what neighbors should it connect to?
what type of links should it use?

B
E
A
D
?
C
64
Open Problems Security

Availability (e.g., coping with DOS attacks)
Authenticity
Anonymity
Access Control (e.g., IP protection,
payments,...)

65
Authenticity
title origin of species
author charles darwin
?
date 1859
body In an island far, far away ...
...
66
More than Just File Integrity
title origin of species
author charles darwin
?
date 1859
00
body In an island far, far away ...
checksum
67
More than Fetching One File
Torigin Y? Adarwin B?
68
Solutions

Authenticity Function A(doc) T or F
at expert sites, at all sites?
can use signature expert sig(doc)
user
Voting Based
authentic is what majority says
Time Based
e.g., oldest version (available) is authentic

69
Added Challenge Efficiency

Example Current music sharing
everyone has authenticity function
but downloading files is expensive

70
How to Track Peer Behavior?

Trust Vector v1, v2, v3, v4
a b c d

Pair of values total downloads, good
downloads ?

71
Trust Operations
update?
a
1, .9, .5, 0, 0
.5
.9
b
c
1, 1, 0, .3, 1
1, 0, 1, 1, .2
.3
1
.3
.2
e
d
72
Issues

Trust computations in dynamic system
Overloading good nodes
Bad nodes provide good content sometimes
Bad nodes can build up reputation
Bad nodes can form collectives
...

73
Sample Results
Fraction of inauthentic downloads
Fraction of malicious peers
74
P2P Challenges

Search
Resource Management
Security and Privacy

75
Resource Management
1
2
capacity C1
capacity C2
3
capacity C3

Local work ?Ci
Remote work (1 - ?) Ci

76
Incentives for Remote Work

What is best value for ??
How do I get remote nodes to work for me?

2
1
C2
C1

Local work ?Ci
Remote work (1 - ?) Ci

3
C3
77
Conclusion

P2P systems popular today
Limewire, Kazaa
Existing P2P systems vulnerable and inefficient
Many challenges ahead
Search
Resource Management
Security and Privacy

Lots of good research to be done
78
For Additional Information

Google
Stanford Peers, OceanStore, Tapestry, Chord
http//www-db.stanford.edu/peers/
http//www.freehaven.net/
http//cs1.cs.nyu.edu/waldman/publius/
http//www.onionnetworks.com

79
Appendix II

P2P Architectures

80
Peer-to-Peer is Not Always Decentralizedwhen
Centralization is Good

Nelson Minar
ltnelson_at_monkey.orggt
http//www.media.mit.edu/nelson/

81
Talk Overview

Topologies of distributed systems
Strengths and weaknesses
Conclusions

Warning Broad generalizations ahead
82
What is P2P Anyway?

Decentralized Systems no
Popular Power fails test
Napster fails test
Most Instant Messaging fails test
Confuses topology with function
Edge Resources yes
Small computers on edges contribute back
All peers are active participants

83
Distributed Systems Topologies

Get away from fundamentalism
Pure P2P, True P2P, etc
Focus instead on system architecture
How do the pieces fit together?
Concentrate on connection topology
Which topology for which problem?

84
Centralized

Client/server
Web servers
Databases
Napster search
Instant Messaging
Popular Power

85
Ring

Fail-over clusters
Simple load balancing
Assumption
Single owner

86
Hierarchical

DNS
NTP
Usenet (sort of)

87
Decentralized

Gnutella
Freenet
Hive
Internet routing

88
(No Transcript)
89
Centralized Centralized

N-tier apps
Database heavy systems
Web services gateways
Grand Central

90
Centralized Ring

Serious web applications
High availability servers

91
Centralized Decentralized

Clip2 Gnutella Reflector
FastTrack
KaZaA
Morpheus
Email

92
What about other topologies?

Centralized Hierarchical?
Back end tree of information
Caching architectures
Decentralized Ring?
P2P network of fail-over clusters
Decentralized Hierarchical?
Decentralized Centralized?

93
Strengths and Weaknesses

Plenty of topologies to choose from
What is each kind good for?
Need a set of properties to measure
Caution What follows is very high level

94
Things to Measure

Manageability
How hard is it to keep working?
Information coherence
How authoritative is info? (Auditing,
non-repudiation)
Extensibility
How easy is it to grow?
Fault tolerance
How well can it handle failures?
Security
How hard is it to subvert?
Resistance to legal or political intervention
How hard is it to shut down? (Can be good or bad)
Scalability
How big can it grow?

95
Centralized

Manageable
Coherent
Extensible
Fault Tolerant
Secure
Lawsuit-proof
Scalable

System is all in one place
All information is in one place
No one can add on to system
Single point of failure
Simply secure one host
Easy to shut down
One machine. But in practice?

96
Ring

Manageable
Coherent
Extensible
Fault Tolerant
Secure
Lawsuit-proof
Scalable

Simple rules for relationships
Easy logic for state
Only ring owner can add
Fail-over to next host
As long as ring has one owner
Shut down owner
Just add more hosts

97
Hierarchical

Manageable
Coherent
Extensible
Fault Tolerant
Secure
Lawsuit-proof
Scalable

Chain of authority
Cache consistency
Add more leaves, rebalance
Root is vulnerable
Too easy to spoof links
Just shut down the root
Hugely scalable DNS

98
Decentralized

Manageable
Coherent
Extensible
Fault Tolerant
Secure
Lawsuit-proof
Scalable

Very difficult, many owners
Difficult, unreliable peers
Anyone can join in!
Redundancy
Difficult, open research
No one to sue! (but follow )
Theory yes Practice no

99
Centralized Ring

Manageable
Coherent
Extensible
Fault Tolerant
Secure
Lawsuit-proof
Scalable

Just manage the ring
As coherent as ring
No more than ring
Ring is a huge win
As secure as ring
Still single place to shut down
Ring is a huge win

Common architecture for web applications
100
Centralized Decentralized

Manageable
Coherent
Extensible
Fault Tolerant
Secure
Lawsuit-proof
Scalable

Same as decentralized
Better than decentralized
Anyone can still join!
Plenty of redundancy
Same as decentralized
Still no one to sue
Looking very hopeful

Best architecture for P2P networks?
101
Centralized vs. Decentralized

Centralized is pretty good!
Manageable
Coherent
Security
Decentralized is exciting
Extensible
Massive fault tolerance
Lawsuit-proof
Scalability is the big question

102
Conclusions

Centralized is easy to deal with
Major architecture for distributed systems
Combines well with rings
Decentralized is good, needs research
Coherence, Manageability, Security
Scalability
Hierarchical is overlooked
Combining architectures is powerful

103
Peer-to-Peer is Not Always Decentralizedwhen
Centralization is Good

Nelson Minar
ltnelson_at_monkey.orggt
http//www.media.mit.edu/nelson/

Thanks to Marc Hedlund, Raffi Krikorian, Tony
White
104
Appendix III

P2P Industry

105
P2P Industry Outline

Theres no peer-to-peer market any more than
theres a client/server market Anne Manes, Sun
Microsystems
Peer-to-peer encompasses a wide range of
technologies centered around decentralizing
computing
Business and revenue models are still fuzzy
There are clear opportunities and research
excitement

106
Distribution of P2P Companies
Category Examples Industry Share
Distributed Computing Entropia United Devices 35
Collaboration / Knowledge Management Groove Networks Engenia 20
Content Distribution Akamai Proksim 10
Infrastructure / Platform Akavi Xdegrees 10
File Sharing Kazaa Napster 10
Distributed Search OpenCola Thinkstream 5
(From P2P 101 An Overview of the P2P Landscape
by Larry Cheng)
107
Major Features of P2P Industry

(From P2P 101 An Overview of the P2P Landscape
by Larry Cheng)
Lack of experienced, quality management teams
Lack of detailed business models
Skeptical investors
150 active companies
Estimated 95 failure rate
The elephant in the room is the fact that most
companies here are not commercially viable. -
Heard from a speaker at OReilly

108
Current P2P Business Models

Sell P2P products to end-users
No current revenue-generating business model
Sometimes coupled with content-sale models
Sell content through P2P
Subscription-based I buy content from you
Sponsor-based Someone pays you to give me
content
Ad-based You give me content and sell ads

109
Current P2P Business Models

Sell something which lets others profit from P2P
Solve a critical problem for decentralized
applications
Offer support and enhanced services for free
tools
Specialized packages for particular industries
Tools and libraries for P2P infrastructure
The people most likely to make money during a
Gold Rush are the ones selling pickaxes and
shovels. Andy Oram, The OReilly Network