Title: Peer to Peer Systems and File Sharing
1Peer to Peer Systems and File Sharing
- Carl Lagoze CS431 Cornell University
- May 3, 2004
Portions borrowed from sourceslisted on next
slide
2Sources of this lecture
- J. Berkes, Decentralized Peer-to-Peer Network
Architecture Gnutella and Freenet - R. Morris, ChordDHashIvy Building Principled
Peer-to-Peer Systems - S. Kamvar, M. Schlosser, H. Garcia Molina,
EigenRep Reputation Management in P2P Networks - J. Golbeck, B. Paris, J. Hendler, Trust Networks
on the Semantic Web
3Characteristics of P2P Network
- Sharing of computing resources by direct exchange
- Blur between clients, servers, routers
- Nodes are autonomous
4P2P Advantages
- Efficient use of resources
- Scalability
- Reliability
- Administrative simplicity
- Democracy
5P2Ps Political History
- Major basis in music sharing context
- Overshadows numerous applications
- Recent research is investigating generic
applicability - DHTs
- Reputation and Trust
6Small-World Phenomenon
- Milgrams six degrees of separation (1967)
- Forwarding of letters from Nebraska to Boston, MA
- Average chain 6 of six hops
7Power Laws and Small Worlds
- Out-degree distribution is
- 1/Ka where a gt 0
- Characteristics of a variety of phenomenon
- Web Graph
- IMDb connection (acted in same movie)
- Social interactions
- P2P networks (Gnutella)
- Epidemiology
8Strength of Weak Ties
- Extension of power-law phenomenon
- Short-cuts (between cliques) critical to small
world phenomenon
9Napster and P2P
- Not really P2P
- Central search index
- Direct interaction for access (p2p)
- Central index was key to litigation
10Gnutella
- Fully P2P
- Flooded query
- Scalability problems
- TTL controls broadcast
- Query Memory controls circularity
- Reliability problems
- But whom to sue?
11Kazaa
- Hybrid of Napster model and Gnutella model
- Notion of a super peer
- Like a regionalized napster server
- Dynamically chosen by characteristics
- P2P relationship among super-peers
- Queries directed towards super-peers
12Free-riding
- Definition downloading but not sharing any data
- On gnutella networks 15 of users contribute 94
of content
13Freenet
- Goal create an uncensorable and secure global
information store - Anonymity and fault tolerance
- http//freenet.sourceforge.net/
- Three types of network messages
- Advertise storage space to store unknown data
- Insert a file to the network
- Request a file (with a key) from the network
- Use of one-way secure hashes to identify files
and encryption to store files - Node does not know what it is storing
- Non-traceability of messages
- A node can not determine where its message is
stored
14Freenet Request/Response Sequence
15Distributed Hash Tables (DHT)
- Overcoming the flooded search problem
- Operationally like standard hash tables
- Data is distributed around the network
- Features
- Efficient
- O(log N) messages per lookup
- Even distribution of keys among nodes
- Adaptable
- Network reconfiguration does not cascade to all
nodes - Robust replication of tables provides survival
to node failures
16Chord
- One implementation of DHT within a larger P2P
project - http//www.pdos.lcs.mit.edu/chord/
- Algorithm properties
- Common hash function distributes node ID (IP) and
document ID uniformly - Maps a content key to its node successor
17Chord Key Mapping
N10
K5, K10
Key ID Node ID
N100
K100
Circular ID Space
N32
K11, K30
N80
K65, K70
N60
K33, K40, K52
Robustness via each node remembering N successors
and replicating table at successors
18Use of finger table to avoid linear lookups
ith finger table position points to first node
that succeeds n by at least 2i1
19Key location with finger table
- Use finger table to find furthest node that
precedes key - O(logN) hops leads to target
20From DHTs to P-trees
- DHTs only support equality queries
- Return the value of resources with ID1
- Need to support range queries
- OAI type query, find all nodes resources that
were changed between D1 and D2 - P-tree reuses aspects of fault-tolerant ring of
Chord with logarithmic search properties for
equality and range queries.
21Pastry Project
- Factors in network locality as part of DHT
algorithm - http//research.microsoft.com/antr/Pastry/
22Identity, Trust, Reputation
- Identity
- Who is making a statement
- Certificates, PKI
- Trust
- Can I believe the person who is making the
statement - PGP Web of Trust
- Reputation
- What is the history of trust in the person making
the statement - Reputation management
23Reputation Issues
- Small world phenomenon makes web of trust
feasible - Reputation is context specific
- I can be trusted with questions about OAI-PMH
- Can you trust me belaying for you?
24Simple reputation network
A
C
B
- A knows and trusts B
- B knows and trusts C
- A can infer trust for C
25Reputation Inference Algorithm
- Begin at source (node seeking a reputation)
- Poll each of neighbors whose reputation it trusts
- Ignore neighbors with bad reputation
- Have each neighbor recursively find reputation of
sink (node for which reputation is sought)
26Accuracy of inferences
- Incorrect bad rating by a node has minimal effect
- Will be dropped from path in reputation seeking
- Will be overcome by correct good rating by
another node. - Incorrect good rating by a node can have
cascading effect - Can cause ratings of good nodes to be ignored
through lies - Serious threat to network
- Good trust algorithm minimizes effect of bad
nodes
27From Golbeck and Hendler
28Trellis
- http//trellis.semanticweb.org
- Semantic web based system for decision making
assessing reliability of information and sources - Decision maker can construct compound statements
justifying decision and providing basis for
others decisions
29Trellis (cont.)
- Components
- Statements (Carl Lagoze is a bad teacher)
- Basis of statement
- http//cornellbigred.collegesports.com/sports/m-cr
ew/mtt/kruse_william00.html - Principal source of basis/statement
- William Kruse
- Qualifications to state certainty of component
30Trellis compound statement
From Gil and Ratnaker
31Advogato
- Trust metrics for open source software developers
- http//www.advogato.org/
- Three levels of trust/certification
- Master
- Journeyer
- Apprentice
32Advogato (cont)
- Graph structure of trust
- Domain of master is only master
- Domain of journeyer is master and journeyer
- Domain of apprentice is all
- Computation of trust is via network flow (well
known problem with efficient solutions) - Hard-wired set of users from which all trust
flows (gods of the system) - people reached by the flow are those accepted by
the trust metric - With the three levels, the maxflow is computed
three times - Robust (resistant to attack) and efficient
33Eigentrust
- Algorithm for Reputation Management in P2P
Networks - Kamvar, Schlosser, Garcia-Molia (Stanford)
- http//www.stanford.edu/sdkamvar/research.html
34Eigentrust Approach
- Goal Identify sources of inauthentic files and
bias peers agains downloading from them - Method Give each peer a trust value based on its
previous behavior - Trust values
- Local open a peer has on another based on past
experience - Global trust that entire system places in a peer
- Want latter computed from aggregate of former
- Dual goals
- Know all peers
- Perform minimal computation and store minimal data
35Past History Approach
- Each peer biases its choice of downloads using
its own opinion vector - Problems
- Each peer has limited past experience
- Inertia if a peer has good past experience with
another, it will be biased towards relying on it
36Friends of friends approach
- Ask for opinions of the people who you trust
- Weigh their opinions by your trust in them
- Problems
- You have a lot of friends too much to compute
and store - Few friends wont have enough data
37Eigentrust Approach
- Whole networks cooperates to store and compute
trust vector - Each peer holds its own opinions
- Each peer holds its own global reputation
- Iterative algorithm that converges to compute
global trust ratings (in the nature of PageRank)
38More Eigentrust Issues
- Secure Score Management
- Voting among multiple score managers
- Peer score held by another peer
- Threat scenarios
- Malicious individuals (always bad)
- Malicious collectives (always bad, think highly
of each other) - Camouflaged collectives (sometimes good to trick
people) - Malicious spies (good all the time but friends
with bad folks)