Title: IEG5270 Advanced Topics in P2P Networking Introduction
1IEG5270 Advanced Topics in P2P
NetworkingIntroduction
- Dah Ming Chiu
- Chinese University of Hong Kong
2What is P2P networking?
- What P2P applications are you aware?
- What distinguishes a P2P application from a
non-P2P application? - Why are we interested in P2P networking?
3P2P traffic in Internet
4Client-server applications
- Traditional applications are all client-server
- A service provider must set up a server e.g.
- Email server
- Web server
- A client is configured with servers address/port
- Server responds to clients one by one
Server
5Limitations of client/server appls
- Scalability
- Compute power, access bandwidth, storage
- Costs
- Server, bandwidth, power, management
- Privacy concerns
- Some are for illegal reasons
- The need for fixed address/port
6Peer-to-peer applications
- In a pure p2p network, every node is both a
client and a server - As clients increases, servers also increases
- perfectly scalable
- Also
- Distributes costs
- Increases privacy
- May use dynamic addresses
7Hurdles for p2p applications
- How to find where things are?
- instead of static configuration of servers
- How to make peers able to help each other?
- A peer usually need to acquire the right content
to be able to serve others - How to deal with unreliable peers?
- Peers come and go Churn
- How to make peers willing to help each other?
- Incentives
- Why should other peers be trust-worthy?
8A short history of some note-worthy p2p
applications
- Napster (1999-2002)
- First p2p file sharing application
- Relies on a centralized directory to help find
content - Once peer X knows peer Y has what it wants, X
contacts Y directly - X and Y often exchange copy-right protected
material - Napster server shutdown due to law suits
- Another company bought Napster, still use the
name to operate on-line music shop
9History - cont
- Gnutella (2000-?)
- Completely distributed p2p file sharing
- Peers form an overlay network each peer knows
its neighbors - Each peer floods its request to all other peers
- Prohibitive overheads
- Open source
10History - cont
- Kazza or FastTrack (2001-2003)
- Partly centralized, rely on super-nodes
- Also met with law suits
- Skype (2003-now)
- Created by same folks who created Kazza
- Use similar technology as Kazza to discover VoIP
destinations, instead of illegal content - Plus other bells and whistles (codec, encryption,
instant messaging, friends list/presence) - Provide relay service if firewalls/NAT prevent
connectivity
11History - cont
- BitTorrent (2002-now)
- Does not deal with the content discovery problem
(avoiding legal problem) - Solves another problem how to distribute a file
to a (large) number of peers - Divide a file into many parts chunks
- Give chunks to different peers
- Peers rely on tracker to find other peers and
what chunks they have - Content flows through multiple, ad-hoc trees
- Several variations, and several other protocols
(e.g. eDonkey)
12History - cont
- PPLive (2005-now)
- Use similar technology as BT to stream Video (TV)
- Need to have a directory of TV channels/programs
- Some earlier start-ups had legal problems
- Mainly popular in China
- 10s of millions of steady customers
- Several competitors PPStream, UUC
13Summary
- Success stories
- Skype p2p VoIP
- BitTorrent p2p file sharing
- PPLive p2p streaming
- Some other Client-server appls may change to p2p
model - Microsofts study of using P2P for VoD service
(Sigcomm 2007) - Google to use p2p for YouTube?
14Other notable P2P projects
- LOCKSS (Lots of Copies Keep Stuff Safe)
- Data preservation mechanism (suppose libraries
want to keep books preserved, even when
publishers die) - Data replicated many times
- How to deal with sabotage? Peers do periodic
voting to validate all copies.
15Other P2P projects
- SETI_at_home (search for Extra-Terrestrial
Intelligence) - Peers devote their compute power to help analyze
radio signals from a telescope, to check for
signs of ET - P2P because peers are autonomous, operation is
distributed - A grid computer really
16Other P2P projects
- RON (Resilient Overlay Network)
- In the Internet, packets do not always flow along
the shortest paths due to ISP peering - Peers help relay packets to support shortest path
routing as much as possible - Peers form an overlay network, and check delay
between each pair of neighboring nodes - RON demonstrated that it can provide lower delay
- Overlay versus p2p
17Instant messaging vs p2p
- Some consider ICQ as the first p2p application
- In some sense, it is similar to Skype
- The communication part is p2p
- The search part may be partly via a centralized
server - Instant messaging may be integrated with file
sharing and streaming, as well as VoIP (e.g. QQ
and QQlive, a p2p application in China)
18Summary of other p2p appls
- The idea of P2P overlaps with application layer
infrastructures, e.g. - Grid computing
- Overlay networks
- Distributed databases, search
- DHT is a building block for many such
infrastructures
19Important building blocks
- How to find things?
- Centralized approach
- Flooding
- Partially distributed approach
- DHT
- How to distribute content to multiple peers?
- IP multicast
- Structured tree(s) (or push)
- Unstructured (or data-driven, or pull)
20Distributed Hash Tables
- Each object (e.g. file, chunk) is mapped to a key
- The key space is partitioned among the peers
- Given a key, you know where to store/retrieve the
object - There are quite a few DHT proposals
- Chord, CAN, pastry
- Every DHT supports one main operation
- Given a key, route to a peer that holds the key
21DHT cont
- A DHT should have the following properties
- Decentralization no central management needed
- Scalability supports millions of nodes
- Reliability still works when peers leave and
join - Other properties
- Small diameter, small degree, get to destination
node in log(n) steps - Compared to index server no single point of
failure - Compared to flooding more efficient
- Limitation supports lookup, not search
22The magic of streams
- Traditional multicast
- Single tree
- receivers do not contribute
- P2P content distribution
- use multiple trees to distribute chunks
- peers all contribute uplink bandwidth
- Peers are unreliable
- use data-driven approach to build distribution
trees dynamically
23Summary of building blocksto be covered in this
course
- DHT
- Multi-tree content distribution
- Models for understanding the capacity achieved by
these algorithms - Others incentive systems etc
24The economics of p2p applications
- P2P applications induce a huge amount of traffic
for ISPs - P2P applications can often extract profit from
users in one way or another - Skype provide gateway into phone networks to
collect money - Streaming and VoD service can generate targeted
advertising opportunities (the same game Google
plays)
25Net neutrality
- ISPs are often prevented from sharing the profits
from content providers (who use p2p technology) - To extract such profit, ISPs need almost monopoly
power - Governments do not want ISPs to become monopolies
- net neutrality is implicit business model
- Charge users based on volume of bits, not on
content
26The tussle between ISPs and P2P
- P2p file sharing or streaming is determining its
own routes for content to travel - ISPs settlements depend on their roles in peering
relationships - E.g. customer ISP pays both transit providers,
but may be transiting some p2p traffic for them
Transit Provide1
Transit Provide2
Customer ISP
27Tussle (conflict)
- How should ISPs consider peering decisions in
view of p2p traffic? - How should ISPs provision their peering
bandwidth? - How should ISPs charge subscribers (peers)?
- How should ISPs manage its p2p traffic?
28P2P traffic monitoring and detection
- Traditional ISP settlements based on using
netflow - Gives total traffic volume, and volume for each 5
minute interval - ISP can see traffic types based on well-known
ports - P2p flows may not use well-known ports
- Deep packet inspection check signature in
payloads - What if payload is encrypted?
29Detecting P2P traffic by flow behavior
- How many other nodes a node talks to
- What combination of protocols is used (UDP, TCP
etc) - Patters on packet sizes
- Active research area
30ISP-friendly P2P
- ISPs love and hate p2p applications at the same
time - P2p applications are the killer appls for
broadband access drawing and keeping customers - P2p traffic quickly eats up any added bandwidth
- ISPs and researchers are looking for p2p
algorithms and ISP caching services to reduce the
cross-ISP traffic demands
31Discussion the future of p2p
- Can it ever be made reliable enough as a service?
- May need to deploy smart servers to help p2p when
necessary, e.g. in PPlive. - Will ISPs change their traffic controls or
charges to kill p2p? - Unlikely in the near term ISP need content
distribution applications - P2p is quite effective, and can be made more
ISP-friendly
32Commercial interest in P2P
- Currently there is a lot of commercial interest
in p2p technology - Large investment in Joost
- Many other start-ups
- Interest by Microsoft and Google
33Research interest in p2p
- Content distribution algorithms
- For file sharing
- For streaming
- For Video on Demand (VoD)
- DHTs and their use
- Revisit naming, addressing and routing in the
next generation Internet - Incentive and reputation systems
- No good incentive system for p2p streaming
34Research interests - cont
- Applying network coding to p2p
- It may not help improve optimal throughput
- But it can help simplify peer and chuck selection
strategies in distribution algorithms - P2P traffic classification
35Compare to a regular course
- There are some theoretical foundations, but less
mature than a regular course - The long term prospect may not be so clear, but
the current interest is high - Although undergraduates are allowed, but will run
like a graduate course - No exams, no text book
- Read papers, do some projects, learn to do
research - Practice oral presentation and writing
36Homework projects
- Streaming algorithm design and simulation
- Based on YP Zhous research paper (ICNP 2007)
- Under some assumptions, try to design the best
chunk selection strategy - Simulate it and compare to YPs algorithm
- P2P traffic trace analysis
- Be a detective find p2p traffic in a trace
- Find properties of different p2p applications
- Planet Lab - ?
37Individual project
- If too many students, we need to make them group
projects (2 in a group) - Three types I can see
- Survey a problem in depth (read several papers
related to the problem and summarized/discuss) - Do some research on a given problem
- Implement some specific P2P application/mechanism
and demo it. - Will give a list of potential topics
- Will give a list of papers to read
38Some example topics
- Most commercial systems (e.g. PPlive) are not
based on open source. Try to deduce the algorithm
for specific p2p streaming applications - What is P2P-SIP and what are its applications?
- Is it possible to implement a viable search
engine based on p2p technology?
39Assessment
- Main project 50
- Oral presentation written report
- Homework projects 40
- Class participation 10