Title: PowerPoint bemutat
1(No Transcript)
2Introduction
- Widespread unstructured P2P network
- Currently between 200,000 300,000 hosts
- Ideal as a research test bed
- Large scale network demonstrates the need for
scalable P2P protocols
- A Gnutella client has 4-10 TCP connections to
other peers
- For signaling traffic UDP is used and to make use
of the benefits of server based networks a
ultra-peer state was created
3Introduction (Cont.)
- Ultra-peer status is self assigned by powerful
peers and provides some extra functionality
compared to ordinary nodes
- There exist many freely available Gnutella
clients
- Some of the most popular are
- Limewire
- Bearshare
- Morpheus
- Shareaza
- It has the most increasing number of users
- It has a very pleasant GUI and connects also to
eDonkey and BitTorrent
4Its Main Features
- This protocol underlies much of the current
file-sharing activity on the Internet.
- It is based on TCP/IP and http!
- A file sharing network (fsn) is a bunch of
machines that exchange files using gnutella.
- To connect to a gnutella network, you need the IP
address of one single machine that is already
part of the network.
5Gnutella
- Peer-to-peer indexing and searching service.
- Peer-to-peer point-to-point file downloading
using HTTP.
- A gnutella node needs a server (or a set of
servers) to start-up gnutellahosts.com
provides a service with reliable initial
connection points
But introduces a new single point of failure!
6Gnutella vs. Napster
- Like Napster, distributed file storage and
transmission
- Added the ability to distribute file discovery
- Ask your direct peers who else they know
- Query those machines directly
7Concepts of Unstructured Services
- There are many interesting ideas being explored
- Breaking shared files into many parts to both
increase bandwidth (parallel I/O) and increase
security of content as no one site can access
files without cooperation from its peers - This type of technology makes censorship very
hard.
- MojoNation has a load balancing and scheduling
algorithm in the form of micro payments to reward
those who contribute most to the community of
peers. - Gnutella - which is a family of related products
-- is usually described as a P2P search engine as
its interface is nearer that of a search engine
than a Web file system
8Characteristics
- Gnutella is a distributed system for file
sharing
- provide means for network discovery
- provide means for file searching and sharing
- Defines a network at the application level
- Employs the concept of peer-to-peer
- all hosts are equal (symmetry)
- there is no central point
- anonymous search, but reveal the IP addresses
when downloading
9connection
- Once you establish connection to the first
servent, you announce your presence.
- The first servent will pass on that message to
all the servents that it is connected to, and so
on.
- These servents all reply with data about
themselves
- how many files it is sharing
- how many kilo bytes the files take up
- This already adds up to a lot of traffic!
10Gnutella File Sharing model
- Users register files with network neighbors
- Search across the network to find files to copy
- Does not require a centralized broker (as Napster)
Copying Final Fantasy 4
Bob
Carol
Where is Final Fantasy 4?
Carol has Final Fantasy 4
Where is Final Fantasy 4?
Alice
Ted
Carol has it
11Decentralized File-sharing Model
Resource Discovery
- Peers have same capability and responsibility
- The communication between peers is symmetric
- There is no central directory server Index on the
metadata of shared files is stored locally among
all peers
- Gnutella
- FreeServe
- MojoNation
12Decentralized (Cont.)
Resource Discovery
- every user acts as a client, a server or both
(servent)
- User connects to framework and becomes a member
of the community, allowing others to connect
through him/her
- Users speak directly to other users with no
intermediate or central authority
- No one entity controls the information that
passes through the community
13Advantages and Disadvantages
Resource Discovery
- Advantages
- Inherent scalability
- Avoidance of single point of litigation
problem
- Fault Tolerance
- Disadvantages
- Slow information discovery
- More query traffic on the network
14Unstructured Decentralized Services
- There some 200 available Napster clones to
support this area http//www.ultimateresourcesite.
com/napster/main.htm
- Currently the most popular is Imesh
http//www.imesh.com, which has some 2 million
users and can share any type of file.
- Some of the best known file sharing systems are
- MojoNation http//www.mojonation.net
- Freenet http//freenet.sourceforge.net/
- Gnutella http//gnutella.wego.com/
- These three are not server based like Napster but
rather support waves of software agents
expressing resource availability and interest
propagating among an informal dynamic networks of
peers
15DFS Variations
- DFS Distributed File Sharing
16P2P File Sharing Benefits
- Cost sharing
- Resource aggregation
- Improved scalability/reliability
- Anonymity/privacy
- Dynamism
17Management/Placement Challenges
- Per-node state
- Bandwidth usage
- Search time
- Fault tolerance/resiliency
18Gnutella in Details
- Share any type of files (not just music)
- Decentralized search unlike Napster
- You ask your neighbors for files of interest
- Neighbors ask their neighbors, and so on
- TTL field quenches messages after a number of
hops
- Users with matching files reply to you
Figure from http//computer.howstuffworks.com/file
-sharing.htm
19The Gnutella protocol (v0.4)
- PING Notify a peer of your existence
- PONG Reply to a PING request
- QUERY Find a file in the network
- RESPONSE Give the location of a file
- PUSHREQUEST Request a server behind a firewall
to push a file out to a client.
20Joining Gnutella Network
Gnutella Network
- The new node connects to a well known Anchor
node.
- Then sends a PING message to discover other
nodes.
- PONG messages are sent in reply from hosts
offering new connections with the new node.
- Direct connections are then made to the newly
discovered nodes.
New
PING
PING
PING
PONG
PING
PING
A
PING
PING
PONG
PING
PING
PING
21Properties of the Flooding
- Searching by flooding
- If you dont have the file you want, query 7 of
your partners.
- If they dont have it, they contact 7 of their
partners, for a maximum hop count of 10.
- Requests are flooded, but there is no tree
structure.
- No looping but packets may be received twice
Note Play gnutella animation at
http//www.limewire.com/index.jsp/p2p
22Query flooding
- Gnutella
- no hierarchy
- use bootstrap node to learn about others
- join message
- Send query to neighbors
- Neighbors forward query to all attached neighbors
(floods)
- If queried peer has object, it sends message back
to querying peer
23More on query flooding
- Pros
- peers have similar responsibilities no group
leaders
- highly decentralized
- no peer maintains directory info
- Cons
- excessive query traffic
- query radius may not have content when present
- bootstrap node still required
- maintenance of overlay network
24About the Flooding
- There is nothing that stops a servant flooding
its network region with messages.
- Cost of maintaining Network
- Cost of searching file
25Breadth-First Search (BFS)
26Pros and Cons
Resource Discovery
- Benefits
- Peers speak directly with no central authority
- Nobody owns the Gnutella Network and nobody can
shut it down
- No central point of failure
- Limited per-node state Isolated node failure can
quickly and automatically be worked around
- Free loading
- Scalability
- Drawbacks
- Searches are less effective and can be slow
- Bandwidth intensive
- Gnutella network evolving to include controlled
decentralization (limewire, bearshare, toadnode)
27Searching for a File
Gnutella Network
- A node broadcasts its QUERY to all its peers who
in turn broadcast to their peers.
- Nodes route QUERYHITs along the QUERY path back
to the sender containing file location details.
- To download files a direct connection is made
using details of the host in the QUERYHIT
messages.
28The Cooperation Spectrum
29Free Riding
- File sharing networks rely on users sharing data
- Two types of free riding
- Downloading but not sharing any data
- Not sharing any interesting data
- On Gnutella
- 15 of users contribute 94 of content
- 63 of users never responded to a query
- Didnt have interesting data
- Data from E. Adar and B.A. Huberman (2000), Free
Riding on Gnutella
30Example GNUTELLA
31Summary of the Gnutellas Features
- Decentralized
- No single point of failure
- Not as susceptible to denial of service
- Cannot ensure correct results
- Flooding queries
- Search is now distributed but still not scalable
32Initials Problems and Fixes
- Freeloading WWW sites offering search/retrieval
from Gnutella network without providing file
sharing or query routing
- Block file-serving to browser-based
non-file-sharing users
- Prematurely terminated downloads
- Software bugs
- long download times over modems
- modem users run gnutella peer only briefly
(Napster problem also!) or any users becomes
overloaded
- fix peer can reply I have it, but I am busy.
Try again later
33Initials Problems and Fixes 2
- 2000 avg size of reachable network only 400-800
hosts
- Why so small?
- modem users not enough bandwidth to provide
search routing capabilities routing black holes
- Fix create peer hierarchy based on capabilities
- previously all peers identical, most modem
blackholes
- connection preferencing
- favors routing to well-connected peers
- favors reply to clients that themselves serve
large number of files prevent freeloading
- Limewire gateway functions as Napster-like
central server on behalf of other peers
- for searching purposes
34Gnutella Enhancements
- Pings/Pongs can consume up to 50 of bandwidth
- Solutions
- Pong Limiting
- Pong Caching
- Ping Multiplexing
- http//www.limewire.com/index.jsp/pingpong
35Gnutella enhancements 2
- Cache query responses
- Results
- Evolving Protocol
- Gnutella Developer Forum
- UltraPeers
- Alternative query routing algorithms
36Can Heterogeneity Make GnutellaScale?
- Ideas
- Replace query flooding with multiple random
walks
- Proactive replication
- replicas proportional to sqrt(request rate)
- Result Two orders of magnitude improvement in
terms of query-time, per node load and message
traffic
37Can Heterogeneity Make GnutellaScale? 2
- Gnutella assumption
- All peers are equal
- Not true! Heterogeneity among P2P peers (dial-up
users vs. college users)
- Evolve topology to match node capacities
- Use random walks over this topology
38Can Heterogeneity Make GnutellaScale? 3
- Solution outline
- C_i, node capacity inj,i messages from j-i,
outi,j messages i-j
- Init ini,jouti,j0, OutMaxi,jc_i/d_I
- Update according the messages received/sent
- Check if overloaded
- If so redirect high-input neighbor to neighbor
with high OutMax (spare capacity)
- Intuitively, take yourself out of the loop
- If node cannot be found ask neighbor to throttle
back
- Result Average query length reduces from 70 to
2-9 hops
- depending on topology
39Measurement Results
- Who is sharing what?
- August 2000
40Problems With Gnutella
- Protocol scalability
- Message broadcast technique imposes limitations
on the network size
- packets per message ?noPeersi
- In November 2000 dial-up bandwidth barrier
reached
- Overlay network efficiency
- Random selection of peers results in inefficient
use of the underlying network
- Redundant traffic generated on the Internet
41Heterogeneous connection qualities of the Gnutella
- 35 have upstream bottleneck bandwidth of at
least 100Kbps
- only 8 have at least 10Mbps bandwidth
- 22 have bandwidth 100kbps or less
42Number of Shared Files
43Why Look at Gnutella
- Widespread unstructured P2P network
- Currently between 200,000 300,000 hosts
- 2006 still heavily in use by about 2 million
users
- Gnutella clients (among others)
- LimeWire
- Morpheus
- BearShare
- OpenCola
- Shareaza
- It has the most increasing number of users
- It has a very pleasant GUI and connects also to
eDonkey and BitTorrent
- Ideal as a research test bed
- Large scale network demonstrates the need for
scalable P2P protocols
44Limewire Improvement on Gnutella
- Creation peer hierarchy based on capabilities
- previously all peers identical, most modem
blackholes
- connection preferencing
- favors routing to well-connected peers
- favors reply to clients that themselves serve
large number of files prevent freeloading
- Limewire gateway functions as Napster-like
central server on behalf of other peers
- for searching purposes
45Limewire
- The Limewire P2P file sharing program connects to
the Gnutella P2P network
- Limewire client software is widely recognized for
its clean user interface that does not contain
adware
- Sometimes billed as the fastest file sharing
program
- Limewire claims to offer relatively good search
and download performance
- Free Limewire software downloads are available
for Windows, Linux and Macintosh operating
systems
- Limewire Pro pay clients also exist
46BearShare
- The BearShare P2P file sharing program is a
popular free software client for the Gnutella P2P
network
- Both free and pay downloads of BearShare file
sharing programs exist
47Shareaza
- Shareaza is an up-and-coming P2P file sharing
program
- This client offers an extremely powerful search
engine capable of connecting to multiple popular
P2P networks including eDonkey, BitTorrent and
Gnutella - Shareaza file sharing software includes
intelligence for detecting fake and/or corrupted
files
- The free Shareaza download also contains no ads
or spyware
- As the installed base of Shareaza client users
grows
- expect Shareaza to become an even better P2P file
sharing program
48Anonymous?
- The person you are getting the file from knows
who you are
- Thats not anonymous.
- Other protocols exist where the owner of the
files doesnt know the requester.
- Peer-to-peer anonymity exists
49Summary
- peer-to-peer networking applications connect to
peer applications
- focus decentralized method of searching for
files
- each application instance serves to
- store selected files
- route queries (file searches) from and to its
neighboring peers
- respond to queries (serve file) if file stored
locally
- Gnutella history
- 3/14/00 release by AOL, almost immediately
withdrawn
- too late 23K users on Gnutella at 8 am this AM
- many iterations to fix poor initial design (poor
design turned many people off)
- What we care about
- How much traffic does one query generate?
- how many hosts can it support at once?
- What is the latency associated with querying?
- Is there a bottleneck?