Crawling Gnutella Network - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Crawling Gnutella Network

Description:

Join: How do I begin participating? Publish: How do I advertise my file(s)? Search: How do I find a file? Fetch: How do I retrieve a file? 8. Gnutella Protocol ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 24
Provided by: sameral
Category:

less

Transcript and Presenter's Notes

Title: Crawling Gnutella Network


1
Crawling Gnutella Network
By Samer Al-Kiswany
2
Roadmap
  • Introduction
  • Gnutella network structure
  • Gnutella protocol overview
  • Gnutella crawling protocol
  • Crawling topology information
  • Crawling node content
  • Demo

3
Introduction
Gnutella network is a decentralized peer to peer
system for file sharing.
  • Original created by Justin Frankel of Nullsoft
  • Large scale
  • today up to 4M nodes, 1000TB data, 100M files
    today
  • Fast growth in its early stages
  • more than 50 times during first half of 2001
  • (50 times again 2001 to 2006)
  • Self-organizing network
  • Open architecture, simple and flexible protocol

4
Roadmap
  • Introduction
  • Gnutella network structure
  • Gnutella protocol overview
  • Gnutella crawling protocol
  • Crawling topology information
  • Crawling node content
  • Demo

5
Gnutella Network Structure
Gnutella Protocol 0.6
Two tier architectures of ultrapeers and leaves
Ultrapeers
Leaves
6
Roadmap
  • Introduction
  • Gnutella network structure
  • Gnutella protocol overview
  • Gnutella crawling protocol
  • Crawling topology information
  • Crawling node content
  • Demo

7
Basic Primitives for File Sharing
  • Join How do I begin participating?
  • Publish How do I advertise my file(s)?
  • Search How do I find a file?
  • Fetch How do I retrieve a file?

8
Gnutella Protocol Overview
  • Join on startup, client contacts an ultrapeer
    node(s)
  • Publish no need
  • Search
  • Ask the ultrapeer node
  • The ultrapeer will propagate the questions to
    other ultrapeers and will return the answer back
  • Fetch get the file directly from peer (HTTP)

9
Roadmap
  • Introduction
  • Gnutella network structure
  • Gnutella protocol overview
  • Gnutella crawling protocol
  • Crawling topology information
  • Crawling node content
  • Demo

10
Crawling a Gnutella node
  • By Crawling we are interested in two main pieces
    of information
  • With whom the node is connected ? - Topology
    information
  • Gnutella protocols terms Crawling/Communicating
    Network Topology Information
  • What files the node is sharing with others?
  • Gnutella protocol terms Browsing Host

11
Crawling Topology Information
Gnutella protocol 0.6 supports network topology
information crawling !!!
  • Topology Information
  • Ultrapeers
  • Leaves

12
Crawling Topology Information
GNUTELLA CONNECT/0.6 User-Agent LimeWire
(crawl) X-Ultrapeer False Query-Routing 0.1
Crawler 0.1
GNUTELLA/0.6 200 OK User-Agent BearShare
Leaves 127.0.0.16346,127.0.0.26346 Peers
127.0.0.46346,127.0.0.56346
GNUTELLA/0.6 200 OK
13
Browsing Node Content
Gnutella Network
14
Browsing Node Content
GET / HTTP/1.1 Host Crawler_IPPORT User-Agent
UBCECE Accept application/x-gnutella-packets Conn
ection close
HTTP/1.1 200 OK Server LimeWire/x.y Content-Type
application/x-gnutella-packets Connectionclose lt
List of filesgt
Query Hit Message
15
Query Hit Parsing
Query Hit Message
1 Gnutella message header important field
message length.
The HTTP response message may contain more than
one query Hit response
2 Query Hit Header important field
Number of files
A-F list of shared files includes file name
and size
3 Other Gnutella protocol fields
1
- - -
1
16
Limitations - Does this always work ?
  • Topology Crawling
  • The topology information crawling is not
    supported by some Gnutella protocol v0.4
    implementations
  • Host Browsing
  • Some Gnutella node implementations will return
    the list of files in HTML (BearShare for
    instance). (will not respond with Query Hit
    message)

17
Roadmap
  • Introduction
  • Gnutella network structure
  • Gnutella protocol overview
  • Gnutella crawling protocol
  • Crawling topology information
  • Crawling node content
  • Demo

18
Single Gnutella-Node Crawler
A proof of concept implementation of single
Gnutella-node crawler.
  • The main class that implements the crawling
    protocol is the Crawler class
  • crawlpeers(ip_address, port)
  • parsePeers(byte )
  • listFiles(ip_address, port)
  • processQueryHit(byte )

Available through the following
link http//www.ece.ubc.ca/samera/TA/project/sgnc
.html
19
Demo !!!
Crawling reala.ece.ubc.ca 5627
20
Project Phase II
  • Implement a single-node Gnutella network crawler
  • Report
  • The active leaf nodes
  • Information regarding the agent (i.e., the
    implementation LimeWire , BearShare etc)
  • The domain name corresponding to the node IP
    address.
  • List all the files shared (excluding for
    BearShare servants).

Avoid cycles !!
21
Project Phase III
  • Implement a master/worker crawler with Java NIO
    sockets.

Crawl the following list
Results peers IPs, statistics
Problems ?
Gnutella Network
22
References
  • Single Gnutella-Node Crawler http//www.ece.ubc.c
    a/samera/TA/project/sgnc.html
  • Gnutella Crawling protocol http//www.ece.ubc.ca
    /samera/TA/project/Gnuttela-Protocol.html
  • Other references
  • http//gnutella-specs.rakjar.de/index.php/Main_Pag
    e
  • www.limewire.com

23
Thank you
www.ece.ubc.ca/samera
Write a Comment
User Comments (0)
About PowerShow.com