Title: Peer to Peer Information Retrieval
1Peer to PeerInformation Retrieval
- By, Chetan K. Sundarde
- _at_CHETANSUNDARDE
- https//www.linkedin.com/in/chetansundarde
2Outlines -
- Peer to Peer Network
- Information Retrieval
- Peer to Peer Information Retrieval (P2PIR)
- Peer to peer IR system architectures
- Techniques used in IR in P2P networks
- Basic algorithms used in P2PIR
- Evaluation techniques used P2PIR
- Challenges
- Conclusion
- References
3Peer To Peer Network
- Collection of distributed system
- Computers leave and join the network frequently
- Each computer acts as a server and a client
simultaneously - three tasks that every peer-to-peer network
performs - Searching Querying and getting list of document
references. - Locating Resolve a document reference to
concrete location - full document - Transferring download the document.
4Applications of P2P
- Information Retrieval
- File Sharing
- Gnutella, Napster, Bit-torrent, etc.
5Information Retrieval -
- A field dealing with the structure, analysis,
organization, storage, searching and retrieval of
information is called information retrieval - Search relevant documents, on the basis of user
input
Query
IR
Document collection
Info. need
Retrieval
Answer list
6Basic Architecture of Information Retrieval
User Interface
Text
User Need
Text Operations
Database Manager
Indexing
Query Operations
User Feedback
Searching
Index
Query
Text Database
Ranked Docs
Retrieved Docs
Ranking
7Fields of Information Retrieval
Example of Content Example of Application Example of Task
Text Web Search Ad hoc search
Images, video Vertical Search Filtering
Scanned Document Desktop Search Question Answering
Text, Images, Audio, Video, Documents, zip files etc. Peer to Peer search P2P Information retrieval
P2PIR - P2P File Sharing and Federal IR
8Comparison between File Sharing and Information
Retrieval
File Sharing Information Retrieval
Application Locating Searching
Index
-Content File Identifiers Document Content
-Size Small Large
Data Exchange
-Unit File Search Result
-Size Megabyte Kilobyte(small)
P2PIR- file sharing networks and federated
information retrieval
9Peer to peer Information Retrieval (P2PIR)
- Searching in peer-to-peer networks
- Each peer shares its information with other peer
- Peer searches information by sending queries to
its peer - Routed to one or many other peers.
- Query result is provide in the form of index
10Generations of P2PIR
- 1st generation
- 2nd generation
- 3rd generation
11Peer to peer IR system architectures
- Based on relationship between peers
- Cooperative system
- Uncooperative system
- Based on the network structure
- Centralized network
- Structured architecture
- Unstructured architecture
- Based on task perform in P2P network
- Centralized Global Index
- Distributed Global Index
- Strict Local Indices
- Aggregated Local Indices
12Peer to peer IR system architectures
- Based on relationship between peers
- Cooperative system
- resource description, collection statistics and
collection index are usually stored in the
central place - Peer can use this information to help there
search - Uncooperative system
- Each peer is independent
- Based on the network structure
- Centralized network
- Structured architecture
- Unstructured architecture
13Peer to peer IR system architectures.
- Centralized network
- mix of traditional client-server architecture and
pure peer to peer architecture - Unstructured Architecture
- All the peers in the system are equal.
- They all can issue request, response to other
request and route requests to other nodes to
locate information. - Structured architecture
- peers are grouped or clustered
- Documents are placed not at random nodes but at
specified location - use of Distributed Hash Table (DHT)
14Peer to peer IR system architectures.
- Based on task perform in P2P network
- Sub-Task perform by Searching Task
- Indexing
- Who constructs the index? Where is stored?
- Querying Routing
- What path is used to send Query?
- Query Processing
- Which peer performs the actual query processing?
- Four commonly used peer-to-peer architectures
- Centralized Global Index
- Distributed Global Index
- Strict Local Indices
- Aggregated Local Indices
15Peer-to-Peer architectures used in IR
G
G
G
G
G
G
G
G
G
G
Central Global Index
Distributed Global Index
L
L
L
L
L
L
L
L
L
L
L
L
Aggregated Local Index
Strict Local Index
16Algorithm used in P2PIR
- Statistical IR algorithms
- Vector Space Model (VSM)
- Document A books on computer networks
- Document B network routing in P2P networks
- Query Q computer network
- Each elements of the vector corresponds to the
importance of the term in the document - Ranking of retrieved documents based Similarity
between document vector and query vector
17Algorithm used in P2PIR
- Statistical IR algorithms
- Latent Semantic Indexing (LSI)
documents
Va Vb
..
terms
- SVD singular value decomposition
- Reduce dimensionality
- Discover word semantics
- Cat lt-gt Pet
- Bus lt-gt Travel
18Algorithm used in P2PIR
- Distributed Hash Table (DHT)
- method of hash table lookup over a decentralized
distributed network - Keyvalue pairs are stored in
- Kdhash (books on computer networks)
- Kqhash (computer network)
- the DHT at a parent node. (Structured
Architecture) - Any node in the DHT can then efficiently retrieve
the value by providing its key. - Napster and BitTorrent
- modern DHTs are CAN, Chord, etc.
- Extend with Content-Based Search
- Full-Text Retrieval
- Content-Based Image Retrieval
- Content-Based Music Retrieval ,etc.
19P2P Information Retrieval Techniques
Unstructured
Structured
pSearch
BFS, RBFS, Eg. Gnutella
Routing Indices
Clustering
Indexing
Blind Search
Random Walk
Semantic Searching Eg. (SON)
Blind Search
Clustering
20Evaluation in P2P IR
- Recall (Are all the relevant documents
retrieved?) - fraction of the documents that are relevant to
the query that are successfully retrieved - Recall number of retrieved relevant in answer/
total number of relevant in the collection. - Precision (Are the retrieved documents relevant?)
- fraction of documents retrieved that are relevant
to a search query - Precision number of retrieved relevant in
answer/ number of retrieved Measure
retrieved relevant
Relevant
Retrieved
21Evaluation Techniques in P2P IR
- F-Score / F-measure
- Harmonic mean of precision and recall.
- Hits per Query
- average number of distinct relevant documents
discovered per search query.
22Applications Of P2P Information RetrievalIn Real
World
- YaCy (www.yacy.net)
- local index entries are injected into a
distributed global index - YaCy uses no centralized servers, but
- The resulting decentralized web search currently
has about 1.4 billion documents in its index and
more than 600 peer operators contribute each
month. About 130,000 search queries are performed
with this network each day (Feb 2015) - Faroo (www.faroo.com)
- This is a proprietary peer-to-peer search engine
that uses a distributed global index. - They perform distributed crawling and ranking.
- Faroo encrypts queries and results for privacy
protection. - 2 million peers.
- Some other P2PIR system Sixearch, ODISSEA,
MINERVA, Seeks, etc.
23Applications Of P2P Information RetrievalIn Real
World
- Some other P2PIR system
- Sixearch
- ODISSEA
- MINERVA
- Seeks
24Challenges-
- Cross-Language Information Retrieval
- Maintaining index freshness
- Security features
- Quality of service
- Efficient use of resources
- Increase range of peer-to-peer network
25Conclusion -
- P2PIR is one of the application of peer to peer
network - P2PIR combines key elements of File Sharing and
Federal Information Retrieval - No single technique is used for all P2PIR problem
- Recall and Precision are used for Evaluation of
P2PIR
26References
- ALMER S. TIGELAAR, DJOERD HIEMSTRA and DOLF
TRIESCHNIGG Peer-to-Peer Information Retrieval
University of Twente, IEEE PAPER SEPT 2012. - Rasanjalee Dissanayaka Mudiyanselage.
Ontology-based Search Algorithms over Large-
Scale Unstructured Peer-to-Peer Networks.Georgia
State University, IEEE , OCT 2014 - Demetrios Zeinalipour-Yazti . Information
Retrieval in Peer-to-Peer Systems . UNIVERSITY
OF CALIFORNIA RIVERSIDE, JUNE, IEEE 2003. - Chengye lu. Peer to Peer English/Chinese
Cross-Language Information Retrieval.Queensland
University of Technology, SEPT 2008.
27References
- Xiuqi Li and Jie Wu Searching Techniques in
Peer-to-Peer Networks. Florida Atlantic
University Boca Raton, FL 33431, 2007 - Christos Gkantsidis, Milena Mihail, and Amin
Saberi. Random Walks in Peer-to-Peer Networks.
Georgia Institute of Technology, Atlanta, GA,
2002. - Taoufik Yeferny, Amel Bouzeghoub and Khedija
Arour. A QUERY LEARNING ROUTING APPROACH BASED
ON SEMANTIC CLUSTERS.International Journal of
Advanced Information Technology (IJAIT) Vol. 1,
No.6, December 2011 - Yulian YANG . Semantic Information Retrieval
over P2P Networks.Universit de Lyon, CNRS
INSA-Lyon, LIRIS, UMR5205, F-69621, France, 2009.
28(No Transcript)