Webbased P2P support for Research Collaboration - PowerPoint PPT Presentation

1 / 43

About This Presentation

Title:

Webbased P2P support for Research Collaboration

Description:

Architectural design. Proposed network is a directed graph of nodes ... http://flame.cs.dal.ca/~monga/cgi-bin /Papers/A_Lexicon.pdf. url : NLP, lexicon. keywords : ... – PowerPoint PPT presentation

Number of Views:17

Avg rating:3.0/5.0

Slides: 44

Provided by: Dal111

Category:

more less

Transcript and Presenter's Notes

Title: Webbased P2P support for Research Collaboration

1
Web-based P2P support for Research Collaboration

Thesis defence
Pradeep Monga
Supervisor Dr. Vlado Keselj
Faculty of Computer Science
Dalhousie University
Date 20th June 05

2
Outline

Introduction
Problem description
Motivation and Objective
Our Contributions
Architectural design and related issues
Use case scenarios
Testing and Evaluation
Conclusion and Future work

3
Introduction to Research Collaboration

Formation of communities by researchers,
universities and/or industry for collaborating in
research work
Needed because some problems are inherently
large-scale and can not be solved individually
Leverages expertise and provides breadth in
knowledge availability

4
Problem description

We tried to address research collaboration
problem using web-based P2P support
This problem is peculiar because user has a fair
degree of knowledge in advance of the kind of
data items required in near or foreseeable future
Information exchange is prolonged in time as
compared to generic P2P problems

5
Motivation and Objective

Existing P2P applications are incapable to
fulfill special needs of research collaboration
Semi-automatic communication
Utilizing idle time of the researchers
Helping researchers to maintain the database of
data-items
No central database of nodes required
Adding another novel dimension to the mosaic of
Semantic Web applications

6
Our Contributions

Semi-autonomous
Pull-only connections
Keeping Keyword profiles
TTL update
Evaluation of the approach
and Paper Re-coll

7
Paper Re-coll

In few words .
It is a P2P-like system developed in Perl and is
open-source.
Users need to configure the nodes according to
their interests.
Metadata of newly uploaded data-items travels in
the network and updates all the nodes
If a match is found at any node, that data-item
would be downloaded automatically at the
concerned nodes.
Also supports manual filtering and paper download.

8
Architectural design

Proposed network is a directed graph of nodes
Figure shows structure of one node
User Interface are browsers
Processing module helps in communication
between user interface, paper archive and
cache archive
Paper archive consists of data-items and their
metadata
Cache archive consists of metadata cache from
the neighboring nodes
Each node has control over only the incoming
links of communication

9
Data Storage

Data-items are items that are transported between
nodes like publications, publication metadata,
CFPs, software and software metadata
Data-items are stored in binary form and their
metadata is stored in plain text files
automatically

10
Example of metadata in plain text
11
Related Issues

Communication HTTP
Naming and Discovery static names for nodes
Availability fault tolerant, handle
intermittent disconnections
Security HTTPS and .htaccess
Periodic Retrieval typical to our approach

12
Duplication Resolution

Unidirectional-flow
It implies, if node P1 communicates some metainfo
to P2 then P2 will not communicate it back to P1

13
Duplication Resolution

Delete cache
Idea is to create a repository of recently
deleted metadata at each node to check
duplication

14
Algorithms

Some mechanisms that help to automate our
approach
Periodic Retrieval
Automatic Download
TTL Update

15
Periodic Retrieval

Each node periodically keeps pulling the metadata
from the neighboring nodes.
Crontabs are used for automatic execution
Manual execution also possible

16
Automatic Download

Each node matches keywords of new metadata
received with its node keywords.
If a match is found that data-item is downloaded
on the node

17
TTL Update

Each node has a ttl value that it attaches to new
uploaded data-items at the concerned node.
These values keep updating if the values are not
sufficient to cover the entire network or if the
network keeps growing.

18
TTL update illustration

Suppose N is the originating node of a data-item
with ttl value 2
All the numbered nodes received the data-item
Nodes marked P (in total 3) would send message to
N that they could not receive data-item
Now Ns new ttl value would be 5 and is
sufficient to cover the entire network for future
uploads on N.

19
Establishing New Node

Pest represents established nodes of a working
network
Pnew represents a new node willing to connect
Pnew can go to any Pests homepage, enter its own
name, download the provided files and now Pnew is
connected to this network

20
Disconnecting/Re-connecting
21
What else can we do ?

Manual paper filtering and download
Add/Delete nodes in PeerGroup
Suggests possible candidates for PeerGroup
Local Search

22
Testing and Evaluations

Each node in a network has
Maximal time - maximum time a node takes to
retrieve metainfo from any other node in network
(represents farthest node)
Mean time - mean of time durations that a node
takes to retrieve data from all nodes in the
network (represents mean retrieval time)

23
Testing and Evaluations

first set of experiments - experimental set up
Analyze the effect of increasing number of nodes
in the network
All networks are unstructured
Metadata retrieval times are different for each
node
All nodes are hosted on same machine
Figures are average of three runs

24
Testing and Evaluations

first set of experiments
Maximal network pull time greatest maximal time
in the network
Mean of maximal times average of maximal
times of all nodes in a network
Mean network pull time average of mean times
of all nodes in a network

Time in mins
25
Testing and Evaluations

first set of experiments

26
Testing and Evaluations

second set of experiments - experimental set up
Analyze the effect of increasing the number of
data items in the network
Compare the effect on different network
topologies
Number of nodes in each network is 16
Figures show metadata update delays and not the
data item retrieval delays
Metadata retrieval times are regular for each
node and are similar for all nodes
All nodes are hosted on same machine
Figures are average of three runs

27
Testing and Evaluations

second set of experiments

28
Testing and Evaluations

second set of experiments

Time in mins
29
Testing and Evaluations

second set of experiments

30
Testing and Evaluations

third set of experiments - experimental set up
Experimental set up is same as in second set of
experiments so that results can be compared
Only metadata retrieval times are now randomized
for each node and are different for all nodes

31
Testing and Evaluations

third set of experiments

Time in mins
32
Testing and Evaluations

third set of experiments

33
Conclusions

Collaboration characterizes a scenario where
interaction is fundamentally peer-to-peer based.
Hence, P2P platform is a good choice for
enhancing collaboration experience
P2P framework is more flexible and scalable, and
becomes a logical alternative when being
physically distributed is an essential component
of a network
We provided a concise view of how pull-only
approach can be used to retrieve information from
neighboring nodes
The aim was to bring together people with common
research interests, providing them network
opportunity and remove barriers in collaborative
efforts
Preliminary experience with this approach is
impressive and reinforces the usefulness of the
prototype
Additional experimentation and deployment of the
prototype can further the validation of our
ideas.

34
Future work

Generalizing keyword-based profiling to a more
structural approach, using possibly Semantic Web,
to profile documents and nodes, and make
structured matching
Orienting towards Semantic Web using XML and SOAP
along with CGI on top of HTTP for better metadata
transfers
Generating metadata automatically as far as
possible
In the current implementation we are assuming
relatively high connectivity on the part of user
we can get rid of this assumption to a fair
degree by adding more URL links to the metadata
of a data item
Keeping profile of users according to their
expertise so that users with related expertise
could be grouped together for faster retrieval
Creating a network backbone of more powerful peers

35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Uploading a new data item

As soon as a new data item is uploaded at a node,
it is added to the metadata cache of that node
This metadata is then pulled by other connected
nodes in the network

39
Automatic Download

Node-keywords at each node can be configured by
the owner to match the requirements
At each node metadata cache is searched
frequently for data items that have keywords
matching the node-keywords
If there is any such data item present, it is
pulled directly from the concerned node