Webbased P2P support for Research Collaboration - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Webbased P2P support for Research Collaboration

Description:

Architectural design. Proposed network is a directed graph of nodes ... http://flame.cs.dal.ca/~monga/cgi-bin /Papers/A_Lexicon.pdf. url : NLP, lexicon. keywords : ... – PowerPoint PPT presentation

Number of Views:17
Avg rating:3.0/5.0
Slides: 44
Provided by: Dal111
Category:

less

Transcript and Presenter's Notes

Title: Webbased P2P support for Research Collaboration


1
Web-based P2P support for Research Collaboration
  • Thesis defence
  • Pradeep Monga
  • Supervisor Dr. Vlado Keselj
  • Faculty of Computer Science
  • Dalhousie University
  • Date 20th June 05

2
Outline
  • Introduction
  • Problem description
  • Motivation and Objective
  • Our Contributions
  • Architectural design and related issues
  • Use case scenarios
  • Testing and Evaluation
  • Conclusion and Future work

3
Introduction to Research Collaboration
  • Formation of communities by researchers,
    universities and/or industry for collaborating in
    research work
  • Needed because some problems are inherently
    large-scale and can not be solved individually
  • Leverages expertise and provides breadth in
    knowledge availability

4
Problem description
  • We tried to address research collaboration
    problem using web-based P2P support
  • This problem is peculiar because user has a fair
    degree of knowledge in advance of the kind of
    data items required in near or foreseeable future
  • Information exchange is prolonged in time as
    compared to generic P2P problems

5
Motivation and Objective
  • Existing P2P applications are incapable to
    fulfill special needs of research collaboration
  • Semi-automatic communication
  • Utilizing idle time of the researchers
  • Helping researchers to maintain the database of
    data-items
  • No central database of nodes required
  • Adding another novel dimension to the mosaic of
    Semantic Web applications

6
Our Contributions
  • Semi-autonomous
  • Pull-only connections
  • Keeping Keyword profiles
  • TTL update
  • Evaluation of the approach
  • and Paper Re-coll

7
Paper Re-coll
  • In few words .
  • It is a P2P-like system developed in Perl and is
    open-source.
  • Users need to configure the nodes according to
    their interests.
  • Metadata of newly uploaded data-items travels in
    the network and updates all the nodes
  • If a match is found at any node, that data-item
    would be downloaded automatically at the
    concerned nodes.
  • Also supports manual filtering and paper download.

8
Architectural design
  • Proposed network is a directed graph of nodes
  • Figure shows structure of one node
  • User Interface are browsers
  • Processing module helps in communication
    between user interface, paper archive and
    cache archive
  • Paper archive consists of data-items and their
    metadata
  • Cache archive consists of metadata cache from
    the neighboring nodes
  • Each node has control over only the incoming
    links of communication

9
Data Storage
  • Data-items are items that are transported between
    nodes like publications, publication metadata,
    CFPs, software and software metadata
  • Data-items are stored in binary form and their
    metadata is stored in plain text files
    automatically

10
Example of metadata in plain text
11
Related Issues
  • Communication HTTP
  • Naming and Discovery static names for nodes
  • Availability fault tolerant, handle
    intermittent disconnections
  • Security HTTPS and .htaccess
  • Periodic Retrieval typical to our approach

12
Duplication Resolution
  • Unidirectional-flow
  • It implies, if node P1 communicates some metainfo
    to P2 then P2 will not communicate it back to P1

13
Duplication Resolution
  • Delete cache
  • Idea is to create a repository of recently
    deleted metadata at each node to check
    duplication

14
Algorithms
  • Some mechanisms that help to automate our
    approach
  • Periodic Retrieval
  • Automatic Download
  • TTL Update

15
Periodic Retrieval
  • Each node periodically keeps pulling the metadata
    from the neighboring nodes.
  • Crontabs are used for automatic execution
  • Manual execution also possible

16
Automatic Download
  • Each node matches keywords of new metadata
    received with its node keywords.
  • If a match is found that data-item is downloaded
    on the node

17
TTL Update
  • Each node has a ttl value that it attaches to new
    uploaded data-items at the concerned node.
  • These values keep updating if the values are not
    sufficient to cover the entire network or if the
    network keeps growing.

18
TTL update illustration
  • Suppose N is the originating node of a data-item
    with ttl value 2
  • All the numbered nodes received the data-item
  • Nodes marked P (in total 3) would send message to
    N that they could not receive data-item
  • Now Ns new ttl value would be 5 and is
    sufficient to cover the entire network for future
    uploads on N.

19
Establishing New Node
  • Pest represents established nodes of a working
    network
  • Pnew represents a new node willing to connect
  • Pnew can go to any Pests homepage, enter its own
    name, download the provided files and now Pnew is
    connected to this network

20
Disconnecting/Re-connecting
21
What else can we do ?
  • Manual paper filtering and download
  • Add/Delete nodes in PeerGroup
  • Suggests possible candidates for PeerGroup
  • Local Search

22
Testing and Evaluations
  • Each node in a network has
  • Maximal time - maximum time a node takes to
    retrieve metainfo from any other node in network
    (represents farthest node)
  • Mean time - mean of time durations that a node
    takes to retrieve data from all nodes in the
    network (represents mean retrieval time)

23
Testing and Evaluations
  • first set of experiments - experimental set up
  • Analyze the effect of increasing number of nodes
    in the network
  • All networks are unstructured
  • Metadata retrieval times are different for each
    node
  • All nodes are hosted on same machine
  • Figures are average of three runs

24
Testing and Evaluations
  • first set of experiments
  • Maximal network pull time greatest maximal time
    in the network
  • Mean of maximal times average of maximal
    times of all nodes in a network
  • Mean network pull time average of mean times
    of all nodes in a network

Time in mins
25
Testing and Evaluations
  • first set of experiments

26
Testing and Evaluations
  • second set of experiments - experimental set up
  • Analyze the effect of increasing the number of
    data items in the network
  • Compare the effect on different network
    topologies
  • Number of nodes in each network is 16
  • Figures show metadata update delays and not the
    data item retrieval delays
  • Metadata retrieval times are regular for each
    node and are similar for all nodes
  • All nodes are hosted on same machine
  • Figures are average of three runs

27
Testing and Evaluations
  • second set of experiments

28
Testing and Evaluations
  • second set of experiments

Time in mins
29
Testing and Evaluations
  • second set of experiments

30
Testing and Evaluations
  • third set of experiments - experimental set up
  • Experimental set up is same as in second set of
    experiments so that results can be compared
  • Only metadata retrieval times are now randomized
    for each node and are different for all nodes

31
Testing and Evaluations
  • third set of experiments

Time in mins
32
Testing and Evaluations
  • third set of experiments

33
Conclusions
  • Collaboration characterizes a scenario where
    interaction is fundamentally peer-to-peer based.
    Hence, P2P platform is a good choice for
    enhancing collaboration experience
  • P2P framework is more flexible and scalable, and
    becomes a logical alternative when being
    physically distributed is an essential component
    of a network
  • We provided a concise view of how pull-only
    approach can be used to retrieve information from
    neighboring nodes
  • The aim was to bring together people with common
    research interests, providing them network
    opportunity and remove barriers in collaborative
    efforts
  • Preliminary experience with this approach is
    impressive and reinforces the usefulness of the
    prototype
  • Additional experimentation and deployment of the
    prototype can further the validation of our
    ideas.

34
Future work
  • Generalizing keyword-based profiling to a more
    structural approach, using possibly Semantic Web,
    to profile documents and nodes, and make
    structured matching
  • Orienting towards Semantic Web using XML and SOAP
    along with CGI on top of HTTP for better metadata
    transfers
  • Generating metadata automatically as far as
    possible
  • In the current implementation we are assuming
    relatively high connectivity on the part of user
    we can get rid of this assumption to a fair
    degree by adding more URL links to the metadata
    of a data item
  • Keeping profile of users according to their
    expertise so that users with related expertise
    could be grouped together for faster retrieval
  • Creating a network backbone of more powerful peers

35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
Uploading a new data item
  • As soon as a new data item is uploaded at a node,
    it is added to the metadata cache of that node
  • This metadata is then pulled by other connected
    nodes in the network

39
Automatic Download
  • Node-keywords at each node can be configured by
    the owner to match the requirements
  • At each node metadata cache is searched
    frequently for data items that have keywords
    matching the node-keywords
  • If there is any such data item present, it is
    pulled directly from the concerned node

40
(No Transcript)
41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com