WP3l: Services and Overlay Networks - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

WP3l: Services and Overlay Networks

Description:

The design, prototype implementation and evaluation of the. following services on top of DHTs: ... concert. Deterministic query placement. depends on query components ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 17
Provided by: scott520
Category:

less

Transcript and Presenter's Notes

Title: WP3l: Services and Overlay Networks


1
WP3l Services and Overlay Networks
  • TUC (lead), EPFL, UCL
  • March 3, 2008
  • Danny Bickson

2
Presentation Outline
  • WP3l focus in the final year of Evergrow
  • Highlights of important results
  • Conclusions

3
WP3 Focus
  • The design, prototype implementation and
    evaluation of the
  • following services on top of DHTs
  • Large-scale, distributed, information retrieval
    and filtering
  • Large-scale, distributed inference by belief
    propagation
  • Network monitoring and visualization (not in this
    presentation but see Deliverable D3l.3)
  • This work is reported in Deliverable D3l.3.

4
Large-scale, distributed, information retrieval
and filtering
  • Previous years Development of information
    retrieval and filtering systems DHTrie and
    LibraRing (TUC), initial proposal of MAPS (TUC
    and MPII Saarbrücken DELIS).
  • This year Emphasis on information filtering.
    Further development and detailed experimental
    evaluation of MAPS (TUC and MPII Saarbrücken
    DELIS).

5
DHTrie and LibraRing (Exact Information Filtering)
  • DHTrie
  • An exact information filtering system built on
    top of a DHT.
  • Basic idea Index and store subscriptions in the
    DHT. Make sure publications meet subscriptions.
  • The DHT is used as an indexing engine
  • nodes index the queries
  • publications sent to appropriate nodes (
    distributed query execution)
  • Filtering effectiveness (aka recall) of a
    centralised system.
  • LibraRing (Digital) Library (Chord) Ring
  • Extension to DHTrie protocols
  • to support both information retrieval and
    filtering
  • tailored for digital library applications

6
DHTrie and LibraRing (Exact Information Filtering)
  • Deterministic query placement
  • depends on query components
  • basically a distributed index for continuous
    queries
  • Nodes are responsible for
  • indexing queries
  • disseminating documents to nodes that may index
    matching queries
  • Document-granularity dissemi-nation
  • message overhead
  • disseminates a nodes content to other nodes

P6 golf
P5
P4 opera
P1
P2 Vienna
P3 concert
7
MAPS (Approximate Information Filtering)
  • MAPS Minerva Approximate Publish/Subscribe
  • Built on the Minerva P2P search engine developed
    by MPII in the context of project DELIS.
  • Basic idea Relax the assumption of delivering
    notifications for all matching publications
  • subscribers monitor only selected publishers
    likely to publish relevant documents in the
    future
  • rank publishers using novel ranking techniques
  • standard resource selection techniques from IR
    (e.g., CORI) are not suitable for publisher
    ranking (refer to the past behaviour of a
    publisher)
  • we need publishing prediction techniques tailored
    to the IF case
  • new techniques are based on time series analysis
    of IR statistics
  • trade recall for scalability and efficiency
  • First proposal in the literature for approximate
    information filtering

8
MAPS protocols at a glance
  • Directory Service
  • A distributed directory layered on top of a DHT.
  • DHT partitions the term space.
  • Peers distribute per-term summaries to the
    directory.
  • The directory manages aggregated statistical
    information for terms.
  • Subscription Service
  • Use Directory Service to retrieve peer
    statistics
  • Rank peers
  • compute score to predict how likely is a
    publisher to produce matching documents in the
    future)
  • use time-series analysis to predict publishing
    behaviour
  • Forward the query to the top selected publishers
  • Publication service
  • Only publishers indexing a query produce a
    notification!

9
DHTrie vs. MAPS
  • Exact Information Filtering
  • Pros and cons
  • Retrieval effectiveness
  • Message traffic
  • Publication rate dependence
  • DHTrie Architecture
  • Deterministic query placement
  • Implicitly collected statistics (needed for
    matching)
  • Explicit load balancing (appropriate algorithm)
  • Approximate Information Filtering
  • Pros and cons
  • Scalability, data model independence
  • Publication rate independence
  • Lower recall
  • MAPS Architecture
  • Statistic query placement
  • Explicitly collected statistics (needed for peer
    ranking and matching)
  • Implicit load balancing (by query placement)

For a detailed comparison see also our IEEE
Internet Computing article
10
Experimental Evaluation of MAPS
  • Web crawls of gt2M Web pages with timestamp data,
    1000 peers
  • Measure recall
  • under various publishing scenarios Consistent
    publishing, publishing breaks, topic changing,
    intervals of topic changes, ...
  • when emphasizing resource selection (a closer to
    1) or behaviour prediction (a closer to 0)

Recall while monitoring a fraction of the
publisher population ()
11
Experimental Evaluation of MAPS
  • Message traffic in MAPS is insensitive to
    publication rate (contrary to DHTrie).
  • Recall can be further improved be fine tuning
    predictin parameters per peer and per query.
  • For more details see
  • Deliverable D3l.3
  • our LSDS-IR paper
  • our unpublished manuscript
  • on MAPS protocols

Message traffic under different publication rates
12
Belief Propagation
  • Bayesian network on P-Grid
  • Spring relaxation
  • physics-inspired approach for CS algorithms.
  • Correlated data are placed close for efficiency
  • Minimum energy configuration
  • Variable clustering
  • Reduced communication cost
  • Trade-off with load-balance
  • Investigated networks
  • Trees, scale-free, random
  • CoopIS Efficient Peer-to-Peer Belief
    Propagation

13
Implementation over P-Grid
  • Approximation
  • we consider only the variables that are located
    at the source and at the destination.
  • Push strategy
  • hosts select among all possible actions (sending
    a variable v to neighbor n) the one that gives
    the highest reduction of tension
  • Load balance mechanism
  • load of every host in interval l-, l
  • hosts send variables only if their load is
    greater than l-
  • Variables are sent only to neighbors that have a
    load smaller than l

14
Results
  • Two distinct phases in the evolution
  • host popularity depended
  • Reduction of the distant edges is not monotonic
  • partial knowledge of the distribution of
    variables
  • Approximation errors

15
Conclusions
  • The research done in WP3l which has been
    completed in the final year of Evergrow produced
    many interesting results including
  • Three state-of-the-art information retrieval and
    filtering systems built on top of DHTs (DHTrie,
    LibraRing, MAPS)
  • Papers on these proposals appeared in top
    conferences (SIGIR2005, ECDL 2005 and 2007) and
    journals (IEEE Internet Computing 2007, ACM TOIS
    - under revision). This work is already highly
    appreciated in the literature and cited very
    often by other researchers.
  • Exchange of results and cross-fertilisation with
    other EU projects (DELIS, DELOS NoE, SelfMan).
  • Implementation and evaluation of a Belief
    Propagation algorithm as a middleware service

16
Future Work
  • Improve implementation of DHTRie and MAPS using
    large scale WAN deployment
  • Filtering out duplicate published information
  • Extending prediction of publisher behavior to
    other domains
Write a Comment
User Comments (0)
About PowerShow.com