Title: WP3l: Services and Overlay Networks
1WP3l Services and Overlay Networks
- TUC (lead), EPFL, UCL
- March 3, 2008
- Danny Bickson
2Presentation Outline
- WP3l focus in the final year of Evergrow
- Highlights of important results
- Conclusions
3WP3 Focus
- The design, prototype implementation and
evaluation of the - following services on top of DHTs
- Large-scale, distributed, information retrieval
and filtering - Large-scale, distributed inference by belief
propagation - Network monitoring and visualization (not in this
presentation but see Deliverable D3l.3) - This work is reported in Deliverable D3l.3.
4Large-scale, distributed, information retrieval
and filtering
- Previous years Development of information
retrieval and filtering systems DHTrie and
LibraRing (TUC), initial proposal of MAPS (TUC
and MPII Saarbrücken DELIS). - This year Emphasis on information filtering.
Further development and detailed experimental
evaluation of MAPS (TUC and MPII Saarbrücken
DELIS).
5DHTrie and LibraRing (Exact Information Filtering)
- DHTrie
- An exact information filtering system built on
top of a DHT. - Basic idea Index and store subscriptions in the
DHT. Make sure publications meet subscriptions. - The DHT is used as an indexing engine
- nodes index the queries
- publications sent to appropriate nodes (
distributed query execution) - Filtering effectiveness (aka recall) of a
centralised system. - LibraRing (Digital) Library (Chord) Ring
- Extension to DHTrie protocols
- to support both information retrieval and
filtering - tailored for digital library applications
6DHTrie and LibraRing (Exact Information Filtering)
- Deterministic query placement
- depends on query components
- basically a distributed index for continuous
queries - Nodes are responsible for
- indexing queries
- disseminating documents to nodes that may index
matching queries - Document-granularity dissemi-nation
- message overhead
- disseminates a nodes content to other nodes
P6 golf
P5
P4 opera
P1
P2 Vienna
P3 concert
7MAPS (Approximate Information Filtering)
- MAPS Minerva Approximate Publish/Subscribe
- Built on the Minerva P2P search engine developed
by MPII in the context of project DELIS. - Basic idea Relax the assumption of delivering
notifications for all matching publications - subscribers monitor only selected publishers
likely to publish relevant documents in the
future - rank publishers using novel ranking techniques
- standard resource selection techniques from IR
(e.g., CORI) are not suitable for publisher
ranking (refer to the past behaviour of a
publisher) - we need publishing prediction techniques tailored
to the IF case - new techniques are based on time series analysis
of IR statistics - trade recall for scalability and efficiency
- First proposal in the literature for approximate
information filtering
8MAPS protocols at a glance
- Directory Service
- A distributed directory layered on top of a DHT.
- DHT partitions the term space.
- Peers distribute per-term summaries to the
directory. - The directory manages aggregated statistical
information for terms. - Subscription Service
- Use Directory Service to retrieve peer
statistics - Rank peers
- compute score to predict how likely is a
publisher to produce matching documents in the
future) - use time-series analysis to predict publishing
behaviour - Forward the query to the top selected publishers
- Publication service
- Only publishers indexing a query produce a
notification!
9DHTrie vs. MAPS
- Exact Information Filtering
- Pros and cons
- Retrieval effectiveness
- Message traffic
- Publication rate dependence
- DHTrie Architecture
- Deterministic query placement
- Implicitly collected statistics (needed for
matching) - Explicit load balancing (appropriate algorithm)
- Approximate Information Filtering
- Pros and cons
- Scalability, data model independence
- Publication rate independence
- Lower recall
- MAPS Architecture
- Statistic query placement
- Explicitly collected statistics (needed for peer
ranking and matching) - Implicit load balancing (by query placement)
For a detailed comparison see also our IEEE
Internet Computing article
10Experimental Evaluation of MAPS
- Web crawls of gt2M Web pages with timestamp data,
1000 peers - Measure recall
- under various publishing scenarios Consistent
publishing, publishing breaks, topic changing,
intervals of topic changes, ... - when emphasizing resource selection (a closer to
1) or behaviour prediction (a closer to 0)
Recall while monitoring a fraction of the
publisher population ()
11Experimental Evaluation of MAPS
- Message traffic in MAPS is insensitive to
publication rate (contrary to DHTrie). - Recall can be further improved be fine tuning
predictin parameters per peer and per query. - For more details see
- Deliverable D3l.3
- our LSDS-IR paper
- our unpublished manuscript
- on MAPS protocols
Message traffic under different publication rates
12Belief Propagation
- Bayesian network on P-Grid
- Spring relaxation
- physics-inspired approach for CS algorithms.
- Correlated data are placed close for efficiency
- Minimum energy configuration
- Variable clustering
- Reduced communication cost
- Trade-off with load-balance
- Investigated networks
- Trees, scale-free, random
- CoopIS Efficient Peer-to-Peer Belief
Propagation
13Implementation over P-Grid
- Approximation
- we consider only the variables that are located
at the source and at the destination. - Push strategy
- hosts select among all possible actions (sending
a variable v to neighbor n) the one that gives
the highest reduction of tension - Load balance mechanism
- load of every host in interval l-, l
- hosts send variables only if their load is
greater than l- - Variables are sent only to neighbors that have a
load smaller than l
14Results
- Two distinct phases in the evolution
- host popularity depended
- Reduction of the distant edges is not monotonic
- partial knowledge of the distribution of
variables - Approximation errors
15Conclusions
- The research done in WP3l which has been
completed in the final year of Evergrow produced
many interesting results including - Three state-of-the-art information retrieval and
filtering systems built on top of DHTs (DHTrie,
LibraRing, MAPS) - Papers on these proposals appeared in top
conferences (SIGIR2005, ECDL 2005 and 2007) and
journals (IEEE Internet Computing 2007, ACM TOIS
- under revision). This work is already highly
appreciated in the literature and cited very
often by other researchers. - Exchange of results and cross-fertilisation with
other EU projects (DELIS, DELOS NoE, SelfMan). - Implementation and evaluation of a Belief
Propagation algorithm as a middleware service
16Future Work
- Improve implementation of DHTRie and MAPS using
large scale WAN deployment - Filtering out duplicate published information
- Extending prediction of publisher behavior to
other domains