Associative Peer to Peer Networks: Harnessing Latent Semantics - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Associative Peer to Peer Networks: Harnessing Latent Semantics

Description:

Scope: ability to locate 'rare' items 'Find the 10th episode of ... Partial-match/complex queries: 'Find an Indiana Jones movie' ...Or 'Indiana Joens' movie. ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 40
Provided by: edith4
Category:

less

Transcript and Presenter's Notes

Title: Associative Peer to Peer Networks: Harnessing Latent Semantics


1
Associative Peer to Peer Networks Harnessing
Latent Semantics
  • Edith Cohen
  • ATT Labs-research

Amos Fiat Haim Kaplan Tel-Aviv University
2
Traditional Client-server Web
3
Peer-to-peer Networks
Distributed network for sharing content (music,
video, software, etc.), where each host acts as
both a server and a client
  • Harness vast resources
  • Scalability/Robustness to failures/shutdowns

4
P2P Search

Overall performance of a P2P network highly
depends on the efficiency and versatility of
search
What features are important ?
  • Scope ability to locate rare items
    Find the 10th episode of Star Trek Voyager
  • Partial-match/complex queries
    Find an Indiana Jones movie
  • Or Indiana Joens movie..

5
(search in) Basic P2P Architectures
Partial-Matches
Scope
Centralized (Napster) central index service.
  • Decentralized peers are connected by low-degree
    overlay network.

6
Associative P2P networks
  • Retain Gnutellas desirable properties
  • Distributed overlay network
  • Peers store only what they need (common good at
    par with own welfare)
  • No tight control of topology/content
  • Support partial-match queries
  • AND
  • Have search scope (orders of magnitude
    improvement over Gnutella)
  • Make implicit use of latent semantics
  • Provably good on a reasonable model
  • Very good on simulations

7
P2P search framework
  • Search queries are propagated on the overlay
    (from peer to a neighbor peer).
  • When a peer receives a query, it checks if it can
    satisfy it decreases hop count and forwards it
    to a subset of its neighbors.
  • Each search includes query and a propagation
    rule, which determines which neighbors the
    search is propagated to.

DHTs propagation rule hash of
query Gnutella propagation rule independent
of query Associative propagation rules are
predicates (guide rules)
8
Overview
  • What do we mean by latent semantics ?
  • Challenges in using latent semantics in P2P
    setting
  • Our proposal search propagation via Possession
    rules
  • Possession rules overlays
  • Search strategies
  • Possession rules search strategies Rapier, GAS
  • Models for blind search strategies (gnutella)
  • Analysis in the Itemsets model
  • Experimental evaluation
  • More on GAS search strategy

9
View of P2P file sharing network
10
What is latent semantics?
  • Selections people make are dependent
  • If you buy baby formula, you are more likely to
    buy diapers.
  • If two people loved a show, they are more likely
    to agree on other shows.
  • Peer/Item matrix is Market Basket dataset.
    Similar to buyers/items, Document/terms,
    Web-pages/hyperlinks, movies/viewers.
  • Applications for extracting patterns from market
    basket data Information Retrieval, Collaborative
    Filtering, Web search, Marketing, Recommendation
    Systems,. (clustering, search, association
    rules)

?? P2P search direct queries to peers with
interests that match yours
11
Challenges
  • Overlay topology (networking aspects) must be
    coupled with search strategy (Information
    Retrieval/Data-Mining)
  • Traditional IR and data-mining tools are not
    adapted to the highly distributed P2P setting.
  • Similarity metrics/clustering/ranking involve
    matrix operations on the market basket data
    principal component analysis (LSI), eigenvalue
    computations, association rules

12
Possession Rules
  • Rule(O) do you possess item O ?
  • Peer maintains a possession rule for each item in
    its index (subset if index is large)
  • Search strategy a sequence of possession rules
    (with hop counts/search size limit)

Making this work
13
Possession-rules overlays
Peer26
Index of P26 Rules/Items Rule(A) Rule(B) Rule(C
) Rule(D)
14
Rules/Items Rule(A) Rule(B) Rule(C ) Rule(D)
15
Possession-rule overlay
Network is gnutella-like, within each rule
  • Coverage The induced overlay on peers that
    satisfy each rule constitutes of large connected
    components.
  • Small degree Each peer participates in a limited
    number of rules. (yet, overall there is a large
    number rules), for each rule it participates
    in, the peer maintains several participating
    neighbors.
  • Overlay and search boost each other (easy to find
    appropriate neighbors for each rule)
  • When you find O, you often discover multiple
    peers that have O when you give O, the searcher
    informs you of other peers with O.
  • Peers that have O can find other peers that have O

( can use super-peer overlay within each rule
!!)
16
Search strategies
  • To beat blind search, associative search should
    probe peers that are more likely to answer than
    random peers
  • Associative search
  • RAPIER Random Possession Rule crudest
    strategy
  • GAS Greedy Selection refined strategy
  • Blind search
  • Urand (gnutella) all peers have same
    likelihood of being probed in each query
  • Prand (gnutella modified) peers are probed
    proportionally to their index size (RAPIER has
    same bias)

17
RAPIER Random Possession Rulesimplest
possession-rule based strategy
  • RAPIER Search strategy
  • Repeat until found
  • Pick a random item O from your index
  • Search peers that have this item (using rule(O))

Straightforward to implement on top of a
possession-rule overlay network
18
Analysis Itemsets Model
  • Items belong to topics. There are very many
    topics but each peer can only select items from
    a fixed set of topics. Topic popularities can
    highly vary but each peer has equal interest in
    each of its topics.
  • We show that
  • RAPIER is at least as good as Prand
  • RAPIER is better than Prand when peers have fewer
    topics
  • Simple model that hints on what is going on

19
Experiments
  • Data used Client/Hostname matrix from proxy
    logs as peer/item matrix. Each entry, in turn,
    is treated as a search item.
  • Similarly-structured market basket data
  • Has rare items (which current P2P networks dont
    support)
  • No universal model for market basket data
  • Cant get a full index for many peers from
    current P2P networks and these networks dont
    reflect well on rare items.
  • Metric ESS (Expected Search Size number of
    peers probed till search is resolved). CDF of
    fraction of searches that have ESS below x.

20
ESS Expected Search Size
  • ESS 1/(success probability in each probe) (when
    probes are independent not true for GAS)
  • Probe success probability
  • Urand fraction of peers that have the item in
    their index
  • Prand weight of each peer is its index size
    divided by sum of index sizes of all peers.
  • Success prob (weight of peers with item) /
    (weight of peers without item)
  • RAPIER the average, over possession rules peer
    participates in, of fraction of peers in rule
    that have the item.

21
Peer-Item Matrix - Experiment
Items
?
?
?
?
?
?
Peers
?
?
22
Urand and Prand
Items
Peers
?
23
RAPIER (Random Possession Rule)
Items
Peers
?
24
Caveat comparing apples and oranges
  • When searching by possession rules we have bias
    towards peers that participate in more rules/
    have more items.
  • But, with this bias, a strategy has better chance
    of finding what it is looking for! So
  • We show that the likelihood of being probed is
    proportional to number of rules you participate
    in.
  • Prand blind search strategy has same bias.
  • Thus, it is fair to compare Prand search with
    possession-rule based RAPIER

25
GAS Refining RAPIER
  • Ideas
  • Some rules are better than others (e.g.,
    possession of a very popular item carries weaker
    information)
  • Unsuccessful search carries information suppose
    you lost something, you think you lost it at
    home. You search home going through various
    closets and drawers and dont find it, then you
    may decide to go search the office, even if you
    have not completed an exhaustive search at home.
    What happened? The posterior distribution on the
    items location had changed as a result of the
    search.

26
All Items
  • Urand Blind search (Gnutella),
  • Prand Gnutella modified,
  • Rapier, GAS our algorithms

27
Rare Items present in 1 of peers
28
Rarer items 0.1 of peers
29
Even Rarer Item 0.01 of peers
30
GAS Greedy Strategy
  • Idea use the search strategy that would have
    optimized your search on previous queries.
  • Caveat this is NP-Complete
  • Can do greedy approximation strategy GAS
  • GAS
  • initialize the query vector to a uniform
    distribution on previous selections.
  • Iterate the following
  • Apply the possession rule that maximizes success
    probability with respect to the query posterior
  • update the query posterior.

Theorem GAS is a constant factor approximation
of the optimal strategy
31
Building GAS strategies
  • GAS
  • Take a sample of items currently in your index
    D,E,F,G.
  • search for these items in each possession rule
    you participate A,B,C
  • obtain a matrix fraction of peers with item x in
    rule(y)

32
GAS strategy (example)
C,C,C,A,C,C,A,C,A,C,B,B,A,C,B,B,C,A,B,B,C
GAS search of size 21 10 probes in rule(C)
6 probes in rule(B) 5 probes in rule(A)

RAPIER search of size 21 7 probes in
rule(C) 7 probes in rule(B) 7 probes in
rule(A)
33
Summary
  • We proposed a general framework for associative
    P2P search exploit patterns inherent in human
    selections to boost search. Adapted to the P2P
    setting.
  • Search strategies and the overlay structure are
    symbiotic and guided/boosted by previous
    selections/queries.
  • Common good in par with own welfare All data
    maintained by each peer has direct personal
    benefit (like gnutella). Helping others helps
    you
  • Possession rules
  • Strategies are approximations to standard
    similarity metrics that work!!.
  • Easy to find other sources of desired item (for
    alternative/parallel downloads)

34
Related work
  • IR-DM association rules/collaborative
    filtering/Web search
  • P2P networks unstructured networks DHTs
  • DHTs have symbiotic overlay/search strategy
  • Caching at peers (Freenet) adapt overlay
    according to search
  • Intersection
  • Crespo/Garcia-Molina 02 routing indexes
  • System isolates topicsmap queries/items to
    topics.
  • Peer knows summary of what can be reached thru
    it/each neighbor
  • Query keywords are used to select a neighbor who
    is a best match
  • Differences from our approach
  • No connection between search and overlay topology
  • Uses only text/keywords. We use co-location
    associations between items.
  • CG02 tradeoff between topic divergence (all
    nodes ending up with similar index summary) or
    restricted coverage (number of peers included in
    each peer summary)
  • neurogrid.net (Sam Joseph, U. Tokyo) agent
    text-based approach
  • Peers learn and remember content of other peers

35
Future
  • Integrate text matching (of query keywords) in
    search strategy (use rule(O) if query keywords
    match Os metadata)
  • Select which possession rules to participate in
    (e.g., using item popularity heuristic or
    GAS-like selection)
  • Search strategy gives more weight to more recent
    selections (are more indicative of next query)
  • Explore other types of propagation rules
  • P2P communities ?
  • Integrate Recommendation Systems in P2P ?
  • Implementation

36
Thank You!
37
Some Extra Comments
  • Issues with straightforward importing of IR
    techniques
  • Vector space approach
  • Similarity metrics
  • Why we need to use several propagation rules in a
    search? (when searching according to examples
    in the index)

38
Straight IR vector-space approach
  • Peers are mapped to vectors, according to their
    index content. Queries are mapped to the vectors
    in the same space.
  • Overlay topology is correlated with distances in
    this vector space (bias towards closer peers)
  • Search propagation targets regions of the space
    that are closest to the query.
  • neighborsO(dimension) - want small dimension
  • Yet, Matrix operations, e.g principal component
    analysis (LSI), are hard in our distributed
    setting
  • Yet, each peer should be able to compute the
    mapping for its queries and/or index
  • Proximity metric alone is insufficient (Need
    different propagation rules)

39
Why we need several propagation rules for the
same query decision-tree like search
  • propagation rule approx interest area
  • Each peer covers several interest areas, peers
    have different sets of interest areas.
  • Peer Query 80 basketball 20polo
  • World Index 5 basketball 0.1 polo
  • All basketball lovers would be close matches
    but need to direct search to more polo lovers
  • multi-rule search strategy basketball 200
    peers polo 200 peers
Write a Comment
User Comments (0)
About PowerShow.com