Associative Peer to Peer Networks: Harnessing Latent Semantics - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Associative Peer to Peer Networks: Harnessing Latent Semantics

Description:

Associative Peer to Peer Networks: Harnessing Latent Semantics Edith Cohen AT&T Labs-research Amos Fiat Haim Kaplan Tel-Aviv University Traditional Client ... – PowerPoint PPT presentation

Number of Views:238

Avg rating:3.0/5.0

Slides: 40

Provided by: Edith63

Category:

more less

Transcript and Presenter's Notes

Title: Associative Peer to Peer Networks: Harnessing Latent Semantics

1
Associative Peer to Peer Networks Harnessing
Latent Semantics

Edith Cohen
ATT Labs-research

Amos Fiat Haim Kaplan Tel-Aviv University
2
Traditional Client-server Web
3
Peer-to-peer Networks
Distributed network for sharing content (music,
video, software, etc.), where each host acts as
both a server and a client

Harness vast resources
Scalability/Robustness to failures/shutdowns

4
P2P Search

Overall performance of a P2P network highly
depends on the efficiency and versatility of
search
What features are important ?

Scope ability to locate rare items
Find the 10th episode of Star Trek Voyager
Partial-match/complex queries
Find an Indiana Jones movie
Or Indiana Joens movie..

5
(search in) Basic P2P Architectures
Partial-Matches
Scope
Centralized (Napster) central index service.

Decentralized peers are connected by low-degree
overlay network.

6
Associative P2P networks

Retain Gnutellas desirable properties
Distributed overlay network
Peers store only what they need (common good at
par with own welfare)
No tight control of topology/content
Support partial-match queries
AND
Have search scope (orders of magnitude
improvement over Gnutella)

Make implicit use of latent semantics
Provably good on a reasonable model
Very good on simulations

7
P2P search framework

Search queries are propagated on the overlay
(from peer to a neighbor peer).
When a peer receives a query, it checks if it can
satisfy it decreases hop count and forwards it
to a subset of its neighbors.
Each search includes query and a propagation
rule, which determines which neighbors the
search is propagated to.

DHTs propagation rule hash of
query Gnutella propagation rule independent
of query Associative propagation rules are
predicates (guide rules)
8
Overview

What do we mean by latent semantics ?
Challenges in using latent semantics in P2P
setting
Our proposal search propagation via Possession
rules
Possession rules overlays
Search strategies
Possession rules search strategies Rapier, GAS
Models for blind search strategies (gnutella)
Analysis in the Itemsets model
Experimental evaluation
More on GAS search strategy

9
View of P2P file sharing network
10
What is latent semantics?

Selections people make are dependent
If you buy baby formula, you are more likely to
buy diapers.
If two people loved a show, they are more likely
to agree on other shows.

Peer/Item matrix is Market Basket dataset.
Similar to buyers/items, Document/terms,
Web-pages/hyperlinks, movies/viewers.
Applications for extracting patterns from market
basket data Information Retrieval, Collaborative
Filtering, Web search, Marketing, Recommendation
Systems,. (clustering, search, association
rules)

?? P2P search direct queries to peers with
interests that match yours
11
Challenges

Overlay topology (networking aspects) must be
coupled with search strategy (Information
Retrieval/Data-Mining)
Traditional IR and data-mining tools are not
adapted to the highly distributed P2P setting.
Similarity metrics/clustering/ranking involve
matrix operations on the market basket data
principal component analysis (LSI), eigenvalue
computations, association rules

12
Possession Rules

Rule(O) do you possess item O ?
Peer maintains a possession rule for each item in
its index (subset if index is large)
Search strategy a sequence of possession rules
(with hop counts/search size limit)

Making this work
13
Possession-rules overlays
Peer26
Index of P26 Rules/Items Rule(A) Rule(B) Rule(C
) Rule(D)
item Rule(item) neighbors
A P11,p7,p3
B P2,p6,p9
C P13,p15,p1
D P4,p5,p10
14
Rules/Items Rule(A) Rule(B) Rule(C ) Rule(D)
15
Possession-rule overlay
Network is gnutella-like, within each rule

Coverage The induced overlay on peers that
satisfy each rule constitutes of large connected
components.
Small degree Each peer participates in a limited
number of rules. (yet, overall there is a large
number rules), for each rule it participates
in, the peer maintains several participating
neighbors.
Overlay and search boost each other (easy to find
appropriate neighbors for each rule)

When you find O, you often discover multiple
peers that have O when you give O, the searcher
informs you of other peers with O.
Peers that have O can find other peers that have O

( can use super-peer overlay within each rule
!!)
16
Search strategies

To beat blind search, associative search should
probe peers that are more likely to answer than
random peers
Associative search
RAPIER Random Possession Rule crudest
strategy
GAS Greedy Selection refined strategy

Blind search
Urand (gnutella) all peers have same
likelihood of being probed in each query
Prand (gnutella modified) peers are probed
proportionally to their index size (RAPIER has
same bias)

17
RAPIER Random Possession Rulesimplest
possession-rule based strategy

RAPIER Search strategy
Repeat until found
Pick a random item O from your index
Search peers that have this item (using rule(O))

Straightforward to implement on top of a
possession-rule overlay network
18
Analysis Itemsets Model

Items belong to topics. There are very many
topics but each peer can only select items from
a fixed set of topics. Topic popularities can
highly vary but each peer has equal interest in
each of its topics.
We show that
RAPIER is at least as good as Prand
RAPIER is better than Prand when peers have fewer
topics
Simple model that hints on what is going on

19
Experiments

Data used Client/Hostname matrix from proxy
logs as peer/item matrix. Each entry, in turn,
is treated as a search item.
Similarly-structured market basket data
Has rare items (which current P2P networks dont
support)
No universal model for market basket data
Cant get a full index for many peers from
current P2P networks and these networks dont
reflect well on rare items.
Metric ESS (Expected Search Size number of
peers probed till search is resolved). CDF of
fraction of searches that have ESS below x.

20
ESS Expected Search Size

ESS 1/(success probability in each probe) (when
probes are independent not true for GAS)
Probe success probability
Urand fraction of peers that have the item in
their index
Prand weight of each peer is its index size
divided by sum of index sizes of all peers.
Success prob (weight of peers with item) /
(weight of peers without item)
RAPIER the average, over possession rules peer
participates in, of fraction of peers in rule
that have the item.

21
Peer-Item Matrix - Experiment
Items
0 0 1 1 1 0 0 0 0 0
0 0 0 0 0 1 0 0 1 1
1 1 0 0 0 0 1 0 0 0
0 0 1 0 1 0 0 0 1 0
0 0 0 0 0 0 1 1 1 0
1 1 0 0 0 0 0 0 1 0
0 0 0 1 1 0 0 1 1 1
0 0 1 1 0 0 0 0 1 0
1 1 0 0 0 1 0 0 0 0
0 1 0 0 1 0 0 0 1 0
?
?
?
?
?
?
Peers
?
?
22
Urand and Prand
Items
0 0 1 1 1 0 0 0 0 0
0 0 0 0 0 1 0 0 1 1
1 1 0 0 0 0 1 0 0 0
0 0 1 0 1 0 0 0 1 0
0 0 0 0 0 0 1 1 1 0
1 1 0 0 0 0 0 0 1 0
0 0 0 1 1 0 0 1 1 1
0 0 1 1 0 0 0 0 1 0
1 1 0 0 0 1 0 0 0 0
0 1 0 0 1 0 0 0 1 0
Peers
?
23
RAPIER (Random Possession Rule)
Items
0 0 1 1 1 0 0 0 0 0
0 0 0 0 0 1 0 0 1 1
1 1 0 0 0 0 1 0 0 0
0 0 1 0 1 0 0 0 1 0
0 0 0 0 0 0 1 1 1 0
1 1 0 0 0 0 0 0 1 0
0 0 0 1 1 0 0 1 1 1
0 0 1 1 0 0 0 0 1 0
1 1 0 0 0 1 0 0 0 0
0 1 0 0 1 0 0 0 1 0
Peers
?
24
Caveat comparing apples and oranges

When searching by possession rules we have bias
towards peers that participate in more rules/
have more items.
But, with this bias, a strategy has better chance
of finding what it is looking for! So
We show that the likelihood of being probed is
proportional to number of rules you participate
in.
Prand blind search strategy has same bias.
Thus, it is fair to compare Prand search with
possession-rule based RAPIER

25
GAS Refining RAPIER

Ideas
Some rules are better than others (e.g.,
possession of a very popular item carries weaker
information)
Unsuccessful search carries information suppose
you lost something, you think you lost it at
home. You search home going through various
closets and drawers and dont find it, then you
may decide to go search the office, even if you
have not completed an exhaustive search at home.
What happened? The posterior distribution on the
items location had changed as a result of the
search.

26
All Items

Urand Blind search (Gnutella),
Prand Gnutella modified,
Rapier, GAS our algorithms

27
Rare Items present in 1 of peers
28
Rarer items 0.1 of peers
29
Even Rarer Item 0.01 of peers
30
GAS Greedy Strategy

Idea use the search strategy that would have
optimized your search on previous queries.
Caveat this is NP-Complete
Can do greedy approximation strategy GAS

GAS
initialize the query vector to a uniform
distribution on previous selections.
Iterate the following
Apply the possession rule that maximizes success
probability with respect to the query posterior
update the query posterior.

Theorem GAS is a constant factor approximation
of the optimal strategy
31
Building GAS strategies

GAS
Take a sample of items currently in your index
D,E,F,G.
search for these items in each possession rule
you participate A,B,C
obtain a matrix fraction of peers with item x in
rule(y)

Item Rule() D E F G
rule(A) 0.03 0.2
rule(B) 0.04 0.1
rule(C) 0.1 0.2 0.03
32
GAS strategy (example)
Item Rule() D E F G
rule(A) 0.03 0.2
rule(B) 0.04 0.1
rule(C) 0.1 0.2 0.03
C,C,C,A,C,C,A,C,A,C,B,B,A,C,B,B,C,A,B,B,C
GAS search of size 21 10 probes in rule(C)
6 probes in rule(B) 5 probes in rule(A)

RAPIER search of size 21 7 probes in
rule(C) 7 probes in rule(B) 7 probes in
rule(A)
33
Summary

We proposed a general framework for associative
P2P search exploit patterns inherent in human
selections to boost search. Adapted to the P2P
setting.
Search strategies and the overlay structure are
symbiotic and guided/boosted by previous
selections/queries.
Common good in par with own welfare All data
maintained by each peer has direct personal
benefit (like gnutella). Helping others helps
you
Possession rules
Strategies are approximations to standard
similarity metrics that work!!.
Easy to find other sources of desired item (for
alternative/parallel downloads)

34
Related work

IR-DM association rules/collaborative
filtering/Web search
P2P networks unstructured networks DHTs
DHTs have symbiotic overlay/search strategy
Caching at peers (Freenet) adapt overlay
according to search
Intersection
Crespo/Garcia-Molina 02 routing indexes
System isolates topicsmap queries/items to
topics.
Peer knows summary of what can be reached thru
it/each neighbor
Query keywords are used to select a neighbor who
is a best match
Differences from our approach
No connection between search and overlay topology
Uses only text/keywords. We use co-location
associations between items.
CG02 tradeoff between topic divergence (all
nodes ending up with similar index summary) or
restricted coverage (number of peers included in
each peer summary)
neurogrid.net (Sam Joseph, U. Tokyo) agent
text-based approach
Peers learn and remember content of other peers

35
Future

Integrate text matching (of query keywords) in
search strategy (use rule(O) if query keywords
match Os metadata)
Select which possession rules to participate in
(e.g., using item popularity heuristic or
GAS-like selection)
Search strategy gives more weight to more recent
selections (are more indicative of next query)
Explore other types of propagation rules
P2P communities ?
Integrate Recommendation Systems in P2P ?
Implementation

36
Thank You!
37
Some Extra Comments

Issues with straightforward importing of IR
techniques
Vector space approach
Similarity metrics
Why we need to use several propagation rules in a
search? (when searching according to examples
in the index)

38
Straight IR vector-space approach

Peers are mapped to vectors, according to their
index content. Queries are mapped to the vectors
in the same space.
Overlay topology is correlated with distances in
this vector space (bias towards closer peers)
Search propagation targets regions of the space
that are closest to the query.

neighborsO(dimension) - want small dimension
Yet, Matrix operations, e.g principal component
analysis (LSI), are hard in our distributed
setting
Yet, each peer should be able to compute the
mapping for its queries and/or index
Proximity metric alone is insufficient (Need
different propagation rules)

39
Why we need several propagation rules for the
same query decision-tree like search

propagation rule approx interest area
Each peer covers several interest areas, peers
have different sets of interest areas.
Peer Query 80 basketball 20polo
World Index 5 basketball 0.1 polo
All basketball lovers would be close matches
but need to direct search to more polo lovers
multi-rule search strategy basketball 200
peers polo 200 peers

Write a Comment

User Comments (0)