Semantic Small World: An Overlay Network for PeertoPeer Search

1 / 39
About This Presentation
Title:

Semantic Small World: An Overlay Network for PeertoPeer Search

Description:

The State-of-the-art Techniques. Flooding / random walk. high traffic ... Semantics/features of data objects is represented by k-element semantic vector (SV) ... –

Number of Views:47
Avg rating:3.0/5.0
Slides: 40
Provided by: mxl7
Category:

less

Transcript and Presenter's Notes

Title: Semantic Small World: An Overlay Network for PeertoPeer Search


1
Semantic Small World An Overlay Network for
Peer-to-Peer Search
  • Mei Li, Wang-Chien Lee, Anand Sivasubramaniam
  • The Pennsylvania State University
  • Oct. 2004 _at_ ICNP04

2
Roadmap
  • Introduction
  • Semantic Small World
  • Performance evaluation
  • Conclusion and future work

3
Peer-to-Peer (P2P) Systems
  • Different from client-server model
  • Each node can act as a server as well a client
  • A mandatory function
  • Search for resources available at peer nodes
  • Fundamental challenges
  • Large scale
  • Dynamic membership changes (peer
    join/leave/failure)

4
The State-of-the-art Techniques
  • Flooding / random walk
  • high traffic / long latency
  • Distributed hash tables (CAN/CHORD)
  • High search efficiency
  • Maintenance becomes complicated
  • Only support key-based search

5
Semantic Based Search
  • Existing techniques address scalability in terms
    of network size to certain extent
  • Like the WWW, voluminous data content is also an
    important issue in P2P!!!
  • Most of existing search technique is key based
  • ? Content/semantic based search is much more
    favorable than key-based search

6
Related Work - pSearch
  • P2P Search Engine

Rolling index Lower 6 dimensions are
partitioned into 3 groups. Each group is mapped
separately onto 2-dimensional CAN.
  • High search index publishing cost
  • Not adaptive to membership changes
  • Regular search space partition --- Not adaptive
    to data distribution

7
Our Goal
  • Build a semantic overlay from ground up
  • Efficient semantic/content based search
  • Adaptive to dynamic network and content changes
  • Adaptive to data distribution

8
Assumptions
  • Data objects
  • Documents, multimedia, etc.
  • Semantics/features of data objects is represented
    by k-element semantic vector (SV)
  • This representation is generic, not limited to
    specific data formats

9
Background Small World Network
  • Small average path length
  • Large cluster co-efficient
  • two neighbors of a node are also very likely to
    be neighbors themselves
  • Each node knows a number of
  • short range contacts, i.e., local neighbors
  • long range contacts with probability proportional
    to 1/distance

10
Our Approach
  • Position peers and data objects in the semantic
    space
  • Peers with similar data objects form into
    clusters
  • Peer clusters form into small-world overlay
  • ? Semantic Small World

11
Roadmap
  • Introduction
  • Semantic Small World
  • Performance evaluation
  • Conclusion and future work

12
Design Issues
  • Peer placement
  • Cluster formation
  • Space partition
  • Overlay formation
  • Dimension reduction

13
Roles of a Peer Node
  • Responsible for managing a semantic subspace
  • Storage
  • Manage locally stored data objects
  • Data objects not semantically mapped here is
    published somewhere else as foreign indexes
  • Index
  • Provide physical locations of data objects
    semantically mapped to its subspace
  • Overlay Network
  • Serve as a part of the overlay network

14
Peer Positioning
  • Cluster local data objects
  • Choose semantic centroid of the largest data
    cluster as the position of a peer in the semantic
    space
  • Advantage
  • Reduce index publishing cost when data objects
    stored at a peer is homogeneous
  • Adaptive to data distribution
  • Take advantage of query locality

15
Semantic Clustering
  • Peers with nearby centroids form into clusters
  • Cluster size is predefined (M)
  • Cluster is partitioned (or merged) when necessary
  • Partition strategy
  • Two nodes with farthest centroids are chosen as
    seeds for two subclusters
  • Other nodes are assigned to the subclusters based
    on shortest distance to the seeds
  • Subspace is partitioned at the middle point of
    the dimension that has longest span between the
    two centroids of the sub-clusters
  • Advantages
  • Improve tolerance to membership changes
    failures.
  • Achieve good load balance

16
Overlay Formation
  • Clusters are formed into small world network
  • Each peer maintains
  • Short range contacts to its neighboring clusters
  • One or several long range contacts to clusters at
    certain distance
  • long range contacts obey the distance
    distribution function p(d) 1/dk
  • d distance to the long range contacts
  • k dimensionality
  • Advantages
  • Good search performance
  • Low maintenance cost

17
Examples
18
Challenge of High Dimensionality
  • How to incorporate high dimensional semantics in
    the small world index structure?
  • Semantic vectors normally are high-dimensional
    vectors
  • e.g., LSI has 50-300 dimensions
  • Simply assigning short range contacts along each
    dimension makes maintenance costly and
    complicated
  • Solution dimension reduction
  • Adaptive space linearization (ASL)

19
ASL
0000
1111(15)
0101(5)
P4
P4
1011(11)
P4
ClusterID
20
SSW-1D
SSW-1D
21
Example Join
22
Search
  • Navigation stage
  • Resolve ClusterID of search semantic vector (SV)
    incrementally according to partition history
  • Greedily navigate on small world overlay based on
    ClusterID to reach target cluster
  • Flooding stage
  • Flood within target cluster
  • Return the most similar data object in the cluster

23
Example Peer 1 search for 0.9, 0.3
0000
d1,0.44
1000
0000
d2,0.37
d2,0.53
0100
0000
1000
1100
d1,0.25
d1,0.63
d1,0.19
d1,0.75
PCN 8
0000
0110
0100
1000
1100
0010
1010
1110
d2,0.75
d2,0.19
d2,0.75
1110
1011
1010
0101
0100
1111
requestor
target
PCN 11
24
Roadmap
  • Introduction
  • Semantic Small World
  • Performance evaluation
  • Conclusion and future work

25
Metrics
  • Search efficiency
  • Search path length
  • Search traffic (msgs)
  • Adaptivity to membership changes
  • Overlay maintenance cost
  • Index publishing cost
  • Resilient to failure
  • Search failure ratio
  • Load balance
  • Index load
  • Routing load
  • Result quality
  • (1- (dissim_real-dissim_ideal)

26
Simulation setup
  • A random mixture of join/leave/query operations
    is injected to the network of certain size
  • Compare with
  • pSearch ( CAN rolling index)
  • SWRI ( small world rolling index )

27
Parameter settings
28
Scalability Search Efficiency
29
Scalability Maintenance Cost
30
Scalability Publishing Cost
31
Clustering effect
32
Data locality
pSearch is in the range of 4000
Choosing the semantic centroid as the position
for a peer can take advantage of data locality
very effectively
33
Query locality
Without update With update
Updating long range contacts according to query
history can take advantage of query locality
effectively
34
Tolerance to peer failure
Forming clusters and relative randomness in the
overlay Improve network stability greatly
35
Load balance
36
Result quality
37
Roadmap
  • Introduction
  • Semantic Small World
  • Performance evaluation
  • Conclusion and future work

38
Summary
  • We propose a semantic overlay network based on
    small world, called semantic small world (SSW),
    for semantic based search in P2P systems
  • SSW considered various heuristics for semantic
    clustering and adopted an effective dimension
    reduction method to address the high
    dimensionality issue.
  • Experiment results demonstrate SSWs strength.

39
Work-in-Progress and Future Work
  • Conducting an in-depth evaluation using real data
  • Exploring strategies for different types of
    queries in SSW
  • Exploring strategies to utilize resource
    heterogeneity
  • Investigating locality of interest in multiple
    queries
Write a Comment
User Comments (0)
About PowerShow.com