Title: Semantic Small World: An Overlay Network for PeertoPeer Search
1Semantic Small World An Overlay Network for
Peer-to-Peer Search
- Mei Li, Wang-Chien Lee, Anand Sivasubramaniam
- The Pennsylvania State University
- Oct. 2004 _at_ ICNP04
-
2Roadmap
- Introduction
- Semantic Small World
- Performance evaluation
- Conclusion and future work
3Peer-to-Peer (P2P) Systems
- Different from client-server model
- Each node can act as a server as well a client
- A mandatory function
- Search for resources available at peer nodes
- Fundamental challenges
- Large scale
- Dynamic membership changes (peer
join/leave/failure)
4The State-of-the-art Techniques
- Flooding / random walk
- high traffic / long latency
- Distributed hash tables (CAN/CHORD)
- High search efficiency
- Maintenance becomes complicated
- Only support key-based search
5Semantic Based Search
- Existing techniques address scalability in terms
of network size to certain extent - Like the WWW, voluminous data content is also an
important issue in P2P!!! - Most of existing search technique is key based
- ? Content/semantic based search is much more
favorable than key-based search
6Related Work - pSearch
Rolling index Lower 6 dimensions are
partitioned into 3 groups. Each group is mapped
separately onto 2-dimensional CAN.
- High search index publishing cost
- Not adaptive to membership changes
- Regular search space partition --- Not adaptive
to data distribution
7Our Goal
- Build a semantic overlay from ground up
- Efficient semantic/content based search
- Adaptive to dynamic network and content changes
- Adaptive to data distribution
8Assumptions
- Data objects
- Documents, multimedia, etc.
- Semantics/features of data objects is represented
by k-element semantic vector (SV) - This representation is generic, not limited to
specific data formats
9Background Small World Network
- Small average path length
- Large cluster co-efficient
- two neighbors of a node are also very likely to
be neighbors themselves - Each node knows a number of
- short range contacts, i.e., local neighbors
- long range contacts with probability proportional
to 1/distance
10Our Approach
- Position peers and data objects in the semantic
space - Peers with similar data objects form into
clusters - Peer clusters form into small-world overlay
- ? Semantic Small World
11Roadmap
- Introduction
- Semantic Small World
- Performance evaluation
- Conclusion and future work
12Design Issues
- Peer placement
- Cluster formation
- Space partition
- Overlay formation
- Dimension reduction
13Roles of a Peer Node
- Responsible for managing a semantic subspace
- Storage
- Manage locally stored data objects
- Data objects not semantically mapped here is
published somewhere else as foreign indexes - Index
- Provide physical locations of data objects
semantically mapped to its subspace - Overlay Network
- Serve as a part of the overlay network
14Peer Positioning
- Cluster local data objects
- Choose semantic centroid of the largest data
cluster as the position of a peer in the semantic
space - Advantage
- Reduce index publishing cost when data objects
stored at a peer is homogeneous - Adaptive to data distribution
- Take advantage of query locality
-
15Semantic Clustering
- Peers with nearby centroids form into clusters
- Cluster size is predefined (M)
- Cluster is partitioned (or merged) when necessary
- Partition strategy
- Two nodes with farthest centroids are chosen as
seeds for two subclusters - Other nodes are assigned to the subclusters based
on shortest distance to the seeds - Subspace is partitioned at the middle point of
the dimension that has longest span between the
two centroids of the sub-clusters - Advantages
- Improve tolerance to membership changes
failures. - Achieve good load balance
16Overlay Formation
- Clusters are formed into small world network
- Each peer maintains
- Short range contacts to its neighboring clusters
- One or several long range contacts to clusters at
certain distance - long range contacts obey the distance
distribution function p(d) 1/dk - d distance to the long range contacts
- k dimensionality
- Advantages
- Good search performance
- Low maintenance cost
17Examples
18Challenge of High Dimensionality
- How to incorporate high dimensional semantics in
the small world index structure? - Semantic vectors normally are high-dimensional
vectors - e.g., LSI has 50-300 dimensions
- Simply assigning short range contacts along each
dimension makes maintenance costly and
complicated - Solution dimension reduction
- Adaptive space linearization (ASL)
-
-
19ASL
0000
1111(15)
0101(5)
P4
P4
1011(11)
P4
ClusterID
20SSW-1D
SSW-1D
21Example Join
22Search
- Navigation stage
- Resolve ClusterID of search semantic vector (SV)
incrementally according to partition history - Greedily navigate on small world overlay based on
ClusterID to reach target cluster - Flooding stage
- Flood within target cluster
- Return the most similar data object in the cluster
23Example Peer 1 search for 0.9, 0.3
0000
d1,0.44
1000
0000
d2,0.37
d2,0.53
0100
0000
1000
1100
d1,0.25
d1,0.63
d1,0.19
d1,0.75
PCN 8
0000
0110
0100
1000
1100
0010
1010
1110
d2,0.75
d2,0.19
d2,0.75
1110
1011
1010
0101
0100
1111
requestor
target
PCN 11
24Roadmap
- Introduction
- Semantic Small World
- Performance evaluation
- Conclusion and future work
25Metrics
- Search efficiency
- Search path length
- Search traffic (msgs)
- Adaptivity to membership changes
- Overlay maintenance cost
- Index publishing cost
- Resilient to failure
- Search failure ratio
- Load balance
- Index load
- Routing load
- Result quality
- (1- (dissim_real-dissim_ideal)
26Simulation setup
- A random mixture of join/leave/query operations
is injected to the network of certain size - Compare with
- pSearch ( CAN rolling index)
- SWRI ( small world rolling index )
27Parameter settings
28Scalability Search Efficiency
29Scalability Maintenance Cost
30Scalability Publishing Cost
31Clustering effect
32Data locality
pSearch is in the range of 4000
Choosing the semantic centroid as the position
for a peer can take advantage of data locality
very effectively
33Query locality
Without update With update
Updating long range contacts according to query
history can take advantage of query locality
effectively
34Tolerance to peer failure
Forming clusters and relative randomness in the
overlay Improve network stability greatly
35Load balance
36Result quality
37Roadmap
- Introduction
- Semantic Small World
- Performance evaluation
- Conclusion and future work
38Summary
- We propose a semantic overlay network based on
small world, called semantic small world (SSW),
for semantic based search in P2P systems - SSW considered various heuristics for semantic
clustering and adopted an effective dimension
reduction method to address the high
dimensionality issue. - Experiment results demonstrate SSWs strength.
39Work-in-Progress and Future Work
- Conducting an in-depth evaluation using real data
- Exploring strategies for different types of
queries in SSW - Exploring strategies to utilize resource
heterogeneity - Investigating locality of interest in multiple
queries