Title: Seminar Series Social Information Systems
1Seminar SeriesSocial Information Systems
Manos Papagelis Department of Computer Science,
University of Toronto papaggel_at_cs.toronto.edu
2Presentation Outline
- Part I Exploiting Social Networks for Internet
Search - Part II An Experimental Study of the Coloring
Problem on Human Subject Networks
3Exploiting Social Networks for Internet Search
Alan Mislove, Krishna Gummadi, and Peter
Druschel, HotNets 2006
4Introduction
- Social Networking (SN)
- A new form of publishing and locating
information - Objective
- To understand whether these social links can be
exploited by search engines to provide better
results - Contributions
- Comparison of the mechanisms in Web and online SN
for - Publishing Mechanisms to make information
available to users - Locating Mechanisms to find information
- Results from an experiment in social
network-based Web Search - Challenges and opportunities in using Social
Networks for Internet Search
5Web vs. SN (1/2)
- Web
- Publishing By placing documents on a Web Server
(and then search for incoming links) - Locating Via Search engines (Exploiting the link
graph) - Pros
- Very Effective (incoming links are good
indicators of importance) - Limitations
- No fresh data
- No personalized results
- Unlinked pages are not indexed
6Web vs. SN (2/2)
- Social Networks
- Publishing No explicit links between content
(photos, videos, blogs) but implicit links
between content through explicit links between
users. - Locating
- Navigation through the social network and
browsing users content - Keyword based search for textual or tagged
content - Through "Top-10" lists
- Pros
- Helps a user find timely, relevant information by
browsing adjacent regions of the network of users
with similar interests - Content is rated rapidly (by comments and
feedback of a community)
7Integration of Web Search and SN
- Web and SN information is disjoint
- No unified search tool that locates information
across different systems
8PeerSpective SN-based Web Search
- Technology
- Lucene text search engine and FreePastry P2P
Overlay - Lightweight HTTP Proxy transparently indexes all
visited URLs of user
9Searching Process
- A query is submitted by a user to Google
- The proxy transparently forwards the query to
both Google and the Proxies of Users in the
network - Each proxy executes the query on the local index
- Results are then collated and presented alongside
Google results - Peerspective Ranking
- Lucene Sc. Pagerank Scores from users who
previously viewed the result
10Search Results Example
11Experiments
- 10 grad. students share downloaded or viewed Web
content - One month long experiments
- 200.000 Distinct URLs
- 25 were of type text/html or application/pdf (so
the can be indexed) - Reports On
- Limits of hyperlink-based search
- Benefits of SN-based Search
12Limits of hyperlink-based search
- Report on fraction of visited URLs that are not
indexed by Google - Too new page (blogs)
- Deep Web
- Dark Web (no links)
- Results
- About 1/3 of requests cannot be retrieved by
Google - Peerspectives indices covers 30 of the
requested URLs - 13.3 of URLs were contained in PeerSpective but
not in Google's index
13Random samples of URLs not in Google and
Potential Reason
14Benefits of SN-based Search
- Experiments on clicks on results on first page
- For 1730 queries (1079 resulted in clicks)
- Results
- 86.5 of the clicked results were returned only
by Google - 5.7 of the clicked results were returned by both
- 7.7 of the clicked results were returned only by
PeerSpective - Conclusions
- This 7.7 is considered to be the gold standard
of web search engineering - Inherent advantage of using social links in web
search
15Reasons for Clicks on Peerspective
- Disambiguation
- Community tend to share definitions or
interpretation of popular terms (bus) - Ranking
- SN information can bias the ranking algorithms
to the interests of users (CoolStreaming) - Serendipity
- Ample opportunity of finding interesting things
without searching
16Example of URLs found in Peerspective
17Opportunities and Challenges
- Privacy
- Willingness of users to disclose information
- Need for mechanisms to control information flow
and anonymity - Membership and Clustering of SN
- Users may participate in many networks
- Need for searching with respect to the different
clusters - Content rating and ranking
- New approaches to ranking search results
- System Architecture centralized or Distributed?
18An Experimental Study of the Coloring Problem on
Human Subject NetworksMichael Kearns, Siddharth
Suri, Nick Montfort, SCIENCE, (313), Aug 2006
19Experimental Study on Human Subject Networks
- Theoretical work suggests that structural
properties of naturally occurring networks are
important in shaping behavior and dynamics - E.g. Hubs in networks are important in routing
information - Empirical Structural Properties established by
many disciplines - Small Diameter (the six degrees of separation)
- Local clustering of connectivity
- Heavy-tailed distribution of connectivity
(Power-law distributions) - Empirical Studies of Networks
- Limitation Networks are fixed and given (no
alternatives) - Other approach Controlled laboratory study
20Experiment
- Experimental Scenario
- Distributed problem-solving from local
information - Experimental Setting
- 38 human subjects (network vertices)
- Each subject controls the color of a vertex in a
network - Networks simple and more complex
- Goal Select a different color from that of all
neighbors - Problem Coloring problem
- Information Available Variable (Low, Medium,
High)
21Graph Coloring Problem
- Graph coloring
- An assignment of "colors" to certain objects in
a graph such that no two adjacent objects are
assigned the same color - Graph Coloring Problem
- Find the minimum number of colors for an
arbitrary graph (NP-hard) - Chromatic number
- The least number of colors needed to color the
graph
- Example
- Vertex coloring
- A 3-coloring suits this graph but fewer colors
would result in adjacent vertices of the same
color
22Network Topologies
20-Chord Cycle
Simple Cycle
5-Chord Cycle
Pref. Att. v3
Leader Cycle
Pref. Att. v2
23Information View
Low (Color of each Neighbor)
All (All network)
Medium (of Links of each Neighbor)
3
6
3
YOU
YOU
YOU
10
7
Overall Progress
Overall Progress
Overall Progress
24Graph Properties and Experimental Results
251 Collective Performance
- Subjects could indeed solve the coloring problem
across a wide range of networks - 31/38 experiments ended in solution in less that
300 seconds - 82 sec mean completion time
- Collective Performance affected by network
structure - Preferential Attachment harder than Cycle-based
networks - Cycle-based networks
- Monotonic relationship between solution time and
average network distance (smaller distance
leading to shorter solution times) - Addition of random chords Systematically reduces
solution time
262 Human Performance VS Artificial Distributed
Heuristics
- Heuristic considered
- A vertex is randomly selected
- If there are unused colors in the neighbor of
this vertex then a color is selected randomly
from the available ones - If there are not unused then a color is selected
randomly - Comparison measure
- Number of vertex color changes
- Findings
- Results exactly reversed lower average distance
increases the difficulty for the heuristic - Preferential attachment networks easier for the
heuristic
273 Effects on Varying the Locality of Information
View
- Variable locality information provided to
subjects - Low Their own and neighboring colors are visible
- Medium Their own and neighboring colors are
visible but providing information on connectivity
of neighbors - High global coloring state at all times
- Findings
- Increased amount of information
- Reduces solution times for cycle-based networks
- Decreases solution times for preferential
attachment networks - Rapid convergence to one of the two solutions in
cycle-based networks
28Information View Effect 1 Pref. Att. VS
Cycle-based Networks
29Information View Effect 2 Cycle-based Solution
Convergence
Low Information View
High Information View
Population oscillates between approaches to the
two solutions
Rapid convergence to one of the Two possible
solutions
30Individual Strategies
- Choosing colors that result in the fewest local
conflicts - Attempt to avoid conflicts with highly connected
subjects - Signaling behavior of subjects
- Introducing conflicts to avoid local minima
31Questions?
32Thanks!