Title: Optimizing Web Search Using Social Annotations
1Optimizing Web Search Using Social Annotations
- Shenghua Bao, Xiaoyuan Wu, Guirong Xue, Yong Yu
- Shanghai JiaoTong University
- Ben Fei, Zhong Su
- IBM China Research Lab
- WWW 2007
2Introduction (1/3)
- Two general aspects on improving web search
- Ordering the web pages according to the
query-document similarity - Ex Anchor text generation, search log mining
etc - Ordering the web pages according to their
qualities - Static ranking
- Ex PageRank, HITS etc
3Introduction (2/3)
- Social annotation service ( social bookmarking)
- Developed for web users to organize and share
their favorite web pages online by social
annotations - Emergent useful information that has been
explored for folksonomy, visualization, semantic
web, etc - Delicious
4Introduction (3/3)
- Utilizing social annotations for better web
search from the two aspects - Similarity ranking
- The annotations provided by web users are usually
good summaries (new metadata) of the
corresponding web pages - The annotation data may be sparse and incomplete
- SocialSimRank (SSR) algorithm
- Static ranking
- The amount of annotations assigned to a page
indicates its popularity and implies its quality
in some sense - Different annotation may have different weights
in indicating the popularity of web pages - SocialPageRank (SPR) algorithm
5Search with Social Annotation
- Web page annotators provide cleaner data for
users browsing - Similar or closely related annotations are
usually given to the same web pages
6Similarity Ranking between the Query and Social
Annotations
- Term-Matching based similarity ranking
- suffers from the synonymy problem
- qq1,q2,, qn, A(p)a1,a2,, am
- Social Similarity Ranking (SSR)
- Observation 1 Similar (semantically-related)
annotations are usually assigned to similar
(semantically-related) web pages by users with
common interests. In the social annotation
environment, the similarity among annotations in
various forms can further be identified by the
common web pages they annotated.
7Illustration of SocialSimRank
- A(a)ubuntu, A(b)linux,ubuntu,
A(c)gnome,linux,ubuntu - P(ubuntu)a,b,c, P(linux)b,c, P(gnome)c
- MAP(ubuntu, a)1, MAP(linux, b)1, MAP(gnome, c)2
8(No Transcript)
9Page Quality Estimation Using Social Annotations
- Observation 2 High quality web pages are usually
popularly annotated. Popular web pages,
up-to-date web users and hot social annotations
usually have the following relations 1) popular
web pages are bookmarked by many up-to-date users
and annotated by hot annotations 2) up-to-date
users like to bookmark popular pages and use hot
annotations 3) hot annotations are used to
annotate popular web pages and used by up-to-date
users. - Notations
- MPU NP NU association matrix between pages and
users - MUA NU NA association matrix between users and
annotations - MAP NA NP association matrix between
annotations and pages - P0 vector of randomly initialized SocialPageRank
scores
10SocialPageRank Algorithm
11Illustration of Quality Transition in the SPR
Algorithm
12Dynamic Ranking with Social Information
- Dynamic ranking method
- RankSVM
- Features
13Experiment Data (1/2)
- Delicious data
- 1,736,268 web pages and 269.566 annotations are
crawled from Delicious during May, 2006. - Split compound annotations into standard words
with the help of WordNet - ex java.programming ? java, programming
14Experiment Data (2/2)
- Test set for dynamic ranking with social
annotation - Manual query set (MQ)
- 50 queries and their corresponding ground truths
in Delicious data manually created by CS students - Pooling judge the top 100 documents returned by
Lucene - Automatic query set (AQ) from Open Directory
Project (ODP) - Merging Delicious data with ODP and discarding
ODP categories that contain no Delicious URLs - Randomly sample 3000 ODP categories and extract
the category paths as the query set and the
corresponding web pages - ex extract path TOP/Computer/Software/Graphics
- as Computer Software Graphics
- 5-fold cross validation for each query set
15Evaluation of Annotation Similarities
Table. Explored similar annotations based on
SocialSimRank
16PageRank vs. Average Count
17SPR vs. PageRank
- SPR is normalized into a scale of 0-10 so that
SPR and PageRank have the same number of pages in
each grade from 0 to 10 - The pages with each PageRank value diversify a
lot on the number of annotations and users - SPR successfully characterizes the web pages
popularity degrees among web annotators
18Results of Dynamic Ranking (1/2)
- Table. Comparison of MAP between similarity
features
19Results of Dynamic Ranking (2/2)
- Figure. NDCG at K for comparison of baseline,
baselineTM, baselineSSR, baselinePR, and
baselineSPR on query set AQ
20Discussions
- There are still several problems to further
address - Annotation Coverage
- The user submitted queries may not match any
social annotation - Many web pages may have no annotations 1) newly
emerging web pages 2) key-page-associated web
pages while users tend to annotate key pages
only 3) uninteresting web pages. - Annotation Ambiguity
- SSR may find the similar terms to the query terms
while fail to disambiguate terms that have more
than one meanings - Annotation Spamming
- As social annotation becomes more and more
popular, the amount of spam could drastically
increase in the near future
21Conclusion
- The problem of integrating social annotations
into web search is studied. - We observed that social annotations could benefit
web search in both similarity ranking and static
ranking. - The experimental results showed that SSR can
successfully find the latent semantic relations
among annotations and SPR can provide the static
ranking from the web annotators perspective. - In the future, we would optimize the proposed
algorithms and explore more sophisticated social
features.