Optimizing Web Search Using Social Annotations - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Optimizing Web Search Using Social Annotations

Description:

Ordering the web pages according to the query-document similarity ... and annotated by hot annotations; 2) up-to-date users like to bookmark popular ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 22
Provided by: nlgCsie
Category:

less

Transcript and Presenter's Notes

Title: Optimizing Web Search Using Social Annotations


1
Optimizing Web Search Using Social Annotations
  • Shenghua Bao, Xiaoyuan Wu, Guirong Xue, Yong Yu
  • Shanghai JiaoTong University
  • Ben Fei, Zhong Su
  • IBM China Research Lab
  • WWW 2007

2
Introduction (1/3)
  • Two general aspects on improving web search
  • Ordering the web pages according to the
    query-document similarity
  • Ex Anchor text generation, search log mining
    etc
  • Ordering the web pages according to their
    qualities
  • Static ranking
  • Ex PageRank, HITS etc

3
Introduction (2/3)
  • Social annotation service ( social bookmarking)
  • Developed for web users to organize and share
    their favorite web pages online by social
    annotations
  • Emergent useful information that has been
    explored for folksonomy, visualization, semantic
    web, etc
  • Delicious

4
Introduction (3/3)
  • Utilizing social annotations for better web
    search from the two aspects
  • Similarity ranking
  • The annotations provided by web users are usually
    good summaries (new metadata) of the
    corresponding web pages
  • The annotation data may be sparse and incomplete
  • SocialSimRank (SSR) algorithm
  • Static ranking
  • The amount of annotations assigned to a page
    indicates its popularity and implies its quality
    in some sense
  • Different annotation may have different weights
    in indicating the popularity of web pages
  • SocialPageRank (SPR) algorithm

5
Search with Social Annotation
  • Web page annotators provide cleaner data for
    users browsing
  • Similar or closely related annotations are
    usually given to the same web pages

6
Similarity Ranking between the Query and Social
Annotations
  • Term-Matching based similarity ranking
  • suffers from the synonymy problem
  • qq1,q2,, qn, A(p)a1,a2,, am
  • Social Similarity Ranking (SSR)
  • Observation 1 Similar (semantically-related)
    annotations are usually assigned to similar
    (semantically-related) web pages by users with
    common interests. In the social annotation
    environment, the similarity among annotations in
    various forms can further be identified by the
    common web pages they annotated.

7
Illustration of SocialSimRank
  • A(a)ubuntu, A(b)linux,ubuntu,
    A(c)gnome,linux,ubuntu
  • P(ubuntu)a,b,c, P(linux)b,c, P(gnome)c
  • MAP(ubuntu, a)1, MAP(linux, b)1, MAP(gnome, c)2

8
(No Transcript)
9
Page Quality Estimation Using Social Annotations
  • Observation 2 High quality web pages are usually
    popularly annotated. Popular web pages,
    up-to-date web users and hot social annotations
    usually have the following relations 1) popular
    web pages are bookmarked by many up-to-date users
    and annotated by hot annotations 2) up-to-date
    users like to bookmark popular pages and use hot
    annotations 3) hot annotations are used to
    annotate popular web pages and used by up-to-date
    users.
  • Notations
  • MPU NP NU association matrix between pages and
    users
  • MUA NU NA association matrix between users and
    annotations
  • MAP NA NP association matrix between
    annotations and pages
  • P0 vector of randomly initialized SocialPageRank
    scores

10
SocialPageRank Algorithm
11
Illustration of Quality Transition in the SPR
Algorithm
12
Dynamic Ranking with Social Information
  • Dynamic ranking method
  • RankSVM
  • Features

13
Experiment Data (1/2)
  • Delicious data
  • 1,736,268 web pages and 269.566 annotations are
    crawled from Delicious during May, 2006.
  • Split compound annotations into standard words
    with the help of WordNet
  • ex java.programming ? java, programming

14
Experiment Data (2/2)
  • Test set for dynamic ranking with social
    annotation
  • Manual query set (MQ)
  • 50 queries and their corresponding ground truths
    in Delicious data manually created by CS students
  • Pooling judge the top 100 documents returned by
    Lucene
  • Automatic query set (AQ) from Open Directory
    Project (ODP)
  • Merging Delicious data with ODP and discarding
    ODP categories that contain no Delicious URLs
  • Randomly sample 3000 ODP categories and extract
    the category paths as the query set and the
    corresponding web pages
  • ex extract path TOP/Computer/Software/Graphics
  • as Computer Software Graphics
  • 5-fold cross validation for each query set

15
Evaluation of Annotation Similarities
Table. Explored similar annotations based on
SocialSimRank
16
PageRank vs. Average Count
17
SPR vs. PageRank
  • SPR is normalized into a scale of 0-10 so that
    SPR and PageRank have the same number of pages in
    each grade from 0 to 10
  • The pages with each PageRank value diversify a
    lot on the number of annotations and users
  • SPR successfully characterizes the web pages
    popularity degrees among web annotators

18
Results of Dynamic Ranking (1/2)
  • Table. Comparison of MAP between similarity
    features

19
Results of Dynamic Ranking (2/2)
  • Figure. NDCG at K for comparison of baseline,
    baselineTM, baselineSSR, baselinePR, and
    baselineSPR on query set AQ

20
Discussions
  • There are still several problems to further
    address
  • Annotation Coverage
  • The user submitted queries may not match any
    social annotation
  • Many web pages may have no annotations 1) newly
    emerging web pages 2) key-page-associated web
    pages while users tend to annotate key pages
    only 3) uninteresting web pages.
  • Annotation Ambiguity
  • SSR may find the similar terms to the query terms
    while fail to disambiguate terms that have more
    than one meanings
  • Annotation Spamming
  • As social annotation becomes more and more
    popular, the amount of spam could drastically
    increase in the near future

21
Conclusion
  • The problem of integrating social annotations
    into web search is studied.
  • We observed that social annotations could benefit
    web search in both similarity ranking and static
    ranking.
  • The experimental results showed that SSR can
    successfully find the latent semantic relations
    among annotations and SPR can provide the static
    ranking from the web annotators perspective.
  • In the future, we would optimize the proposed
    algorithms and explore more sophisticated social
    features.
Write a Comment
User Comments (0)
About PowerShow.com