Title: A Study on Organizing Web Search Results
1A Study on Organizing Web Search Results
- Student Shawn, Ching-Hsiang Tsai
- Advisor Dr. Lee-Feng Chien
- WKD Lab
- Department of Information Management, NTU
- 2005/06/27
2Outline
- Motivation
- Related Work
- The Proposed Approach
- Topic Finding
- Search Result Organizing by User-defined Topic
Classes - Experiments
- Conclusions
- Demo!
3Motivation
- Web search results are often lack of well
organization which require users to pay attention
on examining the retrieved pages and to identify
the relevant ones.
4Search Result Examples
5Existing Clustering Engines
- Some clustering engines, e.g., Vivisimo, try to
organize search results into clusters - Problems exist, e.g., comprehension of clustered
results, clustering complexity, etc.
6Search Result Snippet
Title
Short description
Snippet
Link
7Goal
- To develop a new approach that can
- Provide a more comprehensive overview on
important topics of the search result - Present the result with the manner the user
prefers - Facilitate users quick browsing and formulate
more effective searches
8LiveMotif
- A system that realizes the proposed approach
9Outline
- Motivation
- Related Work
- The Proposed Approach
- Topic Finding and Clustering
- Search Result Organizing by User-defined Topic
Classes - Experiments
- Conclusions
- Demo!
10Related Work on Search Result Clustering (SRC)
- Term-based clustering
- Document clustering
- Weak in comprehension
- Scatter/Gather (Hearst, SIGIR96)
- Term clustering
- STC (Zamir, WWW99)
- DisCover (Kummamuru, WWW04)
- Salient phrases ranking (Zeng, SIGIR04)
- Link-based clustering
- Co-citation and Companion
- Contents-Link (Wang, CIKM02)
11Previous Research Result
- Trends
- Document clustering ? Term clustering
- Finding topics to form document clusters
- Why conventional clustering approaches dont
work? - Snippets are short
- Clustering algorithms may produce some noises
- Determination on thresholds of group number is
hard
12User Preferences
- Previous search result clustering is lack of
explanation - In this paper, we take users preferences into
account - The proposed approach allows to organize search
result with the topic classes defined by users
13Outline
- Motivation
- Related Work
- The Proposed Approach
- Topic Finding
- Search Result Organizing by User-defined Topic
Classes - Experiments
- Conclusions
- Demo!
14Problems to Solve
- Find meaningful topics
- Organize search result with user-defined topic
classes
15The Proposed Approach
- Phase I Topic Finding
- Topic extraction
- Topic selection
- Topic set formation
- Phase II Search Result Organizing
- Classifier training
- Topic classification
16Outline
- Motivation
- Related Work
- The Proposed Approach
- Topic Finding
- Search Result Organizing by User-defined Topic
Classes - Experiments
- Conclusions
- Demo!
17Topic Finding
- Topic extraction
- Source
- Title and short description of snippet
- Nouns
- Chinese CKIP segment system
- English POS tag, n-grams (nlt3)
- Topic selection
- Ranking
- Remove low-ranked topics
- Remove redundant topics, merge similar topics
18Topic Finding (contd)
- Topic set formation
- Two criteria
- Snippet coverage
- Topic compactness
19Outline
- Motivation
- Related Work
- The Proposed Approaches
- Topics Finding
- Search Result Organizing by User-defined Topic
Classes - Experiments
- Conclusions
- Demo!
20User-defined Topic Classes
- User uses topic classes to describe his/her
preferences - Query National Taiwan University
- User preference 1
- Professor
- Department
- Project
- User preference 2
- Student
- Scholarship
- Program
21Example Challenges
User Preference 1
User Preference 2
22Search Result Organizing
- Adopt classification to label topic terms by
user-defined topic classes such that search
result can be organized by users preferences - kNN
- Adopt vector space model to describe the features
for both topics and topic classes - Term weighting TFIDF
- Similarity cosine angle
23Search Result Organizing (contd)
- Classifier training (Huang, WWW04)
- Get general concept for each topic class
- Get Nmax snippets (class objects) for each topic
class - Reformulate the general concepts to specific
concepts
24Search Result Organizing (contd)
25Search Result Organizing (contd)
- Label topic term with the most relevant topic
class
Topic Class 1
Topic Class 2
Topic Class 3
sim
r
26Outline
- Motivation
- Related Work
- The Proposed Approach
- Topics Finding
- Search Result Organizing by User-defined Topic
Classes - Experiments
- Conclusions
- Demo!
27Experiments
- Clustering is really difficult to be evaluated!
- Exp I Topic Finding
- 1-1 Performance
- 1-2 Overlap
- 1-3 Coverage of top k topics
- Exp II Search Result Organizing
- 2-1 Performance of topic classification
- 2-2 The effect of indirect relevance on the
performance - 2-3 Purity of each topic class
- 2-4 The effect of indirect relevance on the
entropy - 2-5 The effect of indirect relevance on the
entropy and performance
28Exp I Set Up
- Query
- Selected from AltaVista query log
- Ambiguous
- apple, jaguar, saturn, java
- Name entity
- iraq, harry potter, honda, dell, disney, academia
sinica,w3c, ibm - General
- sports, jokes, resume, maps, news, wallpapers,
photos, radio, business, science, movies, bank - Specific
- mp3, yoga
- Topic
- 200 search result snippets per each query
(Google) - Average 108 topics (76140)
29Exp I Set Up (contd)
- Manually label standard answers
- 3 persons
- Very relevant 2 points
- Relevant 1 point
- Other 0 point
- Total points ? 2 ? RELEVANT
Person 1 Person 2 Person 3 Total
Topic A 1 1 0 2 RELEVANT
Topic B 2 0 0 2 RELEVANT
Topic B 0 0 1 1
30Exp I Set Up (contd)
- Topic selection methods
- TFIDF
- VTFIDF
- LCA
31Exp1-1 Performance
32Exp 1-2 Overlap
33Exp 1-3 Coverage
34Exp2 Set Up
- Query
- China, United States, Japan, Germany, India,
Singapore, Malaysia, Taipei, California, Beijing
and Kyoto - Topic classes
- government
- travel
- business
- sports
- culture
- school
35Exp 2 Set Up (contd)
- Class-Topic pair
- Modified precision and recall
36Exp 2-1 Performance of topic classification
37Exp 2-1 Performance of topic classification (
contd)
38Exp 2-2 The effect of indirect relevance on
the performance
Totally direct
Totally indirect
39Exp 2-2 The effect of indirect relevance on
the performance (contd)
40Exp 2-3 Purity of each topic class
41Exp 2-4 The effect of indirect relevance on
the entropy
42Exp 2-5 The effect of the indrect relevance on
the purity and performance
43User Study
Question Evaluated Criteria
1a Topics help understanding the overview of the search result.
1b Topics help quick browsing interested search result snippets.
1c Good representativeness and distinctness among topics.
2a User-defined topic classes help browsing search result snippets of different topic classes.
2b User-defined topic classes help understand unknown topics.
2c User-defined topic classes help understand the relationship among topics.
3 Generally speaking, LiveMotif can provide more help than Vivisimo
44User Study Result
1a 1b 1c 2a 2b 2c 3
Agree (106) 33 31 26 33 32 24 26
Neutral (5) 0 3 6 2 3 9 4
Disagree (40) 2 2 1 0 0 2 4
45Outline
- Motivation
- Related Work
- The Proposed Approach
- Topics Finding
- Search Result Organizing by User-defined Topic
Classes - Experiments
- Conclusions
- Demo!
46Conclusions
- Summary
- We provide a new approach for search result
organizing - Topic finding
- User-defined topic classes
- Contributions
- Comprehensive overview on important topics of the
search result - Organize the result with the manner the user
prefers - Future Work
- Automatically suggest topic classes
- More analysis and comparison, e.g. type of topic
- The impact of user-defined classification on
different types of query - Apply to other non-Web document retrieval
application
47Outline
- Motivation
- Related Work
- The Proposed Approach
- Topics Finding
- Search Result Organizing by User-defined Topic
Classes - Experiments
- Conclusions
- Demo!
48Demo!
- LiveMotif (http//livemotif.wkdlab.net/)
49Thank You