Title: Organizing Search Results
1Organizing Search Results
- Susan Dumais
- Microsoft Research
2Organizing Search Results
- Algorithms and interfaces that improve the
effectiveness of search - Beyond ranked lists
- Main goal to support search
- Also information analysis and discovery
- Example applications
- SWISH, results classification
- GridViz, results summarization
- SIS, personal landmarks for context
3Searching with Information Structured
Hierarchically (SWISH)
- Collaborators
- Edward Cutrell, Hao Chen (Berkeley)
- Key Themes
- Going beyond long lists of results
- Classification algorithms
- UI techniques
- More about it
- http//research.microsoft.com /sdumais
4Organizing Search Results
Query jaguar
5Web Directory
- LookSmart Directory Structure
- 400k pages 17k categories 7 levels
- 13 top-level categories 150 second-level
categories - Top-level Categories
Automotive Business Finance Computers
Internet Entertainment Media Health
Fitness Hobbies Interests Home Family People
Chat Reference Education Shopping
Services Society Politics Sports
Recreation Travel Vacations
6SWISH System
- Combines the advantages of
- Directories - Manually crafted structure but
small - Search engines - Broad coverage but limited
metadata - Project search engine results to category
structure - Two main components
- Text classification models
- UI for integrating search results and structure
- Context (category structure) plus focus (search
results)
7SWISH Architecture
8Learning Classification
- Support Vector Machine (SVM)
- Accurate and efficient for text classification
(Dumais et al., Joachims) - Model weighted vector of words
- Automobile motorcycle, vehicle, parts,
automobile, harley, car, auto, honda, porsche - Computers Internet rfc, software, provider,
windows, user, users, pc, hosting, os, downloads
... - Hierarchical models for LS directory
- 1 model for top level N models for second
- Very useful in conjunction w/ user interaction
9User Interface Experiments
List Organization
Category Organization
10 11Effect of Query Difficulty
12SWISH Summary and Design Implications
- Text Classification
- Learn accurate category models
- Classify new web pages on-the-fly
- Organize search results
- User Interface
- Tightly couple search results with category
structure - User manipulation of presentation of category
structure
13Organizing Search Results, other examples
14GridViz
- Collaborators
- George Robertson, Edward Cutrell, Jeremy Goecks
(Georgia Tech) - Key Themes
- Abstract beyond individual results
- Highly interactive interface to support
understanding of trends and relationships - More about it
- http//research.microsoft.com/sdumais
15GridViz
- Summarize the results of a search
- Grid-based design
- Axes represent topic, time, people
- Cells encode frequency, recency
- Supports activities like
- What newsgroups are active (on topic x)?
- What people are active, authoritative (on topic
x)? - When did I last interact w/ people?
16GridViz Demo
17User Interface Experiments
18GridViz Summary
- Abstracting beyond individual results
- Highly interactive interface
- Grid-based design
- Axes represent people, topic, time
- Cells encode frequency, recency
- Preliminary but promising
19Stuff Ive Seen (SIS)
- Collaborators
- Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin
Jancke, Daniel Robbins, Merrie Ringel (Stanford) - Key Themes
- Your content
- Information re-use
- Integration across sources
- More about it
- internal for now
20Search Today
- Many locations, interfaces for finding things
(e.g., web, mail, local files, help, history,
intranet)
21Search with SIS
- Unified index of stuff youve seen
- Unify access to information regardless of source
mail, archives, calendar, files, web pages,
etc. - Full-text index of content plus metadata
attributes (e.g., creation time, author, title,
size) - Automatic and immediate update of index
- Rich UI possibilities, since its your content
- Architecture
- Client side indexing and storage
- Built using MS Search components
22SIS Demo
23SIS Alpha Observations
- 800 internal users
- Usage logs (incl different interfaces), survey
data - File types opened
- 76 Email
- 14 Web pages
- 10 Files
- Age of items accessed
- 7 today
- 22 within the last week
- 46 within the last month
24SIS Alpha Observations
- Use of other search tools
- Non-SIS search for web, email, and files
decreases - Importance of people
- 25 of the queries involve peoples names
- Importance of time
- Date by far the most popular sort field, followed
by rank, author, title - Even when rank is the default
25SIS UI InnovationsTimeline w/ Landmarks
- Importance of time
- Timeline interface
- Contextualize results using important landmarks
as pointers into human memory - General holidays, world events
- Personal important photos, appointments
26Milestones in Time Demo
27Milestones in Timeline
28SIS Summary
- Unified index of stuff youve seen
- Fast access to full-text and metadata, from
heterogeneous sources - Automatic and immediate update of index
- Rich UI possibilities
- Next steps
- Better support for tagging - flatland
- Implicit queries for finding related info, and
identifying Stuff I Should See - Integration with richer activity-based info, Eve
29Organizinging Search Results
- Algorithms and interfaces to improve search
- Use structure and context
- Examples and key themes
- SWISH grouping
- GridViz abstraction
- SIS personal content and landmarks
- Also
- Important attributes People, topics, time
- Interaction
- Evaluation
- More information
- http//research.microsoft.com/sdumais
- sdumais_at_microsoft.com
- Christopher Lee of (SIG)IR
- http//www.cdvp.dcu.ie/SIGIR/index.html