Title: About
1(No Transcript)
2 About
- Vivisimo Inc. is an enterprise software company
- Creates innovative software to access and cluster
the worlds information, for better search and
discovery - Founded June 00 in Pittsburgh, 20 employees,
sustained profitability - Organic growth, no venture capital funding
- About 80 customers Cisco, JNJ, NSA, JAMA,
Micropatent, AOL, AAAS, etc. - Clusty.com, acclaimed web search engine
- Launched on Sept 30 of last year
- Raul Valdes-Perez, CEO co-founder
- Last Stop Carnegie Mellon Computer Science Dept
(1986-2000)
3 Problem Information Overload Overlook
- Information Overload and Information Overlook
- Look for information ? get too much back
- Most people handle overload by overlooking most
information
- How to Get People to Overlook Less Information?
- Provide categorized information!
4Categorization at Creation Time
- Also Known as Taxonomy Building
- Create a controlled vocabulary (taxonomy) of
categories - Index new documents into the taxonomy
- Update the taxonomy over time
- At search time, group results into most frequent
categories - Examples
- National Library of Medicines Medical Subject
Headings (MeSH) - Developed over decades
- Human indexers assign MeSH terms to new articles
- Ten people refine MeSH over time recently added
SARS - Yahoo directory, Open Directory
- Why Not Widespread in Electronic World?
- Taxonomy model is costly complicated
- End-users need to meta-search, which undermines
taxonomies - Much of electronic world still in the stack of
books on the floor mode
5(No Transcript)
6(No Transcript)
7(No Transcript)
8(No Transcript)
9New Way Clustering
- To cluster means to form groups
- Stars, candy, disease outbreaks, computers
- Quick linguistic/statistical analysis of search
results - Forms groups based on the major themes in the top
N results - Can leverage taxonomies, as shown at
http//Clustermed.info - Software based on 6 years of research
development funded by National Science Foundation - Builds on many research-years by others in
academia industry
10 End User Benefits of Clustering
User Benefits 1 See what's available At a
glance, the folders show you an information
landscape. 2 See much further Following our
interests, we navigate to low-ranked but
interesting search results, which we're unlikely
ever to see otherwise. 3 See similar
information together We don't have to be
satisfied with the first reference we come
across. We can compare several and pick the best
one.
Key Advantages 1 Works on the fly No need for
pre-processing 2 Spontaneous categories no need
to pre-define them
11Question Clustering, Query Refinement,
Personalization, or Entity Extraction?
- Query Refinement - Show alternative queries to
the user - Pro little computation, dont need search
results as input - Con
- History-based, not matched to search results,
relevance is a challenge - Click on a query, your screen disappears, absorb
new context - Personalization
- Pro its about you!
- Con
- Peoples interests arent static (Olympics,
Oscars, Tsunamis, etc.) - Shared computers lead to shared personalization
- Entity Extraction
- Pro nouns are informative and familiar
- Con
- verbs matter (and adjectives, adverbs, etc.)
- ungrammatical search-result descriptions
12(No Transcript)
13Question Index Everything or Meta-Search?
- Index Everything
- Pros centralization, universal ranking,
simplicity - Cons centralization, universal ranking,
complexity - Meta-Search
- Pros decentralization, can leverage
- Partners, secondary content, free/government
search engines - Ranking methods can leverage voting
- Publishers aggregators can create vertical
destination sites (vortals) - Cons
- Need to create relationships with partners
- Need meta-search software
- Dont have a single ranking score for everything
- Is Meta-Search a Practical Necessity?
14(No Transcript)
15 Some Early Adopters of Clustering
16 Old Way of Thinking
- If a search returns many hits, then youre a
novice - Most patrons/users are novice searchers!
- New way
- Purposely start out with broad searches
- Learn something by viewing the results
- Use what you learned to search again
- Find what youre looking for
- or discover what youd normally miss
17Clustering Will Go Mainstream in 2005-06
- Why? Critical Mass of Installations and Buzz
- AOL Clusty reach about 10 of web searches
- Unbeatable Value Proposition
- Instant organized information without taxonomies
- Can leverage taxonomies where they exist
- No surprises
- Real End-User Benefits
- Users can consider lots more info with the same
effort - Makes people smarter!
- Transform online world of information
- Less like 1-person used bookstore with piles of
books - Look more like Barnes Noble or Borders
18(No Transcript)
19(No Transcript)
20(No Transcript)