Metasearch Engines - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Metasearch Engines

Description:

Sends query to: AltaVista, Excite, Infoseek, LookSmart, Lycos, The Mining Co. ... AltaVista, Excite, Excite Subj. Guide, GoTo.com, Infoseek, Lycos, Lycos' a2z, ... – PowerPoint PPT presentation

Number of Views:258

Avg rating:3.0/5.0

Slides: 31

Provided by: hoz

Category:

more less

Transcript and Presenter's Notes

Title: Metasearch Engines

1
Metasearch Engines
2
Course Outline (recap)

Introduction and the MPEG standards
The research issues in MPEG-7
Introduction to speech processing for multimedia
Introduction to statistical pattern recognition
Media indexing and retrieval
Past, present and future
Content-based retrieval (CBR)
Introduction to concept-based retrieval
Metasearch engines
Human-computer interface
Human body movement analysis
Human emotion recognition
Media transmission over peer-2-peer networks
Dynamic resource allocation in media
transmission

3
This class

What are metasearch engines
Common features
Some metasearch engines
Research issues

4
Searching more than one database

Users find more good documents but must
Learn how to use each search engine
Combine results

5
Metasearch Engines

Metasearch engines search many databases in
parallel
Combine results

6
Metasearch engine
Read query
Choose databases
For each chosen data base translate and send quer
y
7
Metasearch engine
Accept search results
Select a subset from each
Merge and display results
8
Advantages

A uniform query language
Choose best databases for query
Save users time
Provide better retrieval results

9
Common features

Search most of the popular search engines.
Fast, because they use "parallel" (i.e.,
simultaneous) querying and have high-speed
processors
Allow you to set length of wait time

10
Differences

How results are compiled when reported
How and whether they can handle complex searches
Whether you can customize the search strategy

11
How results are compiled when reported

Some report the results from each search engine
in sequence
Others sort the results, eliminating duplicates.
In some you can specify how results are sorted

12
How and whether they can handle complex searches

Some allow phrase searching,
Some allow Boolean operators (especially OR and
NOT)
Some strip out quotations or Boolean operators,
or create garbage by passing them through as
search terms.
Few allow you to request truncation.

13
Whether you can customize the search strategy

In some you have more flexibility to vary time
limits and choose how results are reported.
Some let you specify which search tool databases
are queried and in what order.

14
Metacrawler

No choice
Fast searches
Sends query to AltaVista, Excite, Infoseek,
LookSmart, Lycos, The Mining Co., WebCrawler,
Yahoo!
Identifies and removed duplicates
Consolidates results in one large list, ranked by
a "vote"

15
Metacrawler

Merges results by first normalizing all the
scores to values 0 to 1000
Then adding the scores of multiply retrieved
documents
Query ALL terms (AND), ANY terms ( OR), or
exact PHRASE. use /- and " around phrases.

16
Inference Find!

Queries 6 search engines currently uses
WebCrawler, Yahoo!, Lycos, AltaVista, Infoseek,
and Excite.
Results are merged and clustered redundancies
are removed.
Default is AND (can use OR and NOT. ignored in
tools that dont support)
Allows phrases in

17
Internet sleuth www.isleuth.com

Users may search for appropriate database (3000
available)
Will search for appropriate database
A search for databases with pictures (or recipes)
finds a variety of databases
Then users choose ones to search
Does not merge results

18
Dogpile

AltaVista, Excite, Excite Subj. Guide,GoTo.com,
Infoseek, Lycos, Lycos' a2z, Magellan, The
Mining Co., PlanetSearch, Thunderstone,
WebCrawler, What-U-Seek, yahoo

19
Dogpile

List of hits after each search tool queried.
Duplicates may occur
If 10 or more hits found among first 3 tried,
option to search more.
Click on a link to a search engine

20
Cyber 411

Fast. Contacts 15 search engines for each query.
Query one word or phrase
Does not merge results

21
Savvysearch www.savvysearch.com

(Colorado State, Howes)
Search engines selected based on Query text,
Sources and types of information selected,
Estimated Internet traffic,
Anticipated response time
The load on CSU computer

22
Research issues

How to choose best DBs
How to merge results

23
Choosing the best databases automatically

Depends on available information
Different researchers and systems make different
assumptions
Choose DB X if it can provide good documents and
if users query can be executed

24
Stored queries/relevancy (Voorhese)

Queries with relevant results are stored
New query compared to stored queries
Use previous results to select databases and
Number of documents to merge from each

25
DB summary index (Callan)

Collection information is available
Commonly used keywords and their dfs
Query is compared to databases
Similarity used to select database and
Number of documents from each

26
Gloss

Assumes knowledge of database/terms dfs
Computes the probability of finding a document
containing all of the query terms in database

27
Merging retrieval results

Similarity values may not be available
Similarity values may not be comparable
Should similarity be modified when documents are
retrieved by more than one search engine

28
Same search engine different databases