Title: afea 1
1Quad Search A novel metasearch
engine (http//cheetah.csd.auth.gr/lakritid)
Leonidas Akritidis1 George Voutsakelis2 Dimitrios
Katsaros1,2 Panayiotis Bozanis2
1Data Engineering Lab, Dept. of Informatics,
Aristotle Univ., Thessaloniki, Hellas 2Computer
Communication Engineering Dept., Univ of
Thessaly, Volos, Hellas
11th Panhellenic Conference of Informatics,
Patras, Hellas, 18-20/05/2007
2Introduction
- Single Search Engines
- Maintenance of a document database
- Low Web Coverage
- Medium Scalability
- Paid Listings
- Metasearch Engines
- Effortless invocation of multiple search engines
- No document database
- Increased Web Coverage
- Improved retrieval effectiveness
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
3Metasearch Engines
The Metasearch Engines use the document databases
that the component search engines maintain
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
4Rank Aggregation
- What is Rank Aggregation?
- The collected data is merged to a final unordered
list - A Rank Aggregation procedure proposes a way to
sort this list - Why do we need Rank Aggregation?
- To provide robust search on the Web
- meta-searching
- Spam problem
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
5Rank Aggregation
What is Rank Aggregation?
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
6Rank Aggregation Methods
Rank Aggregation Methods Unweighted Borda
Count Spearmans Footrule Kentals Tau Markov
Chains
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
7KE Method
Description Each result is called
candidate Each candidate receives a score
(weight), according to the formula below
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
- r(i) The candidates rank in the i-th engine
- n The number of the candidates appearances
- m The number of the invoked search engines
- k The length of the top-k list
8Antispam Version of the KE Method
- We say that a search engine has been spammed by a
- page when it ranks the page too highly with
respect to - the other pages, according to the view of a
typical user - We try to constrain this phenomenon by proposing
the - Antispam version of the KE Method which can be
better - described by the following pseudocode
- Find which items appear in most than half pages
(let the number of these items be c) - Apply the KE Method for these items
- Position them in results list, starting at rank
1 - Apply the KE Method for the rest of the items
- Position them in results list starting at rank
c1
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
9Existing Metasearch Engines
- Features
- The metasearch engines have bad reputation and
many users avoid using them for the following
reasons - Slow result retrieval
- Slow result processing
- Unreliable or No Ranking Algorithms
- Many Sponsored Results and Paid Listings
- They do not evolve
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
10Quad Search
- Features
- Quad Search tries to discard the features that
made the metasearch engines infamous - Evolves constantly
- Supports Web, Image, Scientific, Video, Audio and
- News Searches with options and advanced
features - Implements the KE Method and other algorithms
- Provides user-specified options
- Fast and accurate results
- Free of sponsored results and paid listings
- Friendly user interface and detailed search hints
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
11Web Platform
- Technical Specifications
- Quad Search is hosted on the Cheetah server in
the Aristotle University of Thessaloniki - The server setup includes
- Apache 2.0.40 http server
- PHP 5.1.6
- MySQL 5 database server
- Several PHP extensions
- Zend Optimizer for faster script execution
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
12Quad Searchs Architecture
- Compartments and Modules
- The web search part of Quad Search consists of
the following modules - User Interface
- Database Selector
- Options Page
- Quad Bot
- Object Builder
- Classification Module
- Presentation Module
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
13Quad Searchs Architecture
Schematic diagram of Quad Searchs Architecture
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
14User Interface
- Features
- Quad Searchs User Interface is friendly and
simple in order to ensure - Short download times
- Compatibility with all major browsers
- Convenient usage
- For this reason, we avoided using
- Large graphics files
- Javascript and AJAX
- Flash Presentations
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
15User Interface (Search Hints)
- Search Hints
- We developed this part of Quad Search to provide
- Detailed information about all its features
- Explanation for simple and complex operations
- Many helpful examples
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
16Quad Bot (1)
- Description
- Quad Bot is responsible for the result retrieval.
It consists - of the following sub-modules
- Input Validator It performs security checks
- Query Dispatcher It submits the query to the
component search engines simultaneously - Result Collector It embraces the engines
responses - Result Validator It performs multiple
conversions to the collected data.
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
17Quad Bot (2 - Architecture)
Architecture
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
18Web Search APIs
- What is a Web Search API?
- API stands for Application Programming Interface.
- It is a programming tool supplied by the
manufacturer of a large scale application - A Web Search API is used to retrieve results from
major search engines - Disadvantages
- Inaccurate results compared to the mother
engine - Queries per Day Limitation
- Registration IDs required
- Queries per Registration ID Limitation
- Quad Search does not make use of Search APIs
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
19Engine Bombing
Definition Engine Bombing occurs when multiple
results from the same domain enter the presented
results list Many metasearch engines suffer the
engine bombing. Engine Bombing Protection Quad
Search supports a feature to limit the different
results coming from same domain
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
20Results Filtering
- Provided Filters
- Antispam Filter Application of the antispam
version of the KE Method - Ranking Algorithm Selector Quad Search provides
an option to determine how the collected results
will be ranked - Engine Bombing Protection
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
21Advanced Web Search
- Advanced Search Filters
- File Type Selector The user can perform searches
for files of specific format (PDF, DOC, XLS and
PPT) - Language Filter Quad Search can return documents
written in a specifed language - Domain Filter The user can search a given
domain, or exclude a domain from a search - Date Filter Return results updated in the past
3, 6, or 12 months
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
22Web Search Options
- Quad Search provides the user with the ability to
set - options that will be used in future searches
- Some of these options are
- Connection Timeout Feature. How long Quad Search
should wait a search engine to respond - Determine the number of candidates to be
collected per component engine - Determine the number of results to be displayed
per result page - Determine whether the results will be opened in a
new browser window
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
23Results Presentation (1)
Classic View The results are displayed in the
classic way
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
Array View The results are displayed in a ranked
array. The user can watch the results and their
rankings easier
24Results Presentation (2)
Results Page The results page is highly
customizable. A relative screenshot is depicted
below
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
25Extra Features
- Quad Search supports a set of extra features
- Related Searches Proposed query strings to
narrow or expand a search - Query String Explosion Feature It splits the
query string to its search terms and gives the
user the ability to perform single term
searches - Ranking Algorithm Selector The user is able to
determine how the collected results will be
ranked, by employing one of the supported
algorithms - Search for Scientific Articles Quad Search
supports searches for scientific articles in the
richest scientific databases
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
26Extra Features Calculations
Calculator Quad Search is capable of providing
results to simple algebraic calculations. The
user must use the calc Operator
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
The calculator supports simple functions like
sinus, cosine and logarithm
27Extra Features Unit Conversions
Unit Converter Quad Search is capable of
performing unit conversions. The user must use
the calc operator
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
28Extra Features Term Dictionary
Dictionary Quad Search is capable providing
definitions for words, terms and phrases. The
user must use the define operator
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
29Scientific Search
General Features Quad Search is capable of
searching for scientists, authors and/or
published articles Google Scholar provides the
required data Quad Search collects the data and
produces statistics and charts
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
30Related Work
- There is a small number of similar services
across the - web. These services have all, or some of the
following - drawbacks
- Slow Result Retrieval and Processing
- Unreliability, as they fail to fetch all the
results - Problems in cooperation with Google
- No result pagination
- Poor statistics
- Quad Search eliminates these drawbacks.
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
31H-Index
Definition The h-index is an index for
quantifying the scientific productivity of
physicists and other scientists based on their
publication record A scientist has index h if h
of his Np papers have at least h citations each,
and the other (Np - h) papers have no more than h
citations each Quad Search computes h-index when
the user does a search for authors
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
32Scientific Search Options
- The scientific search part of Quad Search offers
a variety - of options that can be stored and used in future
searches - The user can define
- The results language
- The results subject area (biology, chemistry,
physics, engineering, medicine etc) - The number of results to be displayed per page
- If the results will be opened in the current or
in a new window
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
33Advanced Scientific Search
- Quad Search supports some advanced features for
more - thorough scientific searches
- In particular, the user can search for articles
that - Contain exact phrases
- Do not contain specific terms
- Have been written by a specific author
- Have been published in a specific magazine
- Have been written in a specific time period
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
34Cache
What is Cache? A temporary storage area where
frequently accessed data can be stored for
access. Once the data is stored in the cache,
future use can be made by accessing the cached
copy rather than re-fetching or recomputing
the original data, so that the average access
time is lower Quad Search makes use of a local
database to store the results from submitted
queries When another user submits the same
query, the engine will fetch the results from
that database, not from Google Scholar The cache
is refreshed after 30 days
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
35Extra Features - Charts
The user can visually check the number of cites
per paper of a specified author. This feature is
applicable for Author Searches
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
36Extra Features Excluding Papers
When a user performs an Author Search, Quad
Search transfers all results from Google Scholar
(or its cache) Possibly, some of these articles
should not participate in the calculations (e.g.
the h-index) The user can exclude the papers
that should not participate in the calculations,
by deselecting the appropriate checkbox
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
37Future Work
- Our plans for Quad Search
- Support for extra ranking algorithms (e.g. Markov
chains) - Geography aware search for News
- News Search with RSS feeds
- Wide Personalization (users, profiles, topics of
interest, stored multimedia and user defined
customization) - Image and Video searches
- Searches in P2P networks (e-donkey, g-nutella,
etc) - Torrent Searches
Future Work Concluding remarks
38Concluding Remarks
- Conclusions
- In this session, we presented a pair of rank
aggregation algorithms, KE Method and its
antispam version - We injected some new parameters like the number
of the top-k lists that a page appears and the
total number of the exploited search engines - We also presented a novel meta-search engine,
Quad Search - Quad Search offers a wide variety of new features
for web search, like the ranking algorithm
selector, the engine bombing protection etc - Quad Search also provides options for searches
for scientific articles. It also computes
statistics like h-index
Future Work Concluding remarks