afea 1 - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

afea 1

Description:

Detailed information about all its features. Explanation for ... Determine whether the results will be opened in a new browser window. Results Presentation (1) ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 27
Provided by: SOS72
Category:
Tags: afea | cooperation

less

Transcript and Presenter's Notes

Title: afea 1


1
Quad Search A novel metasearch
engine (http//cheetah.csd.auth.gr/lakritid)
Leonidas Akritidis1 George Voutsakelis2 Dimitrios
Katsaros1,2 Panayiotis Bozanis2
1Data Engineering Lab, Dept. of Informatics,
Aristotle Univ., Thessaloniki, Hellas 2Computer
Communication Engineering Dept., Univ of
Thessaly, Volos, Hellas
11th Panhellenic Conference of Informatics,
Patras, Hellas, 18-20/05/2007
2
Introduction
  • Single Search Engines
  • Maintenance of a document database
  • Low Web Coverage
  • Medium Scalability
  • Paid Listings
  • Metasearch Engines
  • Effortless invocation of multiple search engines
  • No document database
  • Increased Web Coverage
  • Improved retrieval effectiveness

Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
3
Metasearch Engines
The Metasearch Engines use the document databases
that the component search engines maintain
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
4
Rank Aggregation
  • What is Rank Aggregation?
  • The collected data is merged to a final unordered
    list
  • A Rank Aggregation procedure proposes a way to
    sort this list
  • Why do we need Rank Aggregation?
  • To provide robust search on the Web
  • meta-searching
  • Spam problem

Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
5
Rank Aggregation
What is Rank Aggregation?
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
6
Rank Aggregation Methods
Rank Aggregation Methods Unweighted Borda
Count Spearmans Footrule Kentals Tau Markov
Chains
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
7
KE Method
Description Each result is called
candidate Each candidate receives a score
(weight), according to the formula below
Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
  • r(i) The candidates rank in the i-th engine
  • n The number of the candidates appearances
  • m The number of the invoked search engines
  • k The length of the top-k list

8
Antispam Version of the KE Method
  • We say that a search engine has been spammed by a
  • page when it ranks the page too highly with
    respect to
  • the other pages, according to the view of a
    typical user
  • We try to constrain this phenomenon by proposing
    the
  • Antispam version of the KE Method which can be
    better
  • described by the following pseudocode
  • Find which items appear in most than half pages
    (let the number of these items be c)
  • Apply the KE Method for these items
  • Position them in results list, starting at rank
    1
  • Apply the KE Method for the rest of the items
  • Position them in results list starting at rank
    c1

Introduction Metasearch Engines Rank
Aggregation Rank Aggregation Methods KE
Method Antispam Version
9
Existing Metasearch Engines
  • Features
  • The metasearch engines have bad reputation and
    many users avoid using them for the following
    reasons
  • Slow result retrieval
  • Slow result processing
  • Unreliable or No Ranking Algorithms
  • Many Sponsored Results and Paid Listings
  • They do not evolve

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
10
Quad Search
  • Features
  • Quad Search tries to discard the features that
    made the metasearch engines infamous
  • Evolves constantly
  • Supports Web, Image, Scientific, Video, Audio and
  • News Searches with options and advanced
    features
  • Implements the KE Method and other algorithms
  • Provides user-specified options
  • Fast and accurate results
  • Free of sponsored results and paid listings
  • Friendly user interface and detailed search hints

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
11
Web Platform
  • Technical Specifications
  • Quad Search is hosted on the Cheetah server in
    the Aristotle University of Thessaloniki
  • The server setup includes
  • Apache 2.0.40 http server
  • PHP 5.1.6
  • MySQL 5 database server
  • Several PHP extensions
  • Zend Optimizer for faster script execution

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
12
Quad Searchs Architecture
  • Compartments and Modules
  • The web search part of Quad Search consists of
    the following modules
  • User Interface
  • Database Selector
  • Options Page
  • Quad Bot
  • Object Builder
  • Classification Module
  • Presentation Module

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
13
Quad Searchs Architecture
Schematic diagram of Quad Searchs Architecture
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
14
User Interface
  • Features
  • Quad Searchs User Interface is friendly and
    simple in order to ensure
  • Short download times
  • Compatibility with all major browsers
  • Convenient usage
  • For this reason, we avoided using
  • Large graphics files
  • Javascript and AJAX
  • Flash Presentations

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
15
User Interface (Search Hints)
  • Search Hints
  • We developed this part of Quad Search to provide
  • Detailed information about all its features
  • Explanation for simple and complex operations
  • Many helpful examples

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
16
Quad Bot (1)
  • Description
  • Quad Bot is responsible for the result retrieval.
    It consists
  • of the following sub-modules
  • Input Validator It performs security checks
  • Query Dispatcher It submits the query to the
    component search engines simultaneously
  • Result Collector It embraces the engines
    responses
  • Result Validator It performs multiple
    conversions to the collected data.

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
17
Quad Bot (2 - Architecture)
Architecture
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
18
Web Search APIs
  • What is a Web Search API?
  • API stands for Application Programming Interface.
  • It is a programming tool supplied by the
    manufacturer of a large scale application
  • A Web Search API is used to retrieve results from
    major search engines
  • Disadvantages
  • Inaccurate results compared to the mother
    engine
  • Queries per Day Limitation
  • Registration IDs required
  • Queries per Registration ID Limitation
  • Quad Search does not make use of Search APIs

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
19
Engine Bombing
Definition Engine Bombing occurs when multiple
results from the same domain enter the presented
results list Many metasearch engines suffer the
engine bombing. Engine Bombing Protection Quad
Search supports a feature to limit the different
results coming from same domain
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
20
Results Filtering
  • Provided Filters
  • Antispam Filter Application of the antispam
    version of the KE Method
  • Ranking Algorithm Selector Quad Search provides
    an option to determine how the collected results
    will be ranked
  • Engine Bombing Protection

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
21
Advanced Web Search
  • Advanced Search Filters
  • File Type Selector The user can perform searches
    for files of specific format (PDF, DOC, XLS and
    PPT)
  • Language Filter Quad Search can return documents
    written in a specifed language
  • Domain Filter The user can search a given
    domain, or exclude a domain from a search
  • Date Filter Return results updated in the past
    3, 6, or 12 months

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
22
Web Search Options
  • Quad Search provides the user with the ability to
    set
  • options that will be used in future searches
  • Some of these options are
  • Connection Timeout Feature. How long Quad Search
    should wait a search engine to respond
  • Determine the number of candidates to be
    collected per component engine
  • Determine the number of results to be displayed
    per result page
  • Determine whether the results will be opened in a
    new browser window

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
23
Results Presentation (1)
Classic View The results are displayed in the
classic way
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
Array View The results are displayed in a ranked
array. The user can watch the results and their
rankings easier
24
Results Presentation (2)
Results Page The results page is highly
customizable. A relative screenshot is depicted
below
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
25
Extra Features
  • Quad Search supports a set of extra features
  • Related Searches Proposed query strings to
    narrow or expand a search
  • Query String Explosion Feature It splits the
    query string to its search terms and gives the
    user the ability to perform single term
    searches
  • Ranking Algorithm Selector The user is able to
    determine how the collected results will be
    ranked, by employing one of the supported
    algorithms
  • Search for Scientific Articles Quad Search
    supports searches for scientific articles in the
    richest scientific databases

Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
26
Extra Features Calculations
Calculator Quad Search is capable of providing
results to simple algebraic calculations. The
user must use the calc Operator
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
The calculator supports simple functions like
sinus, cosine and logarithm
27
Extra Features Unit Conversions
Unit Converter Quad Search is capable of
performing unit conversions. The user must use
the calc operator
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
28
Extra Features Term Dictionary
Dictionary Quad Search is capable providing
definitions for words, terms and phrases. The
user must use the define operator
Existing Engines Quad Search Web
Platform Architecture User Interface Quad Bot Web
Search APIs Engine Bombing Results
Filtering Advanced Search Search Options Result
Presentation Extra Features
29
Scientific Search
General Features Quad Search is capable of
searching for scientists, authors and/or
published articles Google Scholar provides the
required data Quad Search collects the data and
produces statistics and charts
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
30
Related Work
  • There is a small number of similar services
    across the
  • web. These services have all, or some of the
    following
  • drawbacks
  • Slow Result Retrieval and Processing
  • Unreliability, as they fail to fetch all the
    results
  • Problems in cooperation with Google
  • No result pagination
  • Poor statistics
  • Quad Search eliminates these drawbacks.

Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
31
H-Index
Definition The h-index is an index for
quantifying the scientific productivity of
physicists and other scientists based on their
publication record A scientist has index h if h
of his Np papers have at least h citations each,
and the other (Np - h) papers have no more than h
citations each Quad Search computes h-index when
the user does a search for authors
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
32
Scientific Search Options
  • The scientific search part of Quad Search offers
    a variety
  • of options that can be stored and used in future
    searches
  • The user can define
  • The results language
  • The results subject area (biology, chemistry,
    physics, engineering, medicine etc)
  • The number of results to be displayed per page
  • If the results will be opened in the current or
    in a new window

Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
33
Advanced Scientific Search
  • Quad Search supports some advanced features for
    more
  • thorough scientific searches
  • In particular, the user can search for articles
    that
  • Contain exact phrases
  • Do not contain specific terms
  • Have been written by a specific author
  • Have been published in a specific magazine
  • Have been written in a specific time period

Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
34
Cache
What is Cache? A temporary storage area where
frequently accessed data can be stored for
access. Once the data is stored in the cache,
future use can be made by accessing the cached
copy rather than re-fetching or recomputing
the original data, so that the average access
time is lower Quad Search makes use of a local
database to store the results from submitted
queries When another user submits the same
query, the engine will fetch the results from
that database, not from Google Scholar The cache
is refreshed after 30 days
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
35
Extra Features - Charts
The user can visually check the number of cites
per paper of a specified author. This feature is
applicable for Author Searches
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
36
Extra Features Excluding Papers
When a user performs an Author Search, Quad
Search transfers all results from Google Scholar
(or its cache) Possibly, some of these articles
should not participate in the calculations (e.g.
the h-index) The user can exclude the papers
that should not participate in the calculations,
by deselecting the appropriate checkbox
Scientific Search Related Work H-Index Search
Options Advanced Search Cache Extra Features
37
Future Work
  • Our plans for Quad Search
  • Support for extra ranking algorithms (e.g. Markov
    chains)
  • Geography aware search for News
  • News Search with RSS feeds
  • Wide Personalization (users, profiles, topics of
    interest, stored multimedia and user defined
    customization)
  • Image and Video searches
  • Searches in P2P networks (e-donkey, g-nutella,
    etc)
  • Torrent Searches

Future Work Concluding remarks
38
Concluding Remarks
  • Conclusions
  • In this session, we presented a pair of rank
    aggregation algorithms, KE Method and its
    antispam version
  • We injected some new parameters like the number
    of the top-k lists that a page appears and the
    total number of the exploited search engines
  • We also presented a novel meta-search engine,
    Quad Search
  • Quad Search offers a wide variety of new features
    for web search, like the ranking algorithm
    selector, the engine bombing protection etc
  • Quad Search also provides options for searches
    for scientific articles. It also computes
    statistics like h-index

Future Work Concluding remarks
Write a Comment
User Comments (0)
About PowerShow.com