Search Engines - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Search Engines

Description:

Bebo. Your project paper saved on your computer. The Goal of Search Engine: ... How to process query: synonym, proxy search, alternative spelling ... Ranking ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 23
Provided by: xinyi8
Category:
Tags: bebo | engines | search

less

Transcript and Presenter's Notes

Title: Search Engines


1
Search Engines
  • Xin Ying Qiu
  • Ph.D. candidate, University of Iowa

2
Outline
  • The Goal of Search Engine
  • How search engine works
  • The challenges
  • Search strategies

3
The Goal of Search Engine
  • You are searching for
  • A library book
  • Bebo
  • Your project paper saved on your computer
  • The Goal of Search Engine
  • to satisfy the users information need

4
The Goal of Search Engine
5
History of Web Search Engine
6
What Makes a Good Search Engine
  • Fast results
  • Where to search
  • Relevance and quality of results
  • How to process query synonym, proxy search,
    alternative spelling
  • Ranking of results
  • How to order the retrieved results
  • Snippets of result details
  • The context of the search terms
  • Similar pages

7
How Search Engine Works
  • Three components of search engines
  • crawling
  • Also known as agents, spiders, robots
  • Indexing
  • Why using database
  • Why storing the index
  • Query processing
  • What is actually searched when you search the web

8
How Search Engine Works
9
How Search Engine Works
  • Crawlers are computer programs that gather
    contents of all web pages
  • How to crawl the web
  • Start with domain name
  • Follow hyperlinks within each page
  • Remember the visited and the to-be-visited

10
The Web Is HUGE
11
How Search Engines Work
  • Four laws of crawling
  • A Crawler must show identification
  • A Crawler must obey the robots exclusion standard
  • http//www.robotstxt.org/wc/norobots.html
  • A Crawler must not hog resources
  • A Crawler must report errors

12
Indexing
  • An index maps the salient information in a corpus
    into a format designed to let you quickly locate
    specific content.
  • What to index
  • Key words, urls, anchor text
  • Use stemming and remove stop-words to reduce
    index size
  • Index structure

13
Inverted Index
14
Full Text Index
15
Query Processing
16
Query Processing
  • The query processor has several parts
  • user interface (search box)
  • the engine that evaluates queries and matches
    them to relevant documents
  • Query is compared with the index objects
  • Documents are retrieved
  • results formatter
  • Inclusion/exclusion results
  • Relevance-based results

17
Query Processing
  • Different search engine has its own complex,
    highly-guarded, unique ranking algorithm to rank
    search results.
  • Some criteria
  • Number of search terms in the retrieved page
  • Location of search terms
  • Proximity of search terms to each other
  • Link analysis of pages pointing to the retrieved
  • freshness of page

18
How Google Works
  • Googlebot, Googles Web Crawler
  • Googlebot doesnt traverse the web at all.
  • Fresh crawl and deep crawl send request to
    webservers
  • Googles Indexer
  • Full text index, stop-word removal
  • Googles PageRank algorithm
  • Idea important pages are pointed to by other
    important pages

19
How Google Works
20
The Challenges
  • Manipulating ranking
  • Spamming techniques
  • Cloaking (aka bait and switch)
  • Keyword spam (stuffing a page with irrelevant
    words)
  • Link spam (linking a bad neighbor)
  • Robots (sending automated fake queries)
  • Paid ranking

21
Search Strategies Tips
  • More information is better.
  • People use different words for the same thing.
  • Use keywords, unusual and proper terms
  • Put important terms first
  • Boolean search, phrase search, prefix search
  • Query words have implicit Boolean AND
  • Quotes, ( ), wildcards , , -
  • Make use of slang, jargon, acronym
  • Advanced search

22
Beyond Search Engines
  • Discovering knowledge from hypertext data
  • Question-answering from the web
  • Summarization
  • Recommender system
  • Mining business relationship or product
    reputation on the web
Write a Comment
User Comments (0)
About PowerShow.com