Title: How Search Engines Work General Search Strategies
1How Search Engines Work General Search Strategies
- Dr. Dania Bilal
- IS 587
- SIS Fall 2007
2Fun Quiz
- Take the search engine quiz located at
- http//websearch.about.com/library/quizzes/search_
engine_quiz/blsearchenginequiz.htm - Record the no. of incorrect answers
- Share the results of the quiz with a classmate.
3How Search Engines Work?
- They collect information from selected web sites
- The employ special software robots, called
spiders, to crawl web pages - Spiders build lists of the words found in Web
sites. - When a spider is building its lists, the spider
is Web crawling. - Spiders store the lists in the engines database
- The engines indexing software builds an index of
words - Information is matched against query input and
retrieved (processing algorithm)
4How Spiders and Crawlers Work?
- They begin with popular and heavily used web
servers. - They begin with a popular site, collect the words
on its pages and follow every link found within
the site. - Spiders travel across pages and the most widely
used portions of the Web
5How Spiders and Crawlers Work?
- A dedicated server of URLs is built by a search
engine company (e.g., Google) so that spiders
collect information quickly - More than one spider is used to craw web pages at
a time - Google uses 3-4 spiders and collect over 100
pages per second
6How Spiders and Crawlers Work?
- When no dedicated URL server is used, search
engine company relies on ISP for the domain names
(translated into addresses) to use for crawling
the web - Delay in gathering information
- Delay in updating information
- Lack of control over URL addresses
7Google Spider and How it Works
- A spider looks at the html or xml or other coding
used to build a web page and collects information
from the meta-tags - It indexes words within the actual text of a
page - It indicates where the words were found (URL,
title, headings, etc.) - It disregards initial articles
- It disregards pages that should not be crawled or
indexed
8Google Spider and How it Works
- It uses Robot-Exclusion Protocol in disregarding
pages - Implemented in the meta-tag section at the
beginning of a Web page - Tells a spider to leave the page alone, neither
index the words on the page nor try to follow its
links - Franklin, C. How Internet Search Engines Work.
http//computer.howstuffworks.com/search-engine.ht
m
9How Search Engines Store Words Indexed?
- The process varies among engines
- Words are stored with no. of times they appear on
a pages (posting) - Weight is assigned to each word.
- Words appearing near top of a page may have more
weight than those appearing in subheadings, in
links, in meta tags, in title, etc.
10How Search Engines Store Words Indexed?
- Information is encoded to save space
- Information is indexed
- An index of words is built by the automatic
indexer (indexing software) - A hash table is created with an assigned weight
or value for each word indexed - Hashing allows for even the distribution of
popular entries (e.g., letter M) with those that
are less popular (e.g., letter X) for quick
retrieval
11Using General Directories
- Yahoo and its family
- Browsing directory
- Directory database
- Small and human-selected and indexed
- Searching using keywords
- Search database
- Larger and non-selective database
- Spider and machine indexing
12Yahoo
- Yahoo.com
- Works like a search engine rather than a
directory - Searches the web
- Exercise search under my name and see how Yahoo
processes query while youre inputting
information - Directory found under more or at
- http//search.yahoo.com/dir
13Yahoo Search Engine
- Search
- Web
- Images
- Videos
- Local information
- Shopping
- More
14Yahoo Advanced Search
- Advanced Search feature
- Shown on screen after you perform a search, or by
going directly to - http//search.yahoo.com/web/advanced?eiUTF-8pdr
daniabilalfryfp-t-471 - Lots of search features to explore
15Yahoo Advanced Search Features
- Boolean
- Phrase
- Currency
- Domain
- File format
- Country
- Language
- Other
16Yahoo Advanced Search Features
- Exercise
- Perform a search on a topic of your choice
- Use Boolean equivalents
- All the wordsAND
- The exact phrasephrase proximity search
- Any of these wordsOR
- None of these wordsNot
- Choose part of page to search
- Choose language other than English
- Report results in class
17Yahoo Search Services
- For searching specific content area such as
- Search Services
- Web SearchFind anything from across the Web
- AnswersAsk questions and get answers from real
people - Audio SearchFind over 50mm audio files from
across the Web - Creative Commons SearchFind Creative Commons
content that you can share or re-use in your own
works - Directory SearchSearch or browse Yahoo!'s
categorized guide to the Web - Image SearchFind over 1.6 Billion photos and
illustrations from all over the Web - Job SearchSearch for jobs, post your resume and
more on Yahoo! HotJobs - LocalFind everything in your area from dry
cleaners to day spas - MapsFind maps and driving directions for
anywhere you want to go - Mobile SearchFind whatever, wherever you are
- My Web (Beta)The newest way to save, share and
organize any page you want on the Web - News SearchSearch for news stories and related
photos, videos and audio clips
18Yahoo Next
- http//next.yahoo.com/
- Cutting edge technology at Yahoo
- Blogs, Web 2.0, use of alltheweb, Yahoo Maps,
Podcasts, audio and all other features that are
in Beta testing
19Yahoo Preferences
- Customize Yahoo to fit your needs
- Go to Preferences from the Web search page
- Edit preferences based on your needs
- Edited preferences are saved in browser on
desktop
20General Search Strategies in Search Engines
21Strategies
- Boolean
- Boolean equivalents
- Proximity and phrase searching
- Searching within a field
- Search limits
22Yahoo Search Strategies
- Explore Yahoos help page
- Read the Search Tips
- Read the search limit parameters such as
- Intitle
- url
- inurl
- Read how to use Boolean equivalents and other
search parameters
23General Search Engines Besides Yahoo Search
24Engines and Information Need
- Several general search engines on the Web
- Select engine(s) that best fit your need
- Visit the Web Search Guide for latest
information - http//websearch.about.com/od/generalsearchengines
/General_AllPurpose_Search_Engines.htm
25Hands-on Activity
- Browe the list of general search engines in Web
Search Guide - Explore 4 of the engines listed
- Wisenut, Snap.com, Lycos, Exalead
- Search under my name in each engine
- Compare the results by viewing the first two
pages retrieved - How many overlaps were found among the three
engines - How many unique results were found in each engine
26Specialized Search Engines
- Web Search Guide has a listing of specialized
search engines - Web companion to the textbook, chapter 3
describes a variety of specialized engines - Explore chapter 3 familiarize yourself with the
engines described
27Hands-on Activity
- Find the answer or relevant information for these
two queries using an appropriate, specialized
search engine - Do squirrels hybernate?
- Find me a list of foreign-owned companies based
in the U.S., organized by state.