Title: Effective Web Searching
1Effective Web Searching
T.B. RajashekarNational Centre for Science
InformationIndian Institute of ScienceBangalore
- 560 012 (E-Mail raja_at_ncsi.iisc.ernet.in)
2Effective Web Searching
- How we use libraries and IR systems?
- Organization of the web
- Accessing web-based information key problems
- Tools for Information retrieval on the web
- Directories/ guides
- Search engines
- Meta search tools
- People finding tools
- Strategies for web searching
- Guides to search tools
- Keeping current
3How we use Libraries and IR Systems?
- Libraries
- How the documents are organised document types,
classification system used - Access tools catalogues, indexes, automated
catalogues, access points - Our information need (search topic) translate
these in terms of organization scheme employed by
the library - Information Retrieval systems (e.g. bibliographic
databases) - How the database is organised, record content,
fields, search elements - Indexing and query language, thesaurus, Boolean
logic, truncation, etc. - Our information need formulated as a search
expression using the query language
4Organization of the Web
- Adopt same strategy while searching the Web
- Understand web information architecture
- Understand the information access tools and the
information access mechanisms they provide - Represent our query in terms of mechanisms
supported by these tools and search the web - Web sites
- How the content is organised (document types,
structuring and navigation) - Searchable/indexable and non searchable/indexable
content - Structure of web pages
- Meta tags, page attributes (properties)
5Organization of the Web...
- Web is the totality of web pages stored on web
servers - Spectacular growth in web-based information
sources and services - Education and research
- Entertainment
- Business and commerce
- Personal home pages
- Estimated to contain over 1 billion indexable web
pages - Doubling each year
- Over 80 million web sites
6Accessing Web-based Information Key Problems
- Identification of sources (documents)
- No central card catalog
- Most web pages are not indexed in standard
vocabulary, unlike library catalogues or journal
article indexes - Impossible to reach all related pages/ sites
directly - Need to use intermediate, resource finding tools
7Information Retrieval on the Web
- How to find relevant documents on the Web?
- Informal
- Browsing (and book marking for later use)
- Friends
- Print sources
- Discussion forums (mailing lists)
- Current awareness services (e.g. Scout Report)
- Guessing web site addresses!
- Formal (using information finding tools)
- Web directories/ guides
- Web search engines
- Meta-search tools
- Specialty search engines
8Web Directories/ Guides
- Also called as virtual libraries and Internet
resource catalogues - Organised collection of descriptions and links to
Internet sources - Organisation by subject categories
(hierarchical) by resource type (patents,
e-journals, institutes, etc.) - Most use human experts for source selection,
indexing and classification - Some include reviews/ ratings of listed sites
9Web Directories/ Guides...
- Examples of general web directories
- Librarians Index to the Internet (www.lii.org)
- Britannicas Webs best sites
(www.britannica.com) - Infomine (infomine.ucr.edu)
- Scout Report Signpost (www.signpost.org)
- BUBL link (bubl.ac.uk/link)
- Yahoo (www.yahoo.com)
- Magellan (www.mckinley.com)
- Galaxy (www.galaxy.com)
- Looksmart (www.looksmart.com)
- Snap (www.snap.com)
- New directory (October 2001) JoeAnt (www.
joeant.com)
10Web Directories/ Guides...
- Guides to directories
- WWW Virtual Library (www.vlib.org)
- Argus Clearinghouse (www.clearinghouse.net)
- Gogettem (www.gogettem.com/)
- Subject-specific guides (subject gateways)
- Edinburgh Engineering Virtual Library
(www.eevl.ac.uk) - Social Science Information Gateway (sosig.ac.uk)
- The Internet Pilot To Physics (physicsweb.org/TIPT
OP) - Chemcenter (www.acs.com)
- Programmers Heaven (www.programmersheaven.com)
- Resource type guides
- Patents (www.european-patent-office.org)
- Electronic journals (www.publist.com)
11Web Directories/ Guides...
- Most web directories support searching within
categories and descriptions, in addition to
browsing - Advantages
- Access to high quality sources
- Do not contain redundant links
- Faster access to sources
- Disadvantages
- One needs to be aware of such directories/ guides
- May not be up-to-date
- May not be exhaustive
- Categories (subject hierarchy) varies across
directories
12Web Directories/ Guides...
- When to use web directories/ guides?
- For broad/ general topics where keyword searching
on search engines retrieves too many irrelevant
sites - When you want a few highly relevant sites and
intention is not exhaustive/ comprehensive search - When not to use web directories/ guides?
- For concept/ keyword searches
- Search terms are distinctive
- Effective directory/ guide usage
- Take advantage of the sub-search within
categories, supported by most directories/ guides - Join their mailing lists for automatic updates on
new sites
13Web Directories/ Guides...
- Demonstration of directories/ guides
- Librarians Index to the Internet (www.lii.org)
- Britannicas Webs best sites
(www.britannica.com) - Scout Report Signpost (www.signpost.org)
- BUBL link (bubl.ac.uk/link)
- Yahoo (www.yahoo.com)
- WWW Virtual Library (www.vlib.org)
- Argus Clearinghouse (www.clearinghouse.net)
14Web Search Engines
- Just as AI journals index published literature,
web search engines build a full-text index to web
pages gathered from web sites and provide a
keyword search interface to this index - Spider programs periodically visit web sites and
gather the web pages for indexing - Also index web sites submitted by site developers
- A brief summary of the indexed web page is also
prepared - The index usually contains URLs, titles,
headings, and other words from the HTML document
15Web Search Engines...
- The search engines provide a forms-based search
interface for entering the queries - Support simple and advanced search interfaces
- Search results are returned in the form of a list
of web sites matching the query - Some key features supported
- Phrase searching ( double quotes)
- Boolean searching (AND, OR, NOT)
- Implied Boolean Term inclusion (), term
exclusion (-)
16Web Search Engines
- Key features
- Proximity searches (NEAR, ADJ, BEFORE, AFTER)
- Use of parentheses to group search terms
- Truncation searches (industr)
- Field-specific searching (Title, URL, Text)
- Natural language queries (Why is the sky blue?)
- Relevance ranking of search results
- Number of search terms
- Number of times each search term occurs
- Proximity of search terms
- Location of search terms (title, text)
17Web Search Engines
- Key features
- Sub-searching (searching within retrieved
records) - Case sensitivity
- Limit by language
- Limit by age of documents
- Limit by audio, video and image type
- Translation of search results (title and
description) - Limit by domain, host
18Web Search Engines...
- Examples
- Fastsearch (alltheweb.com)
- Altavista (www.altavista.com)
- Google (www.google.com)
- Northernlight (www.northernlight.com)
- HotBot (www.hotbot.com)
- Excite (www.excite.com)
- New search engines (October 2001)
- Teoma (http//www.teoma.com/)
- Wisenut (http//www.wisenut.com/)
19Web Search Engines...
- Specialty search engines
- Country-specific search engines
- www.khoj.com
- www.123india.com
- Subject-specific search engines
- Chemfinder (www.chemfinder.com)
- Engineering Resources Online (www.er-online.co.uk)
- MathSearch (www.maths.usyd.edu.au8000/MathSearch.
html) - Netpart Company site locator (www.websense.com/lo
cator.cfm) - World Trade Locator (www.intl-tradenet.com)
- Resource-specific search engines
- Patents (www.uspto.gov)
- Journal articles (www.findarticles.com)
20Web Search Engines...
- Example tutorials
- Lets look at couple of tutorials which present
and compare the features of major search engines
(use local copies if cannot connect) - Finding Information on the Internet A tutorial
(www.lib.berkeley.edu/TeachingLib/Guides/Internet/
FindInfo.html) - How to search the world wide web A tutorial for
beginners and non-experts. David P. Habib and
Robert L. Balliot. September, 1999
(204.17.98.73/midlib/tutor.htm)
21Web Search Engines...
- Advantages of search engines
- Best suited for complex keyword/ concept searches
- Control over search search terms can be combined
as required - Searches can be limited to period of time,
fields, source type,etc. - Currency of information, made possible by regular
addition by web spiders - Exhaustive information can be retrieved (with
lots of patience!) - Disadvantages
- Time consuming
- False positives
- Search engines vary in terms of search
techniques/ syntax - Dead links, redundant links (same document gets
displayed) - Spamming (salting of pages)
- Higher ranking of paying sites
22Web Search Engines...
- Limitations of web search engines
- Poor retrieval effectiveness (relevance) as
little vocabulary control is exercised by web
site developers and the index engines - Different search engines return different search
results due to the variation in indexing and
search process (40 non-overlap) - None of the search engines come close to indexing
the entire web, much less the entire Internet.
Content not indexed - PDF documents
- Content that requires log in
- Databases searched using CGI programs
- Web content on intranets behind fire walls
23Web Search Engines...
- Limitations of web search engines
- Limited support for field-based searching
(limitation lies mostly with HTML itself) - Poor support for search using META tag fields
24Web Search Engines...
- Demonstration of search engines
- Fastsearch (www.alltheweb.com)
- Altavista (www.altavista.com)
- Google (www.google.com)
- Northernlight (www.northernlight.com)
25Meta Search Tools
- Exhaustive searches require use of more than one
web search engine and familiarity with their
search interface - Meta search tools provide a common interface and
conduct searches in many search engines
simultaneously and return results in a uniform
format - Do not gather web pages, build indexes, accept
URL additions, classify or review web sites - Some features supported
- Duplicate hits removal
- Rank results
- Selection of search engine(s) to be used
26Meta Search Tools...
Search using multiple search engines
Search using a meta search tool
27Meta Search Tools...
- Meta search tools (remote sites)
- MetaCrawler (www.metacrawler.com)
- Ixquick (www.ixquick.com)
- Dogpile (www.dogpile.com)
- ProFusion (www.profusion.com)
- Meta search tools (local, installable software)
- Copernic (www.copernic.com)
- SearchPad (www.searchpad.com)
- LexiBot (www.completeplanet.com)
28Meta Search Tools...
- Advantages
- Query can be run across multiple search engines
- User needs to learn only the search interface of
the meta search tool - Better results retrieves top-ranking pages from
individual search engines - Disadvantages
- Unique features of individual search engines is
lost - Not exhaustive use only top results returned by
search engines
29Meta Search Tools...
- When to use meta search tools?
- Need to be used cautiously
- Good for simple searches, particularly if search
terms are distinctive or unique - Good for testing with a few keywords and find
which individual search engine returns good
results - Good for quick and dirty searching if you are
in a hurry and want to find a few relevant sites
quickly - For complex searches, involving many search
terms, Boolean logic, etc., it is better to use
individual search engines
30Meta Search Tools...
- Demonstration
- MetaCrawler (www.metacrawler.com)
- Ixquick (www.ixquick.com)
- Dogpile (www.dogpile.com)
- ProFusion (www.profusion.com)
31People Finding Tools
- Register names and addresses and find e-mail
addresses - Examples
- Bigfoot (www.bigfoot.com)
- Peoplesearch (www.peoplesearch.net)
- Ahoy (ahoy.cs.washington.edu6060/)
- Four11 (www.four11.com)
- Switchboard (www.switchboard.com)
- Whowhere (www.whowhere.lycos.com/)
- Most search engines also support people searches
(e.g. Altavista, Google, Yahoo!)
32People Finding Tools
- Using people finding tools
- Person should have registered in the tool(s)
- Searcher should know both surname and first name,
else too many names will be retrieved - Bias for U.S. based people
- Often, required e-mail cannot be retrieved
through these tools - Alternatively, any search engine may be used
(phrase search using persons name) - If persons affiliation is known, Yahoo!
Directory may be used to locate the institution
and e-mail
33Web Search Strategies
- Search steps
- Analyze the search topic and identify the search
terms (both inclusion and exclusion), their
synonyms (if any), phrases and Boolean relations
(if any) - Select the search tool(s) to be used (meta search
engine, directory, general search engine,
specialty search engine) - Translate the search terms into search statements
of the selected search engine - Perform search
- Refine the search based on results
- Visit the actual site(s) and save the information
(using File-Save option of the browser)
34Web Search Strategies
- Tips for effective web searching
- Broad or general concept searches start with
directory-based services (want a few highly
relevant sites for a broad topic) - Highly specific or topics with unique terms/ many
concepts use the search tools - Go through the help pages of search tools
carefully - Gather sufficient information about the search
topic before searching - Spelling variations, synonyms, broader and
narrower terms - Use specific keywords, rare/unusual words are
better than common ones
35Web Search Strategies...
- Tips for effective web searching
- Prefer phrase adjacency searching to Boolean
(stuffed animal than stuffed and animal) - Use as many synonyms as possible - search engines
use statistical retrieval methods and produce
better results with more query words - Avoid use of very common words (e.g., computer)
- Enter search terms in lower case. Use upper case
to force exact match (e.g. Light Combat
Aircraft, LCA) - Use More like this option, if supported by the
search engine (e.g. Excite, Google)
36Web Search Strategies...
- Tips for effective web searching
- Repeat the search by varying search terms and
their combinations try this on different search
tools - Enter most important terms first - some search
tools are sensitive to word order - Use the NOT operator to exclude unwanted pages
(e.g. bio-data, resumes, courses) - Go through at least 5 pages of search results
before giving up the scan - Select 2 or 3 search tools and master the search
techniques
37Sample Web Searches
- Companies dealing with polymers
- Do not use search engines (too many irrelevant
hits) - Use directory sources (e.g. www.yahoo.com)
- Follow the categories
- Business and Economy
- Business-to-Business
- Chemicals
- Do a sub-search on Polymers
- Use specialty search engines (e.g.
www.bizweb.com)
38Sample Web Searches...
- Web pages related to Light Combat Aircraft
- Keywords are unique
- Use Search Tools (e.g. www.altavista.com)
- Search for Light Combat Aircraft (phrase search
in simple search interface) - Use of double quotes will force the search engine
to consider the set of keywords as a phrase - Search can be limited to specific dates
- More refined search in advanced search interface
Light Combat Aircraft AND India
39Sample Web Searches...
- Web sources related to simulation or modeling of
activated sludge process - This is a concept search - search tools are
better - Using Altavista, the query may be submitted as
- (simulat OR model) AND activated sludge
process - Note use of to cover word variations like
simulated, simulate, models, etc. - Note use of phrase form for activated sludge
process
40Guides to Search Tools
- www.beaucoup.com (guide to 2,000 search engines,
indices and directories) - www.searchpower.com (a very comprehensive search
engine directory - claims over 16,000 search
engine listings!) - www.123go.com/drw/search/search.htm (Dr.
Websters Big Page of Search Engines ) - www.finderseeker.com (The search engine of search
engines) - www.virtualfreesites.com (Over 1,000 specialised
search engines)
41Keeping Current
- AskScott (www.askscott.com) Provides a very
comprehensive tutorial on search engines - SearchEngineWatch (www.searchenginewatch.com) The
site offeres information about new developments
in search engines and provides reviews and
tutorials. - Botspot (www.botspot.com) Collection and guide
to variety of bots (intelligent agents)
42Thank You!
raja_at_ncsi.iisc.ernet.in