Effective Web Searching - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Effective Web Searching

Description:

How we use libraries and IR systems? Organization of the web ... MetaCrawler (www.metacrawler.com) Ixquick (www.ixquick.com) Dogpile (www.dogpile.com) ... – PowerPoint PPT presentation

Number of Views:360
Avg rating:3.0/5.0
Slides: 41
Provided by: tbraja6
Category:

less

Transcript and Presenter's Notes

Title: Effective Web Searching


1
Effective Web Searching
T.B. RajashekarNational Centre for Science
InformationIndian Institute of ScienceBangalore
- 560 012 (E-Mail raja_at_ncsi.iisc.ernet.in)
2
Effective Web Searching
  • How we use libraries and IR systems?
  • Organization of the web
  • Accessing web-based information key problems
  • Tools for Information retrieval on the web
  • Directories/ guides
  • Search engines
  • Meta search tools
  • People finding tools
  • Strategies for web searching
  • Guides to search tools
  • Keeping current

3
How we use Libraries and IR Systems?
  • Libraries
  • How the documents are organised document types,
    classification system used
  • Access tools catalogues, indexes, automated
    catalogues, access points
  • Our information need (search topic) translate
    these in terms of organization scheme employed by
    the library
  • Information Retrieval systems (e.g. bibliographic
    databases)
  • How the database is organised, record content,
    fields, search elements
  • Indexing and query language, thesaurus, Boolean
    logic, truncation, etc.
  • Our information need formulated as a search
    expression using the query language

4
Organization of the Web
  • Adopt same strategy while searching the Web
  • Understand web information architecture
  • Understand the information access tools and the
    information access mechanisms they provide
  • Represent our query in terms of mechanisms
    supported by these tools and search the web
  • Web sites
  • How the content is organised (document types,
    structuring and navigation)
  • Searchable/indexable and non searchable/indexable
    content
  • Structure of web pages
  • Meta tags, page attributes (properties)

5
Organization of the Web...
  • Web is the totality of web pages stored on web
    servers
  • Spectacular growth in web-based information
    sources and services
  • Education and research
  • Entertainment
  • Business and commerce
  • Personal home pages
  • Estimated to contain over 1 billion indexable web
    pages
  • Doubling each year
  • Over 80 million web sites

6
Accessing Web-based Information Key Problems
  • Identification of sources (documents)
  • No central card catalog
  • Most web pages are not indexed in standard
    vocabulary, unlike library catalogues or journal
    article indexes
  • Impossible to reach all related pages/ sites
    directly
  • Need to use intermediate, resource finding tools

7
Information Retrieval on the Web
  • How to find relevant documents on the Web?
  • Informal
  • Browsing (and book marking for later use)
  • Friends
  • Print sources
  • Discussion forums (mailing lists)
  • Current awareness services (e.g. Scout Report)
  • Guessing web site addresses!
  • Formal (using information finding tools)
  • Web directories/ guides
  • Web search engines
  • Meta-search tools
  • Specialty search engines

8
Web Directories/ Guides
  • Also called as virtual libraries and Internet
    resource catalogues
  • Organised collection of descriptions and links to
    Internet sources
  • Organisation by subject categories
    (hierarchical) by resource type (patents,
    e-journals, institutes, etc.)
  • Most use human experts for source selection,
    indexing and classification
  • Some include reviews/ ratings of listed sites

9
Web Directories/ Guides...
  • Examples of general web directories
  • Librarians Index to the Internet (www.lii.org)
  • Britannicas Webs best sites
    (www.britannica.com)
  • Infomine (infomine.ucr.edu)
  • Scout Report Signpost (www.signpost.org)
  • BUBL link (bubl.ac.uk/link)
  • Yahoo (www.yahoo.com)
  • Magellan (www.mckinley.com)
  • Galaxy (www.galaxy.com)
  • Looksmart (www.looksmart.com)
  • Snap (www.snap.com)

10
Web Directories/ Guides...
  • Guides to directories
  • WWW Virtual Library (www.vlib.org)
  • Argus Clearinghouse (www.clearinghouse.net)
  • Gogettem (www.gogettem.com/)
  • Subject-specific guides (subject gateways)
  • Edinburgh Engineering Virtual Library
    (www.eevl.ac.uk)
  • Social Science Information Gateway (sosig.ac.uk)
  • The Internet Pilot To Physics (physicsweb.org/TIPT
    OP)
  • Chemcenter (www.acs.com)
  • Programmers Heaven (www.programmersheaven.com)
  • Resource type guides
  • Patents (www.european-patent-office.org)
  • Electronic journals (www.publist.com)

11
Web Directories/ Guides...
  • Most web directories support searching within
    categories and descriptions, in addition to
    browsing
  • Advantages
  • Access to high quality sources
  • Do not contain redundant links
  • Faster access to sources
  • Disadvantages
  • One needs to be aware of such directories/ guides
  • May not be up-to-date
  • May not be exhaustive
  • Categories (subject hierarchy) varies across
    directories

12
Web Directories/ Guides...
  • When to use web directories/ guides?
  • For broad/ general topics where keyword searching
    on search engines retrieves too many irrelevant
    sites
  • When you want a few highly relevant sites and
    intention is not exhaustive/ comprehensive search
  • When not to use web directories/ guides?
  • For concept/ keyword searches
  • Search terms are distinctive
  • Effective directory/ guide usage
  • Take advantage of the sub-search within
    categories, supported by most directories/ guides
  • Join their mailing lists for automatic updates on
    new sites

13
Web Directories/ Guides...
  • Demonstration of directories/ guides
  • Librarians Index to the Internet (www.lii.org)
  • Britannicas Webs best sites
    (www.britannica.com)
  • Scout Report Signpost (www.signpost.org)
  • BUBL link (bubl.ac.uk/link)
  • Yahoo (www.yahoo.com)
  • WWW Virtual Library (www.vlib.org)
  • Argus Clearinghouse (www.clearinghouse.net)

14
Web Search Engines
  • Just as AI journals index published literature,
    web search engines build a full-text index to web
    pages gathered from web sites and provide a
    keyword search interface to this index
  • Spider programs periodically visit web sites and
    gather the web pages for indexing
  • Also index web sites submitted by site developers
  • A brief summary of the indexed web page is also
    prepared
  • The index usually contains URLs, titles,
    headings, and other words from the HTML document

15
Web Search Engines...
  • The search engines provide a forms-based search
    interface for entering the queries
  • Support simple and advanced search interfaces
  • Search results are returned in the form of a list
    of web sites matching the query
  • Some key features supported
  • Phrase searching ( double quotes)
  • Boolean searching (AND, OR, NOT)
  • Implied Boolean Term inclusion (), term
    exclusion (-)

16
Web Search Engines
  • Key features
  • Proximity searches (NEAR, ADJ, BEFORE, AFTER)
  • Use of parentheses to group search terms
  • Truncation searches (industr)
  • Field-specific searching (Title, URL, Text)
  • Natural language queries (Why is the sky blue?)
  • Relevance ranking of search results
  • Number of search terms
  • Number of times each search term occurs
  • Proximity of search terms
  • Location of search terms (title, text)

17
Web Search Engines
  • Key features
  • Sub-searching (searching within retrieved
    records)
  • Case sensitivity
  • Limit by language
  • Limit by age of documents
  • Limit by audio, video and image type
  • Translation of search results (title and
    description)
  • Limit by domain, host

18
Web Search Engines...
  • Examples
  • Fastsearch (alltheweb.com)
  • Altavista (www.altavista.com)
  • Google (www.google.com)
  • Northernlight (www.northernlight.com)
  • HotBot (www.hotbot.com)
  • Excite (www.excite.com)
  • Lycos (www.lycos.com)
  • InfoSeek Guide (www.infoseek.com)
  • WebCrawler (www.webcrawler.com)
  • Worldwide Web Worm (www.goto.com)

19
Web Search Engines...
  • Specialty search engines
  • Country-specific search engines
  • www.khoj.com
  • www.123india.com
  • Subject-specific search engines
  • Chemfinder (www.chemfinder.com)
  • Engineering Resources Online (www.er-online.co.uk)
  • MathSearch (www.maths.usyd.edu.au8000/MathSearch.
    html)
  • Netpart Company site locator (www.websense.com/lo
    cator.cfm)
  • World Trade Locator (www.intl-tradenet.com)
  • Resource-specific search engines
  • Patents (www.uspto.gov)
  • Journal articles (www.findarticles.com)

20
Web Search Engines...
  • Advantages of search engines
  • Best suited for complex keyword/ concept searches
  • Control over search search terms can be combined
    as required
  • Searches can be limited to period of time,
    fields, source type,etc.
  • Currency of information, made possible by regular
    addition by web spiders
  • Exhaustive information can be retrieved (with
    lots of patience!)
  • Disadvantages
  • Time consuming
  • False positives
  • Search engines vary in terms of search
    techniques/ syntax
  • Dead links, redundant links (same document gets
    displayed)
  • Spamming (salting of pages)
  • Higher ranking of paying sites

21
Web Search Engines...
  • Limitations of web search engines
  • Poor retrieval effectiveness (relevance) as
    little vocabulary control is exercised by web
    site developers and the index engines
  • Different search engines return different search
    results due to the variation in indexing and
    search process (40 non-overlap)
  • None of the search engines come close to indexing
    the entire web, much less the entire Internet.
    Content not indexed
  • PDF documents
  • Content that requires log in
  • Databases searched using CGI programs
  • Web content on intranets behind fire walls

22
Web Search Engines...
  • Demonstration of search engines
  • Fastsearch (www.alltheweb.com)
  • Altavista (www.altavista.com)
  • Google (www.google.com)
  • Northernlight (www.northernlight.com)

23
Meta Search Tools
  • Exhaustive searches require use of more than one
    web search engine and familiarity with their
    search interface
  • Meta search tools provide a common interface and
    conduct searches in many search engines
    simultaneously and return results in a uniform
    format
  • Do not gather web pages, build indexes, accept
    URL additions, classify or review web sites
  • Some features supported
  • Duplicate hits removal
  • Rank results
  • Selection of search engine(s) to be used

24
Meta Search Tools...
Search using multiple search engines
Search using a meta search tool
25
Meta Search Tools...
  • Meta search tools (remote sites)
  • MetaCrawler (www.metacrawler.com)
  • Ixquick (www.ixquick.com)
  • Dogpile (www.dogpile.com)
  • ProFusion (www.profusion.com)
  • Meta search tools (local, installable software)
  • Copernic (www.copernic.com)
  • SearchPad (www.searchpad.com)
  • LexiBot (www.completeplanet.com)

26
Meta Search Tools...
  • Advantages
  • Query can be run across multiple search engines
  • User needs to learn only the search interface of
    the meta search tool
  • Better results retrieves top-ranking pages from
    individual search engines
  • Disadvantages
  • Unique features of individual search engines is
    lost
  • Not exhaustive use only top results returned by
    search engines

27
Meta Search Tools...
  • When to use meta search tools?
  • Need to be used cautiously
  • Good for simple searches, particularly if search
    terms are distinctive or unique
  • Good for testing with a few keywords and find
    which individual search engine returns good
    results
  • Good for quick and dirty searching if you are
    in a hurry and want to find a few relevant sites
    quickly
  • For complex searches, involving many search
    terms, Boolean logic, etc., it is better to use
    individual search engines

28
Meta Search Tools...
  • Demonstration
  • MetaCrawler (www.metacrawler.com)
  • Ixquick (www.ixquick.com)
  • Dogpile (www.dogpile.com)
  • ProFusion (www.profusion.com)

29
People Finding Tools
  • Register names and addresses and find e-mail
    addresses
  • Examples
  • Bigfoot (www.bigfoot.com)
  • Peoplesearch (www.peoplesearch.net)
  • Ahoy (ahoy.cs.washington.edu6060/)
  • Four11 (www.four11.com)
  • Switchboard (www.switchboard.com)
  • Whowhere (www.whowhere.lycos.com/)
  • Most search engines also support people searches
    (e.g. Altavista, Google, Yahoo!)

30
People Finding Tools
  • Using people finding tools
  • Person should have registered in the tool(s)
  • Searcher should know both surname and first name,
    else too many names will be retrieved
  • Bias for U.S. based people
  • Often, required e-mail cannot be retrieved
    through these tools
  • Alternatively, any search engine may be used
    (phrase search using persons name)
  • If persons affiliation is known, Yahoo!
    Directory may be used to locate the institution
    and e-mail

31
Web Search Strategies
  • Search steps
  • Analyze the search topic and identify the search
    terms (both inclusion and exclusion), their
    synonyms (if any), phrases and Boolean relations
    (if any)
  • Select the search tool(s) to be used (meta search
    engine, directory, general search engine,
    specialty search engine)
  • Translate the search terms into search statements
    of the selected search engine
  • Perform search
  • Refine the search based on results
  • Visit the actual site(s) and save the information
    (using File-Save option of the browser)

32
Web Search Strategies
  • Tips for effective web searching
  • Broad or general concept searches start with
    directory-based services (want a few highly
    relevant sites for a broad topic)
  • Highly specific or topics with unique terms/ many
    concepts use the search tools
  • Go through the help pages of search tools
    carefully
  • Gather sufficient information about the search
    topic before searching
  • Spelling variations, synonyms, broader and
    narrower terms
  • Use specific keywords, rare/unusual words are
    better than common ones

33
Web Search Strategies...
  • Tips for effective web searching
  • Prefer phrase adjacency searching to Boolean
    (stuffed animal than stuffed and animal)
  • Use as many synonyms as possible - search engines
    use statistical retrieval methods and produce
    better results with more query words
  • Avoid use of very common words (e.g., computer)
  • Enter search terms in lower case. Use upper case
    to force exact match (e.g. Light Combat
    Aircraft, LCA)
  • Use More like this option, if supported by the
    search engine (e.g. Excite, Google)

34
Web Search Strategies...
  • Tips for effective web searching
  • Repeat the search by varying search terms and
    their combinations try this on different search
    tools
  • Enter most important terms first - some search
    tools are sensitive to word order
  • Use the NOT operator to exclude unwanted pages
    (e.g. bio-data, resumes, courses)
  • Go through at least 5 pages of search results
    before giving up the scan
  • Select 2 or 3 search tools and master the search
    techniques

35
Sample Web Searches
  • Companies dealing with polymers
  • Do not use search engines (too many irrelevant
    hits)
  • Use directory sources (e.g. www.yahoo.com)
  • Follow the categories
  • Business and Economy
  • Business-to-Business
  • Chemicals
  • Do a sub-search on Polymers
  • Use specialty search engines (e.g.
    www.bizweb.com)

36
Sample Web Searches...
  • Web pages related to Light Combat Aircraft
  • Keywords are unique
  • Use Search Tools (e.g. www.altavista.com)
  • Search for Light Combat Aircraft (phrase search
    in simple search interface)
  • Use of double quotes will force the search engine
    to consider the set of keywords as a phrase
  • Search can be limited to specific dates
  • More refined search in advanced search interface
    Light Combat Aircraft AND India

37
Sample Web Searches...
  • Web sources related to simulation or modeling of
    activated sludge process
  • This is a concept search - search tools are
    better
  • Using Altavista, the query may be submitted as
  • (simulat OR model) AND activated sludge
    process
  • Note use of to cover word variations like
    simulated, simulate, models, etc.
  • Note use of phrase form for activated sludge
    process

38
Guides to Search Tools
  • www.beaucoup.com (guide to 2,000 search engines,
    indices and directories)
  • www.searchpower.com (a very comprehensive search
    engine directory - claims over 16,000 search
    engine listings!)
  • www.123go.com/drw/search/search.htm (Dr.
    Websters Big Page of Search Engines )
  • www.finderseeker.com (The search engine of search
    engines)
  • www.virtualfreesites.com (Over 1,000 specialised
    search engines)

39
Keeping Current
  • AskScott (www.askscott.com) Provides a very
    comprehensive tutorial on search engines
  • SearchEngineWatch (www.searchenginewatch.com) The
    site offeres information about new developments
    in search engines and provides reviews and
    tutorials.
  • Botspot (www.botspot.com) Collection and guide
    to variety of bots (intelligent agents)

40
Thank You!
raja_at_ncsi.iisc.ernet.in
Write a Comment
User Comments (0)
About PowerShow.com