Indexing and Search Engines - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Indexing and Search Engines

Description:

List of some Intranet Search Engines. Conclusion. References ... Search Engine ... Compile a list of possible products. Choosing the right search engine ... – PowerPoint PPT presentation

Number of Views:206
Avg rating:3.0/5.0
Slides: 34
Provided by: NCSI
Category:

less

Transcript and Presenter's Notes

Title: Indexing and Search Engines


1
Indexing and Search Engines for the Intranets
By Suvarsha Walters (suvarsha_at_ncsi.iisc.ernet.in)
2
Overview
  • Introduction
  • Types of Searching
  • Parts of a Local Search Engine
  • Working of a Local Search Engine
  • Choosing a search engine
  • List of some Intranet Search Engines
  • Conclusion
  • References

3
Introduction Searching and Search Engines
  • A good site is one in which content is king
  • A lot of information makes a site huge, complex
    and navigation difficult
  • Search is the user's lifeline for mastering
    complex websites
  • Search feature is essential for users when they
    revisit a site, looking for specific info

4
Introduction Searching and Search Engines
  • Search is also users' escape hatch when they are
    stuck in navigation. When they can't find a
    reasonable place to go next, they often turn to
    the site's search function.
  • This is why site search is an important feature
    of any site of reasonably size

5
Types of Searching
  • A search can be of various types
  • Internet Search Search Engines like Yahoo,
    Infoseek crawl the web gathering web pages or
    info on web pages, index them and retrieve them
    when the specific term is found
  • Database search Databases store their
    information neatly organized into fields. A
    search Interface is provided for this.

6
Types of Searching
  • With databases one can set up complex queries to
    find the search words in all applicable fields.
  • But this makes them slower to respond, requires
    more memory, and requires programming.
  • Database search is not oriented towards text
    search and relevance ranking they are great for
    listing of inventory or directory of the institute

7
Types of Searching
  • Intranet search Search is restricted to a site
    or a group of sites.
  • Text search engines store this information in
    one index and can find words in any field for a
    record.
  • Many high-end search engines can also store
    field information, so searches can be limited to
    a specific field as well.

8
Parts of a Local Site Search Tool
  • Search Indexer
  • The program that recognizes and creates an index
    of all the documents on the site. The index is
    stored in a file called as the index file, where
    the search engine will find them.
  • Search Index File
  • Created by the Search Indexer program, this file
    stores the data from the site in a special index
    or database, designed for very quick
    access.

9
Parts of a Local Site Search Tool
  • Search Form
  • HTML interface to the site search tool, provided
    for visitors to enter their search terms and
    specify their preferences for the search
  • Search Engine
  • The program (CGI, server module or separate
    server) that accepts the request from the form or
    URL, searches the index, and returns the results
    page to the server

10
Parts of a Local Site Search Tool
  • Results Listing
  • HTML page listing the pages which contain text
    matching the search term(s). These are sorted in
    some kind of relevance order, with the closest
    match at the top. The format of this is often
    defined by the site search tool, but may be
    modified in some ways.

11
Working of a Local Search Engine
12
Types of Search Engines
  • CGI Programs
  • The Common Gateway Interface (CGI) standard
    allows a web server to communicate with external
    programs. CGI Programs run as Search Engines.
  • Server Plug-Ins
  • For better data interchange, less overhead and
    more flexibility, web server companies have
    defined APIs (Application Programmer Interfaces)
    to their servers. This allows third-party
    developers to create modules for the servers
    which run inside the server process

13
Types of Search Engines
  • Search Servers
  • Some search engines run as separate servers. The
    form data is passed as part of the URL, just like
    a URL, but the search engine application runs as
    a separate HTTP server on a different machine.
    This reduces the load on the main web server.
  • Remote Searching
  • It is also possible to outsource search to a
    remote site search service. The indexer and
    search engine run on the remote server. using a
    web indexing robot, or spider, they follow links
    on the site and read the pages, then store every
    word in the index file on that server. When it
    comes time to search, the form on the site Web
    page send a message to the remote search engine
    which sends results back to the site.

14
Choosing a Site Search Tool
  • Technical Considerations
  • Indexing Features
  • Searching Capabilities
  • Results display
  • Costs, licensing and registration requirements
  • Unique features (if any)

15
Features of search enginesTechnical
Considerations
16
Features of search enginesIndexing features
17
Features of search enginesIndexing features
18
Features of search enginesSearch Capabilities
19
Features of search enginesSearching features
20
Features of search engineResults Display
21
Choosing the right search engine
  • Checklist of factors to be considered while
    selecting the search engine
  • Size of the website
  • Technical expertise available (local and/or from
    the supplier / developer)
  • System platforms available
  • Information sources and services to be supported
  • Document collection type, volume (now and in
    future)
  • Indexing, search and display requirements

22
Choosing the right search engine
  • Checklist of factors to be considered while
    selecting the search engine
  • User community to be served
  • Differentiate between the need for indexing the
    web site pages and the need for indexing
    databases / document collections (text,
    bibliographic, DBMS, etc.)
  • Support for the concept of a "record" by the
    search engine.
  • Support for structured fields and metadata
  • Cost

23
Choosing the right search engine
  • Steps in the selection and procurement of search
    engines
  • - Conduct a needs analysis.
  • - Talk to other libraries
  • - Attend trade shows and talk to vendors
  • - Read the literature that reviews search
    engines.
  • - Compile a list of possible products.
  • .

24
Choosing the right search engine
  • Steps in the selection and procurement of search
    engines
  • Compare the functionality of each product to the
    criteria you developed through needs analysis
  • Narrow your list down to three possible products.
  • Spend additional time learning about each
    product.
  • Invite the vendors in for demonstrations.
  • Ask for references and follow up with each
    reference
  • Select product and implement.
  • Follow up with end users.
  • Continue an on going review with end users.

25
Choosing the right search engine
  • Some Suggestions
  • The search system development or selection should
    be based primarily on the local needs
  • Consider using freeware search engines, if your
    requirements are met by these.
  • For large, highly developed intranet sites, you
    may like to consider commercial search engines
  • Consider if the webserver you are using supports
    indexing and search, and if this is adequate for
    you.

26
Choosing the right search engine
  • The IT Professionals should make an effort to
    keep themselves abreast of the current web
    technologies
  • The features available within a tool should be
    made use of properly to get maximum benefits
  • Carefully consider interrelations between the
    three major components document resources, users
    and the search engines.

27
Conclusion
  • Since search is such a common activity, the
    search box should appear on every page of your
    web site.
  • The initial target of the basic search should be
    the contents of the entire web site.
  • The basic search should allow for Boolean
    commands ("and," "or"), although this does not
    need to be explained.

28
Conclusion
  • A quality search process begins with quality
    metadata. It's that old principle Garbage in,
    garbage out. Metadata is about giving a structure
    the the content. For example, if every document
    is assigned keywords or or classified by
    Geography, the reader will get a much more
    accurate return from his or her search.
  • Search engines are the mortar of the Intranet.
    As important as they are, their implementation
    must be given high priority with the necessary
    time allotted for research and development

29
List of some (Free) Intranet Indexing Tools (for
Windows)
  • Microsoft Index Server
  • http//www.microsoft.com/ntserver/web/exec/feature
    /IndexServerSummary.asp
  • DeepSearch
  • http//www.namo.com/products/ds3/info/index.html
  • Harvest
  • http//www.tardis.ed.ac.uk/harvest/
  • HomepageSearchEngine
  • http//www.HomepageSearchEngine.com/
  • Swish-E
  • http//www.webaugur.com/wares/swish

30
List of some (Free) Intranet Indexing Tools (for
Windows)
  • PLWeb Turbo (PLS / AOL)
  • http//www.pls.com/plweb.htm
  • Namazu
  • http//www.namazu.org/
  • Oracle interMedia
  • http//www.oracle.com/intermedia/
  • HomepageSearchEngine
  • http//www.HomepageSearchEngine.com/
  • Sharewire SiteSearch http//www.sharewire.com/nav
    /Products/sitesearch.shtml

31
Free and commercial search engines
  • For HTML and text files (web site indexing and
    file/directory level indexing)
  • SWISH-E (sunsite.berkeley.edu/SWISH-E/)
  • ht//Dig (htdig.sdsu.edu/)
  • Excite For Web Servers (www.excite.com/navigate/)
  • WebGlimpse (glimpse.cs.arizona.edu/webglimpse/
  • For structured/formatted data
  • - MYSQL (www.mysql.com)

32
Free and commercial search engines
  • Commercial search engines
  • AltaVista (www.altavista.digital.com/)
  • Fulcrum (www.fulcrum.com/ )
  • Infoseek (software.infoseek.com)
  • Open Text (www.opentext.com/)
  • Oracle (www.oracle.com/)
  • PLS (www.pls.com/)
  • Verity (www.verity.com/)

33
References
  • Practical Example of Choosing a Site Search Tool
  • University of Pennsylvania
  • http//www.upenn.edu/computing/web/webteam/rnd/sea
    rch.html
  • Search Engine Watch Page http//www.searchenginewa
    tch.com/resources/software.html
  • Web Admin's Guide to Site Search Tool
  • http//www.searchtools.com/guide/index.html
  • List of Search Tools
  • http//www.searchtools.com/tools/tools.html
  • Review of Remote Search Services
  • http//www.searchtools.com/reviews/remotesearch/
Write a Comment
User Comments (0)
About PowerShow.com