Title: Indexing and Search Engines
1Indexing and Search Engines for the Intranets
By Suvarsha Walters (suvarsha_at_ncsi.iisc.ernet.in)
2Overview
- Introduction
- Types of Searching
- Parts of a Local Search Engine
- Working of a Local Search Engine
- Choosing a search engine
- List of some Intranet Search Engines
- Conclusion
- References
3Introduction Searching and Search Engines
- A good site is one in which content is king
- A lot of information makes a site huge, complex
and navigation difficult - Search is the user's lifeline for mastering
complex websites - Search feature is essential for users when they
revisit a site, looking for specific info
4Introduction Searching and Search Engines
- Search is also users' escape hatch when they are
stuck in navigation. When they can't find a
reasonable place to go next, they often turn to
the site's search function. - This is why site search is an important feature
of any site of reasonably size
5Types of Searching
- A search can be of various types
- Internet Search Search Engines like Yahoo,
Infoseek crawl the web gathering web pages or
info on web pages, index them and retrieve them
when the specific term is found - Database search Databases store their
information neatly organized into fields. A
search Interface is provided for this.
6Types of Searching
- With databases one can set up complex queries to
find the search words in all applicable fields. - But this makes them slower to respond, requires
more memory, and requires programming. - Database search is not oriented towards text
search and relevance ranking they are great for
listing of inventory or directory of the institute
7Types of Searching
- Intranet search Search is restricted to a site
or a group of sites. - Text search engines store this information in
one index and can find words in any field for a
record. - Many high-end search engines can also store
field information, so searches can be limited to
a specific field as well.
8Parts of a Local Site Search Tool
- Search Indexer
- The program that recognizes and creates an index
of all the documents on the site. The index is
stored in a file called as the index file, where
the search engine will find them. - Search Index File
- Created by the Search Indexer program, this file
stores the data from the site in a special index
or database, designed for very quick
access.
9Parts of a Local Site Search Tool
- Search Form
- HTML interface to the site search tool, provided
for visitors to enter their search terms and
specify their preferences for the search - Search Engine
- The program (CGI, server module or separate
server) that accepts the request from the form or
URL, searches the index, and returns the results
page to the server
10Parts of a Local Site Search Tool
- Results Listing
- HTML page listing the pages which contain text
matching the search term(s). These are sorted in
some kind of relevance order, with the closest
match at the top. The format of this is often
defined by the site search tool, but may be
modified in some ways.
11Working of a Local Search Engine
12Types of Search Engines
- CGI Programs
- The Common Gateway Interface (CGI) standard
allows a web server to communicate with external
programs. CGI Programs run as Search Engines. - Server Plug-Ins
- For better data interchange, less overhead and
more flexibility, web server companies have
defined APIs (Application Programmer Interfaces)
to their servers. This allows third-party
developers to create modules for the servers
which run inside the server process
13Types of Search Engines
- Search Servers
- Some search engines run as separate servers. The
form data is passed as part of the URL, just like
a URL, but the search engine application runs as
a separate HTTP server on a different machine.
This reduces the load on the main web server. - Remote Searching
- It is also possible to outsource search to a
remote site search service. The indexer and
search engine run on the remote server. using a
web indexing robot, or spider, they follow links
on the site and read the pages, then store every
word in the index file on that server. When it
comes time to search, the form on the site Web
page send a message to the remote search engine
which sends results back to the site.
14Choosing a Site Search Tool
- Technical Considerations
- Indexing Features
- Searching Capabilities
- Results display
- Costs, licensing and registration requirements
- Unique features (if any)
15Features of search enginesTechnical
Considerations
16Features of search enginesIndexing features
17Features of search enginesIndexing features
18Features of search enginesSearch Capabilities
19Features of search enginesSearching features
20Features of search engineResults Display
21Choosing the right search engine
- Checklist of factors to be considered while
selecting the search engine - Size of the website
- Technical expertise available (local and/or from
the supplier / developer) - System platforms available
- Information sources and services to be supported
- Document collection type, volume (now and in
future) - Indexing, search and display requirements
22Choosing the right search engine
- Checklist of factors to be considered while
selecting the search engine - User community to be served
- Differentiate between the need for indexing the
web site pages and the need for indexing
databases / document collections (text,
bibliographic, DBMS, etc.) - Support for the concept of a "record" by the
search engine. - Support for structured fields and metadata
- Cost
23Choosing the right search engine
- Steps in the selection and procurement of search
engines - - Conduct a needs analysis.
- - Talk to other libraries
- - Attend trade shows and talk to vendors
- - Read the literature that reviews search
engines. - - Compile a list of possible products.
- .
-
24Choosing the right search engine
- Steps in the selection and procurement of search
engines - Compare the functionality of each product to the
criteria you developed through needs analysis - Narrow your list down to three possible products.
- Spend additional time learning about each
product. - Invite the vendors in for demonstrations.
- Ask for references and follow up with each
reference - Select product and implement.
- Follow up with end users.
- Continue an on going review with end users.
25Choosing the right search engine
- Some Suggestions
- The search system development or selection should
be based primarily on the local needs - Consider using freeware search engines, if your
requirements are met by these. - For large, highly developed intranet sites, you
may like to consider commercial search engines - Consider if the webserver you are using supports
indexing and search, and if this is adequate for
you.
26Choosing the right search engine
- The IT Professionals should make an effort to
keep themselves abreast of the current web
technologies - The features available within a tool should be
made use of properly to get maximum benefits - Carefully consider interrelations between the
three major components document resources, users
and the search engines.
27Conclusion
- Since search is such a common activity, the
search box should appear on every page of your
web site. - The initial target of the basic search should be
the contents of the entire web site. - The basic search should allow for Boolean
commands ("and," "or"), although this does not
need to be explained.
28Conclusion
- A quality search process begins with quality
metadata. It's that old principle Garbage in,
garbage out. Metadata is about giving a structure
the the content. For example, if every document
is assigned keywords or or classified by
Geography, the reader will get a much more
accurate return from his or her search. - Search engines are the mortar of the Intranet.
As important as they are, their implementation
must be given high priority with the necessary
time allotted for research and development
29List of some (Free) Intranet Indexing Tools (for
Windows)
- Microsoft Index Server
- http//www.microsoft.com/ntserver/web/exec/feature
/IndexServerSummary.asp - DeepSearch
- http//www.namo.com/products/ds3/info/index.html
- Harvest
- http//www.tardis.ed.ac.uk/harvest/
- HomepageSearchEngine
- http//www.HomepageSearchEngine.com/
- Swish-E
- http//www.webaugur.com/wares/swish
30List of some (Free) Intranet Indexing Tools (for
Windows)
- PLWeb Turbo (PLS / AOL)
- http//www.pls.com/plweb.htm
- Namazu
- http//www.namazu.org/
- Oracle interMedia
- http//www.oracle.com/intermedia/
- HomepageSearchEngine
- http//www.HomepageSearchEngine.com/
- Sharewire SiteSearch http//www.sharewire.com/nav
/Products/sitesearch.shtml
31Free and commercial search engines
- For HTML and text files (web site indexing and
file/directory level indexing) - SWISH-E (sunsite.berkeley.edu/SWISH-E/)
- ht//Dig (htdig.sdsu.edu/)
- Excite For Web Servers (www.excite.com/navigate/)
- WebGlimpse (glimpse.cs.arizona.edu/webglimpse/
-
- For structured/formatted data
- - MYSQL (www.mysql.com)
-
32Free and commercial search engines
- Commercial search engines
- AltaVista (www.altavista.digital.com/)
- Fulcrum (www.fulcrum.com/ )
- Infoseek (software.infoseek.com)
- Open Text (www.opentext.com/)
- Oracle (www.oracle.com/)
- PLS (www.pls.com/)
- Verity (www.verity.com/)
33References
- Practical Example of Choosing a Site Search Tool
- University of Pennsylvania
- http//www.upenn.edu/computing/web/webteam/rnd/sea
rch.html - Search Engine Watch Page http//www.searchenginewa
tch.com/resources/software.html - Web Admin's Guide to Site Search Tool
- http//www.searchtools.com/guide/index.html
- List of Search Tools
- http//www.searchtools.com/tools/tools.html
- Review of Remote Search Services
- http//www.searchtools.com/reviews/remotesearch/