Search Engines for Intranets - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Search Engines for Intranets

Description:

Features of search engines. Choosing the right search engine ... The features available within a tool should be made use of properly to get maximum benefits ... – PowerPoint PPT presentation

Number of Views:81

Avg rating:3.0/5.0

Slides: 34

Provided by: drtbraj

Category:

more less

Transcript and Presenter's Notes

Title: Search Engines for Intranets

1
Search Engines for Intranets

Types of search engines
How search engines work?
Features of search engines
Choosing the right search engine
Free and commercial search engines
Demo of ht//dig and mg

2
Types of Search Engines

Internet Search Engines
Crawl, index and search the entire Internet.
Eg.Altavista, Lycos, Infoseek
Intranet Search Engines
Crawl and index internal web servers and/or
portions of these servers to create custom,
searchable index of the documents and data housed
on the servers. Eg. Ht//dig, Swish
Website indexing (e.g. Library website)
Indexing textual databases (e.g. bibliographic
and full text files)

3
Internet search engines
4
Intranet search engine
5
Types of Search Engines

Intranet search engines are unique from Internet
search engines in the following ways
- Often provide indexing for many document
formats such as PDF, word processing, spread
sheets, databases, graphics
- The indexing process is probably deeper
than its Internet counterpart

6
Types of Search Engines

Why Search Engines for Information
professionals?
Knowledge about indexing and searching process
helps in implementation and evaluation of
intranet search engines
They need to familiarise themselves with the
products available and the issues surrounding
their selection, implementation and use

7
Types of Search Engines

Why Search Engines for Information
professionals?
In-depth knowledge of searching techniques,
including use of controlled vocabulary, Boolean
operators, proximity operators, and relevancy
ranking, is necessary for evaluation
An understanding and experience with standard
indexing practices and parameters can also
ensure that the data contained in the various
indexes built on a corporate intranet will
facilitate accurate and efficient data retrieval

8
How search engines work?

Intranet search engines operate in a manner
similar to information retrieval systems (Fig 1)
Components of a search engine The Gatherer,
Indexer and the Search engine (Fig 2)
Gatherer Gatherer or Crawler, gathers content
descriptors from the document collection. In case
of html files it follows links to other pages
within the site. This is called a site being
"spidered" or "crawled." In case of remote
indexing the gatherer returns to the site on a
regular basis.

9
Fig. 1
10
Fig 2
11
How search engines work?

Indexer
Everything the gatherer finds goes into the
second part of a search engine, the index. The
index, also called as the catalog, contains all
the descriptors that the gatherer finds.
Search engine
This is the program that sifts through the
millions of descriptors recorded in the index to
find matches to a search. They also support free
text indexing and relevance ranking. This process
is shown in Fig 3a and Fig 3b

12
Fig 3a
13
Fig 3b
14
Features of search engines

Technical functionality
Indexing features
Search features
Results display
Costs, licensing and registration requirements
Unique features (if any)

15
Features of search enginesTechnical
functionality
16
Features of search enginesIndexing features
17
Features of search enginesIndexing features
18
Features of search enginesSearching features
19
Features of search enginesSearching features
20
Features of search engineResults Display
21
Choosing the right search engine

Checklist of factors to be considered while
selecting the search engine
Size of the website
Technical expertise available (local and/or from
the supplier / developer)
System platforms available
Information sources and services to be supported
Document collection type, volume (now and in
future)
Indexing, search and display requirements

22
Choosing the right search engine

Checklist of factors to be considered while
selecting the search engine
User community to be served
Differentiate between the need for indexing the
web site pages and the need for indexing
databases / document collections (text,
bibliographic, DBMS, etc.)
Support for the concept of a "record" by the
search engine.
Support for structured fields and metadata
Cost

23
Choosing the right search engine

Steps in the selection and procurement of search
engines
- Conduct a needs analysis.
- Talk to other libraries
- Attend trade shows and talk to vendors
- Read the literature that reviews search
engines.
- Compile a list of possible products.
.

24
Choosing the right search engine

Steps in the selection and procurement of search
engines
Compare the functionality of each product to the
criteria you developed through needs analysis
Narrow your list down to three possible products.
Spend additional time learning about each
product.
Invite the vendors in for demonstrations.
Ask for references and follow up with each
reference
Select product and implement.
Follow up with end users.
Continue an on going review with end users.

25
Choosing the right search engine

Some Suggestions
The search system development or selection should
be based primarily on the local needs
Consider using freeware search engines, if your
requirements are met by these.
For large, highly developed intranet sites, you
may like to consider commercial search engines
Consider if the webserver you are using supports
indexing and search, and if this is adequate for
you.

26
Choosing the right search engine

The IT Professionals should make an effort to
keep themselves abreast of the current web
technologies
The features available within a tool should be
made use of properly to get maximum benefits
Carefully consider interrelations between the
three major components document resources, users
and the search engines.

27
Free and commercial search engines

For bibliographic and textual databases
(multi-record files)
MG (Managing Gigabytes) (www.mds.rmit.edu.au/mg/)
Free-WAIS-sf (www.wsc.com/freeWAIS-sf/fwmain.html)
I-search (www.cnidr.org/ir/isearch.html)
WWWISIS (www.bireme.br/wwwisis2.htm)

28
Free and commercial search engines

For HTML and text files (web site indexing and
file/directory level indexing)
SWISH-E (sunsite.berkeley.edu/SWISH-E/)
ht//Dig (htdig.sdsu.edu/)
Excite For Web Servers (www.excite.com/navigate/)
WebGlimpse (glimpse.cs.arizona.edu/webglimpse/
For structured/formatted data
- MYSQL (www.tcx.se/)

29
Free and commercial search engines

Commercial search engines
AltaVista (www.altavista.digital.com/)
Fulcrum (www.fulcrum.com/ )
Infoseek (software.infoseek.com)
Open Text (www.opentext.com/)
Oracle (www.oracle.com/)
PLS (www.pls.com/)
Verity (www.verity.com/)

30
Search engines Related sources

Boeri, Robert J. Intranet searching A light at
the end of the tunnel. EMedia Professional, June
1998, pp. 63-69.
Esler, Sandra L. and Nelson, Michael L. NASA
indexing benchmarks evaluating text search
engines. Journal of Network and Computer
Applications, 20, 1997, pp. 339-353.
Hibbard, Justin. Applications--Straight Line to
Relevant Data--Customized Content Should Slash
Intranet Search Time. Information Week, November
17, 1997.
Nance, Barry. Internal Search Engines Get You
Where You Want To Go. Network Computing, October
8, 1997

31
Search engines Related sources

Railsback, Kevin. Serving Up Quality
Searches--Six Server-based Packages for Adding
Search Capability to a Website. Internet
Computing, February 16, 1998.
Sullivan, Danny. Search Engine Solutions for Your
Site--Make Your Site Easy to Search with an
Assortment of Features and Techniques.
NetGuide, December 1, 1996
Zor, Peggy et. al. Surfing corporate intranets
Search tools that control the undertow. Online,
May/June 1997, pp. 30-51

32
Intranet search engine ht//dig

Developed in 1995 at San Diego State University
as a way to search the various web servers on
the campus network.
The current release is htdig-3_1_3_tar.gz and
is available at
htdig.sdsu.edu/files/htdig-3.1.3.tar.gz
The ht//Dig system is a complete world wide web
indexing and searching system for a small domain
or intranet.
It contains four program modules viz., htdig
(retrieves HTML documents), htmerge (creates
document index word database),
htfuzzy (creates indexes for differentfuzzy''
search algos), htsearch (search engine.)

33
Intranet search engine MG

Developed in 1994 by Tim C. Bell, University of
Canterbury, Alistair Moffat, University of
Melbourne, Ian Witten, University of Waikato and
Justin Zobel, RMIT.
Current version is 1.2.1
MG software is a collection of programs that
through the use of compression provide economical
storage and indexing for large collections of
documents as well as fast index construction and
query processing.
It can be obtained via anonymous ftp from the
Australian archive host munnari.oz.au
128.250.1.21 from the directory /pub/mg and
the documentation is available at
www.mds.rmit.edu.au/mg/
It consists of three program modules Mgbuild
(database creation),
Mgquery (database search), Mgmerge
(database updation)