Title: Finding Resources On Your Web Site
1Finding Resources On Your Web Site
- Aims of Talk
- Review approaches taken by UK HE and Public
Library communities to indexing web sites - Discussion of findings
- Describe future developments
- Brian Kelly
- UK Web Focus
- UKOLN
- University of Bath
- Bath, BA2 7AY
- Email B.Kelly_at_ukoln.ac.uk
- URL http//www.ukoln.ac.uk/
UKOLN is funded by the Library and Information
Commission, the Joint Information Systems
Committee (JISC) of the Higher Education Funding
Councils, as well as by project funding from the
JISC and the European Union. UKOLN also
receives support from the University of Bath
where it is based.
2UKOLN and UK Web Focus
- UKOLN
- UK Office for Library and Information Networking
- Small research and advisory group based at
University of Bath - Funded by JISC and LIC (MLAC from 1 April) to
advise Higher Education and Library (and Museums
Archives from 1 April) communities on digital
networking issues - UK Web Focus
- JISC-funded post to advise HE community on web
matters
3Contents
- Background
- A Survey of Two Communities
- Comparisons
- Interesting Examples
- Other Developments
- Conclusions
4Importance of Indexing
- Design and browsing tends to be given priority
- But
- Users will search as well as browse
- Users may not understand navigation structure /
metaphors which are obvious to members of
organisation - Searching becomes more important as web site
grows
5Which To Choose?
Can choose byreading reviews, web sites, etc. or
by looking at usage in community
- Glimpse
- Harvest
- ht//Dig
- ICE
- iHound (ICATT)
- Index Search (Xavatoria)
- Index Server (Microsoft)
- IndexMySite (remote)
- Infoseek - Ultraseek
- Intermediate Search
- intraSearch (remote)
- I-Search
- Isearch
- ITMS
- Isysweb
- Java Applets
- JHLSearch
- JObjects QuestAgent
- Lycos / InMagic
- Alkaline (Vestris)
- AltaVista - Search Intranet
- ASTAWare SearchKey
- atomz Search (remote)
- BooleanSearch
- BBDBot
- BRS/Search (Dataware)
- Compass Server (Netscape)
- Cybotics
- DataWare BRS/Search
- DocFather (formerly SiteSearch)
- dtSearch Web
- Excalibur RetrievalWare
- EWS (Excite)
- Excerpt (Obsolete)
- Extense
- FAST Search Server
- Findex (code library)
- Folio siteDirector
- Magnifi Enterprise Server
- Matt's SimpleSearch
- Microsoft Index Server
- Microsoft Site Server
- MiniSearch (remote)
- MondoSearch
- Muscat
- NetResults (now SearchKey Plus)
- Netscape - Compass Server
- OpenText - LiveLink
- Perl Scripts
- Perlfect Search
- Phantom (Maxum)
- PicoSearch (remote)
- Etc.
Indexing software from lthttp//searchtools.com/too
ls/tools.htmlgt Which to choose? What software
may be obsolete? What does remote mean?
6Two Surveys
- Two surveys have been carried out
- Summer 1999 a survey of search engines used on
institutional UK University web sites (updated
recently) - January 2000 a survey of search engines used on
UK Public Library web sites
7Characteristics of HE Community
- The UK Higher Education community
- Long-standing involvement in Internet and Web
- Much technical expertise available (e.g. PhD
students) - Early involvement in web by enthusiasts
- Initially little finance available, so interest
in public domain and open source software - More financial resources becoming available as
senior managers become aware of strategic
importance of Web
8Findings UK HE Web Sites
- Main findings of two surveys
Nos. (Mar)
Software
Nos. (Jul)
32
?
ht//Dig
25
17
eXcite
19
?
15
?
Microsoft
12
6
?
Harvest
8
8
?
Ultraseek
7
29
Other
29
51
None
60
?
Totals
160
163
- Article published in Ariadne issue 21 -
lthttp//www.ariadne.ac.uk/issue21/webwatch/gt - Results (including update on survey) available
fromlthttp//www.ukoln.ac.uk/web-focus/surveys/uk
-he-search-engines/gt
9Popular Products ht//Dig
- ht//Dig
- Now used at 32 (up from 25) UK HEIs
- Freely available
- New version released in December 1999
- Own domain with well-designed web site
- Robot to index multiple servers
See lthttp//www.htdig.org/gt
Oxford Case Study 131 servers 438,500
resources Indexes MS Office, PDF, etc. files
(external parser)
Case Studies produced by Helen Sargan (Cambridge)
10Popular Products eXcite
- eXcite
- Now used at 17 (down from 19) UK HEIs
- By-product of the eXcite Internet search engine
- Bug announced in January 1998. Notice not
updated since! - Time to change?
See lthttp//www.excite.com/navigate/gt
11Popular Products Microsoft
- Microsoft
- Several Microsoft indexing tools available
(FrontPage, Index Server, SiteServer, ) - Most powerful is the SiteServer indexer
- Now used at 15 (up from 12) UK HEIs
Essex Case Study 16 servers indexed 11,500
resources Constrained searches possible Indexes
MS Office, PDF, etc. files
12Popular Products Ultraseek
- Ultraseek
- Used at 8 (up from 7) UK HEIs
- Powerful but expensive
- See lthttp//software.infoseek.com/gt
Cambridge Case Study 232 servers 188,000
resources Weightings given to meta tags Useful
logs and reports
13Popular Products Harvest
- Harvest
- Now used at 6 UK HEIs (down from 8)
- For IR research use?
- See lthttp//www.tardis.ed.ac.uk/harvest/gt
14Other Popular Products
Output from SWISH
- SWISH / SWISH-E
- Used at 5 HEIs
- Dated?
- Webinator
- Used at 4 HEIs
- Useful functionality
- See lthttp//www.thunderstone.com/webinator/gt
Output from Webinator
15Use of Third Party Services
- Small usage of third parties to provide indexes
- FreeFind (Used at 2 HEIs) and AltaVista (Used at
1 HEI) - Why not more use by 50 institutions with no
search facility?
- Benefits from services provided by popular
large-scale search engine - Low cost (free?)
- Incomplete coverage?
- Loss of control, advertising,
16Characteristics of Public Library Community
- Public Library Community
- Relatively new to Internet and Web
- Less technical expertise available
- Large OPACs available
- Often part of Council's web site
Note "Well Connected A Snapshot of Local
Authority Websites" (Society of Information
Technology Management report) found that in 1999
69 of local authority websites did not have a
search facility
17Results
- Survey carried out on 4-5th January 2000
- Results for 137 web sites
- 49 have no search facility?!
- Of those that do
- 45 (18) use Microsoft
- 7.5 (3) use Domino
- 7.5 (3) use Muscat
- 40 (16) another solution
- Comments
- Some sites use the general Council search
facility and in some sites the Council search
facility can be used to search areas (e.g.
Library) - Some sites very small (1 page with opening hours)
- See lthttp//www.ukoln.ac.uk/web-focus/surveys/pub
-lib-search-jan-2000/survey.htmlgt
18Popular Products Microsoft
- Microsoft
- Several Microsoft options available
- Used in 18 public libraries
- Sometimes can restrict searches to selected
areas - Popularity indicativeof use of Windows NTin
public libraries
19Popular Products Muscat
- Muscat Empower
- Powerful licensed product
- Agent technology
- Email alerting of changed resources
- Foreign language support
- Used in 2 Public Libraries (full Council web
site only) - Muscat FX also used(1 site)
- See lthttp//www.muscat.com/gt
20Popular Products Domino
- Lotus Domino (Notes)
- Powerful, licensed web server system
- Used at 3 Public Libraries
- See lthttp//www.lotus.com/home.nsf/welcome/domin
ogt
21Home-Grown Solution
- A small number of Public Libraries have developed
their own indexing software. Leeds Public
Library have a good example - Various areas can be searched
- Multiple search terms
- Boolean operators
- Attractive interface
- Software
- Written in C
- Interrogates file when they are live
- Directories can be excluded
- Operational for 3 years
22Try Them For Yourself
- Interfaces to UK University search engines are
available providing a single location for
evaluation - The page also provides a link toorganisational
search pages - The resources are grouped in alphabetical
orderand by search engine
What does Aberdeen's search facility provide?
What functionality do libraries using Domino
provide?
See lthttp//www.ukoln.ac.uk/web-focus/surveys/gt
23Other Developments
- What else is happening to indexing of these
communities? - eLib Hybrid Libraries
- National search engines
- Local initiatives
24eLib Hybrid Libraries
- eLib Phase 3 includes "Hybrid Library" projects
- Help users find electronic (web, OPAC, etc.) and
"real world" resources - Includes regional and subject-specific approaches
MusicOnline search of Music Catalogues
BUILDER search of eLib Phase 3 web sites
25National Search Engines
- ACDC (Academic Directory)
- (Unfunded) pilot of index of ac.uk domain based
on distributed approach using Harvest - Set up in March 1996
- Lack of development effort resulted in degraded
service (e.g. indexer not aware of JavaScript
code) - No longer being developed?
http//acdc.hensa.ac.uk/
26Institutional Developments
- Maestro robot (Dundee)
- Indexes Scottish resources
- Volunteer effort
- North East Universities (UNIS4NE)
- Appearance of cross-searching
- Actually interface to HotBot / AltaVista
27Other Possibilities
- What other developments may we expect
- Increased indexing in institutions of other web
sites (opposition / friends) - Development of a HE (or public sector?) national
search engine - "Surface-scraping" of institutional search
engines - Leave it to commercial sector
- European developments
- New developments (XML / RDF / etc.)
28Indexing Remote Sites
- May see increased indexing of remote sites within
institutions - Examples provided by Dundee and BUILDER
- Feeling of ownership
- Easily done
- Can develop enhancements locally
- Increased server load locally
- Increased server load remotely
- Increased network load
- Not scalable
- Unnecessary duplication
29"Meta-Search" Possibility
- A collection of interfaces to search engines for
UK HEIs is available - This could be used as the basis of a
"meta-searcher"
- Indexes aren't duplicated
- Local site responsible for content of its index
- A hack
- Problems with maintenance
30Commercial Solutions
- Could leave searching to commercial world
- No costs to institution / HE community
- Results too broad
- Distracting interface
- Little scope for tailoring
- Not integrated with non-Web services
31European Developments (1)
- DESIRE project
- EU-funded project with resource discovery
component - Nordic Web Index provides index across Nordic
countries (but partly discontinued due to lack of
funding) - See lthttp//www.desire.org/html/services/resource
discovery/indexing/gt - REIS
- Pilot project on Research Education Indexing
Service for Europe - See lthttp//www.terena.nl/projects/reis/gt
32European Developments (2)
- Surfnet
- Dutch Research network service
- Use of AltaVista search software for national
index - But how widely used is it?
- Is there a user demand for this type of service?
http//www.surfnet.nl/en/surfnet-searchtools/
33What About Metadata?
- Metadata can
- Improve search results
- Provide structured information (for automated
processing) which can provide richer services - Fielded searches
- Limit searches (e.g. only Library pages on
Council web site) - Web site administration
- Alternative browsing interfaces
- Tools, standards, etc. becoming available
- Expected growth area
34Example
- Exploit Interactive web magazine
(www.exploit-lib.org) is using metadata to
provide enhanced searching - Search for foo in
- Issue 2 or in issue 2 and 4 (this is possible
using directory structure) - Feature Articles(needs metadata)
- Articles about EU-funded projects
- Etc.
- Combinations of above
- Also provides alternative browsingstructures
35JISC Developments
- DNER (Distributed National Electronic Resource)
- Seamless access to national resources
- What about local resources?
- Need for "institutional portals"
- RDN
- Resource Discovery Network
- Builds on work of eLib subject gateways
- Based on standards (Z39.50, whois, LDAP,
etc.) - Lessons for institutions
36Conclusions
Questions welcome
- To conclude
- No clear "best buy" for indexing software
- Probably some to avoid
- In 2 years time are you likely to
- Still be using same software?
- Have changed software / architecture?
- If changes likely, need to think about change
migration strategies, interoperability issues,
etc. - Need for user studies (not covered)
Useful Resources http//SearchTools.com/ http//ww
w.searchenginewatch.com/ http//www.builder.com/Se
rvers/AddSearch/