Title: Search Facilities in Websites
1Search Facilities in Websites
Nicola Gillespie
Background I have reviewed several website
search facilities which vary in the technologies
that they use, for example, the Stirling
University search facility uses ht//Dig and the
Strathclyde University search facility is powered
by Google. I also reviewed large commercial
websites such as Amazon and the BBC. There were
some striking mistakes, for example, if you
searched for seller fees in Amazon, the results
shown were we were unable to find exact matches
for your search for killer. Would you like to
search again?. In contrast, searching for
Amazon seller fees using Google returns an
exact match. The general conclusion for many
websites was that using Google instead of the
websites own search facility gave better
results. When you search Google for Stirling
University Computing Science timetable it finds
the exact page, however if you search for it
using the Stirling University search facility,
the exact match is found 7th in the results.
The Project I am currently writing a
web-crawler/indexer using Lucene in Java which
crawls a subset of the Stirling University
website indexing the web pages as it goes. Once
this is complete, I will write a search facility
to see if I can reproduce some of the behaviours
observed with existing sites. I shall then make
recommendations about successful and unsuccessful
search strategies and test them using my own
program.
No
URL TO FOLLOW
CHECK HASHTABLE URL ALREADY INDEXED?
Yes
CONTACT SERVER
INDEX WORDS
EXTRACT URL
IS CRAWLER ALLOWED?
Process of a web-crawler/indexer
PROCESS PAGE
No
Yes