Introduction to Integrated Search - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Introduction to Integrated Search

Description:

Browsing the shelves. 1992 searching moves to the desktop. New library building ... linking to fulltext OpenURL resolving, book shelves, current awareness services ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 39

Provided by: Win6231

Category:

more less

Transcript and Presenter's Notes

Title: Introduction to Integrated Search

1
Introduction to Integrated Search

Thomas Place
Presentation for Digital Libraries à la Carte
2009
Friday, 31 July 2009

2
Overview

A Personal History
Users
Common Architecture
Concluding remarks

3
A Personal History
4
History search _at_ Tilburg University

1988 Searching only in the library
1992 Searching moves to the desktop
1995-1997 Homogeneous search interface
2001 Metasearch plus dynamic linking
2009 Integrated search
201? Searching completely in the cloud

5
1988 Searching only in the library

Psychological Abstracts print index (appeared
during1927 2006)
Social Sciences Citation Index print index
(appeared during1973/4 - ????)
OPAC terminals Online in the library ? Public
Stand alone PC in the library with CD-ROMs
PsycLit, SSCI
(In 1985 my first PC)
Each week Current Contents print journal with
table of contents of journals
Each month exhibition of newly acquired books
Browsing the shelves

6
1992 searching moves to the desktop

New library building
LBS3 of Pica (now OCLC)
new generation ILS the core is still
operational in 2009
building of database from union catalogue took
weeks transfer by tapes
updates online
OPAC accessible via the Internet (telnet)
Tilburg the first Dutch university with a Campus
Wide Information System (1991) with entry points
for the local bibliographical databases
Catalogue
Excerpta Informatica
Online Contents journal articles
Student theses
Attent reports in economics
Brabant database
and for external databases on the internet.
CD-ROMs available via campus network

7
1995-1997 homogeneous search interface

All local databases (Trip) have Z39.50 interface
exception the catalogue
Z39.50 MS Windows client (Kwik)
Soon replaced by a Web application (Trix)
Homogeneous access to internal and external
Z39.50 databases via a Web browser (Netscape)
Each database was, however, searched separately
like in 1988 with the print indexes.
Users didnt understand that Catalogue is for
books and journals, Online Contents is for
articles, etc. Default selection is the first
database in the list

8
One Interface
Z39.50
Homogeneous userinterface
9
One Interface
XML
Federator
Z39.50
SRU
Metasearch
10
2001 metasearch plus dynamic linking

European project Decomate II
?
commercialization by OCLC PICA,
software development by Tilburg University
not at the market anymore, other products
are
First Dutch implementation of metasearch still
running.
Database lists, homogeneous userinterface for
SRU/Z39.50 databases, metasearch, de-duplication,
dynamic linking to fulltext OpenURL resolving,
book shelves, current awareness services
Local databases only available via user interface
of iPort
User interface conforms to house style demo

11
Problems with metasearching

the performance is sometimes disappointing (no
Google-like performance)
the presentation of the information is not
optimal (merging, sorting)
users find it difficult to select the right
databases for a federated search (as a solution
they select all databases which has a negative
effect on the performance and increases the noise
in the search results).
users dont know how to formulate the best
queries for the databases they have chosen (in
many cases this is also not possible because a
query that is optimal for one database is not the
optimal query for another database in which the
user also wants to search indexes differ over
dbs).

12
One Interface
Z39.50
Homogeneous userinterface
13
One Interface
XML
Federator
Z39.50
SRU
Metasearch
14
One Interface
SRU
XML
OAI-PMH
Integrated search
15
2009 Integrated search

Page with databases is no longer the start, but
the search box.
No database selection just search demo
Technical solution Meresco of CQ2
Open Source
We work together with the TU Delft who implements
also Meresco Discover
Meresco infrastructure is also used for special
services, e.g., Economists Online

16
What are our goals?

To be THE one and only search engine of Tilburg
University
Searching scientific information (library) AND
non-scientific information (website, learning
material)
Query leads (in the future) to
Relevant documents and web pages (Meresco)
Experts (expert finding system developed by
master student)
Specialised databases (Purple search, metasearch
application of the University of Groningen)
Finding of documents no longer clicking to full
record display most important information is
directly presented in result list
Informing the user about the search results
facets, clusters
Added value add-ons / mash-ups, integration in
the workflow

17
(No Transcript)
18
Components

Information resources
Ingest
Search engine
Presentation and integration of external services

19
NEEO Institutional Repositories Other economics
repositories
Logs
Metadata
Objects
OAI-PMH
HTTP
Crawler
Harvester
OAI-PMH
Metadata enrichment server
Metadata
Gateway
SRU
Search engine
SRU
RePEc
RSS/Atom
OAI-PMH
Portlet
Portal
Publication list generator
Ajax server
Service component
Data
subcomponent
Protocol
20
Information resources

Repositories with OAI-PMH interface
Local databases (IR, Student theses, Online
Contents, ...)
SHARED repositories with metadata of publishers
Elsevier repository _at_ UvT
External repositories
RePEc
IRs (e.g., NEEO)
...
GGC Dutch Shared Cataloguing System with
OUF/SRU interface (catalogue records)
...

21
Ingest

Meresco harvester
OAI-PMH repositories to harvester
SRUUpdate van harvester to search engine
Inbox
Pica records from GGC go in inbox
Records are fetched form inbox by the search
engine
Records are stored in their original format in
database of the search engine
If no MODS, than conversion MODS is stored
alongside the original records so no dynamic
conversion for indexing and presentation
Parts (e.g.. ratings, annotations or fulltext)
can be added to the record

22
Meresco search engine

Lucene
XML-based all paths in the tree can be indexed
Powerful facetting engine not Lucene
Search term suggestions
Clustering (sort of)
Indexing of fulltext
Has its own GUI
But integration via SRU with other front ends
(e.g., Economists Online) is possible
Flexible writing you own pluggable components in
Python
UvT develops tools for configuration by
Functional Application Managers (information
specialists)

23
Integration of external services (UvT)

place locator
OpenURL resolver
No menus OpenURL in, XML out
Info about location as specific as possible
Connection with ILS for availability info (need
for standards DLF)
Is called from results list (Ajax)
Journal covers (local server)
Book covers (Syndetics)
More to come

24
What is now (June 2009) in de search engine?
25
What will be added?
26
Users (Delft)

Students lack an overview of the domain in which
they search. They are inexperienced searchers and
dont know the terminology of the disciplines in
which they search. The challenge for students is
to find structure in the chaos of information.
Students search without a clear plan. They want
to be able to revisit earlier search paths. This
is not well supported by present systems.
When a student starts searching there is no clear
idea of what (s)he is searching for. During the
search process their information need becomes
gradually more clear and they discover the
relevant search terms.
For students it is difficult to verify the
trustworthiness of the information that they find
during searching.
Student dont know RSS.
The way students search is not very well
organised. They change strategies and goals. They
are very receptive for unexpected results
(serendipity) which give them new leads for
searching more information.

27
Metalib statistics of the University of Groningen
50 zero or false results

Misspellings and typos in search terms
Picking databases at random
Unable to understand QuickSearch, MetaSearch,
Find Database
Using the wrong search keys
Using search keys wrong
Using Dutch search terms in English language
databases
Using non-specific terms, phrases that are too
broad
Lack of understanding of Boolean logic or
database peculiarities

Metalib statistics
28
Common architecture

Data layer
Search layer in most cases Lucene as core (Omega
of Un. Utrecht Autonomy)
Presentation layer

29
(No Transcript)
30
Primo Search Engine
Import/Data APIs
data
Publishing Platform
data
Harvesting (OAI-PMH,..)
Source Repository
31
(No Transcript)
32
Data layer

Collection of metadata and documents from
external sources by
OAI-PMH harvesting
downloading from CDs or DVDs
FTP get
SRU/Z39.50 requests
Cleaning of the metadata (e.g., repairing invalid
XML)
Adding metadata elements local data, subject
infoAlso availability info?
Merging metadata (e-holdings print holdings of
same journals expressions, manifestations of the
same work FRBR)
Conversion to standard XML-format (PNX, MODS,
MARC21) proprietary vs standardized
formatsWhat is stored? Orginal and/or converted
records. Or nothing or only external record
location?
Adding admin info source, ingest date, access
rights
Fetching documents and adding (ASCII or XML
version of) fulltext to the records
Processing of data generated by users tags,
annotations, ratingsUser generated data
external (shareable) or internal (non-shareable)
data

33
Data layer

Sharing of data
Processing of publisher data at one place,
indexing at many places
Sharing of annotations, tags and ratings (?)
Issues
What is stored?
Pre-processing in data layer versus
post-processing in presentation layerstatic data
versus dynamic datadata generated during
post-processing cant be indexed

34
Search layer

Indexing of records from the data layer
Loading SRUUpdate or batch mechanism
Filters, analyzers
Index definitions (Lucene document format)
Separate indexes for facetting, search
suggestions and/or clustering? or use Solr?
Searching in the index(es)
Search results including facets, clusters in XML
SRU
RSS

35
Search layer

Sharing of indexes
One central index with subcollections
Distributed index standardization of index
definitions
Exchanging of indexes
1 is possible but requires organisation
2 and 3 are probably technical possible, but I
dont know of successful examples
Issues
Standard search interfaces SRU,

36
Presentation layer

Web application that sends (converted) user query
(HTTP request) to search engine and receives
search results in XML
Processing of XML and returning HTTP response to
the browser
For dynamic content, the browser is responsible
Ajax.E.g., availability info
Possible modules
Query parser Google like queries gt CQL
OpenURL generator
Tag cloud builder
Authorisation module with access rules
authentication is external support of SAML
(A-Select, Shibboleth)

37
Presentation layer

Integration of external servicesApplication must
allow for easy integration of external web
services
Recommender systems like Purple search,
metasearch application of the University of
Groningen
Personalised services, e.g.,
Current awareness service storage of profiles
(or is RSS sufficient?)
E-shelves, shopping cart permanent storage?
Sharing of e-shelves.
Tagging, annotations, ratings. Sharing
Location services integrating OpenURL resolver
and Circulation Control of ILS. Issue
Standardized access to availability information
of ILS.
Federated search server
Amazon (book covers, book reviews) / Syndetics
(book covers, reviews, tables of content)
Google books
Web of science impact factor (or new service of
Ex Libris)
Export services
xISSN (OCLC) get all related ISSNsCan also be
used during preprocessing in data layer
TicTOCs Journal Tables of Contents Service

38
Concluding remarks

Just search, no database selection
Integrated search systems must give guidance to
the user facets, clusters, suggestions,
recommendations,
Sharing of resources requires a common
architecture, common APIs, common standards

Write a Comment

User Comments (0)