Implementing a Faceted Search Framework - PowerPoint PPT Presentation

About This Presentation
Title:

Implementing a Faceted Search Framework

Description:

Implementing a Faceted Search Framework – PowerPoint PPT presentation

Number of Views:186
Avg rating:3.0/5.0
Slides: 49
Provided by: emilyl
Learn more at: https://www.lib.ncsu.edu
Category:

less

Transcript and Presenter's Notes

Title: Implementing a Faceted Search Framework


1
Implementing a Faceted Search Framework
  • Emily Lynema Andrew K. Pace
  • NC State University Libraries
  • ASIST Seminar
  • April 9, 2007

2
Agenda
  • The Context
  • Problem motivation
  • Local Implementation
  • What and How?
  • Challenges Encountered
  • Outcomes
  • Usage Statistics
  • Future Opportunities

3
The Context
4
Online Catalogs
"Most integrated library systems, as they are
currently configured and used, should be removed
from public view." - Roy Tennant, CDL
5
What was the problem?
  • Existing catalogs are hard to use
  • known item searching works pretty well, but
  • users often do keyword searching on topics and
    get large result sets returned in system sort
    order
  • catalogs are unforgiving on spelling errors,
    stemming

6
Catalog value is buried
  • Subject headings are not leveraged in searching
  • they should be browsed or linked from, not
    searched
  • Data from the item record is not leveraged
  • should be able to filter by item type, location,
    circulation status, popularity

7
What was the motivation?
  • Unresponsive vendors (2001-2006)
  • Some reading and writing
  • SUNY Buffalo XML OPAC (2004)
  • My Kingdom for an OPAC (Feb 2005)
  • Some casual conversation (Jan 2005)
  • Some formal conversation (Feb-June 2005)
  • Organizational culture (all along)
  • Fast implementation (July 2005-Jan 2006)

8
Whats the big picture?
  • Improve the quality of the library catalog user
    experience
  • Exploit our existing authority infrastructure
    (aka make MARC data work harder)
  • Build a more flexible catalog tool that can be
    integrated with discovery tools of the future.

9
What is Endeca?
  • Software company based in Cambridge, MA
  • Search and information access technology provider
    for a number of major e-commerce websites
  • Developers of the Endeca Information Access
    Platform

10
Why Endeca?
  • Customized relevance ranking of results
  • Better subject access by leveraging available
    metadata (including item level data!) through
    facets
  • Improved response time
  • Enhanced natural language searching through spell
    correction, etc.
  • Browse

11
Local Implementation
12
Demo
13
Relevance ranking
  • Based on locally customizable algorithm
  • Most relevant query as entered
  • For multi-term searches phrase match
  • Field match
  • title match more relevant than notes match
  • Other factors
  • number of fields matched
  • weighted frequency (tf/idf)
  • static ordering (publication date, circulation
    stats)

14
Faceted navigation
  • Combine search and browse in single interface
    (Guided Navigation)
  • Filter results across multiple facets
  • Remove facets in any order

15
Facet refinements
  • Availability
  • Author
  • Library
  • Format
  • Language
  • New
  • LC Classification
  • Subject Topic
  • Subject Genre
  • Subject Region
  • Subject Era

16
Added search tools
  • Automatic spell correction
  • Did you mean suggestions
  • Automatic stemming

17
Implementation team
  • Information Technology
  • Team chair and project manager
  • Technical lead
  • ILS Librarian
  • Technical manager
  • Research and Information Services
  • Reference librarian
  • Metadata and Cataloging
  • Cataloging librarian
  • Digital Library Initiatives
  • Interface development

18
Implementation timeline
  • License / negotiation Spring 2005
  • Acquire Summer 2005
  • Implementation
  • August 2005 vendor training
  • September 2005 finalize requirements
  • October 2005 January 2006 design and
    development
  • January 12, 2006 go-live date
  • It doesnt have to be perfect!

19
The nitty gritty
  • Endeca co-exists with SirsiDynix Unicorn ILS and
    Web2 online catalog
  • Endeca handles keyword search
  • Web2 handles authority search and detail page
    display
  • Endeca indexes MARC records exported nightly from
    Unicorn
  • Index is refreshed nightly with records
    added/updated during previous day

20
Technical overview
Information Access Platform
NCSU exports and reformats
Data Foundry
MDEX Engine
Parse text files
Raw MARC data
Indices
Flat text files
HTTP
HTTP
NCSU Web Application
21
Technical overview
Offline - Nightly
NCSU exports and reformats
Data Foundry
MDEX Engine
Parse text files
Raw MARC data
Indices
Flat text files
HTTP
HTTP
NCSU Web Application
22
Technical overview
Always Online
NCSU exports and reformats
Data Foundry
MDEX Engine
Parse text files
Raw MARC data
Indices
Flat text files
HTTP
HTTP
NCSU Web Application
23
Challenges System design
  • Identifying appropriate facets
  • Integrating 2 independent data systems
  • Unique identifiers are important!
  • Designing the user interface
  • Search page
  • Results page

24
  • Too many boxes, lines, and shaded areas.
  • Elements for a single record not visually grouped.

25
First version of results page wireframe (8 total
iterations). Ideas drawn from OPAC,
RedLightGreen, Amazon, etc.
26
Brief view vs. Full view gives user choice about
displaying holdings.
Reduces complexity of continuing and online
resources.
8th (and Final) Revision Aggregate holdings
information by library.
27
Challenges - Data
  • MARC data with MARC-8 encoding gt Text data with
    UTF-8 encoding

28
Fun with MARC
  • MARC ? flat text file(s) for ingest by Endeca.
  • Transformation accomplished with MARC4J.
  • Opportunity to manipulate data on the back-end.

29
Transformed data
30
Challenges - Data
  • MARC data with MARC-8 encoding gt Text data with
    UTF-8 encoding
  • Data issues revealed by exposing metadata in
    facets
  • Relevance ranking for bibliographic data

31
Maintenance
  • Little ongoing work required after deployment
  • Quarterly data refresh from ILS
  • Version upgrades
  • 6 member product team meets monthly
  • Lots of development ideas (as time / library
    priorities afford)!
  • Loosely coupled making changes twice

32
Outcomes
33
Relevance
  • Are search results in Endeca more likely to be
    relevant to a users query than search results in
    old OPAC?
  • 100 topical user searches from 1 month in Fall
    2005
  • How many of top 5 results relevant?
  • 40 relevant in Web2 OPAC 31 no hits
  • 68 relevant in Endeca catalog 12 no hits

34
Usage statistics
35
July 06 Jan 07
36
July 06 Jan 07
37
July 06 Jan 07
19.4 Subj./Class
38
July 06 Jan 07
39
July 06 Jan 07
40
The Future
41
Future opportunities
  • Integrate catalog w/other tools through web
    services
  • Enrich catalog through external web services
  • book jackets, reviews, etc. Amazon/OCLC
  • Build cross-application shopping cart
    functionality

42
The catalog web services
  • Initial impetus 2 requests
  • Can we have RSS feeds for the catalog?
  • Can we integrate catalog results into library
    website QuickSearch?
  • Initial plan
  • Build RSS feeds and extend with OpenSearch for
    integration.
  • Where did we end up?

43
Introducing CatalogWS
  • A Web API for dynamically querying information
    from the NCSU Libraries Catalog
  • http//www.lib.ncsu.edu/catalog/ws/
  • Generic XML layer provides same functionality as
    HTML interface
  • REST web API define HTTP GET requests via URL
    parameters
  • Enables server-side user-defined XSL
    transformations

44
Why go there?
  • More open access to the data available in our
    library catalog
  • Core XML schema can be re-used and modified via
    stylesheets
  • Enable other developers in the library to build
    applications using catalog data
  • Reduce bottleneck (I dont have to do everything)

45
RSS
46
QuickSearch
47
Mobile device searching
48
Thanks
  • NCSU project site
  • http//www.lib.ncsu.edu/endeca
  • Andrew K. Pace
  • Head, Information Technology
  • andrew_pace_at_ncsu.edu
  • Emily Lynema
  • Systems Librarian for Digital Projects
  • emily_lynema_at_ncsu.edu
Write a Comment
User Comments (0)
About PowerShow.com