H' Lundbeck AS21Nov091 - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

H' Lundbeck AS21Nov091

Description:

'Alzheimer's disease OR Alzheimer's disorders OR Alzheimer type dementia OR.' Lundbeck Thesaurus ... Metadata on document types in EDMS are evaluated and under ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 35
Provided by: AGE90
Category:

less

Transcript and Presenter's Notes

Title: H' Lundbeck AS21Nov091


1
Assessing the effectiveness of your current
search and retrieval function
Case story evaluating human metadata indexing
versus automatic query expansion using a
corporate thesaurus
  • Anna G. Eslau, Information Specialist, H.
    Lundbeck A/S
  • Marianne Lykke Nielsen, Associate Professor,
    Royal School of Library and Information Science

2
Agenda
  • Motivation
  • Case study
  • Research partners
  • Purpose
  • Test design
  • Findings
  • Conclusions
  • Summing up

3
Motivation
  • A lot of money has been invested but does our
    current search and retrieval function perform as
    expected?
  • An advanced and time consuming indexing task has
    been laid upon our end users but is our current
    indexing strategy effective?
  • Do we have - as high quality - alternatives to
    manual indexing?

4
Agenda
  • Motivation
  • Case study
  • Research partners
  • Purpose
  • Test design
  • Findings
  • Conclusions
  • Summing up

5
Case study - Research partners
  • H. Lundbeck A/S
  • Pharmaceutical company
  • 5000 employees, in gt 40 countries
  • Information systems with electronic documents
  • Corporate thesaurus
  • Users and search requests
  • Royal School of Librarianship
  • Thesaurus research expertise
  • Domain knowledge from former research project
  • Ensight A/S
  • Verity K2 search engine and Intelligent
    Classifier
  • Technical expertise

6
Purpose of case study
  • To evaluate
  • Information retrieval based on controlled, human
    indexing (controlled metadata)
  • Information retrieval based on full-text
    indexing, with thesaurus-based automatic query
    expansion

7
Case study Retrieval system and indexing policy
  • Electronic document management system (EDMS) and
    bibliographic information system containing
    research documentation
  • Indexing policy
  • Written indexing policy
  • Mandatory training of indexers
  • Corporate Thesaurus
  • Human, controlled indexing
  • Topical checklist/Facetted indexing
  • Searching by controlled metadata and full-text
  • Domain specific thesaurus containing 5,500
    concepts and 16,000 terms

8
EDMS 1/2 - Indexing
9
EDMS 2/2 Searching
10
Lundbeck Thesaurus 1/3
11
Lundbeck Thesaurus 2/3
12
Lundbeck Thesaurus 3/3
13
Agenda
  • Motivation
  • Case study
  • Research partners
  • Purpose
  • Test design
  • Findings
  • Conclusions
  • Summing up

14
Test design - Retrieval performance of different
search strategies
  • Three different search strategies were evaluated
  • Searches based on natural language (words from
    original request) in full text
  • Searches based on natural language in full text
    expanded with words from thesaurus (query
    expansion with synonyms and narrower terms)
  • Searches based on (manually assigned) controlled
    keywords in selected metadata fields

15
Test design - Query expansion
  • Search for information about intravenous
    administration of a drug AND Alzheimers disease
  • Intravenous OR IV OR Intravenously OR
  • AND
  • Alzheimers disease OR Alzheimers disorders OR
    Alzheimer type dementia OR..

16
Lundbeck Thesaurus
17
Test design - Test persons and retrieval system
  • Persons
  • Query expansion tests were carried out by the
    thesaurus manager and did not involve end-users
  • Evaluation of search results were carried out by
    end users 4 subject experts (Medical advisers)
    who had formerly answered the search requests
  • System
  • Verity K2 search system was used as test
    retrieval system for the query expansion test
    work
  • Original document management systems were used as
    retrieval system for the metadata searches

18
Test design - Test thesaurus
  • The Lundbeck Thesaurus was the test thesaurus.
    The thesaurus formed basis for query
    formulations
  • - Synonyms and narrower terms were picked from
    the thesaurus for the test searches based on
    expansion of natural language in full text
    searches
  • - Preferred keywords were picked from the
    thesaurus for the test searches based on
    controlled keywords in selected metadata fields.

19
Test design - Test collection
  • 25,384 document objects from two different
    sources
  • 24,369 document objects from a bibliographical
    (BRS) information system (internal research
    reports and published research articles)
  • 1015 documents from the full-text EDMS system
    (internal research reports)

20
Test design - Search requests
  • 10 search requests were selected from a set of
    searches which in real life had been carried out
    in the corporate information systems

Work task 7 You are a medical reviewer. A
physician has contacted you. He would like to
have data on the use of Citalopram and Reboxetine
together to treat resistant depression. He wants
any reporting of possible interactions. Indicative
request Find reports, papers or case stories
that investigate the possible interaction of
Citalopram and Reboxetine on resistant depression
21
Agenda
  • Motivation
  • Case study
  • Research partners
  • Purpose
  • Test design
  • Findings
  • Conclusions
  • Summing up

22
Findings Performance
SJ Search Job, QE Query Expansion
Precision ( relevant docs out of all retrieved
docs) went down from 33 to 24 with query
expansion
23
Findings Human indexing problems
24
Findings Other metadata
  • Topical retrieval and situational relevance
    ranking - the importance of contextual parameters
  • Document type
  • Publication year
  • Source
  • Language
  • Author

25
Findings Thesaurus
  • Thesaurus
  • Relevant synonyms (acronyms with multiple
    meanings should be omitted)
  • Logical hierarchies
  • High topical relevance

26
Findings Documents and search requests
  • Document collection
  • OCR scanned documents may contain errors gt false
    positive hits
  • Large (gt100 pages) full text documents lower
    precision (irrelevant hits)
  • Search requests
  • If people are searching using very general terms,
    QE will be extremely complicated/extensive, the
    more levels of QE we choose to add
  • Different types of facets result in
  • Different relevance assessment according to
    document types
  • Different recall in metadata search

27
Findings Search software
  • Search software settings are important
  • Stemming
  • Case sensitivity
  • Character sensitivity (())
  • Number of search terms allowed
  • Zoning

28
Agenda
  • Motivation
  • Case study
  • Research partners
  • Purpose
  • Test design
  • Findings
  • Conclusions
  • Summing up

29
Conclusion Thesaurus and QE
  • A domain specific thesaurus are well suited for
    QE
  • QE improves recall but decreases precision
  • QE with synonyms only are in most cases sufficient

30
Conclusion - Search result display
  • Users want to see all hits (recall is important)
  • Manual sorting of search results by (other than
    topical) metadata is requested by the users
  • Ranking based on e.g. zoning is not always useful

31
Conclusion Indexing policy
  • Difficult to obtain complete, accurate and
    exhaustive human indexing
  • Findings suggest that searching for specific
    topics should be based on full-text indexing,
    supported by thesaurus based query expansion
  • Human indexing should focus on few, important,
    well-defined topics, e.g. used to develop
    taxonomies for broad browsing
  • Non-Topical context metadata are important in
    assessment of document relevance
  • Document type
  • Publication year
  • Source
  • Language
  • Author

32
Conclusion Implications for Lundbeck
  • Lundbeck Thesaurus has been integrated with
    bibliographic information system to perform
    automated QE
  • EDMS upgrade planned where QE should be possible
  • OCR scanning of existing documents are considered
  • Metadata on document types in EDMS are evaluated
    and under revision (simplified)
  • New models on how to add metadata are considered
    (dictionaries)
  • New indexing tools for the users are developed
    (indexing keys)

33
Agenda
  • Motivation
  • Case study
  • Research partners
  • Purpose
  • Test design
  • Findings
  • Conclusions
  • Summing up

34
Summing up
  • If your current search and retrieval function
    does NOT perform as expected, your organisation
    may loose important information
  • You may have an indexing strategy (which is
    good) but evaluation may reveal that the
    resource investments could be used even better
  • Evaluation is important, it may save your
    organisation money over time
Write a Comment
User Comments (0)
About PowerShow.com