Improving MEDLINE Query Precision With Pseudo Relevance Feedback

1 / 42

About This Presentation

Title:

Improving MEDLINE Query Precision With Pseudo Relevance Feedback

Description:

The MetaThesaurus is a comprehensive vocabulary that includes definitions MeSH terms [13] ... Senior Engineering Manager, Marquette Medical Systems. 2/86 to 8/87. ... –

Number of Views:51

Avg rating:3.0/5.0

Slides: 43

Provided by: jayur

Category:

more less

Transcript and Presenter's Notes

Title: Improving MEDLINE Query Precision With Pseudo Relevance Feedback

1
Improving MEDLINE Query Precision With Pseudo
Relevance Feedback

January 30, 2004
Jay Urbain
Advisor Ophir Frieder

2
Improving MEDLINE Query Precision With Pseudo
Relevance Feedback

Improving the precision of MEDLINE search results
is an important challenge.
Prior investigations on improving retrieval
performance using thesaurus and relevance
feedback have produced mixed results.
The use of a mixture of IR strategies utilities
along with RF makes it difficult to determine the
contributions of each technique.
We review prior work, and use a systematic method
to evaluate best techniques parameters for
using pseudo RF on MEDLINE.
Our research showed an increase in average
precision of 21.5 using pseudo RF, exceeding the
results of similar efforts.

3
MEDLINE

Collection of more than 10,000,000 citations from
more than 4,200 medical journals, indexed from a
MetaThesaurus containing more than 19,000 Medical
Subject Headings (MeSH) .
MeSH organized in a related hierarchy of
categories and sub-categories.
Myocardial Ischemia (304,753)
Coronary Disease (140,720)
Angina Pectoris (33,938)
Angina, Pectoris, Variant (1,889)
Angina, Unstable
Syndrome X
Coronary Aneurysm
Coronary Arteriosclerosis
Coronary Thrombosis
Coronary Vasospasm
http//www.nlm.nih.gov/mesh/MBrowser.html

4
Sample MeSH

ltMeshHeadingListgt
ltMeshHeadinggt
ltDescriptorName MajorTopicYN"N"gtAdultlt/Descript
orNamegt
lt/MeshHeadinggt
ltMeshHeadinggtltDescriptorName MajorTopicYN"N"gtCar
diovascular Diseaseslt/DescriptorNamegt
ltQualifierName MajorTopicYN"Y"gtepidemiologylt/Qu
alifierNamegt
ltQualifierName MajorTopicYN"N"gtprevention
controllt/QualifierNamegt
lt/MeshHeadinggt
ltMeshHeadinggt
ltDescriptorName MajorTopicYN"N"gtDiabetes
Mellitus, Type IIlt/DescriptorNamegt
ltQualifierName MajorTopicYN"N"gtepidemiologylt/Qu
alifierNamegt
lt/MeshHeadinggt
lt/MeshHeadingListgt

5
UMLS MetaThesaurus

The MetaThesaurus is a comprehensive vocabulary
that includes definitions MeSH terms 13.
Thesauri can have three types of relationships
Synonym
Represents same underlying concept.
cancer vs. carcinoma
Hierarchical
Broader/narrower is-a relationship.
Angiotensin Converting Enzyme Inhibitors vs.
Captopril
Related
Some other type of relationship
hypertensive vs. anti-hypertensive

6
Searching the MEDLINE Database

Requires knowledge of the hierarchical MeSH
indexing scheme.
Even expert users tend to select MeSH terms
that are too general.
Studies by Hersh, et al. 5 found novice users
using VSM search can be as effective as experts
using Boolean search.
IR systems are becoming essential tools for the
practice of evidence based medicine 12.
Novice and expert users alike need to find
relevant documents with the facts needed to make
decisions.

7
Retrieval Strategies

Vector Space Model (VSM)
Measure similarity by inner product, or Cosine
angle between query document vectors.
IDF log(N/df) // N total of docs df of
docs with term
SC ? (tfdidf tfqidf) // dot product.
SC ? (tfdidf tfqidf)/nd // cosine
normalized by document length.
Probabilistic
Probability estimate of the chance of a document
being relevant to a query.
P(R t1) (num relevant with t1 / num
relevant)
(num with t1 / all documents)
P(R t1, t2, , tn) P(R t1) x P(R t1) x
P(R t2) x x P(R tn)

8
Relevance Feedback

Refine search queries based on prior relevance
judgments.
Manual
Relevant documents identified by the user.
New terms selected by the user or automatically.
Automatic Pseudo
Assume the top-ranked documents are relevant.
New terms are selected automatically.

9
Prior Work
10
Experimental Methodology

Prior work used combinations of several IR
strategies and utilities, making the
identification of the contribution due to RF
difficult.
Results from Hersh and Srinivasan, and others
vary considerably.
Used a variation of the systematic method
outlined by Lundquist, Grossman, and Frieder 7
for identifying optimal techniques to improve
relevance feedback.
Our method include calibrating the number of
top-ranked documents, the number of feedback
terms used for relevance feedback, and adjusting
the term weight.

11
Methodology Experimental Steps

Evaluate baseline retrieval strategy and term
weights.
Dot Product
Cosine
Probabilistic
Explore expansion using MeSH and title terms.
Identify optimal query expansion parameters
Term relevance (term count, idf, term count and
idf)
Number of top documents to include in term
selection
Number of terms added to the query
Number of iterations

12
Data set, Sample Query

Data set
The TREC-9 training subset of the OHSUMED MEDLINE
training collection 54,710 MEDLINE citations
from 1987 and 63 queries developed by the Hersh
team at Oregon Health Sciences University 5.
Sample query
lttopgt
ltnumgt Number OHSU1
lttitlegt 60 year old menopausal woman without
hormone replacement therapy
ltdescgt Description
Are there adverse effects on lipids when
progesterone is given with estrogen replacement
therapy
lt/topgt

13
Sample MEDLINE Citation from Dataset

.I 48200
.U
87316326
.S
Obstet Gynecol Clin North Am 8712 14(1)299-320
.M
Drug Administration Schedule Endometrium/DE/PA
Estrogens/AD/PD Female Human Menopause/DE
Middle Age Norethindrone/AA/AD/PD Support,
Non-U.S. Gov't Uterine Hemorrhage/ET/PA Uterine
Neoplasms/CI/PC.
.T
The effects of estrogens and progestogens on the
endometrium. Modern approach to treatment.
.P
JOURNAL ARTICLE REVIEW.
.W
The major hazard of postmenopausal cyclic
estrogen therapy is endometrial hyperstimulation.
The incidence of hyperplasia is dose dependent
the incidence of carcinoma is both dose and
duration dependent. The risk of carcinoma is
small. Invasive procedures such as endometrial
biopsy to detect those patients with hyperplasia
and carcinoma are unlikely to be cost-effective
and have other disadvantages.
.A
Whitehead MI Fraser D.

14
Methodology Baseline

Phrasing
Phrasing techniques were not included in the
study.
Initial testing with 2-word phrasing showed no
significant increase in precision.
Significantly increased the size of the index,
and index build time.
Stemming
Standard Porter and Lovins stemmers significantly
reduce precision in early testing.
Would have also increased the number of
variables in the study.
Environment
The SimpleIR IR engine used in Information
Retrieval classes at IIT was enhanced to include
weighted terms, support for the tested
information retrieval strategies, and modified to
improve scalability.
All runs were performed on a 400 MHz Pentium III
Dell laptop with 500M of memory.

15
Results - Baseline retrieval strategy
Identify Document Term Weights
16
Results - Baseline retrieval strategy
Identify Retrieval Strategy with Constant
Relevance Feedback Parameters 3 additional
terms from top 10 documents with one additional
iteration, feedback term weight
termFrequencyidfidf
17
Expansion using MeSH and title terms
Query Term Expansion MeSH terms, Title terms 3
additional terms from top 10 documents with one
additional iteration
18
Identify query expansion parameters
Term Relevance (term count, idf, term count and
idf) 3 additional terms from top 10 documents
with one additional iteration, dot product SC,
max(1, 0.1tf) document term weighting, query
expansion with MeSH terms only
19
Identify query expansion parameters
Number of Top Documents to Include In Term
Selection 3 additional terms from top documents
with one additional iteration, dot product SC,
max(1, 0.1tf) document term weighting, query
expansion with MeSH terms only, term relevance
measure of termFrequencyidf.
20
Identify query expansion parameters
Number of Terms Added To Query Terms 15 top
documents with one additional iteration, dot
product SC, max(1, 0.1tf) document term
weighting, query expansion with MeSH terms only,
term relevance measure of termFrequencyidf.
21
Identify query expansion parameters
Number of Feedback Iterations 3 terms selected
from 15 top documents, dot product SC, max(1,
0.1tf) document term weighting, query expansion
with MeSH terms only, term relevance measure of
termFrequencyidf.
22
Summary - RF Parameters

Optimal parameters
Select the 3 most discriminating terms
Use tfidf term selection criteria
Select from the 15 most highly ranked documents
Use one feedback iteration.
These relevance feedback parameters yielded an
average precision of .3590 over 11-points of
recall and represented an increase of 21.5 over
the average precision of .2955 achieved without
any relevance feedback

23
RF vs. No RF Precision
24
RF vs. No RF Precision at n Docs
25
Discussion
Pseudo RF using MeSH terms uniformly increased
precision at all points of recall. Results
exceeded the improvement in average precision in
similar studies by Hersh 21, Srinivasan 11,
and Aronson 1, though they cant be directly
compared due to differences in underlying IR
strategies and utilities. Results indicate a
broad based robustness of utilizing relevance
feedback to improve precision. Results and
tuning parameters, may be subject to some bias
introduced by the order in which we conducted our
experiments. By RFs very nature of expanding
queries with terms from top ranked documents,
relevance feedback is bound to provide
improvement since we are utilizing the criteria
we have defined for relevance to identify
relevant. terms.
26
Future Work

Apply adaptive learning methods at each iteration
of RF.
Restrict MeSH term expansion for highly ranked
retrieved documents to MetaThesaurus concepts
related to the initial query.
Expansion using Subject heading/sub-headings
relations.
fuzzy term matcher utilizing machine learning.
Explore causes of rare diseases by looking for
indirect links in different subsets of MEDLINE
biomedical literature (Swanson 15).
UMLS co-occurrence statistics.
Add page Score algorithm to help identify gold
standard papers.
Full OSHUMED data set.
Automated weighting of feedback terms.

27
Jay Urbain

Professional
7/99 to Present. Principal, Upstream Development
LLC
8/03 to Present. Lecturer, Milwaukee School of
Engineering, EECS.
9/00 to Present. Certified Instructor, Learning
Tree International
7/97 to 7/99. VP Software, ThinkMed LLC
3/93 to 7/97. Senior Engineering Manager,
Marquette Medical Systems
2/86 to 8/87. Senior Systems Engineer, Rockwell,
Computer Vision Group
4/84 to 2/86. Senior Software Engineer, GCA Corp.
1/82 to 4/84. Software Engineer, Northrop Defense
Systems
Academic
Post Grad. Studies in Computer Science, IIT,
Chicago
MBA, May 97, UNIVERSITY OF WISCONSIN, Madison
MSEE, Computer Engineering, 1986, IIT, Chicago
BS Engineering, Computer Info. Systems, 1981,
UNIVERSITY OF ILLINOIS, Chicago
Certifications
Sun Certified Enterprise Architect for Java
Platform Enterprise Edition Technology
Sun Certified Programmer for the Java 2 Platform

28
Old Stuff follows
29
Retrieval Strategy VSM

VSM represents each document and query as a
vector of terms.
Measure similarity (SC) by taking dot product, or
measuring angle (Cosine) between them.
Documents most similar to the query are deemed to
be most relevant.
Each term in a document and a query is weighted
by term frequency and inverse document frequency
score (IDF).
IDF provides basic measure of how discriminating
a term is in identifying a document

30
Retrieval Strategy VSM

Measure similarity (SC) by taking dot product, or
measuring angle (Cosine) between document and
query vector.
Each term in a document and a query is weighted
by term frequency and inverse document frequency
score (IDF).
IDF log(N/df) // N total of docs df of
docs with term
SC ? (tfdidf tfqidf) // dot product.
SC ? (tfdidf tfqidf)/nd // cosine
normalized by document length.

31
Retrieval Strategy Probabilistic

Use probability estimate of the chance of a
document being relevant to a query.
Odds a single term will be in a relevant
document
P(R t1) (num relevant with t1 / num
relevant)
(num with t1 / all documents)
Probability for multiple terms
P(R t1, t2, , tn) P(R t1) x P(R t1) x
P(R t2) x x P(R tn)

32
Retrieval Strategy Issues

Small size of the query vector makes finding a
higher percentage of relevant documents difficult
since query vector contains only a few relevant
terms.
In a MEDLINE application, it is difficult for the
average user to identify relevant MeSH terms to
facilitate high precision searches.
Relevance feedback can address both issues by
adjusting the query vector through term expansion
of highly relevant/discriminating terms.

33
Relevance Feedback

Automatically expand query by adding terms not in
the original query, but in the ranked relevant
documents.
The more like this link available with search
engines allows user to select which documents are
relevant.
In Rocchios approach
VSM is used to rank documents.
After feedback from the user, relevant document
vectors are added to the query, and non-relevant
document vectors are removed

34
Relevance Feedback

Many extensions of Rocchio's algorithm have been
proposed, like the Ide regular algorithm and Ide
dec-hi algorithm.
Salton and Buckley found a variation of Ide
dec-hi method, where relevant terms from top
ranked docs are added directly to the queries,
to be the most effective technique in their
study.

35
Pseudo RF Approach

Assume the top ranking retrieved documents are
relevant.
Identify most relevant terms (MeSH, title,
abstract)
Term frequency
How effectively a term uniquely identifies a
document (IDF).
Add most relevant terms to query.
Re-execute the expanded query.
Repeat process (if helpful).

36
(No Transcript)
37
Prior Work - Hersh

Hersh, Price, and Donohoe 21 conducted a study
assessing thesaurus-based query expansion using
the MetaThesaurus.
OHSUMED test collection 106 queries, 350,000
MEDLINE refs.
Created MetaThesaurus concepts for each query.
Used SAPHIRE concept matching program to identify
best terms
Used synonym, hierarchical, and related term
information
Found all types of query expansion degraded
aggregate retrieval performance as measured by
recall and precision.
Some queries showed some improvement.

38
Prior Work - Srinivasan

Srinivasan demonstrated effectiveness of
expanding MEDLINE queries with three expansion
strategies
expansion of MeSH terms.
expansion of document free text.
expansion of both MeSH and document free text
terms.
Used subset of OHSUMED test collection of 75
queries and 2,344 MEDLINE documents.
Results showed significant improvement for all
three strategies.
Showed that expansion of MeSH terms outperformed
free text query expansion.
MeSH term expansion improved 11-AvgP by 14.4
over baseline of .5169.

39
Prior Work - Aranson

Aronson, et. al. 1 used the UMLS MetaMapTM
program for associating UMLS Metathesaurus
concepts with terms in the original query.
Query expansion included human assigned MeSH
terms, along with terms through MetaThesaurus
lookup.
Used probabilistic retrieval strategy.
Used same OHSUMED test collection as Srinivasan
(75 queries and 2,344 MEDLINE documents.
Results compared favorable to automated relevance
feedback and the results of Srinivasan (2.2 to
9.4 over baseline).
Authors suggest an optimal strategy is to use
both UMLS Metathesaurus and some form of
automated relevance feedback.

40
Prior Work - Yang and Chute

Yang and Chute 14 utilized a least squared
error technique to map query terms to MeSH terms
in the MetaThesaurus.
They reported a 32.2 improvement in average
precision over their baseline.
Difficult to compare results since they utilized
a custom collection.
An additional concern is the risk of over-fitting
their data by utilizing 88 of their data as the
training set.

41
Prior Work - Mao and Chu

Mao and Chu8 use a variant on query expansion
through the use of concepts and relations in the
MetaThesaurus.
Define a phrase as consisting of word stems and
concepts.
The similarity between two phrases is jointly
determined by their conceptual similarity and
their common word stems.
Concepts represent a UMLS concept identifier,
e.g., high body temperature is a hyponym of
hyperthermia, and hyperthermia is synonymous
with fever.
Their results showed an increase of 16 in
precision, but they only tested with one query.

42
Prior Work - Mendonca and Cimino

Mendonca and Cimino 23 investigated the
utilization of co-occurrence of MeSH terms in
MEDLINE citations.
Interested in terms associated with the search
strategies optimal for evidencebased medicine to
automate construction of a knowledge base.
Used UMLS MRCOC MRREL table.
Found good specificity and sensitivity for
co-occurrence of MeSH terms in MEDLINE citations.
These results lend support to the notion of
utilizing MeSH terms from highly ranked documents
as part of a query expansion strategy for
relevance feedback.