Title: Improving MEDLINE Query Precision With Pseudo Relevance Feedback
1Improving MEDLINE Query Precision With Pseudo
Relevance Feedback
- January 30, 2004
- Jay Urbain
- Advisor Ophir Frieder
2Improving MEDLINE Query Precision With Pseudo
Relevance Feedback
- Improving the precision of MEDLINE search results
is an important challenge. - Prior investigations on improving retrieval
performance using thesaurus and relevance
feedback have produced mixed results. - The use of a mixture of IR strategies utilities
along with RF makes it difficult to determine the
contributions of each technique. - We review prior work, and use a systematic method
to evaluate best techniques parameters for
using pseudo RF on MEDLINE. - Our research showed an increase in average
precision of 21.5 using pseudo RF, exceeding the
results of similar efforts.
3MEDLINE
- Collection of more than 10,000,000 citations from
more than 4,200 medical journals, indexed from a
MetaThesaurus containing more than 19,000 Medical
Subject Headings (MeSH) . - MeSH organized in a related hierarchy of
categories and sub-categories. - Myocardial Ischemia (304,753)
- Coronary Disease (140,720)
- Angina Pectoris (33,938)
- Angina, Pectoris, Variant (1,889)
- Angina, Unstable
- Syndrome X
- Coronary Aneurysm
- Coronary Arteriosclerosis
- Coronary Thrombosis
- Coronary Vasospasm
-
- http//www.nlm.nih.gov/mesh/MBrowser.html
4Sample MeSH
- ltMeshHeadingListgt
- ltMeshHeadinggt
- ltDescriptorName MajorTopicYN"N"gtAdultlt/Descript
orNamegt - lt/MeshHeadinggt
- ltMeshHeadinggtltDescriptorName MajorTopicYN"N"gtCar
diovascular Diseaseslt/DescriptorNamegt - ltQualifierName MajorTopicYN"Y"gtepidemiologylt/Qu
alifierNamegt - ltQualifierName MajorTopicYN"N"gtprevention
controllt/QualifierNamegt - lt/MeshHeadinggt
- ltMeshHeadinggt
- ltDescriptorName MajorTopicYN"N"gtDiabetes
Mellitus, Type IIlt/DescriptorNamegt - ltQualifierName MajorTopicYN"N"gtepidemiologylt/Qu
alifierNamegt - lt/MeshHeadinggt
-
- lt/MeshHeadingListgt
5UMLS MetaThesaurus
- The MetaThesaurus is a comprehensive vocabulary
that includes definitions MeSH terms 13. - Thesauri can have three types of relationships
- Synonym
- Represents same underlying concept.
- cancer vs. carcinoma
- Hierarchical
- Broader/narrower is-a relationship.
- Angiotensin Converting Enzyme Inhibitors vs.
Captopril - Related
- Some other type of relationship
- hypertensive vs. anti-hypertensive
6Searching the MEDLINE Database
- Requires knowledge of the hierarchical MeSH
indexing scheme. - Even expert users tend to select MeSH terms
that are too general. - Studies by Hersh, et al. 5 found novice users
using VSM search can be as effective as experts
using Boolean search. - IR systems are becoming essential tools for the
practice of evidence based medicine 12. - Novice and expert users alike need to find
relevant documents with the facts needed to make
decisions.
7Retrieval Strategies
- Vector Space Model (VSM)
- Measure similarity by inner product, or Cosine
angle between query document vectors. - IDF log(N/df) // N total of docs df of
docs with term - SC ? (tfdidf tfqidf) // dot product.
- SC ? (tfdidf tfqidf)/nd // cosine
normalized by document length. - Probabilistic
- Probability estimate of the chance of a document
being relevant to a query. - P(R t1) (num relevant with t1 / num
relevant) - (num with t1 / all documents)
- P(R t1, t2, , tn) P(R t1) x P(R t1) x
P(R t2) x x P(R tn)
8Relevance Feedback
- Refine search queries based on prior relevance
judgments. - Manual
- Relevant documents identified by the user.
- New terms selected by the user or automatically.
- Automatic Pseudo
- Assume the top-ranked documents are relevant.
- New terms are selected automatically.
9Prior Work
10Experimental Methodology
- Prior work used combinations of several IR
strategies and utilities, making the
identification of the contribution due to RF
difficult. - Results from Hersh and Srinivasan, and others
vary considerably. - Used a variation of the systematic method
outlined by Lundquist, Grossman, and Frieder 7
for identifying optimal techniques to improve
relevance feedback. - Our method include calibrating the number of
top-ranked documents, the number of feedback
terms used for relevance feedback, and adjusting
the term weight.
11Methodology Experimental Steps
- Evaluate baseline retrieval strategy and term
weights. - Dot Product
- Cosine
- Probabilistic
- Explore expansion using MeSH and title terms.
- Identify optimal query expansion parameters
- Term relevance (term count, idf, term count and
idf) - Number of top documents to include in term
selection - Number of terms added to the query
- Number of iterations
12Data set, Sample Query
- Data set
- The TREC-9 training subset of the OHSUMED MEDLINE
training collection 54,710 MEDLINE citations
from 1987 and 63 queries developed by the Hersh
team at Oregon Health Sciences University 5. - Sample query
- lttopgt
- ltnumgt Number OHSU1
- lttitlegt 60 year old menopausal woman without
hormone replacement therapy - ltdescgt Description
- Are there adverse effects on lipids when
progesterone is given with estrogen replacement
therapy - lt/topgt
13Sample MEDLINE Citation from Dataset
- .I 48200
- .U
- 87316326
- .S
- Obstet Gynecol Clin North Am 8712 14(1)299-320
- .M
- Drug Administration Schedule Endometrium/DE/PA
Estrogens/AD/PD Female Human Menopause/DE
Middle Age Norethindrone/AA/AD/PD Support,
Non-U.S. Gov't Uterine Hemorrhage/ET/PA Uterine
Neoplasms/CI/PC. - .T
- The effects of estrogens and progestogens on the
endometrium. Modern approach to treatment. - .P
- JOURNAL ARTICLE REVIEW.
- .W
- The major hazard of postmenopausal cyclic
estrogen therapy is endometrial hyperstimulation.
The incidence of hyperplasia is dose dependent
the incidence of carcinoma is both dose and
duration dependent. The risk of carcinoma is
small. Invasive procedures such as endometrial
biopsy to detect those patients with hyperplasia
and carcinoma are unlikely to be cost-effective
and have other disadvantages. - .A
- Whitehead MI Fraser D.
14Methodology Baseline
- Phrasing
- Phrasing techniques were not included in the
study. - Initial testing with 2-word phrasing showed no
significant increase in precision. - Significantly increased the size of the index,
and index build time. - Stemming
- Standard Porter and Lovins stemmers significantly
reduce precision in early testing. - Would have also increased the number of
variables in the study. - Environment
- The SimpleIR IR engine used in Information
Retrieval classes at IIT was enhanced to include
weighted terms, support for the tested
information retrieval strategies, and modified to
improve scalability. - All runs were performed on a 400 MHz Pentium III
Dell laptop with 500M of memory.
15Results - Baseline retrieval strategy
Identify Document Term Weights
16Results - Baseline retrieval strategy
Identify Retrieval Strategy with Constant
Relevance Feedback Parameters 3 additional
terms from top 10 documents with one additional
iteration, feedback term weight
termFrequencyidfidf
17Expansion using MeSH and title terms
Query Term Expansion MeSH terms, Title terms 3
additional terms from top 10 documents with one
additional iteration
18Identify query expansion parameters
Term Relevance (term count, idf, term count and
idf) 3 additional terms from top 10 documents
with one additional iteration, dot product SC,
max(1, 0.1tf) document term weighting, query
expansion with MeSH terms only
19Identify query expansion parameters
Number of Top Documents to Include In Term
Selection 3 additional terms from top documents
with one additional iteration, dot product SC,
max(1, 0.1tf) document term weighting, query
expansion with MeSH terms only, term relevance
measure of termFrequencyidf.
20Identify query expansion parameters
Number of Terms Added To Query Terms 15 top
documents with one additional iteration, dot
product SC, max(1, 0.1tf) document term
weighting, query expansion with MeSH terms only,
term relevance measure of termFrequencyidf.
21Identify query expansion parameters
Number of Feedback Iterations 3 terms selected
from 15 top documents, dot product SC, max(1,
0.1tf) document term weighting, query expansion
with MeSH terms only, term relevance measure of
termFrequencyidf.
22Summary - RF Parameters
- Optimal parameters
- Select the 3 most discriminating terms
- Use tfidf term selection criteria
- Select from the 15 most highly ranked documents
- Use one feedback iteration.
- These relevance feedback parameters yielded an
average precision of .3590 over 11-points of
recall and represented an increase of 21.5 over
the average precision of .2955 achieved without
any relevance feedback
23RF vs. No RF Precision
24RF vs. No RF Precision at n Docs
25Discussion
Pseudo RF using MeSH terms uniformly increased
precision at all points of recall. Results
exceeded the improvement in average precision in
similar studies by Hersh 21, Srinivasan 11,
and Aronson 1, though they cant be directly
compared due to differences in underlying IR
strategies and utilities. Results indicate a
broad based robustness of utilizing relevance
feedback to improve precision. Results and
tuning parameters, may be subject to some bias
introduced by the order in which we conducted our
experiments. By RFs very nature of expanding
queries with terms from top ranked documents,
relevance feedback is bound to provide
improvement since we are utilizing the criteria
we have defined for relevance to identify
relevant. terms.
26Future Work
- Apply adaptive learning methods at each iteration
of RF. - Restrict MeSH term expansion for highly ranked
retrieved documents to MetaThesaurus concepts
related to the initial query. - Expansion using Subject heading/sub-headings
relations. - fuzzy term matcher utilizing machine learning.
- Explore causes of rare diseases by looking for
indirect links in different subsets of MEDLINE
biomedical literature (Swanson 15). - UMLS co-occurrence statistics.
- Add page Score algorithm to help identify gold
standard papers. - Full OSHUMED data set.
- Automated weighting of feedback terms.
27Jay Urbain
- Professional
- 7/99 to Present. Principal, Upstream Development
LLC - 8/03 to Present. Lecturer, Milwaukee School of
Engineering, EECS. - 9/00 to Present. Certified Instructor, Learning
Tree International - 7/97 to 7/99. VP Software, ThinkMed LLC
- 3/93 to 7/97. Senior Engineering Manager,
Marquette Medical Systems - 2/86 to 8/87. Senior Systems Engineer, Rockwell,
Computer Vision Group - 4/84 to 2/86. Senior Software Engineer, GCA Corp.
- 1/82 to 4/84. Software Engineer, Northrop Defense
Systems - Academic
- Post Grad. Studies in Computer Science, IIT,
Chicago - MBA, May 97, UNIVERSITY OF WISCONSIN, Madison
- MSEE, Computer Engineering, 1986, IIT, Chicago
- BS Engineering, Computer Info. Systems, 1981,
UNIVERSITY OF ILLINOIS, Chicago - Certifications
- Sun Certified Enterprise Architect for Java
Platform Enterprise Edition Technology - Sun Certified Programmer for the Java 2 Platform
28Old Stuff follows
29Retrieval Strategy VSM
- VSM represents each document and query as a
vector of terms. - Measure similarity (SC) by taking dot product, or
measuring angle (Cosine) between them. - Documents most similar to the query are deemed to
be most relevant. - Each term in a document and a query is weighted
by term frequency and inverse document frequency
score (IDF). - IDF provides basic measure of how discriminating
a term is in identifying a document
30Retrieval Strategy VSM
- Measure similarity (SC) by taking dot product, or
measuring angle (Cosine) between document and
query vector. - Each term in a document and a query is weighted
by term frequency and inverse document frequency
score (IDF). - IDF log(N/df) // N total of docs df of
docs with term - SC ? (tfdidf tfqidf) // dot product.
- SC ? (tfdidf tfqidf)/nd // cosine
normalized by document length.
31Retrieval Strategy Probabilistic
- Use probability estimate of the chance of a
document being relevant to a query. - Odds a single term will be in a relevant
document - P(R t1) (num relevant with t1 / num
relevant) - (num with t1 / all documents)
- Probability for multiple terms
- P(R t1, t2, , tn) P(R t1) x P(R t1) x
P(R t2) x x P(R tn)
32Retrieval Strategy Issues
- Small size of the query vector makes finding a
higher percentage of relevant documents difficult
since query vector contains only a few relevant
terms. - In a MEDLINE application, it is difficult for the
average user to identify relevant MeSH terms to
facilitate high precision searches. - Relevance feedback can address both issues by
adjusting the query vector through term expansion
of highly relevant/discriminating terms.
33Relevance Feedback
- Automatically expand query by adding terms not in
the original query, but in the ranked relevant
documents. - The more like this link available with search
engines allows user to select which documents are
relevant. - In Rocchios approach
- VSM is used to rank documents.
- After feedback from the user, relevant document
vectors are added to the query, and non-relevant
document vectors are removed
34Relevance Feedback
- Many extensions of Rocchio's algorithm have been
proposed, like the Ide regular algorithm and Ide
dec-hi algorithm. - Salton and Buckley found a variation of Ide
dec-hi method, where relevant terms from top
ranked docs are added directly to the queries,
to be the most effective technique in their
study.
35Pseudo RF Approach
- Assume the top ranking retrieved documents are
relevant. - Identify most relevant terms (MeSH, title,
abstract) - Term frequency
- How effectively a term uniquely identifies a
document (IDF). - Add most relevant terms to query.
- Re-execute the expanded query.
- Repeat process (if helpful).
36(No Transcript)
37Prior Work - Hersh
- Hersh, Price, and Donohoe 21 conducted a study
assessing thesaurus-based query expansion using
the MetaThesaurus. - OHSUMED test collection 106 queries, 350,000
MEDLINE refs. - Created MetaThesaurus concepts for each query.
- Used SAPHIRE concept matching program to identify
best terms - Used synonym, hierarchical, and related term
information - Found all types of query expansion degraded
aggregate retrieval performance as measured by
recall and precision. - Some queries showed some improvement.
38Prior Work - Srinivasan
- Srinivasan demonstrated effectiveness of
expanding MEDLINE queries with three expansion
strategies - expansion of MeSH terms.
- expansion of document free text.
- expansion of both MeSH and document free text
terms. - Used subset of OHSUMED test collection of 75
queries and 2,344 MEDLINE documents. - Results showed significant improvement for all
three strategies. - Showed that expansion of MeSH terms outperformed
free text query expansion. - MeSH term expansion improved 11-AvgP by 14.4
over baseline of .5169.
39Prior Work - Aranson
- Aronson, et. al. 1 used the UMLS MetaMapTM
program for associating UMLS Metathesaurus
concepts with terms in the original query. - Query expansion included human assigned MeSH
terms, along with terms through MetaThesaurus
lookup. - Used probabilistic retrieval strategy.
- Used same OHSUMED test collection as Srinivasan
(75 queries and 2,344 MEDLINE documents. - Results compared favorable to automated relevance
feedback and the results of Srinivasan (2.2 to
9.4 over baseline). - Authors suggest an optimal strategy is to use
both UMLS Metathesaurus and some form of
automated relevance feedback.
40Prior Work - Yang and Chute
- Yang and Chute 14 utilized a least squared
error technique to map query terms to MeSH terms
in the MetaThesaurus. - They reported a 32.2 improvement in average
precision over their baseline. - Difficult to compare results since they utilized
a custom collection. - An additional concern is the risk of over-fitting
their data by utilizing 88 of their data as the
training set.
41Prior Work - Mao and Chu
- Mao and Chu8 use a variant on query expansion
through the use of concepts and relations in the
MetaThesaurus. - Define a phrase as consisting of word stems and
concepts. - The similarity between two phrases is jointly
determined by their conceptual similarity and
their common word stems. - Concepts represent a UMLS concept identifier,
e.g., high body temperature is a hyponym of
hyperthermia, and hyperthermia is synonymous
with fever. - Their results showed an increase of 16 in
precision, but they only tested with one query.
42Prior Work - Mendonca and Cimino
- Mendonca and Cimino 23 investigated the
utilization of co-occurrence of MeSH terms in
MEDLINE citations. - Interested in terms associated with the search
strategies optimal for evidencebased medicine to
automate construction of a knowledge base. - Used UMLS MRCOC MRREL table.
- Found good specificity and sensitivity for
co-occurrence of MeSH terms in MEDLINE citations.
- These results lend support to the notion of
utilizing MeSH terms from highly ranked documents
as part of a query expansion strategy for
relevance feedback.