Learning to Extract Genic Interactions Using Gleaner

About This Presentation

Title:

Learning to Extract Genic Interactions Using Gleaner

Description:

Mark Goadrich, Louis Oliphant and Jude Shavlik. Department of Computer Sciences ... Learning Language in Logic. Biomedical Information ... UW Condor Group ... – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 25

Provided by: laura71

Category:

more less

Transcript and Presenter's Notes

Title: Learning to Extract Genic Interactions Using Gleaner

1
Learning to ExtractGenic InteractionsUsing
Gleaner

LLL05 Workshop, 7 August 2005
ICML 2005, Bonn, Germany
Mark Goadrich, Louis Oliphant and Jude Shavlik
Department of Computer Sciences
University of Wisconsin Madison USA

2
Learning Language in Logic

Biomedical Information Extraction Challenge
Two tasks with and without co-reference
80 sentences for training
40 sentences for testing
Our approach Gleaner (ILP 04)
Fast ensemble ILP algorithm
Focused on recall and precision evaluation

3
A Sample Positive Example

Given Medical Journal abstracts tagged
with genic interaction relations
Do Construct system to extract genic
interaction phrases from unseen text
ykuD was transcribed by SigK RNA polymerase
from T4 of sporulation.

4
What is a Negative Example?

All unlabeled word pairings?
Wastes time with irrelevant words
We know the testset will include a dictionary
Use only unlabeled pairings of words in
dictionary
106 positive, 414 negative without co-reference
59 positive, 261 negative with co-reference

5
Tagging and Parsing

ykuD was transcribed by SigK RNA

6
Some Additional Predicates

High-scoring words in agent phrases
depend, bind, protein,
High-scoring words in target phrases
gene, promote, product
High-scoring BETWEEN agent target
negative, regulate, transcribe,
Medical Subject Headings (MeSH)
canonized method for indexing biomedical articles
in_mesh(RNA), in_mesh(gene)

7
Even More Predicates

Lexical Predicates
Internal_caps(Word)
alphanumeric(Word)
Look-ahead Phrase Predicates
few_POS_in_phrase(Phrase, POS)
phrase_contains_specific_word_triple(Phrase, W1,
W2, W3)
phrase_contains_some_marked_up_arg(Phrase, Arg,
Word, Fold)
Relative Location of Phrases
agent_before_target(ExampleID)
word_pair_in_between_target_phrases(ExampleID,
W1, W2)

8
Enriched Data From Committee

Link Parser (CMU) creates parse tree
Root lemma of each word (not used)
27 Syntactic Information Predicates
complement_of_N_N(Word, Word)
modifier_ADV_V(Word, Word)
object_V_Passive_N(Word, Word)

9
Gleaner

Definition of Gleaner
One who gathers grain left behind by reapers
Key Ideas of Gleaner
Use Aleph as underlying ILP clause engine
Keep wide range of clauses usually discarded
Create separate theories for different recall
ranges

10
Aleph - Background

Seed Example
A positive example that our clause must cover
Bottom Clause
All predicates which are true about seed example

11
Aleph - Learning

Aleph learns theories of clauses (Srinivasan,
v4, 2003)
Pick positive seed example, find bottom clause
Use heuristic search to find best clause
Pick new seed from uncovered positivesand repeat
until threshold of positives covered
Theory produces one recall-precision point
Learning complete theories is time-consuming
Can produce ranking with ensembles

12
Gleaner - Background

Rapid Random Restart (Zelezny et al ILP 2002)
Stochastic selection of initial clause
Time-limited local heuristic search
Randomly choose new initial clause and repeat

seed
13
Gleaner - Learning

Create B Bins
Generate Clauses
Record Best per Bin
Repeat for K seeds

Precision
Recall
14
Gleaner - Combining

Combine K clauses per bin
If at least L of K clauses match, call example
positive
How to choose L ?
L1 then high recall, low precision
LK then low recall, high precision
We want a collection of high precision theories
spanning space of recall levels

15
Gleaner - Overlap

Take topmost curve of overlapping theories

Precision
Recall
16
Gleaner - Practical Use

Generate Curve
User Selects Recall Bin
Return ClassificationsWith L of K Confidence

Precision
Recall 0.50 Precision 0.70
Recall
17
Sample Extraction Clause

agent_target(Agent, Target, Sentence) -
several_phrases_in_sentence(Sentence),
some_wordPOS_in_sentence(Sentence, novelword),
n(Agent),
alphabetic(Agent),
word_parent(Agent, F),
phrase_contains_internal_cap_word(F, noun,
_),
few_POS_in_phrase(F, novelword),
in_between_target_phrases(Agent, Target, _),
n(Target).
0.14 Recall, 0.93 Precision on without
co-reference training set

18
Sample Extraction Clause

agent_target(Agent, Target, Sentence)
- avg_length_sentence(Sentence),
n(Agent),
word_previous(Target,_),
in_between_target_phrases(Agent, Target, _).
0.76 Recall, 0.49 Precision on without
co-reference training set

19
Experimental Methodology

Used other trainset for tuneset in both cases
Testset unlabeled, but dictionary provided
Included sentences with no positives
936 total testset examples generated
Parameter Settings
Gleaner (20 recall bins)
seeds 100
clauses 25,000
Aleph (0.75 minimum accruacy)
nodes 1K, 25K)

20
LLL Without Co-reference Results
Gleaner Basic
Aleph Basic 1K
Gleaner Enriched
21
LLL With Co-reference Results
Gleaner Basic
Aleph Basic 1K
Gleaner Enriched
22
We Need More Datasets

LLL Challenge task is small
Would prefer to do cross-validation
Need labels for testset
Our ILP04 dataset open to community
ftp//ftp.cs.wisc.edu/machine-learning/shavlik-gro
up/datasets/IE-protein-location
Biomedical information-extraction tasks
Genetic Disorder (Ray and Craven 2001)
Genia BioCreAtiVe

23
Conclusions

Contributions
Develop large amount of background knowledge
Exploit normally discarded clauses
Visually present precision and recall trade-off
Proposed Work
Achieve gains in High-Recall areas
Reduce overfitting when using enriched data
Increase diversity of learned clauses

24
Acknowledgements

USA DARPA Grant F30602-01-2-0571
USA Air Force Grant F30602-01-2-0571
USA NLM Grant 5T15LM007359-02
USA NLM Grant 1R01LM07050-01
UW Condor Group
David Page, Vitor Santos Costa, Ines Dutra,
Soumya Ray, Marios Skounakis, Mark Craven, Burr
Settles, Jessie Davis

Write a Comment

User Comments (0)