Event Extraction: Learning from Corpora - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Event Extraction: Learning from Corpora

Description:

Source: Wall Street Journal. Training corpus: ~ 6,000 articles. Test corpus: ... IR-style. documents matching at least one pattern ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 19
Provided by: ralp96
Learn more at: https://cs.nyu.edu
Category:

less

Transcript and Presenter's Notes

Title: Event Extraction: Learning from Corpora


1
Event ExtractionLearning from Corpora
  • Prepared by Ralph Grishman
  • Based on research and slides by Roman Yangarber

2
Finding Patterns
  • How can we collect patterns?
  • Supervised learning
  • mark information to be extracted in text
  • collect information and context specific
    patterns
  • generalize patterns
  • Annotation quite expensive
  • Zipfian distribution of patterns means that
    annotation of consecutive text is inefficient
    the same pattern is annotated many times

3
Unsupervised learning?
  • The intuition
  • if we collect documents DR relevant to the
    scenario, patterns relevant to the scenario will
    occur more frequently in DR than in the language
    as a whole
  • (cf. sublanguage predicates in Harriss
    distributional analysis)

4
Riloff 96
  • Corpus manually divided into relevant and
    irrelevant documents
  • Collect patterns around each noun phrase
  • Score patterns by R log Fwhere R
    relevance rate freq in relevant docs /
    overall freq
  • Select top-ranked patterns
  • These patterns each find one template
    slotcombining filled slots into templates is a
    separate task

5
Extending the Discovery Procedure
  • Finding relevant documents automatically
  • Yangarber use patterns to select documents
  • Sudo use keywords and IR engine
  • Defining larger patterns (covering several
    template slots)
  • Yangarber clause structures
  • Nobata Sudo larger structures

6
Automated Extraction Pattern Discovery
  • Goal find examples / patterns relevant to a
    given scenariowithout any corpus tagging
    (Yangarber 00)
  • Method
  • identify a few seed patterns for scenario
  • retrieve documents containing patterns
  • find subject-verb-object pattern with
  • high frequency in retrieved documents
  • relatively high frequency in retrieved docs vs.
    other docs
  • add pattern to seed and repeat

7
1 pick seed pattern
  • Seed lt person retires gt

8
2 retrieve relevant documents
  • Seed lt person retires gt

Fred retired. ... Harry was named president.
Maki retired. ... Yuki was named president.
Relevant documents
Otherdocuments
9
3 pick new pattern
  • Seed lt person retires gt
  • lt person was named president gt appears in
    several relevant documents (top-ranked by
    Riloff metric)

Fred retired. ... Harry was named president.
Maki retired. ... Yuki was named president.
10
4 add new pattern to pattern set
  • Pattern set lt person retires gt
  • lt person was named president gt
  • Note new patterns added with confidence lt 1

11
Experiment
  • Task Management succession (as MUC-6)
  • Source Wall Street Journal
  • Training corpus 6,000 articles
  • Test corpus
  • 100 documents MUC-6 formal training
  • 150 documents judged manually

12
Pre-processing
  • For each document, find and classify names
  • person location organization
  • Parse document
  • (regularize passive, relative clauses, etc.)
  • For each clause, collect a candidate
    patterntuple heads of
  • subject verb direct object object/subject
    complement locative and temporal modifiers

13
Experiment two seed patterns
  • v-appoint appoint, elect, promote, name
  • v-resign resign, depart, quit, step-down
  • Run discovery procedure for 80 iterations

14
Evaluation
  • Look at discovered patterns
  • new patterns, missed in manual training
  • Document filtering
  • Slot filling

15
Discovered patterns
16
Evaluation Text Filtering
  • How effective are discovered patterns at
    selecting relevant documents?
  • IR-style
  • documents matching at least one pattern

17
(No Transcript)
18
Evaluation Slot filling
  • How effective are patterns within a complete IE
    system?
  • MUC-style IE on MUC-6 corpora
  • Caveat filtered / aligned by hand

74
27
40
52
72
60
manualMUC
54
71
62
47
70
56
manualnow
69
79
74
56
75
64
Write a Comment
User Comments (0)
About PowerShow.com