Robust Reading of Ambiguous Names in Texts - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Robust Reading of Ambiguous Names in Texts

Description:

... into the assassinations of John F. Kennedy and Martin ... President John F. Kennedy. Robust Reading of Ambiguous Names in Text. Page 13. Third Step: ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 17
Provided by: xli
Category:

less

Transcript and Presenter's Notes

Title: Robust Reading of Ambiguous Names in Texts


1
Robust Reading of Ambiguous Names in Texts
  • Xin Li, Paul Morie and Dan Roth
  • Dept. of Computer Science
  • University of Illinois at Urbana-Champaign

2
The problem
Document 1 The Justice Department has officially
ended its inquiry into the assassinations of John
F. Kennedy and Martin Luther King Jr., finding
no persuasive evidence'' to support conspiracy
theories, according to department documents. The
House Assassinations Committee concluded in 1978
that Kennedy was probably'' assassinated as the
result of a conspiracy involving a second gunman,
a finding that broke from the Warren Commission
's belief that Lee Harvey Oswald acted alone in
Dallas on Nov. 22, 1963. Document 2 In 1953,
Massachusetts Sen. John F. Kennedy married
Jacqueline Lee Bouvier in Newport, R.I. In 1960,
Democratic presidential candidate John F. Kennedy
confronted the issue of his Roman Catholic faith
by telling a Protestant group in Houston, I do
not speak for my church on public matters, and
the church does not speak for me.' Document 3
David Kennedy was born in Leicester, England in
1959.  Kennedy co-edited The New Poetry
(Bloodaxe Books 1993), and is the author of New
Relations The Refashioning Of British Poetry
1980-1994 (Seren 1996). 
3
Document 1 The Justice Department has officially
ended its inquiry into the assassinations of John
F. Kennedy and Martin Luther King Jr., finding
no persuasive evidence'' to support conspiracy
theories, according to department documents. The
House Assassinations Committee concluded in 1978
that Kennedy was probably'' assassinated as the
result of a conspiracy involving a second gunman,
a finding that broke from the Warren Commission
's belief that Lee Harvey Oswald acted alone in
Dallas on Nov. 22, 1963. Document 2 In 1953,
Massachusetts Sen. John F. Kennedy married
Jacqueline Lee Bouvier in Newport, R.I. In 1960,
Democratic presidential candidate John F. Kennedy
confronted the issue of his Roman Catholic faith
by telling a Protestant group in Houston, I do
not speak for my church on public matters, and
the church does not speak for me.' Document 3
David Kennedy was born in Leicester, England in
1959.  Kennedy co-edited The New Poetry
(Bloodaxe Books 1993), and is the author of New
Relations The Refashioning Of British Poetry
1980-1994 (Seren 1996). 
4
Robust Reading of Ambiguous Names
We identify different entities that are mentioned
in text, and map mentions, within and across
documents to the corresponding entity.
5
Problems
  • Entity Identity
  • Do Mentions A and B refer to the same entity?
  • Name Expansion
  • Given a writing of a name, find other likely
    writings of the same entity.
  • Prominence
  • Whats Bushs foreign policy?
  • Find the most prominent Bush.

6
Why is this problem important?
  • Intelligent access to textual information
    requires identifying entities and discovering
    knowledge about entities from text.
  • Information Extraction extract knowledge about
    entities
  • Question Answering Answer English questions
    automatically
  • Most research in NLP is still done with
    individual mentions of an entity.
  • We would like to start moving from mentions to
    concepts and treat mentions as a whole based on
    the real-world entities they refer to.

7
Our solution
  • We developed machine learning techniques to this
    problem.
  • They are based on a natural generation process
    of documents.
  • Data Collection New York Times news articles
    and Yahoo movie databases.

8
A generative model
  • A natural process of how documents are
    generated.
  • A probabilistic view of how documents are
    generated and how "mentions" of entities are 
    "sprinkled into them.
  • Entity identification through inference, once the
    model is learned. Learning is done in an
    unsupervised way.

9
Generate a document d
The Justice Department has officially ended its
inquiry into the assassinations of President John
F. Kennedy and Martin Luther King Jr., finding
no persuasive evidence'' to support conspiracy
theories, according to department documents. The
House Assassinations Committee concluded in 1978
that Kennedy was probably'' assassinated as the
result of a conspiracy involving a second gunman,
a finding that broke from the Warren Commission's
belief that Lee Harvey Oswald acted alone in
Dallas on Nov. 22, 1963. President KennedyJFK
10
At the beginning, we have a set of entities in
our mind
A set of entities E
The Justice Department
Dallas
The House Assassinations Committee
David Kennedy
11
First Step Select a subset of entities for a
document d. Underlying probability
distribution P(Ed).
The Justice Department
Dallas
The House Assassinations Committee
Ed the set of entities in a document d
12
Second Step For each entity e, select a
representative r. Underlying probability
distribution P(re) and P(RdEd)? P(re).
Rd the set of representatives in document d
13
Third Step For each representative r, select a
set of mentions m. Underlying probability
distributions P(mr) and P(MdRd)? P(mr).
Kennedy, JFK President Kennedy
President John F. Kennedy
Md the set of actual mentions in document d
14
Generate a document d
The Justice Department has officially ended its
inquiry into the assassinations of President John
F. Kennedy and Martin Luther King Jr., finding
no persuasive evidence'' to support conspiracy
theories, according to department documents. The
House Assassinations Committee concluded in 1978
that Kennedy was probably'' assassinated as the
result of a conspiracy involving a second gunman,
a finding that broke from the Warren Commission's
belief that Lee Harvey Oswald acted alone in
Dallas on Nov. 22, 1963. President KennedyJFK
15
Robust Reading
  • Assuming we have the model, the fundamental
    problem is to decide what entities are mentioned
    in a given document and what is the most likely
    entity referred by each mention. (Li, Morie and
    Roth, HLT-NAACL 2004)
  • Ed argmax (Ed,Rd) P(Ed,Rd Md, ?)
  • argmax (Ed,Rd) P(Ed,Rd, Md ?)

16
Significant Applications
  • Based on the work, we can implement an analysis
    tool that can be used to browse, retrieve and
    track information about entities using textual
    resources. The tool can
  • Automatically retrieve knowledge about specific
    persons and locations,
  • Automatically extract relations between them,
  • Automatically build and unify important databases.
Write a Comment
User Comments (0)
About PowerShow.com