NJIT CIS 634 Information Retrieval Fall 2002 - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

NJIT CIS 634 Information Retrieval Fall 2002

Description:

Event e8 type: succeed person1:e6 person2:e5. Fall 2002 ... Event e8 type: succeed person1:e6 person2:e1. Event e9 type: stat-job person:e6 position:e2 ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 19
Provided by: abc7232
Category:

less

Transcript and Presenter's Notes

Title: NJIT CIS 634 Information Retrieval Fall 2002


1
NJIT CIS 634 Information Retrieval Fall 2002
  • Information Extraction
  • Material
  • Information Extraction Techniques and
    Challenges, by Ralph Grishman

2
What do people want from IE?
  • Lists of relevant entities rather than lists of
    relevant documents.
  • How many companies filed bankruptcy in year 2001?
  • How many universities are there in the United
    States?

3
Definitions
  • IE is the identification of instances of a
    particular class of events or relationships in a
    natural language text, and the extraction of the
    relevant arguments of the event relationship.
  • It involves the creation of structured
    representation of selected information drawn from
    the text.

4
Example
  • Text 19 March A bomb went off this morning
    near a power tower in San Salvador leaving a
    large part of the city without energy, but no
    casualties have been reported. According to
    unofficial sources, the bomb allegedly
    detonated by urban guerrilla commandos blew up
    a power tower in the northwestern part of San
    Salvador at 0650 (1250 GMT).

5
Results
  • INCIDENT TYPE bombing
  • DATE March 19
  • LOCATION El Salvador San Salvador (city)
  • PERPETRATOR urban guerrilla commandos
  • PHYSICAL TARGET power tower
  • HUMAN TARGET -
  • EFFECT ON PHYSICAL TARGET destroyed
  • EFFECT ON HUMAN TARGET no injury or death
  • INSTRUMENT bomb

6
Top Level Overview of Processes
  • Facts are extracted from text through local text
    analysis.
  • Facts are integrated, producing larger facts or
    new facts.
  • Facts are translated into required format.
  • Domain vs scenario vs template.

7
Desired outputs
  • Scenario Sam Schwartz retired as executive vice
    president of the famous hot dog manufacturer,
    Hupplewhite, Inc. He will be succeeded by Harry
    Himmelfarb.
  • Templates
  • Event start job
  • Person Harry Himmelfarb
  • Position Executive vice president
  • Company Hupplewhite Inc.
  • --------------------------------------------------
    ---------------
  • Event leave job
  • Person Sam Schwartz
  • Position Executive vice president
  • Company Hupplewhite Inc.

8
Pattern creation and template structure building
  • Create sets of expression patterns.
  • Person retires as position
  • Person is succeeded by person.
  • Structures for templates
  • Entities
  • Events
  • (The role of patterns is to extract events or
    relationships relevant to the scenario.)

9
Local text analysis step 1Lexical Analysis
  • Text is first divided into sentences and into
    tokens.
  • Each token is looked up in the dictionaries
    (general vs specialized) to determine its
    possible parts-of-speech and features.

10
Local text analysis step 2 and 3
  • Name Recognition
  • Identifying various types of proper names and
    other special forms (e.g. dates, currency).
  • Syntactic Structure
  • Arguments are mostly noun phrases.
  • Relationships grammatical functional relations
  • Example Company-description, company-name,
  • Position of company

11
Example of syntactic structure
  • np e1 Sam Schwartz vg retired as np e2
    executive vice president of np e3 the famous
    hot dog manufacturer, np e4 Hupplewhite, Inc.
    np e5 He vg will be succeeded by np e6 Harry
    Himmelfarb.

12
Example (cont)
  • Semantic Entity
  • Entity e1 type person name Sam Schwartz
  • Entity e2 type position value executive vice
    president
  • Entity e3 type manufacturer
  • Entity e4 type company name Hupplewhite
    Inc.
  • Entity e5 type person
  • Entity e6 type person name Harry Himmelfarb
  • Updated according to pattern position of
    company
  • Entity e1 type person name Sam Schwartz
  • Entity e2 type position value executive vice
    president companye3
  • Entity e3 type manufacturer name Hupplewhite
    Inc.
  • Entity e5 type person
  • Entity e6 type person name Harry Himmelfarb

13
Local text analysis step 4 Scenario Pattern
Matching
  • Extract the events or relationships relevant to
    the scenario, which is executive succession in
    this case.
  • Person (A) is succeeded by person (B).
  • Entity e1 type person name Sam Schwartz
  • Entity e2 type position value executive vice
    president
  • Entity e3 type manufacturer name Hupplewhite
    Inc.
  • Entity e5 type person
  • Entity e6 type person name Harry Himmelfarb
  • Event e7 type leave-job persone1 positione2
  • Event e8 type succeed person1e6 person2e5

14
Discourse analysis step 1 CORE-ference Analysis
  • Resolving anaphoric references by pronouns and
    definite noun phrases
  • E5 type person (pronoun -- he)
  • It is replaced by the most recent previously
    mentioned entity of type person, which is e1 Sam
    Schwartz.

15
Discourse analysis step 2 Inferenceing and Event
Merging
  • Leave-job (X-person, Y-job) succeed (Z-person,
    X-person)
  • gt start-job (Z-person, Y-job)
  • Start-job (X-person, Y-job) succeed (X-person,
    Z-person)
  • gt leave-job (Z-person, Y-job)

16
Inferencing and Event Merging (cont)
  • Entity e1 type person name Sam Schwartz
  • Entity e2 type position value executive vice
    president company e3
  • Entity e3 type manufacturer name Hupplewhite
    Inc.
  • Entity e6 type person name Harry Himmelfarb
  • Event e7 type leave-job persone1 positione2
  • Event e8 type succeed person1e6 person2e1
  • Event e9 type stat-job persone6 positione2

17
(No Transcript)
18
Design Issues
  • To Parse or not to Parse linguistics complexity
    involved.
  • Portability low
  • Performance not satisfactory
  • 5 of 9 systems represented in MUC, MUC-6 got F
    scores range from 51 to 56, (recall 43 to 50,
    and precision 59 to 70)
  • F Score (2 X precision X recall) / (precision
    recall)
  • Precision N correct / N response
  • Recall N correct / N key
Write a Comment
User Comments (0)
About PowerShow.com