Information Extraction - PowerPoint PPT Presentation

1 / 96
About This Presentation
Title:

Information Extraction

Description:

Goal: Localization and extraction, in a specific format, of the relevant ... Typology. Introduction. Different points of view: ... – PowerPoint PPT presentation

Number of Views:192
Avg rating:3.0/5.0
Slides: 97
Provided by: llu75
Category:

less

Transcript and Presenter's Notes

Title: Information Extraction


1
Information Extraction
  • Jordi Turmo
  • TALP Research Centre
  • Dep. Llenguatges i Sistemes Informàtics
  • Universitat Politècnica de Catalunya
  • turmo_at_lsi.upc.edu
  • http//www.lsi.upc.edu/turmo

2
Summary
  • Information Extraction Systems
  • Evaluation
  • Multilinguality
  • Adaptability

3
Summary
  • Information Extraction Systems
  • Introduction
  • Historical framework
  • Architecture
  • Knowledge specific for IE
  • Examples
  • Evaluation
  • Multilinguality
  • Adaptability

4
Introduction
Definition
  • Goal Localization and extraction, in a specific
    format, of the relevant information included in a
    collection of documents
  • Input requirements scenario of extraction and
    document collection
  • Output requirements output format

5
Introduction
Typology
  • Different points of view
  • conceptual coverage restricted-domain IE vs.
    open-domain IE
  • language coverage monoligual IE vs.
    multilingual IE
  • media coverage written text IE, speech IE,
    image IE, multimedia IE
  • document type IE from free text, from
    semi-structured documents, from structured
    documents (including Web pages in HTML and XML)

6
Introduction
Typology
  • Different points of view
  • conceptual converage restricted-domain IE vs.
    open-domain IE
  • language coverage monoligual IE vs.
    multilingual IE
  • media coverage written text IE, speech IE,
    image IE, multimedia IE
  • document type IE from free text, from
    semi-structured documents, from structured
    documents (including Web pages in HTML and XML)

7
Introduction
Example 1 Structured documents
  • Web pages
  • A list of members of an organization per
  • document
  • English
  • Scenario of Extraction
  • Name, degree, school and affiliation of the
    member

8
Introduction
Example 1 Structured documents
Name Degree School Affiliation WL
Hsu PhD Cornell IIS, Sinica CS Ho PhD NTU
EE,NTIT C.Chen PhD SUNY
EE,NTIT C.Wu PhD Utexas Cedu,NNU Mark
Liao PhD NWU IIS, Sinica CJ Liau
PhD NTU IIS, Sinica WK Cheng PhD
TKU Tunghai WC Wang MS Syracus
FIT ...
9
Introduction
Example 2 Semi-structured documents
  • 485 seminar announcements
  • A description of one seminar per document
  • English
  • Scenario of Extraction
  • Speaker, location, start time and end time of the
  • seminar

10
Introduction
Example 2 Semi-structured documents
11
Introduction
Example 3 Free text
  • 318 Wall Street Journal articles
  • A description of an incident per document
  • English
  • Scenario of Extraction
  • Type of incident, perpetrator, target, date,
    location,
  • effects and instrument

12
Introduction
Example 3 Free text
Incident type bombing date March
19 Location El Salvador San Salvador
(city) Perpetrator urban guerrilla
commandos Physical target power tower Human
target - Effect on physical target destroyed Ef
fect on human target no injury or
death Instrument bomb
13
Introduction
Example 4 Free text
  • 78 documents
  • A description of mushroom per document
  • Spanish
  • Scenario of Extraction
  • colors of parts of mushrooms and the
    circumstances
  • in which they occur

14
Introduction
Example 4 Free text
15
Introduction
Example 4 Free text
El color blanco de su sombrero pasa a amarillo
crema al corte. El sombrero ennegrece si se corta.
color_1 base blanco tono indef luz indef
Sombrero_1 color
virar_1 inicio final causa corte
color_2 base amarillo tono crema luz indef
Sombrero_2 color
virar_2 inicio indef final causa corte
color_3 base indef tono negro luz indef
16
Introduction
Example 5 Combination
  • 78 documents
  • A description of mushroom per document
  • Spanish
  • Scenario of Extraction
  • Names of the mushroom in different languages,
    ethimology
  • colors of parts of mushrooms and the
    circumstances
  • in which they occur

17
Introduction
Example 5 Combination
18
Introduction
Applications
  • IE from the Web
  • Building of news DBs
  • Information Integration
  • Support for QA and Summarization
  • Limitation when Plt80

19
Introduction
References
  • D.E. Appelt, D.J. Israel, 1999
  • E. Hovy, 1999
  • R.J. Mooney, C. Cardie, 1999
  • Muslea, 1999
  • J. Cowie, Y. Wilks, 2000
  • M.T. Pazienza, 2003
  • Turmo, 2003
  • Turmo et al. 2005

20
Introduction
Recent events
  • IJCAI 2001 Workshop on Adaptive Text Extraction
    and Mining (ATEM-2001)
  • ECML 03/PKDD Workshop on Adaptive Text Extraction
    and Mining (ATEM-2003)
  • AAAI 04 Workshop on Adaptive Text Extraction and
    Mining (ATEM-2004)
  • EACL 06 Workshop on Adaptive Text Extraction and
    Mining (ATEM-2006)
  • COLING-ACL 06 Workshop on Information Extraction
    Beyond the Document
  • ECAI 06 Workshop on Adaptive Text Extraction and
    Mining (ATEM-2006)

21
Summary
  • Information Extraction Systems
  • Introduction
  • Historical framework
  • Architecture
  • Knowledge specific for IE
  • Examples
  • Evaluation
  • Multilinguality
  • Adaptability

22
Historical framework
Origin of IE
  • Acquisition of the relevant information involved
    in knowledge-based systems
  • Traditionally (High human cost)

23
Historical framework
Origin of IE
  • Acquisition of the relevant information involved
    in knowledge-based systems
  • 80s (text sources)

Relevant Information
24
Historical framework
Origin of IE
  • Text-Based Intelligent Systems (TBIS)
  • Information Retrieval
  • Information Integration
  • Information Filtering
  • Information Routing
  • Information Extraction
  • Document Classification
  • Question Answering
  • Automatic Summarization
  • Topic Detection Tracking
  • ...

25
Historical framework
Relevant Historical Programs
  • Precedents LSP (Sager, 81), FRUMP (DeJong, 82),
  • JASPER (Hayes, 86)
  • in USA
  • (1987-1991) MUC US Navy
  • TIPSTER (1991-1998) MUC DARPA
  • TIDES (1999-) ACE NIST
  • in Europe
  • LRE (1993-1996) TREE, AVENTINUS, FACILE, ECRAN,
    SPARKLE
  • PASCAL excellence network (2003-)

26
Historical framework
MUC Evolution
  • MUC-1 (1987)
  • naval operations
  • auto-definition of scenarios
  • auto-evaluation
  • MUC-2 (1989)
  • naval operations
  • output structure with 10 attributes
  • (type of event, agent, place, ...)
  • auto-evaluation

27
Historical framework
MUC Evolution
  • MUC-3 (1991),
  • Latin-American terrorism
  • output structure with 18 attributes
  • (type of incident, date, place, ...)
  • recall and precision measures

a
extracted a b e f relevant a f
d recall a 0.5 f/ (a f d) precision a
0.5 f/ (a f b e)
extracted
f
b
e
d
c
parcially extracted
relevant
28
Historical framework
MUC Evolution
  • MUC-4 (1992),
  • Latin-American terrorism
  • 24 attributes
  • F-score (harmonic average)
  • MUC-5 (1993),
  • Financial news, microelectronics
  • English, Japanese

29
Historical framework
MUC Evolution
  • MUC-6 (1995),
  • finantial news
  • subtasks NE, coreference
  • tasks TE (template element), ST (scenario
    template)
  • MUC-7 (1998),
  • air crashes
  • new task TR (template relation)

30
Historical framework
MUC Evolution
  • MUC-6, MUC-7
  • Partial extractions are discarded

extracted a b relevant a d recall a /
(a d) precision a / (a b)
31
Summary
  • Information Extraction Systems
  • Introduction
  • Historical framework
  • Architecture
  • Knowledge specific for IE
  • Examples
  • Evaluation
  • Multilinguality
  • Adaptability

32
Architecture
General Architecture
  • Hobbs,93
  • Cascade of transducers (or modules) that add
    structure to text and, often, drop out irrelevant
    information by applying rules

33
Architecture
Traditional Architecture
Document Preprocessing
Conceptual Hierarchy
Pattern Matching
Pattern Base
Postprocess
34
Architecture
Traditional Architecture
Text Control
Lexical Analysis
Conceptual Hierarchy
Syntactic Analysis
Pattern Matching
Pattern Base
Postprocess
35
Architecture
Traditional Architecture
Text Control
Lexical Analysis
Conceptual Hierarchy
Syntactic Analysis
Pattern Matching
Pattern Base
Discourse Analysis
Output Template Generation
Output Format
36
Architecture
Architecture
Text control
  • Filtering relevant documents
  • Guessing the language of the documents
  • Splitting documents into textual zones
  • Filtering relevant zones
  • Splitting text into appropriate units (eg.
    sentences)
  • Filtering relevant units
  • Tokenizing units

37
Architecture
Architecture
Text control
  • Example

38
Architecture
Architecture
Text control
  • Example

ltSombrero bastante carnoso de 4 a 8 cm , convexo
, luego completamente extendido , aplanado y
mamelonado , liso , húmedo e higrófano .gt ltEsta
última condición influye en la variabilidad de su
coloración desde canela claro a toda la gama de
tostados .gt ltCon la edad generalmente palidece
sus tonos .gt ltPuede confundirse con otras
foliotas comestibles , pero alguna especie es
amarga . gtltLos aficionados poco experimentados
pueden también confundir este género con otros no
comestibles , como Hypholoma y Flacemula ,
también lignícolas.gt
39
Architecture
Architecture
Lexical analysis
  • Identifying morpho-syntactic categories and
    semantic categories of words
  • General lexicon
  • Recognizing terminology words
  • Specific dictionaries
  • Recognizing time expressions, quantities,
    abbreviations,
  • Extending abbreviations
  • Lists of abbrev. expansion

40
Architecture
Architecture
Lexical analysis
  • Recognizing and classifying proper nouns (Named
    Entities NERC-)
  • Gazetteers
  • Patterns
  • Dealing with unknown words
  • Dealing with lexical ambiguities
  • POS taggers
  • WSD (???)

41
Architecture
Architecture
Lexical analysis
  • Example1

time expressions mushroom names abbreviatures numb
ers morphologic parts
ltSombrero bastante carnoso de 4 a 8 cm , convexo
, luego completamente extendido , aplanado y
mamelonado , liso , húmedo e higrófano .gt ltEsta
última condición influye en la variabilidad de su
coloración desde canela claro a toda la gama de
tostados .gt ltCon la edad generalmente palidece
sus tonos .gt ltPuede confundirse con otras
foliotas comestibles , pero alguna especie es
amarga . gtltLos aficionados poco experimentados
pueden también confundir este género con otros no
comestibles , como Hypholoma y Flacemula ,
también lignícolas.gt
Depends on the scenario
42
Architecture
Architecture
Lexical analysis
  • Example2

ltA bomb went off this morning near a power tower
in San Salvador leaving a large part of the city
without energy , but no casualties have been
reported .gt ltAccording to unofficial sources ,
the bomb-allegedly detonated by urban guerrilla
commandos- blew up a power tower in the
northwestern part of San Salvador at 0650 .gt
time expressions locations organizations persons
43
Architecture
Architecture
Syntactic analysis
  • Full parsing (Lolita, LaSIE, LaSIE-II)
  • inefficient, sizes of the grammars
  • missing robustness (off vocabulary)
  • treebank grammars
  • cascaded grammars
  • Solves some problems related to the tuning and
    incompleteness

44
Architecture
Architecture
Syntactic analysis
  • Partial parsing
  • the most commonly used
  • chunks or phrasal trees (noun phrases, verbal
    phrases, prep phrases, adj phrases, adv phrases)
  • absence of global dependences

45
Architecture
Architecture
Semantic interpretation
  • Compositive semantics
  • full parsing ?-expressions
  • LaSIE, LaSIE-II
  • Entries with ?-expressions in the Lexicons
  • partial parsing gramatical relations
    Vilain,99
  • output logical forms

46
Architecture
Architecture
Semantic interpretation
  • Compositive semantics (example1)

?(z) ?(y) ?(x) (bombing(x,y,z,bomb,today_morning,p
ower_tower(San_Salvador)))
s
vp
pp
np
np np
pp
A bomb went off this morning near a power tower
in San Salvador
go_off ? ?(t) ?(s) ?(r) ?(z) ?(y) ?(x)
(bombing(x,y,z,r,s,t))
power_tower ? ?(x) (power_tower(x))
47
Architecture
Architecture
Semantic interpretation
  • Compositive semantics (example2)

location_of
place
subj
time
A bomb went off this morning near a power tower
in San Salvador
event(bombing , E) subj(bomb , E) time(today_morni
ng , E) place(power_tower, E) location_of(power_to
wer, San_Salvador)
48
Architecture
Architecture
Semantic interpretation
  • Pattern matching
  • after partial parsing svo dependences
  • the most extended
  • patterns can be implemented in different ways
  • scenario driven approach (TE, TR, ST, )
  • Output partial templates

49
Architecture
Architecture
Semantic interpretation
  • Pattern matching (example)

A bomb went off this morning near a power tower
in San Salvador
np(C-instrument) vp(go_off) np(C-time)
near np(C-place) in np(C-location) ? INSTRUMEN
T C-instrument DATE C-time PHIS_TARGET
C-place LOCATION C-location
50
Architecture
Architecture
Discourse analysis
  • Inter-sentence analysis
  • Co-reference resolution
  • Ellipsis resolution
  • Alias resolution
  • Traditional semantic interpretation procedures
  • Template merging procedures
  • Inference procedures
  • Open-domain and domain-specific knowledge for
    inferences

51
Architecture
Architecture
Discourse analysis
  • Example

A bomb went off this morning near a power tower
in San Salvador , but no casualties have been
reported
?(y) ?(x) (bombing(x,y,no_casualties,bomb,today_mo
rning, power_tower(San_Salvador)))
According to unofficial sources , the bomb
-allegedly detonated by urban guerrilla
commandos- blew up a power tower in the
northwestern part of San Salvador at 0650
?(z) ?(y) (bombing(urban_guerrilla_comandos,y,z,bo
mb,0650, power_tower(the_northwestern_part_of_San_
Salvador)))
52
Architecture
Architecture
Discourse analysis
  • Example

?(y) ?(x) (bombing(x,y,no_casualties,bomb,today_mo
rning, power_tower(San_Salvador)))
?(z) ?(y) (bombing(urban_guerrilla_comandos,y,z,bo
mb,0650, power_tower( the_northwestern_part_of_San
_Salvador)))
Unification inference
?(y) (bombing(urban_guerrilla_comandos,y,no_casual
ties,bomb,today_morning,power_tower(San_Salvador))
)
Inference (blew_up ? destroyed)
bombing(urban_guerrilla_comandos,destroyed,no_casu
alties,bomb, today_morning,power_tower(San_Salvado
r))
53
Architecture
Architecture
Output template generation
  • Mapping of the extracted pieces onto the desired
    output format
  • Specific inferences
  • Normalization to predefined values of slots
  • Mandatory slots
  • Extracted information that implies different
    slot values

54
Architecture
Architecture
Output template generation
  • Example

bombing(urban_guerrilla_comandos,destroyed,no_casu
alties,bomb, today_morning,power_tower(San_Salvado
r))
Today_morning ? March_19 No_casualties
no_injuries_or_death
Incident type bombing date March
19 Location El Salvador San Salvador
(city) Perpetrator urban guerrilla
commandos Physical target power tower Human
target - Effect on physical target destroyed Ef
fect on human target no injury or
death Instrument bomb
55
Summary
  • Information Extraction Systems
  • Introduction
  • Historical framework
  • Architecture
  • Knowledge specific for IE
  • Examples
  • Evaluation
  • Multilinguality
  • Adaptability

56
Knowledge specific for IE
Characteristics of IE systems
  • Strong dependence of the domain
  • Scenario of extraction
  • Semantics vs. syntax
  • Discourse analysis
  • Strong dependence of the text structure
  • Sublanguages
  • Meta-information
  • Strong dependence of the output format
  • BDs
  • annotations

57
Knowledge specific for IE
Characteristics of IE systems
  • Importance of the portability and tuning
  • Importance of the Knowledge Engineering
  • Modularity
  • Basic tasks and specific tasks
  • Use of weak and local knowledge
  • Importance of the NL resources
  • MDRs, ontologies, general lexicons, specific
    dictionaries,

58
Knowledge specific for IE
Knowledge resources
  • Knowledge more or less stable
  • general lexicon
  • general grammar
  • basic NL processors segmenters, taggers,
    parsers,
  • Domain dependent knowledge
  • Domain specific vocabularies, terminology
  • gazetteers and patterns for NERC
  • IE patterns

59
Knowledge specific for IE
Types of IE patterns
  • Viewpoint 1 type of representation
  • rules

np(C-instrument) vp(go_off) np(C-time)
near np(C-place) in np(C-location) ? EventIN
STRUMENT C-instrument EventDATE
C-time EventPHIS_TARGET C-place
EventLOCATION C-location
60
Knowledge specific for IE
Types of IE patterns
  • Viewpoint 1 type of representation
  • statistical models (BNs, HMMs, ME, Hyperplanes,
    )

61
Knowledge specific for IE
Types of IE patterns
  • Viewpoint 2 type of values extracted
  • slot filler extraction patterns
  • (the HMM presented before)

62
Knowledge specific for IE
Types of IE patterns
  • Viewpoint 2 type of values extracted
  • slot filler extraction patterns
  • (the HMM presented before)
  • event extraction patterns
  • (the rule presented before)

np(C-instrument) vp(go_off) np(C-time)
near np(C-place) in np(C-location) ? EventINS
TRUMENT C-instrument EventDATE
C-time EventPHIS_TARGET C-place
EventLOCATION C-location
63
Knowledge specific for IE
Types of IE patterns
  • Point of view type of values extracted
  • slot filler extraction patterns
  • (the HMM presented before)
  • event extraction patterns
  • (the rule presented before)

64
Knowledge specific for IE
Types of IE patterns
  • Viewpoint 3 number of slot fillers extracted
  • single-slot IE patterns
  • (the HMM presented before)
  • multi-slot IE patterns
  • (both rules presented before)

65
Summary
  • Information Extraction Systems
  • Introduction
  • Historical framework
  • Architecture
  • Knowledge specific for IE
  • Examples
  • Evaluation
  • Multilinguality
  • Adaptability

66
Examples of IE systems
Methodologies Turmo,2002
System Reference Parsing
Semantics Discourse
LaSIE LaSIE-II LOLITA CIRCUS FASTUS BADGER HASTEN
PROTEUS ALEMBIC PIE TURBIO PLUM IE2 LOUELLA SIFT
Gaizauskas et al, 1995 Humphreys et al,
1998 Garigliano et al, 1998 Lehnert et al,
1991 Hobbs et al, 1993 Fisher et al, 1995 Krupka,
1995 Grishman, 1995 Aberdeen et al, 1993 Lin,
1995 Turmo,2002 Weischedel et al, 1995 Aone et
al, 1998 Childs et al, 1995 Miller et al, 1998
indepth understanding
template
merging Chunking Pattern
matching -
semantic Gramm relations
interp interpretation
procedures Partial Parsing pattern
matching Pattern matching
template merging -
sintactico-semantic parsing
67
Examples of IE systems
Knowledge Turmo,2002
System Parsing
Semantics Discourse
LaSIE LaSIE-II LOLITA CIRCUS FASTUS BADGER HASTEN
PROTEUS ALEMBIC TURBIO PIE PLUM IE2 LOUELLA SIFT
Treebank grammar ?-expressions hand-c
rafted stratified general grammar General
grammar semantic network
concept nodes (AutoSlog)
hand-crafted IE rules concept
nodes (CRYSTAL) decision trees Phrasal
grammar E-graphs IE
rules (ExDISCO)
hand-crafted gram relations
IE rules (EVIUS) General grammar
hand-crafted IE rules
hand-crafted rules hand-crafted IE rules
decision trees Statistical models for
syntactic-semantic parsing coreference
resolution learned from PTB and on-domain
annotated texts
68
Examples of IE systems
LaSIE-II system
gazetteers
Lexicon
Conceptual hierarchy
Sentence splitter
Gazetteer lookup
Buchart parser
Name matcher
Brill tagger
Tagged morph
Discourse interpreter
Template writer
69
Examples of IE systems
LaSIE-II system
gazetteers
Lexicon
Conceptual hierarchy
Sentence splitter
Gazetteer lookup
Buchart parser
Name matcher
Brill tagger
Tagged morph
Discourse interpreter
Template writer
  • Preprocessing
  • NERC preprocess via gazetters and keyword lists
  • Root form and inflexional suffix for verbs,
    nouns and adjs found in sentences

According_to-adv unofficial-adj sources-n ,
the-det bomb-n allegedly-adv detonateed-v
by-prep urban-adj guerrilla-n commandos-n -
blow_up-v a-det power_tower-n in-prep the-det
northwestern-adj part-n of-prep San Salvador-loc
at-prep 0650
70
Examples of IE systems
LaSIE-II system
gazetteers
Lexicon
Conceptual hierarchy
Sentence splitter
Gazetteer lookup
Buchart parser
Name matcher
Brill tagger
Tagged morph
Discourse interpreter
Template Writer
  • Syntactico-semantic interpretation
  • bottom-up chart parser
  • cascade of NERC grammars (eg. aircraft, person,
    money, time, timex)

According_to-adv unofficial-adj sources-n ,
the-det bomb-n allegedly-adv detonateed-v
by-prep urban-adj guerrilla-n commandos-n -
blow_up-v a-det power_tower-n in-prep the-det
northwestern part of San Salvador-loc at-prep
0650-time
NE1
NE2
71
Examples of IE systems
LaSIE-II system
gazetteers
Lexicon
Conceptual hierarchy
Sentence splitter
Gazetteer lookup
Buchart parser
Name matcher
Brill tagger
Tagged morph
Discourse interpreter
Template Writer
  • Syntactico-semantic interpretation
  • bottom-up chart parser
  • cascade of NERC grammars (eg. aircraft, person,
    money, time)
  • cascade of partial grammars (NPs, PPs, complex
    NP, VPs, complex VPs, RelClauses, Sentence)

S(According_to-adv NP(unofficial-adj sources-n)
, NP(the-det bomb-n) allegedly-adv
VP(detonateed-v) PP(by-prep NP(urban-adj
guerrilla-n commandos-n)) - VP(blow_up-v)
NP(a-det power_tower-n) PP(in-prep NP(the-det
NE1-loc)) PP(at-prep NP(NE2-time)))
72
Examples of IE systems
LaSIE-II system
gazetteers
Lexicon
Conceptual hierarchy
Sentence splitter
Gazetteer lookup
Buchart parser
Name matcher
Brill tagger
Tagged morph
Discourse interpreter
Template Writer
  • Syntactico-semantic interpretation
  • bottom-up chart parser
  • cascade of NERC grammars (eg. aircraft, person,
    money, time)
  • cascade of partial grammars (NPs, PPs, complex
    NP, VPs, complex VPs, RelClauses, Sentence)
  • QLFs (Note the real implementation of QLFs is
    not specified)

Event(E1), detonate(E1,Y,X), urban_guerrilla_coman
do(X), bomb(Y), Event(E2), blow_up(E2,Y,Z),
power_tower(Z), location_of(Z,NE1),
time_of(E2,NE2)
73
Examples of IE systems
LaSIE-II system
gazetteers
Lexicon
Conceptual hierarchy
Sentence splitter
Gazetteer lookup
Buchart parser
Name matcher
Brill tagger
Tagged morph
Discourse interpreter
Template writer
  • Discourse analysis
  • Name matcher Matches variants of NEs across the
    text
  • Discourse interpreter
  • adds QLF representation to a semantic net
    (links)
  • adds presuppositions
  • coreference resolution

bombing event
implies
Event(E1), detonate(E1,Y,X), urban_guerrilla_coman
do(X), bomb(Y), Event(E2), blow_up(E2,Y,Z),
power_tower(Z), location_of(Z,NE1),
time_of(E2,NE2)
implies
isa
location of event
destroy
74
Examples of IE systems
LaSIE-II system
gazetteers
Lexicon
Conceptual hierarchy
Sentence splitter
Gazetteer lookup
Buchart parser
Name matcher
Brill tagger
Tagged morph
Discourse interpreter
Template writer
  • Output template generation
  • procedure that write the templates in the
    desired format

Incident type bombing date March
19 Location El Salvador San Salvador
(city) Perpetrator urban guerrilla
commandos Physical target power tower Human
target - Effect on physical target destroyed Ef
fect on human target no injury or
death Instrument bomb
75
Examples of IE systems
PROTEUS system
Lexicon
Chunk grammar
NERC Rules
IE-Rules
Conceptual hierarchy
Format Rules
Inference Rules
Partial parsing
Lexical Analizer
Coreference resolution
Discourse Analysis
Scenario Patterns
Output generator
NERC
76
Examples of IE systems
PROTEUS system
Lexicon
Chunk grammar
NERC Rules
IE-Rules
Conceptual hierarchy
Format Rules
Inference Rules
Partial parsing
Lexical Analizer
Coreference resolution
Discourse Analysis
Scenario Patterns
Output generator
NERC
Preprocessing
According_to-adv unofficial-adj sources-n ,
the-det bomb-n allegedly-adv detonated-v
by-prep urban-adj guerrilla-n commandos-n -
blew_up-v a-det power_tower-n in-prep the-det
northwestern part of San Salvador-loc at-prep
0650-time
NE2
NE1
77
Examples of IE systems
PROTEUS system
Lexicon
Chunk grammar
NERC Rules
IE-Rules
Conceptual hierarchy
Format Rules
Inference Rules
Partial parsing
Lexical Analizer
Coreference resolution
Discourse Analysis
Scenario Patterns
Output generator
NERC
  • Sintactico-semantic interpretation
  • basic VP and NP chunkshead_semantics
  • semantics refer to types of slot fillers
    (Conceptual hierarchy)

According_to-adv NP(unofficial-adj sources-n-s1)
, NP(the-det bomb-n-artifact) allegedly-adv
VP(detonated-v-s3) by-prep NP(urban-adj
guerrilla-n commandos-n-person)
VP(blew_up-v-s4) NP(a-det power_tower-n-building)
in-prep NP(NE1-location) at-prep NP(NE2-time)
78
Examples of IE systems
PROTEUS system
Lexicon
Chunk grammar
NERC Rules
IE-Rules
Conceptual hierarchy
Format Rules
Inference Rules
Partial parsing
Lexical Analizer
Coreference resolution
Discourse Analysis
Scenario Patterns
Output generator
NERC
  • Sintactico-semantic interpretation
  • basic VP and NP chunkshead_semantics
  • IE-rules for relations (appositions,
    PP-attachments, limited conjunctions)
  • NP(A-person) , B-integer years old , ?
    instance(X,person), name_of(X,A), age_of(X,B)
  • NP(A-position) of NP(B-company) ?
    instance(X,person), position_of(X,A),
    company_of(X,B)

Value
Slot
person
Class
Real implementation as objects
A
name
B
age
79
Examples of IE systems
PROTEUS system
Lexicon
Chunk grammar
NERC Rules
IE-Rules
Conceptual hierarchy
Format Rules
Inference Rules
Partial parsing
Lexical Analizer
Coreference resolution
Discourse Analysis
Scenario Patterns
Output generator
NERC
  • Sintactico-semantic interpretation
  • basic VP and NP chunkshead_semantics
  • IE-rules for relations (appositions,
    PP-attachments, limited conjunctions)
  • IE-rules for events (PET interface or ExDISCO)
  • NP(A-artifact) v-s4 NP(B-building) ?
    instance(E1,s4), instrument_of(E1,A),
    phisical_target_of(E1,B)

According_to-adv NP(unofficial-adj sources-n-s1)
, NP(the-det bomb-n-artifact) allegedly-adv
VP(detonated-v-s3) by-prep NP(urban-adj
guerrilla-n commandos-n-person)
VP(blew_up-v-s4) NP(a-det power_tower-n-building)
in-prep NP(NE1-location) at-prep NP(NE2-time)
80
Examples of IE systems
PROTEUS system
Lexicon
Chunk grammar
NERC Rules
IE-Rules
Conceptual hierarchy
Format Rules
Inference Rules
Partial parsing
Lexical Analizer
Coreference resolution
Discourse Analysis
Scenario Patterns
Output generator
NERC
  • Discourse analysis
  • antecedents found seeking in sequential order.
  • constraints
  • instance of a hyperclass
  • same number
  • share arguments

81
Examples of IE systems
PROTEUS system
Lexicon
Chunk grammar
NERC Rules
IE-Rules
Conceptual hierarchy
Format Rules
Inference Rules
Partial parsing
Lexical Analizer
Coreference resolution
Discourse Analysis
Scenario Patterns
Output generator
NERC
  • Discourse analysis
  • QLFs inference rules more complex QLFs
  • conversion of date expressions.
  • inference of slot values from the QLFs already
    achieved
  • inference of events from others explicitly
    described
  • Fred, the president of Cuban Cigar Corp., was
    appointed vice president of Microsoft
  • implies
  • Fred left the Cuban Cigar Corp.

82
Examples of IE systems
PROTEUS system
Lexicon
Chunk grammar
NERC Rules
IE-Rules
Conceptual hierarchy
Format Rules
Inference Rules
Partial parsing
Lexical Analizer
Coreference resolution
Discourse Analysis
Scenario Patterns
Output generator
NERC
  • Output template generation
  • use of rules to build the templates with the
    desired format

83
Examples of IE systems
IE2 system
Custom NameTag
Discourse Module
NetOwl Extractor 3.0
PhraseTag
EventTag
TempGen
Hand-crafted rules
84
Examples of IE systems
IE2 system
Custom NameTag
Discourse Module
NetOwl Extractor 3.0
PhraseTag
EventTag
TempGen
Hand-crafted rules
  • Preprocessing
  • only NERC
  • SGML-tagged
  • general NE types and subtypes
  • restricted-domain NE types and subtypes

ltperson id1gtJeff Bantlelt/persongt, ltentity
id2gtNASAlt/entitygts mission operations
directorate representative for the shuttle flight
85
Examples of IE systems
IE2 system
Custom NameTag
Discourse Module
NetOwl Extractor 3.0
PhraseTag
EventTag
TempGen
Hand-crafted rules
  • Syntactico-semantic interpretation
  • SGML-tagging of phrases that are values of slots
  • NPs denoting persons (PNP), organizations (ENP),
    artifacts (ANP),
  • local links (location-of, employee-of, owner-of,
    )

ltperson id1gtJeff Bantlelt/persongt, ltPNP
affil2gtltentity id2gtNASAlt/entitygts mission
operations directorate representative for the
shuttle flightlt/PNPgt
86
Examples of IE systems
IE2 system
Custom NameTag
Discourse Module
NetOwl Extractor 3.0
PhraseTag
EventTag
TempGen
Hand-crafted rules
  • Syntactico-semantic interpretation
  • SGML-tagging of phrases that are values of slots
    in templates
  • NPs
  • local semantic relations (employee-of,
    location-of, product-of, )
  • event IE-rules (note the real implementation is
    not specified)
  • Vehicle LaunchN ? launch_eventvehicle_info
    Vehicle

ltlaunch_event id2 vehicle_info1gtltANPgt The
ltvehicle id1gtArian 5lt/vehiclegt launch lt/ANPgt was
successfully achieved at 6am
87
Examples of IE systems
IE2 system
Custom NameTag
Discourse Module
NetOwl Extractor 3.0
PhraseTag
EventTag
TempGen
Hand-crafted rules
  • Discourse analysis
  • Three coreference resolution methods
  • Rule based
  • Machine learning based
  • Hybrid
  • Name alias resolution in addition to that
    performed by NetOwl
  • Definite NPs
  • Singular personal pronouns

ltperson id1gtJeff Bantlelt/persongt, ltPNP ref1
affil2gtltentity id2gtNASAlt/entitygts mission
operations directorate representative for the
shuttle flightlt/PNPgt
88
Examples of IE systems
IE2 system
Custom NameTag
Discourse Module
NetOwl Extractor 3.0
PhraseTag
EventTag
TempGen
Hand-crafted rules
  • Output template generation
  • Translates SGML output into templates in the
    desired format
  • Solves and normalizes time expressions
  • Performs event merging

89
Examples of IE systems
SIFT system
Output generator
Cross-sentece level
Sentence level
IdentifinderTM
Statistical models
90
Examples of IE systems
SIFT system
Output generator
Cross-sentece level
Sentence level
IdentifinderTM
Statistical models
  • Preprocessing
  • NERC using a HMM Bikel et al. 97 Viterbi
    maximizing Pr(W,F,C)
  • each word is tagged with one NE class

start-sentence
person
not-a-name
organization
location
end-sentence
91
Examples of IE systems
SIFT system
Output generator
Cross-sentece level
Sentence level
IdentifinderTM
Statistical models
  • Syntactico-semantic interpretation
  • properties of NEs (TE) and relations (TR)
  • generative statistical model Miller et al. 98,
    00
  • search the most likely augmented parse tree
    (bottom-up chart based)
  • prunning of low probability constituents

92
Examples of IE systems
SIFT system
Output generator
Cross-sentece level
Sentence level
IdentifinderTM
Statistical models
Syntactico-semantic interpretation
per/np
per-desc-r/np
emp-of/pp-lnk
org-ptr/pp
per-r/np per-desc/np
org-r/np
per/nnp , det vbn per-desc/nn to
org/nnp org/nnp ,
Nance , a paid consultant to
ABC News ,
93
Examples of IE systems
SIFT system
Output generator
Cross-sentece level
Sentence level
IdentifinderTM
Statistical models
  • Syntactico-semantic interpretation
  • relations between NEs across sentences
  • statistical model Miller et al. 98
  • classifier of pairs of entities
  • entities in different sentences
  • entities do not take part into local relations
  • their types are compatible with any relation

94
Examples of IE systems
TURBIO system
Partial-tree grammar
Lexicon
NERC Rules
IE-rule set scheduling
IE-Rule set processor
IE-Rule sets
Partial parsing
Lexical Analizer
controller
NERC
Output generator
95
Examples of IE systems
TURBIO system
Partial-tree grammar
Lexicon
NERC Rules
IE-rule set scheduling
IE-Rule set processor
IE-Rule sets
Partial parsing
Lexical Analizer
controller
NERC
Output generator
  • Preprocessing
  • WordNet synsets, lemmas, POS tags
  • NERC
  • parsed trees of noun, verbal, and adjectival
    phrases

96
Examples of IE systems
TURBIO system
Partial-tree grammar
Lexicon
NERC Rules
IE-rule set scheduling
IE-Rule set processor
IE-Rule sets
Partial parsing
Lexical Analizer
controller
NERC
Output generator
  • Syntactico-semantic interpretation
  • Hypotesis dependence among relations of NEs
  • Iterative execution of IE-rule sets depending on
    the scheduling
  • Example
  • Scenario Mushroom parts, their possible colors
    and the circumstances by which they are produced
  • There are colors in the documents that are not
    related to any mushroom part, but all colors
    related with a circumstance are colors related to
    mushroom parts.
Write a Comment
User Comments (0)
About PowerShow.com