Graphical models for structure extraction and information integration presentation

About This Presentation

Title:

Graphical models for structure extraction and information integration

Description:

... models for structure extraction and information integration. Sunita ... extract all instances of E from S. Many versions involving ... Extract person, location, ... –

Number of Views:52

Avg rating:3.0/5.0

Slides: 42

Provided by: Comp754

Category:

more less

Transcript and Presenter's Notes

Title: Graphical models for structure extraction and information integration

1
Graphical models for structure extraction and
information integration

Sunita Sarawagi
IIT Bombay
http//www.it.iitb.ac.in/sunita

2
Information Extraction (IE) Integration

The Extraction task Given,
E a set of structured elements
S unstructured source S
extract all instances of E from S

The Integration task Also, given
Database of existing inter-linked entities
Resolve which extracted entities exists, and
Insert appropriate links and entities.

Many versions involving many source types
Actively researched in varied communities
Several tools and techniques
Several commercial applications

3
IE from free format text

Classical Named Entity Recognition
Extract person, location, organization names

According to Robert Callahan, president of
Eastern's flight attendants union, the past
practice of Eastern's parent, Houston-based Texas
Air Corp., has involved ultimatums to unions to
accept the carrier's terms

Several applications
News tracking
Monitor events
Bio-informatics
Protein and Gene names from publications
Customer care
Part number, problem description from emails in
help centers

4
Text segmentation
4089 Whispering Pines Nobel Drive San Diego CA
92122
P.P.Wangikar, T.P. Graycar, D.A. Estell, D.S.
Clark, J.S. Dordick (1993) Protein and Solvent
Engineering of Subtilising BPN' in Nearly
Anhydrous Organic Media J.Amer. Chem. Soc. 115,
12231-12237.
5
Information Extraction on the web
6
Personal Information Systems

Automatically add a bibtex entry of a paper I
download
Integrate a resume in email with the candidates
database

Papers
Files
People

Email
Emails

Web
Projects
Resumes
7
History of approaches

Manually-developed set of scripts
Tedious, lots and lots of special cases
Needs continuous refinement as new cases arise
Ad hoc ways of combining varied set of clues
Example wrappers, OK for regular tasks
Learning-based approach (lots!)
Rule-based (Whisk, Rapier etc) 80s
Brittle
Statistical
Generative HMMs 90s
Intuitive but not too flexible
Conditional (flexible feature set) 00s

8
Basic chain model for extraction
My review of Fermats last theorem by S. Singh
1 2 3 4 5 6 7 8 9
My review of Fermats last theorem by S. Singh
Other Other Other Title Title Title other Author Author
y1
y2
y3
y4
y5
y6
y7
y8
y9
Independent model
9
Features

The word as-is
Orthographic word properties
Capitalized? Digit? Ends-with-dot?
Part of speech
Noun?
Match in a dictionary
Appears in a dictionary of people names?
Appears in a list of stop-words?

Fire these for each label and
The token,
W tokens to the left or right, or
Concatenation of tokens.

10
Basic chain model for extraction
My review of Fermats last theorem by S. Singh
1 2 3 4 5 6 7 8 9
My review of Fermats last theorem by S. Singh
Other Other Other Title Title Title other Author Author
y1
y2
y3
y4
y5
y6
y7
y8
y9
Global conditional model over Pr(y1,y2y9x)
11
Outline
Graphical models
Extraction

Chain models -
Basic extraction (Word-level)
Associative Markov Networks - Collective
labeling
Dynamic CRFs - Two
labelings (POS, extraction)
2-D CRFs -
Layout-driven extraction (web)

Integration

Segmentation models - Match
with entities databases
Constrained models -
Integrating to multiple tables

12
Undirected Graphical models

Joint probability distribution of multiple
variables expressed compactly as a graph

y3 directly dependent on y4
y1
y2
y3
y4
y3 independent of y1 and y5 given y2 y4
y5
Discrete variables over finite set of
labels Example Author, Title, Other
13
Graphical models potentials
The joint probability distribution
Potential function
Cliques of graph
14
Graphical models potentials
Form of potentials
Conditional Random Fields (CRFs) Model
probability of a set of labels given
observation x
Lafferty et al, ICML 2000
Numeric feature
Model parameters
Observed variables
15
Inference on graphical models

Probability of an assignment of variables

Marginal probability of a subset of variables

16
Message passing

Efficient two-pass dynamic programming algorithm
for graphs without cycles
Viterbi is a special case for chains
Cyclic graphs
Approximate answer after convergence, or,
Transform cliques to nodes in a junction tree
Alternatives to message passing
Exploit structure of potentials to design special
algorithms
two examples in this talk
Upper bound using one or more trees
MCMC sampling

17
Outline
Graphical models
Extraction

Chain models -
Basic extraction (Word-level)
Associative Markov Networks - Collective
labeling
Dynamic CRFs - Two
labelings (POS, extraction)
2-D CRFs -
Layout-driven extraction (web)

Integration

Segmentation models - Match
with entities databases
Constrained models -
Integrating to multiple tables

18
Long range dependencies

Extraction with repeated names (Bunescu et al
2004)

19
Dependency graph

Assume only word-level matches.

nitric oxide synthase eNos .with
synthase interaction eNOS
y1
y2
y3
y4
y5
y6
y7
y8

Approximate message passing
Sample results (Bunescu et al ACL 2004)
Protein names from medline abstracts
F1 65 ? 68
Person names, organization names etc from news
articles
F1 80 ? 82

20
Associative Markov Networks
y1
y2
y3
y4
y5
y6
y7
y8

Consider a simpler graph
Binary labels
Only associative edges
Higher potential when same label to both

Exact inference in polynomial time via mincut
(Greig 1989)
Multi-class, metric labeling?approximate
algorithm with guarantees (Kleinberg 1999)
21
Factorial CRFs multiple linked chains

Several synchronized inter-dependent tasks
POS, Noun phrase, Entity extraction
Cascading propagates errors
Joint models

i saw mr. ray canning at
the market
POS
w1
w2
w3
w4
w5
w6
w7
w8
IE
y1
y2
y3
y4
y5
y6
y7
y8
22
Inference with multiple chains

Graph has cycles, most likely exact inference
intractable
Two alternatives
Approximate message passing
Upper bound marginal (Piecewise training)
Treat each edge potential as an independent
training instance
Results (F1) noun phrase POS
Piecewise training 88, faster
Belief propagation 86

Combined
(Sutton et al, ICML 2004) (McCallum et al,
EMNLP/HLT 2005)
Staged
23
Outline
Graphical models
Extraction

Chain models -
Basic extraction (Word-level)
Associative Markov Networks - Collective
labeling
Dynamic CRFs - Two
labelings (POS, extraction)
2-D CRFs -
Layout-driven extraction (web)

Integration

Segmentation models - Match
with entities databases
Constrained models -
Integrating to multiple tables

24
Conventional Extraction Research
Labeled unstructured text
Training
Model
Data integration
25
Goals of integration

Exploit database to improve extraction
Entity might exist in the database
Integrate extracted entities, resolve if entity
already in database
If existing, create links
If not existing, create a new entry

26
R. Fagin and J. Helpern, Belief, awareness,
reasoning. In AI 1988 10 also see
Articles
Journals
3 Top-level entities
Id Title Year Journal Canonical

2 Update Semantics 1983 10

Id Name Canonical

10 ACM TODS
17 AI 17
16 ACM Trans. Databases
Writes
Article Author

2 11
2 2
2 3

Authors
Variant links to canonical entries
Id Name Canonical
11 M Y Vardi
2 J. Ullman 4
3 Ron Fagin 3
4 Jeffrey Ullman 4
Database normalized, stores noisy variants
27
Segmentation models (Semi-CRFs)
t
1 2 3 4 5 6 7 8
R. Fagin and J. Helpbern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
x
y
Features describe the single word Fagin
28
Segmentation models (Semi-CRFs)
t
1 2 3 4 5 6 7 8
R. Fagin and J. Helpbern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
x
y
Features describe the single word Fagin
l,u
l11, u12 l11, u12 l1u13 l14, u15 l14, u15 l16, u18 l16, u18 l16, u18
R. Fagin and J. Helpbern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
x
y
Features describe the segment from l to u
29
Graphical models for segmentation
y1
y2
y3
y4
y5
y6
y7
y8

Graph has many cycles
clique size maximum segment length
Two kinds of potentials
Transition potentials
Only across adjacent nodes
Segment potentials
Requires all positions in segment to have the
same label
? exact inference possible in time linear
maximum segment length (Cohen Sarawagi 2004)

30
Effect of database on extraction performance

L LDB ?
PersonalBib author 75.7 79.5 4.9
PersonalBib journal 33.9 50.3 48.6
PersonalBib title 61.0 70.3 15.1
Address city_name 72.4 76.7 6.0
Address state_name 13.9 33.2 138.5
Address zipcode 91.6 94.3 3.0
L Only labeled structured data L DB
similarity to database entities and other DB
features
(from Mansuri et al ICDE 2006)
31
R. Fagin and J. Helpern, Belief, awareness,
reasoning. In AI 1988 10 also see
Articles
Extraction
Journals
Id Title Year Journal Canonical

2 Update Semantics 1983 10

7 Belief, awareness, reasoning 1988 17
Id Name Canonical

10 ACM TODS
17 AI 17
16 ACM Trans. Databases 10
Author R. Fagin A Author J. Helpern
Title Belief,..reasoning Journal AI
Year 1998
Writes
Article Author

2 11
2 2
2 3
7 8
7 9
Authors
Id Name Canonical
11 M Y Vardi
2 J. Ullman 4
3 Ron Fagin 3
4 Jeffrey Ullman 4
8 R Fagin 3
9 J Helpern 8
Integration
Match with existing linked entities while
respecting all constraints
32
CACM 2000, R. Fagin and J. Helpern, Belief,
awareness, reasoning in AI
Combined Extractionintegration
Only extraction
Author R. Fagin Author J. Helpern
Title Belief,..reasoning Journal
AI Year 2000
Author R. Fagin Author J. Helpern Title
Belief,..reasoning in AI Journal CACM
Year 2000
Id Title Year Journal Canonical

2 Update Semantics 1983 10

7 Belief, awareness, reasoning 1988 17
Year mismatch!
33
Combined extraction matching

Convert predicted label to be a pair y (a,r)
(r0) means none-of-the-above or a new entry

l,u
l11, u12 l11, u12 l1u13 l14, u18
CACM. 2000 Fagin Belief Awareness Reasoning In AI
Journal Year Journal Year Author Title
0 7 0 7 3 7
x
y
r
Id of matching entity
Constraints exist on ids that can be assigned to
two segments
34
Constrained models

Training
Ignore constraints or use max-margin methods that
require only MAP estimates
Application
Formulate as a constrained integer programming
problem (expensive)
Use general A-star search to find most likely
constrained assignment

35
Full integration performance

L LDB ?
PersonalBib author 70.8 74.0 4.5
PersonalBib journal 29.6 45.5 53.6
PersonalBib title 51.6 65.0 25.9
Address city_name 70.1 74.6 6.4
Address state_name 9.0 28.3 213.8
Address pincode 87.8 90.7 3.3

L conventional extraction matching
L DB technology presented here
Much higher accuracies possible with more
training data

(from Mansuri et al ICDE 2006)
36
What next in data integration?

Lots to be done in building large-scale, viable
data integration systems
Online collective inference
Cannot freeze database
Cannot batch too many inferences
Need theoretically sound, practical alternatives
to exact, batch inference
Performance of integration (Chandel et al, ICDE
2006)
Other operations
Data standardization
Schema management

37
Probabilistic Querying Systems

Integration systems while improving, cannot be
perfect particularly for domains like the web
Users supervision of each integration result
impossible
? Create uncertainty-aware storage and querying
engines
Two enablers
Probabilistic database querying engines over
generic uncertainty models
Conditional graphical models produce
well-calibrated probabilities

38
Probabilities in CRFs are well-calibrated
Cora citations
Cora headers
Ideal
Ideal
Probability of segmentation ?? Probability
correct E.g 0.5 probability ?? Correct 50 of
the times
39
Uncertainty in integration systems
Unstructured text
Model
Additional training data
p1
Entities
Very uncertain?
Other more compact models?
Entities
p2
Entities
pk
Probabilistic database system
Select conference name of article RJ03? Find most
cited author?
40
In summary

Data integration provides scope for several
interesting learning problems
Probabilistic graphical models provide robust,
unified mechanism of exploiting wide variety of
clues and dependencies
Lot of open research challenges in making
graphical models work in a practical setting

41
(No Transcript)

Write a Comment

User Comments (0)

About PowerShow.com