Title: Graphical models for structure extraction and information integration
1Graphical models for structure extraction and
information integration
- Sunita Sarawagi
- IIT Bombay
- http//www.it.iitb.ac.in/sunita
2Information Extraction (IE) Integration
- The Extraction task Given,
- E a set of structured elements
- S unstructured source S
- extract all instances of E from S
- The Integration task Also, given
- Database of existing inter-linked entities
- Resolve which extracted entities exists, and
- Insert appropriate links and entities.
- Many versions involving many source types
- Actively researched in varied communities
- Several tools and techniques
- Several commercial applications
3IE from free format text
- Classical Named Entity Recognition
- Extract person, location, organization names
According to Robert Callahan, president of
Eastern's flight attendants union, the past
practice of Eastern's parent, Houston-based Texas
Air Corp., has involved ultimatums to unions to
accept the carrier's terms
- Several applications
- News tracking
- Monitor events
- Bio-informatics
- Protein and Gene names from publications
- Customer care
- Part number, problem description from emails in
help centers
4Text segmentation
4089 Whispering Pines Nobel Drive San Diego CA
92122
P.P.Wangikar, T.P. Graycar, D.A. Estell, D.S.
Clark, J.S. Dordick (1993) Protein and Solvent
Engineering of Subtilising BPN' in Nearly
Anhydrous Organic Media J.Amer. Chem. Soc. 115,
12231-12237.
5Information Extraction on the web
6Personal Information Systems
- Automatically add a bibtex entry of a paper I
download - Integrate a resume in email with the candidates
database
Papers
Files
People
Email
Emails
Web
Projects
Resumes
7History of approaches
- Manually-developed set of scripts
- Tedious, lots and lots of special cases
- Needs continuous refinement as new cases arise
- Ad hoc ways of combining varied set of clues
- Example wrappers, OK for regular tasks
- Learning-based approach (lots!)
- Rule-based (Whisk, Rapier etc) 80s
- Brittle
- Statistical
- Generative HMMs 90s
- Intuitive but not too flexible
- Conditional (flexible feature set) 00s
8Basic chain model for extraction
My review of Fermats last theorem by S. Singh
1 2 3 4 5 6 7 8 9
My review of Fermats last theorem by S. Singh
Other Other Other Title Title Title other Author Author
y1
y2
y3
y4
y5
y6
y7
y8
y9
Independent model
9Features
- The word as-is
- Orthographic word properties
- Capitalized? Digit? Ends-with-dot?
- Part of speech
- Noun?
- Match in a dictionary
- Appears in a dictionary of people names?
- Appears in a list of stop-words?
- Fire these for each label and
- The token,
- W tokens to the left or right, or
- Concatenation of tokens.
10Basic chain model for extraction
My review of Fermats last theorem by S. Singh
1 2 3 4 5 6 7 8 9
My review of Fermats last theorem by S. Singh
Other Other Other Title Title Title other Author Author
y1
y2
y3
y4
y5
y6
y7
y8
y9
Global conditional model over Pr(y1,y2y9x)
11Outline
Graphical models
Extraction
- Chain models -
Basic extraction (Word-level) - Associative Markov Networks - Collective
labeling - Dynamic CRFs - Two
labelings (POS, extraction) - 2-D CRFs -
Layout-driven extraction (web)
Integration
- Segmentation models - Match
with entities databases - Constrained models -
Integrating to multiple tables
12Undirected Graphical models
- Joint probability distribution of multiple
variables expressed compactly as a graph
y3 directly dependent on y4
y1
y2
y3
y4
y3 independent of y1 and y5 given y2 y4
y5
Discrete variables over finite set of
labels Example Author, Title, Other
13Graphical models potentials
The joint probability distribution
Potential function
Cliques of graph
14Graphical models potentials
Form of potentials
Conditional Random Fields (CRFs) Model
probability of a set of labels given
observation x
Lafferty et al, ICML 2000
Numeric feature
Model parameters
Observed variables
15Inference on graphical models
- Probability of an assignment of variables
- Marginal probability of a subset of variables
16Message passing
- Efficient two-pass dynamic programming algorithm
for graphs without cycles - Viterbi is a special case for chains
- Cyclic graphs
- Approximate answer after convergence, or,
- Transform cliques to nodes in a junction tree
- Alternatives to message passing
- Exploit structure of potentials to design special
algorithms - two examples in this talk
- Upper bound using one or more trees
- MCMC sampling
17Outline
Graphical models
Extraction
- Chain models -
Basic extraction (Word-level) - Associative Markov Networks - Collective
labeling - Dynamic CRFs - Two
labelings (POS, extraction) - 2-D CRFs -
Layout-driven extraction (web)
Integration
- Segmentation models - Match
with entities databases - Constrained models -
Integrating to multiple tables
18Long range dependencies
- Extraction with repeated names (Bunescu et al
2004)
19Dependency graph
- Assume only word-level matches.
nitric oxide synthase eNos .with
synthase interaction eNOS
y1
y2
y3
y4
y5
y6
y7
y8
- Approximate message passing
- Sample results (Bunescu et al ACL 2004)
- Protein names from medline abstracts
- F1 65 ? 68
- Person names, organization names etc from news
articles - F1 80 ? 82
20Associative Markov Networks
y1
y2
y3
y4
y5
y6
y7
y8
- Consider a simpler graph
- Binary labels
- Only associative edges
- Higher potential when same label to both
Exact inference in polynomial time via mincut
(Greig 1989)
Multi-class, metric labeling?approximate
algorithm with guarantees (Kleinberg 1999)
21Factorial CRFs multiple linked chains
- Several synchronized inter-dependent tasks
- POS, Noun phrase, Entity extraction
- Cascading propagates errors
- Joint models
i saw mr. ray canning at
the market
POS
w1
w2
w3
w4
w5
w6
w7
w8
IE
y1
y2
y3
y4
y5
y6
y7
y8
22Inference with multiple chains
- Graph has cycles, most likely exact inference
intractable - Two alternatives
- Approximate message passing
- Upper bound marginal (Piecewise training)
- Treat each edge potential as an independent
training instance - Results (F1) noun phrase POS
- Piecewise training 88, faster
- Belief propagation 86
Combined
(Sutton et al, ICML 2004) (McCallum et al,
EMNLP/HLT 2005)
Staged
23Outline
Graphical models
Extraction
- Chain models -
Basic extraction (Word-level) - Associative Markov Networks - Collective
labeling - Dynamic CRFs - Two
labelings (POS, extraction) - 2-D CRFs -
Layout-driven extraction (web)
Integration
- Segmentation models - Match
with entities databases - Constrained models -
Integrating to multiple tables
24Conventional Extraction Research
Labeled unstructured text
Training
Model
Data integration
25Goals of integration
- Exploit database to improve extraction
- Entity might exist in the database
- Integrate extracted entities, resolve if entity
already in database - If existing, create links
- If not existing, create a new entry
26R. Fagin and J. Helpern, Belief, awareness,
reasoning. In AI 1988 10 also see
Articles
Journals
3 Top-level entities
Id Title Year Journal Canonical
2 Update Semantics 1983 10
Id Name Canonical
10 ACM TODS
17 AI 17
16 ACM Trans. Databases
Writes
Article Author
2 11
2 2
2 3
Authors
Variant links to canonical entries
Id Name Canonical
11 M Y Vardi
2 J. Ullman 4
3 Ron Fagin 3
4 Jeffrey Ullman 4
Database normalized, stores noisy variants
27Segmentation models (Semi-CRFs)
t
1 2 3 4 5 6 7 8
R. Fagin and J. Helpbern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
x
y
Features describe the single word Fagin
28Segmentation models (Semi-CRFs)
t
1 2 3 4 5 6 7 8
R. Fagin and J. Helpbern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
x
y
Features describe the single word Fagin
l,u
l11, u12 l11, u12 l1u13 l14, u15 l14, u15 l16, u18 l16, u18 l16, u18
R. Fagin and J. Helpbern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
x
y
Features describe the segment from l to u
29Graphical models for segmentation
y1
y2
y3
y4
y5
y6
y7
y8
- Graph has many cycles
- clique size maximum segment length
- Two kinds of potentials
- Transition potentials
- Only across adjacent nodes
- Segment potentials
- Requires all positions in segment to have the
same label - ? exact inference possible in time linear
maximum segment length (Cohen Sarawagi 2004)
30Effect of database on extraction performance
L LDB ?
PersonalBib author 75.7 79.5 4.9
PersonalBib journal 33.9 50.3 48.6
PersonalBib title 61.0 70.3 15.1
Address city_name 72.4 76.7 6.0
Address state_name 13.9 33.2 138.5
Address zipcode 91.6 94.3 3.0
L Only labeled structured data L DB
similarity to database entities and other DB
features
(from Mansuri et al ICDE 2006)
31R. Fagin and J. Helpern, Belief, awareness,
reasoning. In AI 1988 10 also see
Articles
Extraction
Journals
Id Title Year Journal Canonical
2 Update Semantics 1983 10
7 Belief, awareness, reasoning 1988 17
Id Name Canonical
10 ACM TODS
17 AI 17
16 ACM Trans. Databases 10
Author R. Fagin A Author J. Helpern
Title Belief,..reasoning Journal AI
Year 1998
Writes
Article Author
2 11
2 2
2 3
7 8
7 9
Authors
Id Name Canonical
11 M Y Vardi
2 J. Ullman 4
3 Ron Fagin 3
4 Jeffrey Ullman 4
8 R Fagin 3
9 J Helpern 8
Integration
Match with existing linked entities while
respecting all constraints
32CACM 2000, R. Fagin and J. Helpern, Belief,
awareness, reasoning in AI
Combined Extractionintegration
Only extraction
Author R. Fagin Author J. Helpern
Title Belief,..reasoning Journal
AI Year 2000
Author R. Fagin Author J. Helpern Title
Belief,..reasoning in AI Journal CACM
Year 2000
Id Title Year Journal Canonical
2 Update Semantics 1983 10
7 Belief, awareness, reasoning 1988 17
Year mismatch!
33Combined extraction matching
- Convert predicted label to be a pair y (a,r)
- (r0) means none-of-the-above or a new entry
l,u
l11, u12 l11, u12 l1u13 l14, u18
CACM. 2000 Fagin Belief Awareness Reasoning In AI
Journal Year Journal Year Author Title
0 7 0 7 3 7
x
y
r
Id of matching entity
Constraints exist on ids that can be assigned to
two segments
34Constrained models
- Training
- Ignore constraints or use max-margin methods that
require only MAP estimates - Application
- Formulate as a constrained integer programming
problem (expensive) - Use general A-star search to find most likely
constrained assignment
35Full integration performance
L LDB ?
PersonalBib author 70.8 74.0 4.5
PersonalBib journal 29.6 45.5 53.6
PersonalBib title 51.6 65.0 25.9
Address city_name 70.1 74.6 6.4
Address state_name 9.0 28.3 213.8
Address pincode 87.8 90.7 3.3
- L conventional extraction matching
- L DB technology presented here
- Much higher accuracies possible with more
training data
(from Mansuri et al ICDE 2006)
36What next in data integration?
- Lots to be done in building large-scale, viable
data integration systems - Online collective inference
- Cannot freeze database
- Cannot batch too many inferences
- Need theoretically sound, practical alternatives
to exact, batch inference - Performance of integration (Chandel et al, ICDE
2006) - Other operations
- Data standardization
- Schema management
37Probabilistic Querying Systems
- Integration systems while improving, cannot be
perfect particularly for domains like the web - Users supervision of each integration result
impossible - ? Create uncertainty-aware storage and querying
engines - Two enablers
- Probabilistic database querying engines over
generic uncertainty models - Conditional graphical models produce
well-calibrated probabilities
38Probabilities in CRFs are well-calibrated
Cora citations
Cora headers
Ideal
Ideal
Probability of segmentation ?? Probability
correct E.g 0.5 probability ?? Correct 50 of
the times
39Uncertainty in integration systems
Unstructured text
Model
Additional training data
p1
Entities
Very uncertain?
Other more compact models?
Entities
p2
Entities
pk
Probabilistic database system
Select conference name of article RJ03? Find most
cited author?
40In summary
- Data integration provides scope for several
interesting learning problems - Probabilistic graphical models provide robust,
unified mechanism of exploiting wide variety of
clues and dependencies - Lot of open research challenges in making
graphical models work in a practical setting
41(No Transcript)