Graphical models for structure extraction and information integration

1 / 41
About This Presentation
Title:

Graphical models for structure extraction and information integration

Description:

... models for structure extraction and information integration. Sunita ... extract all instances of E from S. Many versions involving ... Extract person, location, ... –

Number of Views:52
Avg rating:3.0/5.0
Slides: 42
Provided by: Comp754
Category:

less

Transcript and Presenter's Notes

Title: Graphical models for structure extraction and information integration


1
Graphical models for structure extraction and
information integration
  • Sunita Sarawagi
  • IIT Bombay
  • http//www.it.iitb.ac.in/sunita

2
Information Extraction (IE) Integration
  • The Extraction task Given,
  • E a set of structured elements
  • S unstructured source S
  • extract all instances of E from S
  • The Integration task Also, given
  • Database of existing inter-linked entities
  • Resolve which extracted entities exists, and
  • Insert appropriate links and entities.
  • Many versions involving many source types
  • Actively researched in varied communities
  • Several tools and techniques
  • Several commercial applications

3
IE from free format text
  • Classical Named Entity Recognition
  • Extract person, location, organization names

According to Robert Callahan, president of
Eastern's flight attendants union, the past
practice of Eastern's parent, Houston-based Texas
Air Corp., has involved ultimatums to unions to
accept the carrier's terms
  • Several applications
  • News tracking
  • Monitor events
  • Bio-informatics
  • Protein and Gene names from publications
  • Customer care
  • Part number, problem description from emails in
    help centers

4
Text segmentation
4089 Whispering Pines Nobel Drive San Diego CA
92122
P.P.Wangikar, T.P. Graycar, D.A. Estell, D.S.
Clark, J.S. Dordick (1993) Protein and Solvent
Engineering of Subtilising BPN' in Nearly
Anhydrous Organic Media J.Amer. Chem. Soc. 115,
12231-12237.
5
Information Extraction on the web
6
Personal Information Systems
  • Automatically add a bibtex entry of a paper I
    download
  • Integrate a resume in email with the candidates
    database

Papers
Files
People


Email
Emails

Web
Projects
Resumes
7
History of approaches
  • Manually-developed set of scripts
  • Tedious, lots and lots of special cases
  • Needs continuous refinement as new cases arise
  • Ad hoc ways of combining varied set of clues
  • Example wrappers, OK for regular tasks
  • Learning-based approach (lots!)
  • Rule-based (Whisk, Rapier etc) 80s
  • Brittle
  • Statistical
  • Generative HMMs 90s
  • Intuitive but not too flexible
  • Conditional (flexible feature set) 00s

8
Basic chain model for extraction
My review of Fermats last theorem by S. Singh
1 2 3 4 5 6 7 8 9
My review of Fermats last theorem by S. Singh
Other Other Other Title Title Title other Author Author
y1
y2
y3
y4
y5
y6
y7
y8
y9
Independent model
9
Features
  • The word as-is
  • Orthographic word properties
  • Capitalized? Digit? Ends-with-dot?
  • Part of speech
  • Noun?
  • Match in a dictionary
  • Appears in a dictionary of people names?
  • Appears in a list of stop-words?
  • Fire these for each label and
  • The token,
  • W tokens to the left or right, or
  • Concatenation of tokens.

10
Basic chain model for extraction
My review of Fermats last theorem by S. Singh
1 2 3 4 5 6 7 8 9
My review of Fermats last theorem by S. Singh
Other Other Other Title Title Title other Author Author
y1
y2
y3
y4
y5
y6
y7
y8
y9
Global conditional model over Pr(y1,y2y9x)
11
Outline
Graphical models
Extraction
  • Chain models -
    Basic extraction (Word-level)
  • Associative Markov Networks - Collective
    labeling
  • Dynamic CRFs - Two
    labelings (POS, extraction)
  • 2-D CRFs -
    Layout-driven extraction (web)

Integration
  • Segmentation models - Match
    with entities databases
  • Constrained models -
    Integrating to multiple tables

12
Undirected Graphical models
  • Joint probability distribution of multiple
    variables expressed compactly as a graph

y3 directly dependent on y4
y1
y2
y3
y4
y3 independent of y1 and y5 given y2 y4
y5
Discrete variables over finite set of
labels Example Author, Title, Other
13
Graphical models potentials
The joint probability distribution
Potential function
Cliques of graph
14
Graphical models potentials
Form of potentials
Conditional Random Fields (CRFs) Model
probability of a set of labels given
observation x
Lafferty et al, ICML 2000
Numeric feature
Model parameters
Observed variables
15
Inference on graphical models
  • Probability of an assignment of variables
  • Marginal probability of a subset of variables

16
Message passing
  • Efficient two-pass dynamic programming algorithm
    for graphs without cycles
  • Viterbi is a special case for chains
  • Cyclic graphs
  • Approximate answer after convergence, or,
  • Transform cliques to nodes in a junction tree
  • Alternatives to message passing
  • Exploit structure of potentials to design special
    algorithms
  • two examples in this talk
  • Upper bound using one or more trees
  • MCMC sampling

17
Outline
Graphical models
Extraction
  • Chain models -
    Basic extraction (Word-level)
  • Associative Markov Networks - Collective
    labeling
  • Dynamic CRFs - Two
    labelings (POS, extraction)
  • 2-D CRFs -
    Layout-driven extraction (web)

Integration
  • Segmentation models - Match
    with entities databases
  • Constrained models -
    Integrating to multiple tables

18
Long range dependencies
  • Extraction with repeated names (Bunescu et al
    2004)

19
Dependency graph
  • Assume only word-level matches.

nitric oxide synthase eNos .with
synthase interaction eNOS
y1
y2
y3
y4
y5
y6
y7
y8
  • Approximate message passing
  • Sample results (Bunescu et al ACL 2004)
  • Protein names from medline abstracts
  • F1 65 ? 68
  • Person names, organization names etc from news
    articles
  • F1 80 ? 82

20
Associative Markov Networks
y1
y2
y3
y4
y5
y6
y7
y8
  • Consider a simpler graph
  • Binary labels
  • Only associative edges
  • Higher potential when same label to both

Exact inference in polynomial time via mincut
(Greig 1989)
Multi-class, metric labeling?approximate
algorithm with guarantees (Kleinberg 1999)
21
Factorial CRFs multiple linked chains
  • Several synchronized inter-dependent tasks
  • POS, Noun phrase, Entity extraction
  • Cascading propagates errors
  • Joint models

i saw mr. ray canning at
the market
POS
w1
w2
w3
w4
w5
w6
w7
w8
IE
y1
y2
y3
y4
y5
y6
y7
y8
22
Inference with multiple chains
  • Graph has cycles, most likely exact inference
    intractable
  • Two alternatives
  • Approximate message passing
  • Upper bound marginal (Piecewise training)
  • Treat each edge potential as an independent
    training instance
  • Results (F1) noun phrase POS
  • Piecewise training 88, faster
  • Belief propagation 86

Combined
(Sutton et al, ICML 2004) (McCallum et al,
EMNLP/HLT 2005)
Staged
23
Outline
Graphical models
Extraction
  • Chain models -
    Basic extraction (Word-level)
  • Associative Markov Networks - Collective
    labeling
  • Dynamic CRFs - Two
    labelings (POS, extraction)
  • 2-D CRFs -
    Layout-driven extraction (web)

Integration
  • Segmentation models - Match
    with entities databases
  • Constrained models -
    Integrating to multiple tables

24
Conventional Extraction Research
Labeled unstructured text
Training
Model
Data integration
25
Goals of integration
  • Exploit database to improve extraction
  • Entity might exist in the database
  • Integrate extracted entities, resolve if entity
    already in database
  • If existing, create links
  • If not existing, create a new entry

26
R. Fagin and J. Helpern, Belief, awareness,
reasoning. In AI 1988 10 also see
Articles
Journals
3 Top-level entities
Id Title Year Journal Canonical

2 Update Semantics 1983 10



Id Name Canonical

10 ACM TODS
17 AI 17
16 ACM Trans. Databases
Writes
Article Author


2 11
2 2
2 3

Authors
Variant links to canonical entries
Id Name Canonical
11 M Y Vardi
2 J. Ullman 4
3 Ron Fagin 3
4 Jeffrey Ullman 4
Database normalized, stores noisy variants
27
Segmentation models (Semi-CRFs)
t
1 2 3 4 5 6 7 8
R. Fagin and J. Helpbern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
x
y
Features describe the single word Fagin
28
Segmentation models (Semi-CRFs)
t
1 2 3 4 5 6 7 8
R. Fagin and J. Helpbern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
x
y
Features describe the single word Fagin
l,u
l11, u12 l11, u12 l1u13 l14, u15 l14, u15 l16, u18 l16, u18 l16, u18
R. Fagin and J. Helpbern Belief Awareness Reasoning
Author Author Other Author Author Title Title Title
x
y
Features describe the segment from l to u
29
Graphical models for segmentation
y1
y2
y3
y4
y5
y6
y7
y8
  • Graph has many cycles
  • clique size maximum segment length
  • Two kinds of potentials
  • Transition potentials
  • Only across adjacent nodes
  • Segment potentials
  • Requires all positions in segment to have the
    same label
  • ? exact inference possible in time linear
    maximum segment length (Cohen Sarawagi 2004)

30
Effect of database on extraction performance

L LDB ?
PersonalBib author 75.7 79.5 4.9
PersonalBib journal 33.9 50.3 48.6
PersonalBib title 61.0 70.3 15.1
Address city_name 72.4 76.7 6.0
Address state_name 13.9 33.2 138.5
Address zipcode 91.6 94.3 3.0
L Only labeled structured data L DB
similarity to database entities and other DB
features
(from Mansuri et al ICDE 2006)
31
R. Fagin and J. Helpern, Belief, awareness,
reasoning. In AI 1988 10 also see
Articles
Extraction
Journals
Id Title Year Journal Canonical

2 Update Semantics 1983 10


7 Belief, awareness, reasoning 1988 17
Id Name Canonical

10 ACM TODS
17 AI 17
16 ACM Trans. Databases 10
Author R. Fagin A Author J. Helpern
Title Belief,..reasoning Journal AI
Year 1998
Writes
Article Author


2 11
2 2
2 3
7 8
7 9
Authors
Id Name Canonical
11 M Y Vardi
2 J. Ullman 4
3 Ron Fagin 3
4 Jeffrey Ullman 4
8 R Fagin 3
9 J Helpern 8
Integration
Match with existing linked entities while
respecting all constraints
32
CACM 2000, R. Fagin and J. Helpern, Belief,
awareness, reasoning in AI
Combined Extractionintegration
Only extraction
Author R. Fagin Author J. Helpern
Title Belief,..reasoning Journal
AI Year 2000
Author R. Fagin Author J. Helpern Title
Belief,..reasoning in AI Journal CACM
Year 2000
Id Title Year Journal Canonical

2 Update Semantics 1983 10


7 Belief, awareness, reasoning 1988 17
Year mismatch!
33
Combined extraction matching
  • Convert predicted label to be a pair y (a,r)
  • (r0) means none-of-the-above or a new entry

l,u
l11, u12 l11, u12 l1u13 l14, u18
CACM. 2000 Fagin Belief Awareness Reasoning In AI
Journal Year Journal Year Author Title
0 7 0 7 3 7
x
y
r
Id of matching entity
Constraints exist on ids that can be assigned to
two segments
34
Constrained models
  • Training
  • Ignore constraints or use max-margin methods that
    require only MAP estimates
  • Application
  • Formulate as a constrained integer programming
    problem (expensive)
  • Use general A-star search to find most likely
    constrained assignment

35
Full integration performance

L LDB ?
PersonalBib author 70.8 74.0 4.5
PersonalBib journal 29.6 45.5 53.6
PersonalBib title 51.6 65.0 25.9
Address city_name 70.1 74.6 6.4
Address state_name 9.0 28.3 213.8
Address pincode 87.8 90.7 3.3
  • L conventional extraction matching
  • L DB technology presented here
  • Much higher accuracies possible with more
    training data

(from Mansuri et al ICDE 2006)
36
What next in data integration?
  • Lots to be done in building large-scale, viable
    data integration systems
  • Online collective inference
  • Cannot freeze database
  • Cannot batch too many inferences
  • Need theoretically sound, practical alternatives
    to exact, batch inference
  • Performance of integration (Chandel et al, ICDE
    2006)
  • Other operations
  • Data standardization
  • Schema management

37
Probabilistic Querying Systems
  • Integration systems while improving, cannot be
    perfect particularly for domains like the web
  • Users supervision of each integration result
    impossible
  • ? Create uncertainty-aware storage and querying
    engines
  • Two enablers
  • Probabilistic database querying engines over
    generic uncertainty models
  • Conditional graphical models produce
    well-calibrated probabilities

38
Probabilities in CRFs are well-calibrated
Cora citations
Cora headers
Ideal
Ideal
Probability of segmentation ?? Probability
correct E.g 0.5 probability ?? Correct 50 of
the times
39
Uncertainty in integration systems
Unstructured text
Model
Additional training data
p1
Entities
Very uncertain?
Other more compact models?
Entities
p2
Entities
pk
Probabilistic database system
Select conference name of article RJ03? Find most
cited author?
40
In summary
  • Data integration provides scope for several
    interesting learning problems
  • Probabilistic graphical models provide robust,
    unified mechanism of exploiting wide variety of
    clues and dependencies
  • Lot of open research challenges in making
    graphical models work in a practical setting

41
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com