Information Extraction, Data Mining and Joint Inference

About This Presentation

Title:

Information Extraction, Data Mining and Joint Inference

Description:

... years, Microsoft Corporation CEO Bill Gates railed against the economic ... Gates himself says Microsoft will gladly disclose its crown jewels--the coveted ... – PowerPoint PPT presentation

Number of Views:113

Avg rating:3.0/5.0

Slides: 79

Provided by: kdd2

Category:

more less

Transcript and Presenter's Notes

Title: Information Extraction, Data Mining and Joint Inference

1
Information Extraction,Data Miningand Joint
Inference

Andrew McCallum
Computer Science Department
University of Massachusetts Amherst

Joint work with Charles Sutton, Aron Culotta,
Xuerui Wang, Ben Wellner, David Mimno, Gideon
Mann.
2
Goal
Mine actionable knowledgefrom unstructured text.
3
Extracting Job Openings from the Web
4
A Portal for Job Openings
5
Job Openings Category High Tech Keyword Java
Location U.S.
6
Data Mining the Extracted Job Information
7
IE from Research Papers
McCallum et al 99
8
IE from Research Papers
9
Mining Research Papers
Rosen-Zvi, Griffiths, Steyvers, Smyth, 2004
Giles et al
10
IE fromChinese Documents regarding Weather
Department of Terrestrial System, Chinese Academy
of Sciences
200k documents several millennia old - Qing
Dynasty Archives - memos - newspaper articles -
diaries
11
What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification clustering association
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
12
What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
13
What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
14
What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation

Free Soft..
Microsoft
Microsoft
TITLE ORGANIZATION

founder

CEO
VP

Stallman
NAME
Veghte
Bill Gates
Richard
Bill
15
From Text to Actionable Knowledge
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Database
Documentcollection
Actionableknowledge
Prediction Outlier detection Decision support
16
Problem

Combined in serial juxtaposition,
IE and DM are unaware of each others
weaknesses and opportunities.
DM begins from a populated DB, unaware of where
the data came from, or its inherent errors and
uncertainties.
IE is unaware of emerging patterns and
regularities in the DB.
The accuracy of both suffers, and significant
mining of complex text sources is beyond reach.

17
Solution
Uncertainty Info
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Database
Documentcollection
Actionableknowledge
Emerging Patterns
Prediction Outlier detection Decision support
18
Solution
Unified Model
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Probabilistic Model
Documentcollection
Actionableknowledge
Prediction Outlier detection Decision support
19
Scientific Questions

What model structures will capture salient
dependencies?
Will joint inference actually improve accuracy?
How to do inference in these large graphical
models?
How to do parameter estimation efficiently in
these models,which are built from multiple large
components?
How to do structure discovery in these models?

20
Scientific Questions

What model structures will capture salient
dependencies?
Will joint inference actually improve accuracy?
How to do inference in these large graphical
models?
How to do parameter estimation efficiently in
these models,which are built from multiple large
components?
How to do structure discovery in these models?

21
Outline
a

Examples of IE and Data Mining.
Motivate Joint Inference
Brief introduction to Conditional Random Fields
Joint inference Examples
Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation)
Joint Co-reference Resolution (Graph
Partitioning)
Joint Co-reference with Weighted 1st-order Logic
(MCMC)
Joint Relation Extraction and Data Mining
(Bootstrapping)
Ultimate application area Rexa, a Web portal
for researchers

a
22
(Linear Chain) Conditional Random Fields
Lafferty, McCallum, Pereira 2001
Undirected graphical model, trained to
maximize conditional probability of output
(sequence) given input (sequence)
Finite state model
Graphical model
OTHER PERSON OTHER ORG TITLE
output seq
y
y
y
y
y
t2
t3
t
-
1
t
t1
FSM states
. . .
observations
x
x
x
x
x
t
t
t
t
1
-
2
3
t
1
input seq
said Jones a Microsoft VP
23
Table Extraction from Government Reports
Cash receipts from marketings of milk during 1995
at 19.9 billion dollars, was slightly below
1994. Producer returns averaged 12.93 per
hundredweight, 0.19 per hundredweight
below 1994. Marketings totaled 154 billion
pounds, 1 percent above 1994. Marketings
include whole milk sold to plants and dealers as
well as milk sold directly to consumers.

An estimated 1.56 billion pounds of milk
were used on farms where produced, 8 percent
less than 1994. Calves were fed 78 percent of
this milk with the remainder consumed in
producer households.

Milk Cows
and Production of Milk and Milkfat
United States,
1993-95
-------------------------------------------------
-------------------------------
Production of Milk and Milkfat
2/ Number
-------------------------------------------------
------ Year of Per Milk Cow
Percentage Total
Milk Cows 1/------------------- of Fat in All
------------------
Milk Milkfat Milk Produced Milk
Milkfat ----------------------------------------
----------------------------------------
1,000 Head --- Pounds --- Percent
Million Pounds

1993 9,589 15,704 575
3.66 150,582 5,514.4 1994
9,500 16,175 592 3.66
153,664 5,623.7 1995 9,461
16,451 602 3.66 155,644
5,694.3 ----------------------------------------
---------------------------------------- 1/
Average number during year, excluding heifers not
yet fresh. 2/ Excludes milk
sucked by calves.

24
Table Extraction from Government Reports
Pinto, McCallum, Wei, Croft, 2003 SIGIR
100 documents from www.fedstats.gov
Labels
CRF

Non-Table
Table Title
Table Header
Table Data Row
Table Section Data Row
Table Footnote
... (12 in all)

Cash receipts from marketings of milk during 1995
at 19.9 billion dollars, was slightly below
1994. Producer returns averaged 12.93 per
hundredweight, 0.19 per hundredweight
below 1994. Marketings totaled 154 billion
pounds, 1 percent above 1994. Marketings
include whole milk sold to plants and dealers as
well as milk sold directly to consumers.

An estimated 1.56 billion pounds of milk
were used on farms where produced, 8 percent
less than 1994. Calves were fed 78 percent of
this milk with the remainder consumed in
producer households.

Milk Cows
and Production of Milk and Milkfat
United States,
1993-95
-------------------------------------------------
-------------------------------
Production of Milk and Milkfat
2/ Number
-------------------------------------------------
------ Year of Per Milk Cow
Percentage Total
Milk Cows 1/------------------- of Fat in All
------------------
Milk Milkfat Milk Produced Milk
Milkfat ----------------------------------------
----------------------------------------
1,000 Head --- Pounds --- Percent
Million Pounds

1993 9,589 15,704 575
3.66 150,582 5,514.4 1994
9,500 16,175 592 3.66
153,664 5,623.7 1995 9,461
16,451 602 3.66 155,644
5,694.3 ----------------------------------------
---------------------------------------- 1/
Average number during year, excluding heifers not
yet fresh. 2/ Excludes milk
sucked by calves.
Features

Percentage of digit chars
Percentage of alpha chars
Indented
Contains 5 consecutive spaces
Whitespace in this line aligns with prev.
...
Conjunctions of all previous features, time
offset 0,0, -1,0, 0,1, 1,2.

25
Table Extraction Experimental Results
Pinto, McCallum, Wei, Croft, 2003 SIGIR
Line labels, percent correct
Table segments, F1
HMM
65
64
Stateless MaxEnt
85
-
95
92
CRF
26
IE from Research Papers
McCallum et al 99
27
IE from Research Papers
Field-level F1 Hidden Markov Models
(HMMs) 75.6 Seymore, McCallum, Rosenfeld,
1999 Support Vector Machines (SVMs) 89.7 Han,
Giles, et al, 2003 Conditional Random Fields
(CRFs) 93.9 Peng, McCallum, 2004
? error 40
28
Named Entity Recognition
CRICKET - MILLNS SIGNS FOR BOLAND CAPE TOWN
1996-08-22 South African provincial side Boland
said on Thursday they had signed Leicestershire
fast bowler David Millns on a one year contract.
Millns, who toured Australia with England A in
1992, replaces former England all-rounder Phillip
DeFreitas as Boland's overseas professional.
Labels Examples
PER Yayuk Basuki Innocent Butare ORG 3M KDP
Cleveland LOC Cleveland Nirmal Hriday The
Oval MISC Java Basque 1,000 Lakes Rally
29
Automatically Induced Features
McCallum Li, 2003, CoNLL
Index Feature 0 inside-noun-phrase
(ot-1) 5 stopword (ot) 20 capitalized
(ot1) 75 wordthe (ot) 100 in-person-lexicon
(ot-1) 200 wordin (ot2) 500 wordRepublic
(ot1) 711 wordRBI (ot) headerBASEBALL 1027 he
aderCRICKET (ot) in-English-county-lexicon
(ot) 1298 company-suffix-word (firstmentiont2) 40
40 location (ot) POSNNP (ot) capitalized
(ot) stopword (ot-1) 4945 moderately-rare-first-
name (ot-1) very-common-last-name
(ot) 4474 wordthe (ot-2) wordof (ot)
30
Named Entity Extraction Results
McCallum Li, 2003, CoNLL
Method F1 HMMs BBN's Identifinder 73 CRFs
w/out Feature Induction 83 CRFs with Feature
Induction 90 based on LikelihoodGain
31
Outline
a

Examples of IE and Data Mining.
Motivate Joint Inference
Brief introduction to Conditional Random Fields
Joint inference Examples
Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation)
Joint Co-reference Resolution (Graph
Partitioning)
Joint Co-reference with Weighted 1st-order Logic
(MCMC)
Joint Relation Extraction and Data Mining
(Bootstrapping)
Ultimate application area Rexa, a Web portal
for researchers

a
a
32
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
33
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
34
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
But errors cascade--must be perfect at every
stage to do well.
35
Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
Joint prediction of part-of-speech and
noun-phrase in newswire, matching accuracy with
only 50 of the training data.
Inference Loopy Belief Propagation
36
Outline
a

Examples of IE and Data Mining.
Motivate Joint Inference
Brief introduction to Conditional Random Fields
Joint inference Examples
Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation)
Joint Co-reference Resolution (Graph
Partitioning)
Joint Co-reference with Weighted 1st-order Logic
(MCMC)
Joint Relation Extraction and Data Mining
(Bootstrapping)
Ultimate application area Rexa, a Web portal
for researchers

a
37
Joint co-reference among all pairsAffinity
Matrix CRF
Entity resolutionObject correspondence
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
Y/N
-99
Y/N
25 reduction in error on co-reference of
proper nouns in newswire.
11
. . . she . . .
Inference Correlational clustering graph
partitioning
McCallum, Wellner, IJCAI WS 2003, NIPS 2004
Bansal, Blum, Chawla, 2002
38
Joint Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Stuart Russell
Y/N
Stuart Russell
Y/N
Y/N
S. Russel
39
Joint Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Organizations
Stuart Russell
University of California at Berkeley
Y/N
Y/N
Stuart Russell
Y/N
Berkeley
Y/N
Y/N
Y/N
S. Russel
Berkeley
40
Joint Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Organizations
Stuart Russell
University of California at Berkeley
Y/N
Y/N
Stuart Russell
Y/N
Berkeley
Y/N
Y/N
Y/N
Reduces error by 22
S. Russel
Berkeley
41
Joint Co-reference Experimental Results
Culotta McCallum 2005
CiteSeer Dataset 1500 citations, 900 unique
papers, 350 unique venues Paper
Venue indep joint indep joint constraint 88.
9 91.0 79.4 94.1 reinforce 92.2 92.2 56.5 60.1
face 88.2 93.7 80.9 82.8 reason 97.4 97.0 75
.6 79.5 Micro Average 91.7 93.4 73.1 79.1 ?
error20 ?error22
42
4. Joint segmentation and co-reference
Extraction from and matching of research paper
citations.
o
s
World Knowledge
Laurel, B. Interface Agents Metaphors with
Character, in The Art of Human-Computer
Interface Design, B. Laurel (ed), Addison-Wesley,
1990.
c
Co-reference decisions
y
y
p
Brenda Laurel. Interface Agents Metaphors with
Character, in Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
Databasefield values
c
y
c
Citation attributes
s
s
Segmentation
o
o
35 reduction in co-reference error by using
segmentation uncertainty.
6-14 reduction in segmentation error by using
co-reference.
Inference Sparse Generalized Belief Propagation
Wellner, McCallum, Peng, Hay, UAI 2004
see also Marthi, Milch, Russell, 2003
Pal, Sutton, McCallum, 2005
43
Outline
a

Examples of IE and Data Mining.
Motivate Joint Inference
Brief introduction to Conditional Random Fields
Joint inference Examples
Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation)
Joint Co-reference Resolution (Graph
Partitioning)
Joint Co-reference with Weighted 1st-order Logic
(MCMC)
Joint Relation Extraction and Data Mining
(Bootstrapping)
Ultimate application area Rexa, a Web portal
for researchers

a
a
a
a
44
Sometimes pairwise comparisonsare not enough.

Entities have multiple attributes (name, email,
institution, location)need to measure
compatibility among them.
Having 2 given names is common, but not 4.
Need to measure size of the clusters of mentions.
? a pair of lastname strings that differ 5?
We need measures on hypothesized entities
We need First-order logic

45
Toward High-Order Representations Identity
Uncertainty
..Howard Dean..
..H Dean..
..Dean Martin..
..Howard Martin..
..Dino..
..Howard..
46
Toward High-Order Representations Identity
Uncertainty
..Howard Dean..
..H Dean..
..Dean Martin..
..Howard Martin..
..Dino..
..Howard..
47
Pairwise Co-reference Features
Howard Dean
Dean Martin
Howard Martin
48
Cluster-wise (higher-order) Representations
Howard Dean
SamePerson(Howard Dean, Howard Martin,
Dean Martin)?
Dean Martin
Howard Martin
49
Cluster-wise (higher-order) Representations

Dino
Martin
Dean Martin
Howard Dean
Howard Martin
Howie
50
This space complexity is common in first-order
probabilistic models
51
Markov Logic (Weighted 1st-order Logic)Using
1st-order Logic as a Template to Construct a CRF
Richardson Domingos 2005
ground Markov network
grounding Markov network requires space
O(nr) n number constants
r highest clause arity
52
How can we perform inference and learning in
models that cannot be grounded?
53
Inference in First-Order ModelsSAT Solvers

Weighted SAT solvers Kautz et al 1997
Requires complete grounding of network
LazySAT Singla Domingos 2006
Saves memory by only storing clauses that may
become unsatisfied
Still requires exponential time to visit all
ground clauses at initialization.

54
Inference in First-Order ModelsSampling

Gibbs Sampling
Difficult to move between high probability
configurations by changing single variables
Although, consider MC-SAT Poon Domingos 06
An alternative Metropolis-Hastings sampling
Can be extended to partial configurations
Only instantiate relevant variables
Successfully used in BLOG models Milch et al
2005
2 parts proposal distribution, acceptance
distribution.

Culotta McCallum 2006
55
Learning in First-Order Models

Sampling
Pseudo-likelihood
Voted Perceptron
We propose
Conditional model to rank configurations
Intuitive objective function for
Metropolis-Hastings

56
Contributions

Metropolis-Hastings sampling in an undirected
model with first-order features
Discriminative training for Metropolis-Hastings

57
An Undirected Model of Identity Uncertainty
58
Toward High-Order Representations Identity
Uncertainty

Dino
Martin
Dean Martin
Howard Dean
Howard Martin
Howie
59
Model
First-order features
Dean Martin Dino Howard Martin Howie Martin
Howard Dean Governor Howie
fw SamePerson(x) fb DifferentPerson(x, x )
60
Model
Howard Martin Howie Martin
Howard Dean Governor Howie
Dean Martin Dino
61
Model
ZX Sum over all possible configurations!
62
Proposal Distribution
Dean Martin Howie Martin
Howard Martin Dino
y y
63
Proposal Distribution
Dean Martin Howie Martin
Howard Martin Dino
y y
Dean Martin Howie Martin Howard Martin
Howie Martin
64
Proposal Distribution
y y
Dean Martin Howie Martin Howard Martin
Howie Martin
Dean Martin Howie Martin
Howard Martin Dino
65
Inference with Metropolis-Hastings

y configuration
p(y)/p(y) likelihood ratio
Ratio of P(YX)
ZX cancels
q(yy) proposal distribution
probability of proposing move y ?y

66
Learning the Likelihood Ratio
Given a pair of configurations, learn to rank the
better configuration higher.
67
Learning the Likelihood Ratio
S(Y) true evaluation of configuration (e.g. F1)
68
Sampling Training Examples

Run sampler on training data
Generate training example for each proposed move
Iteratively retrain during sampling

69
Tying Parameters with Proposal Distribution

Proposal distribution q(yy) cheap
approximation to p(y)
Reuse subset of parameters in p(y)
E.g. in identity uncertainty model
Sample two clusters
Stochastic agglomerative clustering to propose
new configuration

70
Experiments
71
Simplified Model

Use only within-cluster factors.
Inference with agglomerative clustering

Dean Martin Dino
Howard Martin Howie Martin
72
Experiments

Paper citation coreference
Author coreference
First-order features
All Titles Match
Exists Year MisMatch
Average String Edit Distance X
Number of mentions

73
Results on Citation Data
Citeseer paper coreference results (pair F1)
Author coreference results (pair F1)
74
Outline
a

Examples of IE and Data Mining.
Motivate Joint Inference
Brief introduction to Conditional Random Fields
Joint inference Examples
Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation)
Joint Co-reference Resolution (Graph
Partitioning)
Joint Co-reference with Weighted 1st-order Logic
(MCMC)
Joint Relation Extraction and Data Mining
(Bootstrapping)
Ultimate application area Rexa, a Web portal
for researchers

a
a
a
a
a
75
Motivation Robust Relation Extraction
George W. Bush graduated from Yale
George W. Bush attended Yale
Bill Clinton attended Yale. Fellow alumnus George
W. Bush
Yale is located in New Haven. When George W.
Bush visited

Pattern matching

Classifier with contextual and external
features (thesaurus)

Relations with sparse, noisy, or complex
contextual evidence?

How to learn predictive relational patterns?

Knowledge discovery from text
Mine web to discover unknown facts

76
Data

270 Wikipedia articles
1000 paragraphs
4700 relations
52 relation types
JobTitle, BirthDay, Friend, Sister, Husband,
Employer, Cousin, Competition, Education,
Targeted for density of relations
Bush/Kennedy/Manning/Coppola families and friends

77
(No Transcript)
78
Relation Extraction as
Named-Entity Recognition Classification

George W. Bush and his father, George H. W.
Bush,

79
Relation Extraction as
Named-Entity Recognition Classification

Difficulties with this approach
enumerate all pairs of entities in document
low signal/noise
errors in NER
if Ford mislabeled as company, wont be part
of brother relation.

80
Relation Extraction as Sequence Labeling

George W. Bush
the son of George H. W. Bush

Most entities are related to subject
Folds together NER and relation extraction
Models dependency of adjacent relations
Austrian physicist nationality jobTitle
Lots of work on sequence labeling
HMMs, CRFs,

81
CRF Features

Context words
Lexicons
cities, states, names, companies
Regexp
Capitalization, ContainsDigits,
ContainsPunctuation
Part-of-speech
Prefixes/suffixes
Conjunctions of these within window of size 6

82
Example Features

Father son of NAME father, NAME
Brother hisher brother X
Executive JOBTITLE of X
Birthday born MONTH 0-9
Boss under JOBTITLE X
Competition defeating NAME
Award awarded DET X won DET X

83
(No Transcript)
84
Mining Relational Features

Want to discover database regularities across
documents that provide strong evidence of
relation
High Precision
parent(x,z) sibling(z,w) child(w,y)
cousin(x,y)
Low Precision
friends tend to attend the same schools

85
Mining Relational Features

Generate relational path features from extracted
(or true) database.
Paths between entities up to length k

86
George W. Bush his father George H. W.
Bush his cousin John Prescott Ellis
George H. W. Bush his sister Nancy Ellis Bush
Nancy Ellis Bush her son John Prescott Ellis
Cousin Fathers Sisters Son
87
John Kerry celebrated with Stuart Forbes
likely a cousin
88
Iterative DB Construction

Joseph P. Kennedy, Sr
son John F. Kennedy with Rose
Fitzgerald

(0.3)
89
Results
ME maximum entropy CRF conditional random
field RCRF CRF mined features
90
Examples of Discovered Relational Features

Mother Father?Wife
Cousin Mother?Husband?Nephew
Friend Education?Student
Education Father?Education
Boss Boss?Son
MemberOf Grandfather?MemberOf
Competition PoliticalParty?Member?Competition

91
Outline
a

Examples of IE and Data Mining.
Motivate Joint Inference
Brief introduction to Conditional Random Fields
Joint inference Examples
Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation)
Joint Co-reference Resolution (Graph
Partitioning)
Joint Co-reference with Weighted 1st-order Logic
(MCMC)
Joint Relation Extraction and Data Mining
(Bootstrapping)
Ultimate application area Rexa, a Web portal
for researchers

a
a
a
92
Mining our Research Literature

Better understand structure of our own research
area.
Structure helps us learn a new field.
Aid collaboration
Map how ideas travel through social networks of
researchers.
Aids for hiring and finding reviewers!

93
Previous Systems
94
(No Transcript)
95
Previous Systems
Cites
Research Paper
96
More Entities and Relations
Expertise
Cites
Research Paper
Person
Grant
University
Venue
Groups
97
(No Transcript)
98
(No Transcript)
99
(No Transcript)
100
(No Transcript)
101
(No Transcript)
102
(No Transcript)
103
(No Transcript)
104
(No Transcript)
105
(No Transcript)
106
(No Transcript)
107
(No Transcript)
108
(No Transcript)
109
(No Transcript)
110
(No Transcript)
111
(No Transcript)
112
(No Transcript)
113
(No Transcript)
114
(No Transcript)
115
Topical Transfer
Mann, Mimno, McCallum, JCDL 2006
Citation counts from one topic to another.
Map producers and consumers
116
Impact Diversity
Topic Diversity Entropy of the distribution of
citing topics
117
Summary

Joint inference needed for avoiding cascading
errors in information extraction and data mining.
Challenge making inference learning scale to
massive graphical models.
Markov-chain Monte Carlo
Rexa New research paper search engine, mining
the interactions in our community.

Write a Comment

User Comments (0)