Title: Information Extraction, Data Mining and Joint Inference
1Information Extraction,Data Miningand Joint
Inference
- Andrew McCallum
- Computer Science Department
- University of Massachusetts Amherst
Joint work with Charles Sutton, Aron Culotta,
Xuerui Wang, Ben Wellner, David Mimno, Gideon
Mann.
2Goal
Mine actionable knowledgefrom unstructured text.
3Extracting Job Openings from the Web
4A Portal for Job Openings
5Job Openings Category High Tech Keyword Java
Location U.S.
6Data Mining the Extracted Job Information
7 IE from Research Papers
McCallum et al 99
8IE from Research Papers
9Mining Research Papers
Rosen-Zvi, Griffiths, Steyvers, Smyth, 2004
Giles et al
10IE fromChinese Documents regarding Weather
Department of Terrestrial System, Chinese Academy
of Sciences
200k documents several millennia old - Qing
Dynasty Archives - memos - newspaper articles -
diaries
11What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification clustering association
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
12What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
13What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
14What is Information Extraction
As a familyof techniques
Information Extraction segmentation
classification association clustering
October 14, 2002, 400 a.m. PT For years,
Microsoft Corporation CEO Bill Gates railed
against the economic philosophy of open-source
software with Orwellian fervor, denouncing its
communal licensing as a "cancer" that stifled
technological innovation. Today, Microsoft
claims to "love" the open-source concept, by
which software code is made public to encourage
improvement and development by outside
programmers. Gates himself says Microsoft will
gladly disclose its crown jewels--the coveted
code behind the Windows operating system--to
select customers. "We can be open source. We
love the concept of shared source," said Bill
Veghte, a Microsoft VP. "That's a super-important
shift for us in terms of code access. Richard
Stallman, founder of the Free Software
Foundation, countered saying
Microsoft Corporation CEO Bill Gates Microsoft Gat
es Microsoft Bill Veghte Microsoft VP Richard
Stallman founder Free Software Foundation
Free Soft..
Microsoft
Microsoft
TITLE ORGANIZATION
founder
CEO
VP
Stallman
NAME
Veghte
Bill Gates
Richard
Bill
15From Text to Actionable Knowledge
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Database
Documentcollection
Actionableknowledge
Prediction Outlier detection Decision support
16Problem
- Combined in serial juxtaposition,
- IE and DM are unaware of each others
- weaknesses and opportunities.
- DM begins from a populated DB, unaware of where
the data came from, or its inherent errors and
uncertainties. - IE is unaware of emerging patterns and
regularities in the DB. -
- The accuracy of both suffers, and significant
mining of complex text sources is beyond reach.
17Solution
Uncertainty Info
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Database
Documentcollection
Actionableknowledge
Emerging Patterns
Prediction Outlier detection Decision support
18Solution
Unified Model
Spider
Filter
Data Mining
IE
Segment Classify Associate Cluster
Discover patterns - entity types - links /
relations - events
Probabilistic Model
Documentcollection
Actionableknowledge
Prediction Outlier detection Decision support
19Scientific Questions
- What model structures will capture salient
dependencies? - Will joint inference actually improve accuracy?
- How to do inference in these large graphical
models? - How to do parameter estimation efficiently in
these models,which are built from multiple large
components? - How to do structure discovery in these models?
20Scientific Questions
- What model structures will capture salient
dependencies? - Will joint inference actually improve accuracy?
- How to do inference in these large graphical
models? - How to do parameter estimation efficiently in
these models,which are built from multiple large
components? - How to do structure discovery in these models?
21Outline
a
- Examples of IE and Data Mining.
- Motivate Joint Inference
- Brief introduction to Conditional Random Fields
- Joint inference Examples
- Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation) - Joint Co-reference Resolution (Graph
Partitioning) - Joint Co-reference with Weighted 1st-order Logic
(MCMC) - Joint Relation Extraction and Data Mining
(Bootstrapping) - Ultimate application area Rexa, a Web portal
for researchers
a
22(Linear Chain) Conditional Random Fields
Lafferty, McCallum, Pereira 2001
Undirected graphical model, trained to
maximize conditional probability of output
(sequence) given input (sequence)
Finite state model
Graphical model
OTHER PERSON OTHER ORG TITLE
output seq
y
y
y
y
y
t2
t3
t
-
1
t
t1
FSM states
. . .
observations
x
x
x
x
x
t
t
t
t
1
-
2
3
t
1
input seq
said Jones a Microsoft VP
23Table Extraction from Government Reports
Cash receipts from marketings of milk during 1995
at 19.9 billion dollars, was slightly below
1994. Producer returns averaged 12.93 per
hundredweight, 0.19 per hundredweight
below 1994. Marketings totaled 154 billion
pounds, 1 percent above 1994. Marketings
include whole milk sold to plants and dealers as
well as milk sold directly to consumers.
An estimated 1.56 billion pounds of milk
were used on farms where produced, 8 percent
less than 1994. Calves were fed 78 percent of
this milk with the remainder consumed in
producer households.
Milk Cows
and Production of Milk and Milkfat
United States,
1993-95
-------------------------------------------------
-------------------------------
Production of Milk and Milkfat
2/ Number
-------------------------------------------------
------ Year of Per Milk Cow
Percentage Total
Milk Cows 1/------------------- of Fat in All
------------------
Milk Milkfat Milk Produced Milk
Milkfat ----------------------------------------
----------------------------------------
1,000 Head --- Pounds --- Percent
Million Pounds
1993 9,589 15,704 575
3.66 150,582 5,514.4 1994
9,500 16,175 592 3.66
153,664 5,623.7 1995 9,461
16,451 602 3.66 155,644
5,694.3 ----------------------------------------
---------------------------------------- 1/
Average number during year, excluding heifers not
yet fresh. 2/ Excludes milk
sucked by calves.
24Table Extraction from Government Reports
Pinto, McCallum, Wei, Croft, 2003 SIGIR
100 documents from www.fedstats.gov
Labels
CRF
- Non-Table
- Table Title
- Table Header
- Table Data Row
- Table Section Data Row
- Table Footnote
- ... (12 in all)
Cash receipts from marketings of milk during 1995
at 19.9 billion dollars, was slightly below
1994. Producer returns averaged 12.93 per
hundredweight, 0.19 per hundredweight
below 1994. Marketings totaled 154 billion
pounds, 1 percent above 1994. Marketings
include whole milk sold to plants and dealers as
well as milk sold directly to consumers.
An estimated 1.56 billion pounds of milk
were used on farms where produced, 8 percent
less than 1994. Calves were fed 78 percent of
this milk with the remainder consumed in
producer households.
Milk Cows
and Production of Milk and Milkfat
United States,
1993-95
-------------------------------------------------
-------------------------------
Production of Milk and Milkfat
2/ Number
-------------------------------------------------
------ Year of Per Milk Cow
Percentage Total
Milk Cows 1/------------------- of Fat in All
------------------
Milk Milkfat Milk Produced Milk
Milkfat ----------------------------------------
----------------------------------------
1,000 Head --- Pounds --- Percent
Million Pounds
1993 9,589 15,704 575
3.66 150,582 5,514.4 1994
9,500 16,175 592 3.66
153,664 5,623.7 1995 9,461
16,451 602 3.66 155,644
5,694.3 ----------------------------------------
---------------------------------------- 1/
Average number during year, excluding heifers not
yet fresh. 2/ Excludes milk
sucked by calves.
Features
- Percentage of digit chars
- Percentage of alpha chars
- Indented
- Contains 5 consecutive spaces
- Whitespace in this line aligns with prev.
- ...
- Conjunctions of all previous features, time
offset 0,0, -1,0, 0,1, 1,2.
25Table Extraction Experimental Results
Pinto, McCallum, Wei, Croft, 2003 SIGIR
Line labels, percent correct
Table segments, F1
HMM
65
64
Stateless MaxEnt
85
-
95
92
CRF
26 IE from Research Papers
McCallum et al 99
27IE from Research Papers
Field-level F1 Hidden Markov Models
(HMMs) 75.6 Seymore, McCallum, Rosenfeld,
1999 Support Vector Machines (SVMs) 89.7 Han,
Giles, et al, 2003 Conditional Random Fields
(CRFs) 93.9 Peng, McCallum, 2004
? error 40
28Named Entity Recognition
CRICKET - MILLNS SIGNS FOR BOLAND CAPE TOWN
1996-08-22 South African provincial side Boland
said on Thursday they had signed Leicestershire
fast bowler David Millns on a one year contract.
Millns, who toured Australia with England A in
1992, replaces former England all-rounder Phillip
DeFreitas as Boland's overseas professional.
Labels Examples
PER Yayuk Basuki Innocent Butare ORG 3M KDP
Cleveland LOC Cleveland Nirmal Hriday The
Oval MISC Java Basque 1,000 Lakes Rally
29Automatically Induced Features
McCallum Li, 2003, CoNLL
Index Feature 0 inside-noun-phrase
(ot-1) 5 stopword (ot) 20 capitalized
(ot1) 75 wordthe (ot) 100 in-person-lexicon
(ot-1) 200 wordin (ot2) 500 wordRepublic
(ot1) 711 wordRBI (ot) headerBASEBALL 1027 he
aderCRICKET (ot) in-English-county-lexicon
(ot) 1298 company-suffix-word (firstmentiont2) 40
40 location (ot) POSNNP (ot) capitalized
(ot) stopword (ot-1) 4945 moderately-rare-first-
name (ot-1) very-common-last-name
(ot) 4474 wordthe (ot-2) wordof (ot)
30Named Entity Extraction Results
McCallum Li, 2003, CoNLL
Method F1 HMMs BBN's Identifinder 73 CRFs
w/out Feature Induction 83 CRFs with Feature
Induction 90 based on LikelihoodGain
31Outline
a
- Examples of IE and Data Mining.
- Motivate Joint Inference
- Brief introduction to Conditional Random Fields
- Joint inference Examples
- Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation) - Joint Co-reference Resolution (Graph
Partitioning) - Joint Co-reference with Weighted 1st-order Logic
(MCMC) - Joint Relation Extraction and Data Mining
(Bootstrapping) - Ultimate application area Rexa, a Web portal
for researchers
a
a
32Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
33Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
34Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
But errors cascade--must be perfect at every
stage to do well.
35Jointly labeling cascaded sequencesFactorial CRFs
Sutton, Khashayar, McCallum, ICML 2004
Named-entity tag
Noun-phrase boundaries
Part-of-speech
English words
Joint prediction of part-of-speech and
noun-phrase in newswire, matching accuracy with
only 50 of the training data.
Inference Loopy Belief Propagation
36Outline
a
- Examples of IE and Data Mining.
- Motivate Joint Inference
- Brief introduction to Conditional Random Fields
- Joint inference Examples
- Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation) - Joint Co-reference Resolution (Graph
Partitioning) - Joint Co-reference with Weighted 1st-order Logic
(MCMC) - Joint Relation Extraction and Data Mining
(Bootstrapping) - Ultimate application area Rexa, a Web portal
for researchers
a
37Joint co-reference among all pairsAffinity
Matrix CRF
Entity resolutionObject correspondence
. . . Mr Powell . . .
45
. . . Powell . . .
Y/N
Y/N
-99
Y/N
25 reduction in error on co-reference of
proper nouns in newswire.
11
. . . she . . .
Inference Correlational clustering graph
partitioning
McCallum, Wellner, IJCAI WS 2003, NIPS 2004
Bansal, Blum, Chawla, 2002
38Joint Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Stuart Russell
Y/N
Stuart Russell
Y/N
Y/N
S. Russel
39Joint Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Organizations
Stuart Russell
University of California at Berkeley
Y/N
Y/N
Stuart Russell
Y/N
Berkeley
Y/N
Y/N
Y/N
S. Russel
Berkeley
40Joint Co-reference for Multiple Entity Types
Culotta McCallum 2005
People
Organizations
Stuart Russell
University of California at Berkeley
Y/N
Y/N
Stuart Russell
Y/N
Berkeley
Y/N
Y/N
Y/N
Reduces error by 22
S. Russel
Berkeley
41Joint Co-reference Experimental Results
Culotta McCallum 2005
CiteSeer Dataset 1500 citations, 900 unique
papers, 350 unique venues Paper
Venue indep joint indep joint constraint 88.
9 91.0 79.4 94.1 reinforce 92.2 92.2 56.5 60.1
face 88.2 93.7 80.9 82.8 reason 97.4 97.0 75
.6 79.5 Micro Average 91.7 93.4 73.1 79.1 ?
error20 ?error22
424. Joint segmentation and co-reference
Extraction from and matching of research paper
citations.
o
s
World Knowledge
Laurel, B. Interface Agents Metaphors with
Character, in The Art of Human-Computer
Interface Design, B. Laurel (ed), Addison-Wesley,
1990.
c
Co-reference decisions
y
y
p
Brenda Laurel. Interface Agents Metaphors with
Character, in Laurel, The Art of Human-Computer
Interface Design, 355-366, 1990.
Databasefield values
c
y
c
Citation attributes
s
s
Segmentation
o
o
35 reduction in co-reference error by using
segmentation uncertainty.
6-14 reduction in segmentation error by using
co-reference.
Inference Sparse Generalized Belief Propagation
Wellner, McCallum, Peng, Hay, UAI 2004
see also Marthi, Milch, Russell, 2003
Pal, Sutton, McCallum, 2005
43Outline
a
- Examples of IE and Data Mining.
- Motivate Joint Inference
- Brief introduction to Conditional Random Fields
- Joint inference Examples
- Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation) - Joint Co-reference Resolution (Graph
Partitioning) - Joint Co-reference with Weighted 1st-order Logic
(MCMC) - Joint Relation Extraction and Data Mining
(Bootstrapping) - Ultimate application area Rexa, a Web portal
for researchers
a
a
a
a
44Sometimes pairwise comparisonsare not enough.
- Entities have multiple attributes (name, email,
institution, location)need to measure
compatibility among them. - Having 2 given names is common, but not 4.
- Need to measure size of the clusters of mentions.
- ? a pair of lastname strings that differ 5?
- We need measures on hypothesized entities
- We need First-order logic
45Toward High-Order Representations Identity
Uncertainty
..Howard Dean..
..H Dean..
..Dean Martin..
..Howard Martin..
..Dino..
..Howard..
46Toward High-Order Representations Identity
Uncertainty
..Howard Dean..
..H Dean..
..Dean Martin..
..Howard Martin..
..Dino..
..Howard..
47Pairwise Co-reference Features
Howard Dean
Dean Martin
Howard Martin
48Cluster-wise (higher-order) Representations
Howard Dean
SamePerson(Howard Dean, Howard Martin,
Dean Martin)?
Dean Martin
Howard Martin
49Cluster-wise (higher-order) Representations
Dino
Martin
Dean Martin
Howard Dean
Howard Martin
Howie
50This space complexity is common in first-order
probabilistic models
51Markov Logic (Weighted 1st-order Logic)Using
1st-order Logic as a Template to Construct a CRF
Richardson Domingos 2005
ground Markov network
grounding Markov network requires space
O(nr) n number constants
r highest clause arity
52How can we perform inference and learning in
models that cannot be grounded?
53Inference in First-Order ModelsSAT Solvers
- Weighted SAT solvers Kautz et al 1997
- Requires complete grounding of network
- LazySAT Singla Domingos 2006
- Saves memory by only storing clauses that may
become unsatisfied - Still requires exponential time to visit all
ground clauses at initialization.
54Inference in First-Order ModelsSampling
- Gibbs Sampling
- Difficult to move between high probability
configurations by changing single variables - Although, consider MC-SAT Poon Domingos 06
- An alternative Metropolis-Hastings sampling
- Can be extended to partial configurations
- Only instantiate relevant variables
- Successfully used in BLOG models Milch et al
2005 - 2 parts proposal distribution, acceptance
distribution.
Culotta McCallum 2006
55Learning in First-Order Models
- Sampling
- Pseudo-likelihood
- Voted Perceptron
- We propose
- Conditional model to rank configurations
- Intuitive objective function for
Metropolis-Hastings
56Contributions
- Metropolis-Hastings sampling in an undirected
model with first-order features - Discriminative training for Metropolis-Hastings
57An Undirected Model of Identity Uncertainty
58Toward High-Order Representations Identity
Uncertainty
Dino
Martin
Dean Martin
Howard Dean
Howard Martin
Howie
59Model
First-order features
Dean Martin Dino Howard Martin Howie Martin
Howard Dean Governor Howie
fw SamePerson(x) fb DifferentPerson(x, x )
60Model
Howard Martin Howie Martin
Howard Dean Governor Howie
Dean Martin Dino
61Model
ZX Sum over all possible configurations!
62Proposal Distribution
Dean Martin Howie Martin
Howard Martin Dino
y y
63Proposal Distribution
Dean Martin Howie Martin
Howard Martin Dino
y y
Dean Martin Howie Martin Howard Martin
Howie Martin
64Proposal Distribution
y y
Dean Martin Howie Martin Howard Martin
Howie Martin
Dean Martin Howie Martin
Howard Martin Dino
65Inference with Metropolis-Hastings
- y configuration
- p(y)/p(y) likelihood ratio
- Ratio of P(YX)
- ZX cancels
- q(yy) proposal distribution
- probability of proposing move y ?y
66Learning the Likelihood Ratio
Given a pair of configurations, learn to rank the
better configuration higher.
67Learning the Likelihood Ratio
S(Y) true evaluation of configuration (e.g. F1)
68Sampling Training Examples
- Run sampler on training data
- Generate training example for each proposed move
- Iteratively retrain during sampling
69Tying Parameters with Proposal Distribution
- Proposal distribution q(yy) cheap
approximation to p(y) - Reuse subset of parameters in p(y)
- E.g. in identity uncertainty model
- Sample two clusters
- Stochastic agglomerative clustering to propose
new configuration
70Experiments
71Simplified Model
- Use only within-cluster factors.
- Inference with agglomerative clustering
Dean Martin Dino
Howard Martin Howie Martin
72Experiments
- Paper citation coreference
- Author coreference
- First-order features
- All Titles Match
- Exists Year MisMatch
- Average String Edit Distance X
- Number of mentions
73Results on Citation Data
Citeseer paper coreference results (pair F1)
Author coreference results (pair F1)
74Outline
a
- Examples of IE and Data Mining.
- Motivate Joint Inference
- Brief introduction to Conditional Random Fields
- Joint inference Examples
- Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation) - Joint Co-reference Resolution (Graph
Partitioning) - Joint Co-reference with Weighted 1st-order Logic
(MCMC) - Joint Relation Extraction and Data Mining
(Bootstrapping) - Ultimate application area Rexa, a Web portal
for researchers
a
a
a
a
a
75Motivation Robust Relation Extraction
George W. Bush graduated from Yale
George W. Bush attended Yale
Bill Clinton attended Yale. Fellow alumnus George
W. Bush
Yale is located in New Haven. When George W.
Bush visited
- Classifier with contextual and external
features (thesaurus)
- Relations with sparse, noisy, or complex
contextual evidence?
- How to learn predictive relational patterns?
- Knowledge discovery from text
- Mine web to discover unknown facts
76Data
- 270 Wikipedia articles
- 1000 paragraphs
- 4700 relations
- 52 relation types
- JobTitle, BirthDay, Friend, Sister, Husband,
Employer, Cousin, Competition, Education, - Targeted for density of relations
- Bush/Kennedy/Manning/Coppola families and friends
77(No Transcript)
78Relation Extraction as
Named-Entity Recognition Classification
- George W. Bush and his father, George H. W.
Bush,
79Relation Extraction as
Named-Entity Recognition Classification
- Difficulties with this approach
- enumerate all pairs of entities in document
- low signal/noise
- errors in NER
- if Ford mislabeled as company, wont be part
of brother relation.
80Relation Extraction as Sequence Labeling
- George W. Bush
- the son of George H. W. Bush
- Most entities are related to subject
- Folds together NER and relation extraction
- Models dependency of adjacent relations
- Austrian physicist nationality jobTitle
- Lots of work on sequence labeling
- HMMs, CRFs,
81CRF Features
- Context words
- Lexicons
- cities, states, names, companies
- Regexp
- Capitalization, ContainsDigits,
ContainsPunctuation - Part-of-speech
- Prefixes/suffixes
- Conjunctions of these within window of size 6
82Example Features
- Father son of NAME father, NAME
- Brother hisher brother X
- Executive JOBTITLE of X
- Birthday born MONTH 0-9
- Boss under JOBTITLE X
- Competition defeating NAME
- Award awarded DET X won DET X
83(No Transcript)
84Mining Relational Features
- Want to discover database regularities across
documents that provide strong evidence of
relation - High Precision
- parent(x,z) sibling(z,w) child(w,y)
cousin(x,y) - Low Precision
- friends tend to attend the same schools
85Mining Relational Features
- Generate relational path features from extracted
(or true) database. - Paths between entities up to length k
86George W. Bush his father George H. W.
Bush his cousin John Prescott Ellis
George H. W. Bush his sister Nancy Ellis Bush
Nancy Ellis Bush her son John Prescott Ellis
Cousin Fathers Sisters Son
87John Kerry celebrated with Stuart Forbes
likely a cousin
88Iterative DB Construction
- Joseph P. Kennedy, Sr
- son John F. Kennedy with Rose
Fitzgerald
(0.3)
89Results
ME maximum entropy CRF conditional random
field RCRF CRF mined features
90Examples of Discovered Relational Features
- Mother Father?Wife
- Cousin Mother?Husband?Nephew
- Friend Education?Student
- Education Father?Education
- Boss Boss?Son
- MemberOf Grandfather?MemberOf
- Competition PoliticalParty?Member?Competition
91Outline
a
- Examples of IE and Data Mining.
- Motivate Joint Inference
- Brief introduction to Conditional Random Fields
- Joint inference Examples
- Joint Labeling of Cascaded Sequences (Loopy
Belief Propagation) - Joint Co-reference Resolution (Graph
Partitioning) - Joint Co-reference with Weighted 1st-order Logic
(MCMC) - Joint Relation Extraction and Data Mining
(Bootstrapping) - Ultimate application area Rexa, a Web portal
for researchers
a
a
a
92Mining our Research Literature
- Better understand structure of our own research
area. - Structure helps us learn a new field.
- Aid collaboration
- Map how ideas travel through social networks of
researchers. - Aids for hiring and finding reviewers!
93Previous Systems
94(No Transcript)
95Previous Systems
Cites
Research Paper
96More Entities and Relations
Expertise
Cites
Research Paper
Person
Grant
University
Venue
Groups
97(No Transcript)
98(No Transcript)
99(No Transcript)
100(No Transcript)
101(No Transcript)
102(No Transcript)
103(No Transcript)
104(No Transcript)
105(No Transcript)
106(No Transcript)
107(No Transcript)
108(No Transcript)
109(No Transcript)
110(No Transcript)
111(No Transcript)
112(No Transcript)
113(No Transcript)
114(No Transcript)
115Topical Transfer
Mann, Mimno, McCallum, JCDL 2006
Citation counts from one topic to another.
Map producers and consumers
116Impact Diversity
Topic Diversity Entropy of the distribution of
citing topics
117Summary
- Joint inference needed for avoiding cascading
errors in information extraction and data mining. - Challenge making inference learning scale to
massive graphical models. - Markov-chain Monte Carlo
- Rexa New research paper search engine, mining
the interactions in our community.