Title: Latent Variable Models of Social Networks and Text
1Latent Variable Models of Social Networks and
Text
- Andrew McCallum
- Computer Science Department
- University of Massachusetts Amherst
Joint work with ?Xuerui Wang, Natasha
Mohanty, Andres Corrada, Chris Pal, Wei Li, David
Mimno and Gideon Mann.
2Social Network in an Email Dataset
3Outline
Social Network Analysis with Topic Models
- Role Discovery (Author-Recipient-Topic Model,
ART) - Group Discovery (Group-Topic Model, GT)
- Enhanced Topic Models
- Time Localized Topics (Topics-over-Time Model,
TOT) - Time Localized Groups (Groups-over-Time Model,
GOT) - Markov Dependencies in Topics (Topical N-Grams
Model, TNG) - Bibliometric Impact Transfer Measures using
Topics
Multi-Conditional Mixtures AAAI 2006
4Clustering words into topics withLatent
Dirichlet Allocation
Blei, Ng, Jordan 2003
GenerativeProcess
Mixed Membershipmodel
Example
For each document
70 Iraq war 30 US election
Sample a distributionover topics, ?
Multinomialover topics
For each word in doc
Iraq war
Sample a topic, z
Topic
Sample a wordfrom the topic, w
bombing
Word
Per-topicmultinomialover words
5Example topicsinduced from a large collection of
text
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED
SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PER
SON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECT
IONS CERTAIN
WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK
TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL
DIVE DOLPHIN UNDERWATER
Tennenbaum et al
6Example topicsinduced from a large collection of
text
JOB WORK JOBS CAREER EXPERIENCE EMPLOYMENT OPPORTU
NITIES WORKING TRAINING SKILLS CAREERS POSITIONS F
IND POSITION FIELD OCCUPATIONS REQUIRE OPPORTUNITY
EARN ABLE
SCIENCE STUDY SCIENTISTS SCIENTIFIC KNOWLEDGE WORK
RESEARCH CHEMISTRY TECHNOLOGY MANY MATHEMATICS BI
OLOGY FIELD PHYSICS LABORATORY STUDIES WORLD SCIEN
TIST STUDYING SCIENCES
BALL GAME TEAM FOOTBALL BASEBALL PLAYERS PLAY FIEL
D PLAYER BASKETBALL COACH PLAYED PLAYING HIT TENNI
S TEAMS GAMES SPORTS BAT TERRY
FIELD MAGNETIC MAGNET WIRE NEEDLE CURRENT COIL POL
ES IRON COMPASS LINES CORE ELECTRIC DIRECTION FORC
E MAGNETS BE MAGNETISM POLE INDUCED
STORY STORIES TELL CHARACTER CHARACTERS AUTHOR REA
D TOLD SETTING TALES PLOT TELLING SHORT FICTION AC
TION TRUE EVENTS TELLS TALE NOVEL
MIND WORLD DREAM DREAMS THOUGHT IMAGINATION MOMENT
THOUGHTS OWN REAL LIFE IMAGINE SENSE CONSCIOUSNES
S STRANGE FEELING WHOLE BEING MIGHT HOPE
DISEASE BACTERIA DISEASES GERMS FEVER CAUSE CAUSED
SPREAD VIRUSES INFECTION VIRUS MICROORGANISMS PER
SON INFECTIOUS COMMON CAUSING SMALLPOX BODY INFECT
IONS CERTAIN
WATER FISH SEA SWIM SWIMMING POOL LIKE SHELL SHARK
TANK SHELLS SHARKS DIVING DOLPHINS SWAM LONG SEAL
DIVE DOLPHIN UNDERWATER
Tennenbaum et al
7From LDA to Author-Recipient-Topic
McCallum et al 2005
(ART)
8Inference and Estimation
- Gibbs Sampling
- Easy to implement
- Reasonably fast
r
9Enron Email Corpus
- 250k email messages
- 23k people
Date Wed, 11 Apr 2001 065600 -0700 (PDT) From
debra.perlingiere_at_enron.com To
steve.hooser_at_enron.com Subject
Enron/TransAltaContract dated Jan 1, 2001 Please
see below. Katalin Kiss of TransAlta has
requested an electronic copy of our final draft?
Are you OK with this? If so, the only version I
have is the original draft without
revisions. DP Debra Perlingiere Enron North
America Corp. Legal Department 1400 Smith Street,
EB 3885 Houston, Texas 77002 dperlin_at_enron.com
10Topics, and prominent senders /
receiversdiscovered by ART
Topic names, by hand
11Topics, and prominent senders /
receiversdiscovered by ART
Beck Chief Operations Officer
Dasovich Government Relations
Executive Shapiro Vice President of
Regulatory Affairs Steffes Vice President of
Government Affairs
12Comparing Role Discovery
Traditional SNA
Author-Topic
ART
connection strength (A,B)
distribution over recipients
distribution over authored topics
distribution over authored topics
13Comparing Role Discovery Tracy Geaconne ? Dan
McCarty
Traditional SNA
Author-Topic
ART
Different roles
Different roles
Similar roles
Geaconne Secretary McCarty Vice President
14Comparing Role Discovery Lynn Blair ? Kimberly
Watson
Traditional SNA
Author-Topic
ART
Very different
Very similar
Different roles
Blair Gas pipeline logistics Watson
Pipeline facilities planning
15McCallum Email Corpus 2004
- January - October 2004
- 23k email messages
- 825 people
From kate_at_cs.umass.edu Subject NIPS and
.... Date June 14, 2004 22741 PM EDT To
mccallum_at_cs.umass.edu There is pertinent stuff
on the first yellow folder that is completed
either travel or other things, so please sign
that first folder anyway. Then, here is the
reminder of the things I'm still waiting
for NIPS registration receipt. CALO
registration receipt. Thanks, Kate
16Four most prominent topicsin discussions with
____?
17(No Transcript)
18Two most prominent topicsin discussions with
____?
19(No Transcript)
20Role-Author-Recipient-Topic Models
21Results with RARTPeople in Role 3 in
Academic Email
- olc lead Linux sysadmin
- gauthier sysadmin for CIIR group
- irsystem mailing list CIIR sysadmins
- system mailing list for dept. sysadmins
- allan Prof., chair of computing committee
- valerie second Linux sysadmin
- tech mailing list for dept. hardware
- steve head of dept. I.T. support
22Roles for allan (James Allan)
- Role 3 I.T. support
- Role 2 Natural Language researcher
Roles for pereira (Fernando Pereira)
- Role 2 Natural Language researcher
- Role 4 SRI CALO project participant
- Role 6 Grant proposal writer
- Role 10 Grant proposal coordinator
- Role 8 Guests at McCallums house
23ART Roles but not Groups
Traditional SNA
Author-Topic
ART
Not
Not
Block structured
Enron TransWestern Division
24Outline
Social Network Analysis with Topic Models
a
- Role Discovery (Author-Recipient-Topic Model,
ART) - Group Discovery (Group-Topic Model, GT)
- Enhanced Topic Models
- Time Localized Topics (Topics-over-Time Model,
TOT) - Time Localized Groups (Groups-over-Time Model,
GOT) - Markov Dependencies in Topics (Topical N-Grams
Model, TNG) - Bibliometric Impact Transfer Measures using
Topics
Multi-Conditional Mixtures AAAI 2006
25Groups and Topics
- Input
- Observed relations between people
- Attributes on those relations (text, or
categorical) - Output
- Attributes clustered into topics
- Groups of people---varying depending on topic
26Discovering Groups from Observed Set of Relations
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
Admiration relations among six high school
students.
27Adjacency Matrix Representing Relations
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
A B C D E F
G1 G2 G1 G2 G3 G3
G1
G2
G1
G2
G3
G3
A C B D E F
G1 G1 G2 G2 G3 G3
G1
G1
G2
G2
G3
G3
A B C D E F
A
B
C
D
E
F
A
B
C
D
E
F
A
C
B
D
E
F
28Group Model Partitioning Entities into Groups
Stochastic Blockstructures for Relations Nowicki,
Snijders 2001
Beta
Dirichlet
Multinomial
S number of entities G number of groups
Binomial
Enhanced with arbitrary number of groups in
Kemp, Griffiths, Tenenbaum 2004
29Two Relations with Different Attributes
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
Social Admiration Soci(A, B) Soci(A, D) Soci(A,
F) Soci(B, A) Soci(B, C) Soci(B, E) Soci(C, B)
Soci(C, D) Soci(C, F) Soci(D, A) Soci(D, C)
Soci(D, E) Soci(E, B) Soci(E, D) Soci(E,
F) Soci(F, A) Soci(F, C) Soci(F, E)
A C B D E F
G1 G1 G2 G2 G3 G3
G1
G1
G2
G2
G3
G3
A C E B D F
G1 G1 G1 G2 G2 G2
G1
G1
G1
G2
G2
G2
A
C
E
B
D
F
A
C
B
D
E
F
30The Group-Topic Model Discovering Groups and
Topics Simultaneously
Wang, Mohanty, McCallum 2006
Beta
Uniform
Dirichlet
Multinomial
Dirichlet
Binomial
Multinomial
31Inference and Estimation
- Gibbs Sampling
- Many r.v.s can be integrated out
- Easy to implement
- Reasonably fast
We assume the relationship is symmetric.
32Dataset 1U.S. Senate
- 16 years of voting records in the US Senate (1989
2005) - a Senator may respond Yea or Nay to a resolution
- 3423 resolutions with text attributes (index
terms) - 191 Senators in total across 16 years
S.543 Title An Act to reform Federal deposit
insurance, protect the deposit insurance funds,
recapitalize the Bank Insurance Fund, improve
supervision and regulation of insured depository
institutions, and for other purposes. Sponsor
Sen Riegle, Donald W., Jr. MI (introduced
3/5/1991) Cosponsors (2) Latest Major Action
12/19/1991 Became Public Law No 102-242. Index
terms Banks and banking Accounting
Administrative fees Cost control Credit Deposit
insurance Depressed areas and other 110 terms
Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen
(D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea
Bradley (D-NJ), Nay Conrad (D-ND), Nay
33Topics Discovered (U.S. Senate)
Education Energy Military Misc. Economic
education energy government federal
school power military labor
aid water foreign insurance
children nuclear tax aid
drug gas congress tax
students petrol aid business
elementary research law employee
prevention pollution policy care
Mixture of Unigrams
Education Domestic Foreign Economic Social Security Medicare
education foreign labor social
school trade insurance security
federal chemicals tax insurance
aid tariff congress medical
government congress income care
tax drugs minimum medicare
energy communicable wage disability
research diseases business assistance
Group-Topic Model
34Groups Discovered (US Senate)
Groups from topic Education Domestic
35Senators Who Change Coalition the most Dependent
on Topic
e.g. Senator Shelby (D-AL) votes with the
Republicans on Economic with the Democrats on
Education Domestic with a small group of
maverick Republicans on Social Security Medicaid
36Dataset 2The UN General Assembly
- Voting records of the UN General Assembly (1990 -
2003) - A country may choose to vote Yes, No or Abstain
- 931 resolutions with text attributes (titles)
- 192 countries in total
- Also experiments later with resolutions from
1960-2003
Vote on Permanent Sovereignty of Palestinian
People, 87th plenary meeting The draft
resolution on permanent sovereignty of the
Palestinian people in the occupied Palestinian
territory, including Jerusalem, and of the Arab
population in the occupied Syrian Golan over
their natural resources (document A/54/591) was
adopted by a recorded vote of 145 in favour to 3
against with 6 abstentions In favour
Afghanistan, Argentina, Belgium, Brazil, Canada,
China, France, Germany, India, Japan, Mexico,
Netherlands, New Zealand, Pakistan, Panama,
Russian Federation, South Africa, Spain, Turkey,
and other 126 countries. Against Israel,
Marshall Islands, United States. Abstain
Australia, Cameroon, Georgia, Kazakhstan,
Uzbekistan, Zambia.
37Topics Discovered (UN)
Everything Nuclear Human Rights Security in Middle East
Everything Nuclear Security in Middle East
nuclear rights occupied
weapons human israel
use palestine syria
implementation situation security
countries israel calls
Mixture of Unigrams
Nuclear Non-proliferation Nuclear Arms Race Human Rights
nuclear nuclear rights
states arms human
united prevention palestine
weapons race occupied
nations space israel
Group-TopicModel
38GroupsDiscovered(UN)
The countries list for each group are ordered by
their 2005 GDP (PPP) and only 5 countries are
shown in groups that have more than 5 members.
39Do We Get Better Groups with the GT Model?
Baseline Model GT Model
- Cluster bills into topics using mixture of
unigrams - Apply group model on topic-specific subsets of
bills.
- Jointly cluster topic and groups at the same time
using the GT model.
Datasets Avg. AI for Baseline Avg. AI for GT p-value
Senate 0.8198 0.8294 lt.01
UN 0.8548 0.8664 lt.01
Agreement Index (AI) measures group cohesion.
Higher, better.
40Groups and Topics, Trends over Time (UN)
41Outline
Social Network Analysis with Topic Models
a
- Role Discovery (Author-Recipient-Topic Model,
ART) - Group Discovery (Group-Topic Model, GT)
- Enhanced Topic Models
- Time Localized Topics (Topics-over-Time Model,
TOT) - Time Localized Groups (Groups-over-Time Model,
GOT) - Markov Dependencies in Topics (Topical N-Grams
Model, TNG) - Bibliometric Impact Transfer Measures using
Topics
a
Multi-Conditional Mixtures AAAI 2006
42Want to Model Trends over Time
- Is prevalence of topic growing or waning?
- Pattern appears only briefly
- Capture its statistics in focused way
- Dont confuse it with patterns elsewhere in time
- How do roles, groups, influence shift over time?
43Topics over Time (TOT)
Wang, McCallum, KDD 2006
?
Dirichlet
?
multinomialover topics
Uniformprior
Dirichlet prior
topicindex
z
?
?
timestamp
word
w
t
?
?
T
T
Nd
Betaover time
Multinomialover words
D
44State of the Union Address
208 Addresses delivered between January 8, 1790
and January 29, 2002.
- To increase the number of documents, we split the
addresses into paragraphs and treated them as
documents. One-line paragraphs were excluded.
Stopping was applied. - 17156 documents
- 21534 words
- 669,425 tokens
Our scheme of taxation, by means of which this
needless surplus is taken from the people and put
into the public Treasury, consists of a tariff
or duty levied upon importations from abroad and
internal-revenue taxes levied upon the
consumption of tobacco and spirituous and malt
liquors. It must be conceded that none of the
things subjected to internal-revenue
taxation are, strictly speaking, necessaries.
There appears to be no just complaint of this
taxation by the consumers of these articles, and
there seems to be nothing so well able to bear
the burden without hardship to any portion of the
people.
1910
45Comparing TOT with LDA
46Sample Topic Cold War
world nations united states peace free economic mi
litary soviet international security strength defe
nse freedom europe force peoples efforts aggressio
n today
47ComparingTOTagainst LDA
48TOT on 17 years of NIPS proceedings
49Topic Distributions Conditioned on Time
topic mass (in vertical height)
time
50TOT on 17 years of NIPS proceedings
TOT
LDA
51TOT versusLDAon my email
52TOT improves ability to Predict Time
Predicting the year of a State-of-the-Union
address.
L1 distance between predicted year and actual
year.
53Discovering Group StructureTrends over Time
Group Model without Time
Group Model with Time
per groupbeta overtime
G
multinomialdistributionover groups
groupid
time- stamp
observedrelation
per group-pairbinomial overrelation absent /
present
54Outline
Social Network Analysis with Topic Models
a
- Role Discovery (Author-Recipient-Topic Model,
ART) - Group Discovery (Group-Topic Model, GT)
- Enhanced Topic Models
- Time Localized Topics (Topics-over-Time Model,
TOT) - Time Localized Groups (Groups-over-Time Model,
GOT) - Markov Dependencies in Topics (Topical N-Grams
Model, TNG) - Bibliometric Impact Transfer Measures using
Topics
a
a
a
Multi-Conditional Mixtures AAAI 2006
55Topics Modeling Phrases
- Topics based only on unigrams often difficult to
interpret - Topic discovery itself is confused because
important meaning / distinctions carried by
phrases.
56Topic Interpretability
LDA algorithms algorithm genetic problems efficie
nt
Topical N-grams genetic algorithms genetic
algorithm evolutionary computation evolutionary
algorithms fitness function
57Topical N-gram Model
Wang, McCallum 2005
?
?
z1
z2
z3
z4
. . .
topic
uni- / bi-gramstatus
y1
y2
y3
y4
. . .
w1
w2
w3
w4
. . .
words
D
?1
?2
?
?1
?
?2
W
W
bi-
uni-
T
T
58Features of Topical N-Grams model
- Easily trained by Gibbs sampling
- Can run efficiently on millions of words
- Topic-specific phrase discovery
- white house has special meaning as a phrasein
the politics topic, - ... but not in the real estate topic.
59Topic Comparison
LDA
Topical N-grams (2)
Topical N-grams (1)
policy action states actions function reward contr
ol agent q-learning optimal goal learning space st
ep environment system problem steps sutton policie
s
learning optimal reinforcement state problems poli
cy dynamic action programming actions function mar
kov methods decision rl continuous spaces step pol
icies planning
reinforcement learning optimal policy dynamic
programming optimal control function
approximator prioritized sweeping finite-state
controller learning system reinforcement learning
rl function approximators markov decision
problems markov decision processes local
search state-action pair markov decision
process belief states stochastic policy action
selection upright position reinforcement learning
methods
60Topic Comparison
LDA
Topical N-grams (2)
Topical N-grams (1)
motion response direction cells stimulus figure co
ntrast velocity model responses stimuli moving cel
l intensity population image center tuning complex
directions
motion visual field position figure direction fiel
ds eye location retina receptive velocity vision m
oving system flow edge center light local
receptive field spatial frequency temporal
frequency visual motion motion energy tuning
curves horizontal cells motion detection preferred
direction visual processing area mt visual
cortex light intensity directional
selectivity high contrast motion
detectors spatial phase moving stimuli decision
strategy visual stimuli
61Topic Comparison
LDA
Topical N-grams (2)
Topical N-grams (1)
speech word training system recognition hmm speake
r performance phoneme acoustic words context syste
ms frame trained sequence phonetic speakers mlp hy
brid
word system recognition hmm speech training perfor
mance phoneme words context systems frame trained
speaker sequence speakers mlp frames segmentation
models
speech recognition training data neural
network error rates neural net hidden markov
model feature vectors continuous speech training
procedure continuous speech recognition gamma
filter hidden control speech production neural
nets input representation output layers training
algorithm test set speech frames speaker dependent
62Outline
Social Network Analysis with Topic Models
a
- Role Discovery (Author-Recipient-Topic Model,
ART) - Group Discovery (Group-Topic Model, GT)
- Enhanced Topic Models
- Time Localized Topics (Topics-over-Time Model,
TOT) - Time Localized Groups (Groups-over-Time Model,
GOT) - Markov Dependencies in Topics (Topical N-Grams
Model, TNG) - Bibliometric Impact Transfer Measures using
Topics
a
a
a
a
Multi-Conditional Mixtures AAAI 2006
63Social Networks in Research Literature
- Better understand structure of our own research
area. - Structure helps us learn a new field.
- Aid collaboration
- Map how ideas travel through social networks of
researchers. - Aids for hiring and finding reviewers!
64Traditional Bibliometrics
- Analyses a small amount of data(e.g. 19 articles
from a single issue of a journal) - Uses journal as a proxy for research
topic(but there is no journal for information
extraction) - Uses impact measures almost exclusively based on
simple citation counts.
How can we use topic models to create new,
interesting impact measures? Can create a social
network of scientific sub-fields?
65Our Data
- Over 1.6 million research papers, gathered as
part of Rexa.info portal. - Cross linked references / citations.
66Previous Systems
67(No Transcript)
68Previous Systems
Cites
Research Paper
69More Entities and Relations
Expertise
Cites
Research Paper
Person
Grant
University
Venue
Groups
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74(No Transcript)
75(No Transcript)
76(No Transcript)
77(No Transcript)
78(No Transcript)
79(No Transcript)
80(No Transcript)
81Finding Topics with TNG
Traditional unigram LDArun on 1.6 milliontitles
/ abstracts (200 topics) ...select 300k
papers onML, NLP, robotics, vision... Find 200
TNG topics among those papers.
82Topical Bibliometric Impact Measures
Mann, Mimno, McCallum, 2006
- Topical Citation Counts
- Topical Impact Factors
- Topical Longevity
- Topical Precedence
- Topical Diversity
- Topical Transfer
83Topical Diversity
Can also be measured on particular papers...
84Topical Diversity
Entropy of the topic distribution among papers
that cite this paper (this topic).
LowDiversity
HighDiversity
85Topical Transfer
Transfer from Digital Libraries to other topics
Other topic Cits Paper Title
Web Pages 31 Trawling the Web for Emerging Cyber-Communities, Kumar, Raghavan,... 1999.
Computer Vision 14 On being Undigital with digital cameras extending the dynamic...
Video 12 Lessons learned from the creation and deployment of a terabyte digital video
Graphs 12 Trawling the Web for Emerging Cyber-Communities
Web Pages 11 WebBase a repository of Web pages
86Topical Transfer
Citation counts from one topic to another.
Map producers and consumers
87(No Transcript)
88(No Transcript)
89(No Transcript)
90(No Transcript)
91(No Transcript)
92(No Transcript)
93(No Transcript)
94Topical Transfer Through Time
- Can we predict which research topicswill be
hot at ICML next year? - ...based on
- the hot topics in neighboring venues last year
- learned neighborhood distances for venue pairs
95How do Ideas Progress Through Social Networks?
Hypothetical Example
ADA Boost
SIGIR(Info. Retrieval)
COLT
ICML
ICCV(Vision)
ACL(NLP)
96How do Ideas Progress Through Social Networks?
Hypothetical Example
ADA Boost
SIGIR(Info. Retrieval)
COLT
ICML
ICCV(Vision)
ACL(NLP)
97How do Ideas Progress Through Social Networks?
Hypothetical Example
ADA Boost
SIGIR(Info. Retrieval)
COLT
ICML
ICCV(Vision)
ACL(NLP)
98How do ConferencesInfluence Each Other?
- Run an LDA on research papers.
- For each year, create an agglomerated topic
distribution for a particular conference - Model the topic distribution of a conference by
the topic distributions of related conferences
99Topic Prediction Models
Static Model
Transfer Model
Linear Regression and Ridge Regression Used for
Coefficient Training.
100Preliminary Results
Mean Squared Prediction Error
(Smaller Is better)
TransferModel
Venues used for prediction
Transfer Model with Ridge Regression is a good
Predictor
101Estimated Neighborhood Distances
Transfer into NIPS, 1988-1989
ML .079 Neural Computation .023 UAI
-0.0035 PAMI .0998 Theoretical
CS .0955 AI .032 AAAI .082
102Outline
Social Network Analysis with Topic Models
a
- Role Discovery (Author-Recipient-Topic Model,
ART) - Group Discovery (Group-Topic Model, GT)
- Enhanced Topic Models
- Time Localized Topics (Topics-over-Time Model,
TOT) - Time Localized Groups (Groups-over-Time Model,
GOT) - Markov Dependencies in Topics (Topical N-Grams
Model, TNG) - Bibliometric Impact Transfer Measures using
Topics
a
a
a
a
a
a
Multi-Conditional Mixtures AAAI 2006
103Outline
Social Network Analysis with Topic Models
a
- Role Discovery (Author-Recipient-Topic Model,
ART) - Group Discovery (Group-Topic Model, GT)
- Enhanced Topic Models
- Time Localized Topics (Topics-over-Time Model,
TOT) - Time Localized Groups (Groups-over-Time Model,
GOT) - Markov Dependencies in Topics (Topical N-Grams
Model, TNG) - Bibliometric Impact Transfer Measures using
Topics
a
a
a
a
a
Multi-Conditional Mixtures AAAI 2006
104Want a topic model with the advantages of CRFs
- Use arbitrary, overlapping features of the input.
- Undirected graphical model, so we dont have to
think about avoiding cycles. - Integrate naturally with our other CRF
components. - Train discriminatively
- Natural semi-supervised training
What does this mean? Topic models are
unsupervised!
105Multi-Conditional MixturesLatent Variable
Models fit by Multi-way Conditional Probability
McCallum, Wang, Pal, 2005, McCallum, Pal,
Druck, Wang, 2006
- For clustering structured data,ala Latent
Dirichlet Allocation its successors - But an undirected model,like the Harmonium
Welling, Rosen-Zvi, Hinton, 2005 - But trained by a multi-conditional objective
O P(AB,C) P(BA,C) P(CA,B)e.g. A,B,C are
different modalities
106Objective Functions for Parameter Estimation
Traditional
New, multi-conditional
107Multi-Conditional Learning (Regularization)
McCallum, Pal, Wang, 2006
108Multi-Conditional Mixtures
109Predictive Random Fieldsmixture of Gaussians on
synthetic data
McCallum, Wang, Pal, 2005
Data, classify by color
Generatively trained
Multi-Conditional
Conditionally-trained Jebara 1998
110Multi-Conditional Mixturesvs. Harmoniunon
document retrieval task
McCallum, Wang, Pal, 2005
Multi-Conditional,multi-way conditionally trained
Conditionally-trained,to predict class labels
Harmonium, joint,with class labels and words
Harmonium, joint with words, no labels
111Multi-Conditional Topics
Strong positive and negative indicators
112Outline
Social Network Analysis with Topic Models
- Role Discovery (Author-Recipient-Topic Model,
ART) - Group Discovery (Group-Topic Model, GT)
- Enhanced Topic Models
- Correlations among Topics (Pachinko Allocation,
PAM) - Time Localized Topics (Topics-over-Time Model,
TOT) - Markov Dependencies in Topics (Topical N-Grams
Model, TNG) - Bibliometric Impact Measures enabled by Topics
Multi-Conditional Mixtures
113Summary
114Topical Precedence
Early-ness
Within a topic, what are the earliest papers
that received more than n citations?
- Information Retrieval
- On Relevance, Probabilistic Indexing and
Information Retrieval, Kuhns and Maron (1960) - Expected Search Length A Single Measure of
Retrieval Effectiveness Based on the Weak
Ordering Action of Retrieval Systems, Cooper
(1968) - Relevance feedback in information retrieval,
Rocchio (1971) - Relevance feedback and the optimization of
retrieval effectiveness, Salton (1971) - New experiments in relevance feedback, Ide
(1971) - Automatic Indexing of a Sound Database Using
Self-organizing Neural Nets, Feiten and Gunzel
(1982)
115Topical Precedence
Early-ness
Within a topic, what are the earliest papers
that received more than n citations?
- Speech Recognition
- Some experiments on the recognition of speech,
with one and two ears, E. Colin Cherry (1953) - Spectrographic study of vowel reduction, B.
Lindblom (1963) - Automatic Lipreading to enhance speech
recognition, Eric D. Petajan (1965) - Effectiveness of linear prediction
characteristics of the speech wave for..., B.
Atal (1974) - Automatic Recognition of Speakers from Their
Voices, B. Atal (1976)
116Summary