Title: Discovering Latent Structure in Multiple Modalities
1Discovering Latent Structure inMultiple
Modalities
- Andrew McCallum
- Computer Science Department
- University of Massachusetts Amherst
Joint work with ?Xuerui Wang, Natasha
Mohanty, Andres Corrada, Chris Pal, Wei Li, Greg
Druck.
2Social Network in an Email Dataset
3Social Network in Political Data
Vote similarity inU.S. Senate
Jakulin Buntine 2005
4Inference and Estimation
- Gibbs Sampling
- Easy to implement
- Reasonably fast
r
5Enron Email Corpus
- 250k email messages
- 23k people
Date Wed, 11 Apr 2001 065600 -0700 (PDT) From
debra.perlingiere_at_enron.com To
steve.hooser_at_enron.com Subject
Enron/TransAltaContract dated Jan 1, 2001 Please
see below. Katalin Kiss of TransAlta has
requested an electronic copy of our final draft?
Are you OK with this? If so, the only version I
have is the original draft without
revisions. DP Debra Perlingiere Enron North
America Corp. Legal Department 1400 Smith Street,
EB 3885 Houston, Texas 77002 dperlin_at_enron.com
6Topics, and prominent senders /
receiversdiscovered by ART
Topic names, by hand
7ART Roles but not Groups
Traditional SNA
Author-Topic
ART
Not
Not
Block structured
Enron TransWestern Division
8Outline
Social Network Analysis with Topic Models
a
- Role Discovery (Author-Recipient-Topic Model,
ART) - Group Discovery (Group-Topic Model, GT)
- Enhanced Topic Models
- Correlations among Topics (Pachinko Allocation,
PAM) - Time Localized Topics (Topics-over-Time Model,
TOT) - Markov Dependencies in Topics (Topical N-Grams
Model, TNG) - Bibliometric Impact Measures enabled by Topics
Multi-Conditional Mixtures
9Groups and Topics
- Input
- Observed relations between people
- Attributes on those relations (text, or
categorical) - Output
- Attributes clustered into topics
- Groups of people---varying depending on topic
10Discovering Groups from Observed Set of Relations
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
Admiration relations among six high school
students.
11Adjacency Matrix Representing Relations
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
A B C D E F
G1 G2 G1 G2 G3 G3
G1
G2
G1
G2
G3
G3
A C B D E F
G1 G1 G2 G2 G3 G3
G1
G1
G2
G2
G3
G3
A B C D E F
A
B
C
D
E
F
A
B
C
D
E
F
A
C
B
D
E
F
12Group Model Partitioning Entities into Groups
Stochastic Blockstructures for Relations Nowicki,
Snijders 2001
Beta
Dirichlet
Multinomial
S number of entities G number of groups
Binomial
Enhanced with arbitrary number of groups in
Kemp, Griffiths, Tenenbaum 2004
13Two Relations with Different Attributes
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
Social Admiration Soci(A, B) Soci(A, D) Soci(A,
F) Soci(B, A) Soci(B, C) Soci(B, E) Soci(C, B)
Soci(C, D) Soci(C, F) Soci(D, A) Soci(D, C)
Soci(D, E) Soci(E, B) Soci(E, D) Soci(E,
F) Soci(F, A) Soci(F, C) Soci(F, E)
A C B D E F
G1 G1 G2 G2 G3 G3
G1
G1
G2
G2
G3
G3
A C E B D F
G1 G1 G1 G2 G2 G2
G1
G1
G1
G2
G2
G2
A
C
E
B
D
F
A
C
B
D
E
F
14The Group-Topic Model Discovering Groups and
Topics Simultaneously
Wang, Mohanty, McCallum 2006
Beta
Uniform
Dirichlet
Multinomial
Dirichlet
Binomial
Multinomial
15Inference and Estimation
- Gibbs Sampling
- Many r.v.s can be integrated out
- Easy to implement
- Reasonably fast
We assume the relationship is symmetric.
16Dataset 1U.S. Senate
- 16 years of voting records in the US Senate (1989
2005) - a Senator may respond Yea or Nay to a resolution
- 3423 resolutions with text attributes (index
terms) - 191 Senators in total across 16 years
S.543 Title An Act to reform Federal deposit
insurance, protect the deposit insurance funds,
recapitalize the Bank Insurance Fund, improve
supervision and regulation of insured depository
institutions, and for other purposes. Sponsor
Sen Riegle, Donald W., Jr. MI (introduced
3/5/1991) Cosponsors (2) Latest Major Action
12/19/1991 Became Public Law No 102-242. Index
terms Banks and banking Accounting
Administrative fees Cost control Credit Deposit
insurance Depressed areas and other 110 terms
Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen
(D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea
Bradley (D-NJ), Nay Conrad (D-ND), Nay
17Topics Discovered (U.S. Senate)
Education Energy Military Misc. Economic
education energy government federal
school power military labor
aid water foreign insurance
children nuclear tax aid
drug gas congress tax
students petrol aid business
elementary research law employee
prevention pollution policy care
Mixture of Unigrams
Education Domestic Foreign Economic Social Security Medicare
education foreign labor social
school trade insurance security
federal chemicals tax insurance
aid tariff congress medical
government congress income care
tax drugs minimum medicare
energy communicable wage disability
research diseases business assistance
Group-Topic Model
18Groups Discovered (US Senate)
Groups from topic Education Domestic
19Senators Who Change Coalition the most Dependent
on Topic
e.g. Senator Shelby (D-AL) votes with the
Republicans on Economic with the Democrats on
Education Domestic with a small group of
maverick Republicans on Social Security Medicaid
20Dataset 2The UN General Assembly
- Voting records of the UN General Assembly (1990 -
2003) - A country may choose to vote Yes, No or Abstain
- 931 resolutions with text attributes (titles)
- 192 countries in total
- Also experiments later with resolutions from
1960-2003
Vote on Permanent Sovereignty of Palestinian
People, 87th plenary meeting The draft
resolution on permanent sovereignty of the
Palestinian people in the occupied Palestinian
territory, including Jerusalem, and of the Arab
population in the occupied Syrian Golan over
their natural resources (document A/54/591) was
adopted by a recorded vote of 145 in favour to 3
against with 6 abstentions In favour
Afghanistan, Argentina, Belgium, Brazil, Canada,
China, France, Germany, India, Japan, Mexico,
Netherlands, New Zealand, Pakistan, Panama,
Russian Federation, South Africa, Spain, Turkey,
and other 126 countries. Against Israel,
Marshall Islands, United States. Abstain
Australia, Cameroon, Georgia, Kazakhstan,
Uzbekistan, Zambia.
21Topics Discovered (UN)
Everything Nuclear Human Rights Security in Middle East
Everything Nuclear Security in Middle East
nuclear rights occupied
weapons human israel
use palestine syria
implementation situation security
countries israel calls
Mixture of Unigrams
Nuclear Non-proliferation Nuclear Arms Race Human Rights
nuclear nuclear rights
states arms human
united prevention palestine
weapons race occupied
nations space israel
Group-TopicModel
22GroupsDiscovered(UN)
The countries list for each group are ordered by
their 2005 GDP (PPP) and only 5 countries are
shown in groups that have more than 5 members.
23Outline
Discovering Latent Structure in Multiple
Modalities
a
- Groups Text (Group-Topic Model, GT)
- Nested Correlations (Pachinko Allocation, PAM)
- Time Text (Topics-over-Time Model, TOT)
- Time Text with Nested Correlations (PAM-TOT)
- Multi-Conditional Mixtures
24Latent Dirichlet Allocation
Blei, Ng, Jordan, 2003
a
N
?
n
z
ß
T
w
f
25Correlated Topic Model
Blei, Lafferty, 2005
?
?
N
logistic normal
?
n
z
ß
T
w
f
Square matrix of pairwise correlations.
26Topic Correlation Representation
7 topics A, B, C, D, E, F, G Correlations
A, B, C, D, E and C, D, E, F, G
CTM
B
C
D
E
F
G
A
B
C
D
E
F
27Pachinko Machine
28Pachinko Allocation Model
Thanks to Michael Jordan for suggesting the name
Li, McCallum, 2005, 2006
?11
Given directed acyclic graph (DAG) at each
interior node a Dirichlet over its children
and words at leaves
Model structure, not the graphical model
?22
?21
For each document Sample a multinomial from
each Dirichlet
?31
?33
?32
For each word in this document Starting from
the root, sample a child from successive
nodes, down to a leaf. Generate the word at the
leaf
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
Like a Polya tree, but DAG shaped, with arbitrary
number of children.
29Pachinko Allocation Model
Li, McCallum, 2005
?11
- DAG may have arbitrary structure
- arbitrary depth
- any number of children per node
- sparse connectivity
- edges may skip layers
Model structure, not the graphical model
?22
?21
?31
?33
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
30Pachinko Allocation Model
Li, McCallum, 2005
?11
Model structure, not the graphical model
?22
?21
Distributions over distributions over topics...
Distributions over topicsmixtures, representing
topic correlations
?31
?33
?32
?41
?42
?43
?44
?45
Distributions over words (like LDA topics)
word1
word2
word3
word4
word5
word6
word7
word8
Some interior nodes could contain one
multinomial, used for all documents. (i.e. a very
peaked Dirichlet)
31Pachinko Allocation Model
Li, McCallum, 2005
?11
Estimate all these Dirichlets from
data. Estimate model structure from data.
(number of nodes, and connectivity)
Model structure, not the graphical model
?22
?21
?31
?33
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
32Pachinko Allocation Special Cases
Latent Dirichlet Allocation
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
33Pachinko Allocation Special Cases
Hierarchical Latent Dirichlet Allocation (HLDA)
Very low variance Dirichlet at root
?11
Each leaf of the HLDA topic hier. has a distr.
over nodes on path to the root.
?22
?23
?24
?21
?32
?33
?31
?34
TheHLDAhier.
?41
?42
?51
word1
word2
word3
word4
word5
word6
word7
word8
34Pachinko Allocation on a Topic Hierarchy
Combining best of HLDA and Pachinko Allocation
?00
ThePAMDAG.
?11
?12
...representingcorrelations amongtopic leaves.
?22
?23
?24
?21
?32
?33
?31
?34
TheHLDAhier.
?41
?42
?51
word1
word2
word3
word4
word5
word6
word7
word8
35Pachinko Allocation Model
... with two layers, no skipping
layers,fully-connected from one layer to the
next.
?11
?21
?23
?22
super-topics
sub-topics
?31
?32
?33
?34
?35
fixed multinomials
word1
word2
word3
word4
word5
word6
word7
word8
Another special case would select only one
super-topic per document.
36Graphical Models
Four-level PAM (with fixed multinomials for
sub-topics)
LDA
T
a1
a
a2
N
N
?2
?
?3
n
n
z2
z3
z
ß
ß
T
T
w
f
w
f
37Inference Gibbs Sampling
T
a2
a3
N
?2
?3
n
Jointly sampled
z2
z3
ß
T
w
f
Dirichlet parameters a are estimated with moment
matching
38Experimental Results
- Topic clarity by human judgement
- Likelihood on held-out data
- Document classification
39Datasets
- Rexa (http//rexa.info/)
- 4000 documents, 278438 word tokens and 25597
unique words. - NIPS
- 1647 documents, 114142 word tokens and 11708
unique words. - 20 newsgroup comp5 subset
- 4836 documents, 35567 unique words.
40Topic Correlations
41Example Topics
images, motion eyes
motion ( some generic)
motion
eyes
images
LDA 100 motion detection field optical flow sensit
ive moving functional detect contrast light dimens
ional intensity computer mt measures occlusion tem
poral edge real
PAM 100 motion video surface surfaces figure scene
camera noisy sequence activation generated analy
tical pixels measurements assigne advance lated sh
own closed perceptual
LDA 20 visual model motion field object image ima
ges objects fields receptive eye position spatial
direction target vision multiple figure orientatio
n location
PAM 100 eye head vor vestibulo oculomotor vestibul
ar vary reflex vi pan rapid semicircular canals re
sponds streams cholinergic rotation topographicall
y detectors ning
PAM 100 image digit faces pixel surface interpolat
ion scene people viewing neighboring sensors patch
es manifold dataset magnitude transparency rich dy
namical amounts tor
42Blind Topic Evaluation
- Randomly select 25 similar pairs of topics
generated from PAM and LDA - 5 people
- Each asked to select the topic in each pair that
you find more semantically coherent.
Topic counts
LDA PAM
5 votes 0 5
gt 4 votes 3 8
gt 3 votes 9 16
43Examples
PAM LDA
control systems robot adaptive environment goal state controller control systems based adaptive direct con controller change
PAM LDA
motion image detection images scene vision texture segmentation image motion images multiple local generated noisy optical
5 votes 0 vote 4
votes 1 vote
44Examples
PAM LDA
algorithm learning algorithms gradient convergence function stochastic weight algorithm algorithms gradient convergence stochastic line descent converge
PAM LDA
signals source separation eeg sources blind single event signal signals single time low source temporal processing
4 votes 1 vote 1
vote 4 votes
45Likelihood Comparison
- Dataset NIPS
- Two sets of experiments
- Varying number of topics
- Different proportions of training data
46Likelihood Comparison
47Likelihood Comparison
- Different proportions of training data
48Likelihood Estimation
- VariationalPerform inference in a simpler model
- (Gibbs sampling) Harmonic mean
- Approximate the marginal probability with the
harmonic mean of conditional probabilities - (Gibbs sampling) Empirical likelihood
- Estimate the distribution based on empirical
samples
Diggle Gratton, 1984
49Empirical Likelihood Estimation
50Document Classification
- 20 newsgroup comp5 subset
- 5-way classification (accuracy in )
class docs LDA PAM
graphics 243 83.95 86.83
os 239 81.59 84.10
pc 245 83.67 88.16
mac 239 86.61 89.54
Windows.x 243 88.07 92.20
total 1209 84.70 87.34
Statistically significant with a p-value lt 0.05.
51Outline
Discovering Latent Structure in Multiple
Modalities
a
- Groups Text (Group-Topic Model, GT)
- Nested Correlations (Pachinko Allocation, PAM)
- Time Text (Topics-over-Time Model, TOT)
- Time Text with Nested Correlations (PAM-TOT)
- Multi-Conditional Mixtures
a
52Want to Model Trends over Time
- Is prevalence of topic growing or waning?
- Pattern appears only briefly
- Capture its statistics in focused way
- Dont confuse it with patterns elsewhere in time
- How do roles, groups, influence shift over time?
53Topics over Time (TOT)
Wang, McCallum 2006
?
Dirichlet
?
multinomialover topics
Uniformprior
Dirichlet prior
topicindex
z
?
?
timestamp
word
w
t
?
?
T
T
Nd
Betaover time
Multinomialover words
D
54Attributes of this Approach to Modeling Time
- Not a Markov model
- No state transitions, or Markov assumption
- Continuous Time
- Time not discretized
- Easily incorporated into other more complex
models with additional modalities.
55State of the Union Address
208 Addresses delivered between January 8, 1790
and January 29, 2002.
- To increase the number of documents, we split the
addresses into paragraphs and treated them as
documents. One-line paragraphs were excluded.
Stopping was applied. - 17156 documents
- 21534 words
- 669,425 tokens
Our scheme of taxation, by means of which this
needless surplus is taken from the people and put
into the public Treasury, consists of a tariff
or duty levied upon importations from abroad and
internal-revenue taxes levied upon the
consumption of tobacco and spirituous and malt
liquors. It must be conceded that none of the
things subjected to internal-revenue
taxation are, strictly speaking, necessaries.
There appears to be no just complaint of this
taxation by the consumers of these articles, and
there seems to be nothing so well able to bear
the burden without hardship to any portion of the
people.
1910
56ComparingTOTagainst LDA
57TOT on 17 years of NIPS proceedings
58Topic Distributions Conditioned on Time
topic mass (in vertical height)
time
59TOT on 17 years of NIPS proceedings
TOT
LDA
60TOT versusLDAon my email
61TOT improves ability to Predict Time
Predicting the year of a State-of-the-Union
address.
L1 distance between predicted year and actual
year.
62Outline
Discovering Latent Structure in Multiple
Modalities
a
- Groups Text (Group-Topic Model, GT)
- Nested Correlations (Pachinko Allocation, PAM)
- Time Text (Topics-over-Time Model, TOT)
- Time Text with Nested Correlations (PAM-TOT)
- Multi-Conditional Mixtures
a
a
63PAM Over Time (PAMTOT)
T
a1
a2
N
?2
?3
n
z2
z3
?
ß
T
w
f
t3
t2
64Experimental Results
- Dataset Rexa subset
- 4454 documents between years 1991 and 2005
- 372936 word tokens
- 21748 unique words
- Topic Examples
- Predict Time
65Topic Examples (1)
PAMTOT
PAM
66Topic Examples (2)
PAMTOT
PAM
67Topic Examples (3)
PAMTOT
PAM
68Predict Time with PAMTOT
L1 Error E(L1) Accuracy
PAMTOT 1.56 1.57 0.29
PAM 5.34 5.30 0.10
70 error reduction
L1 Error the difference between predicted and
true years E(L1) average difference between all
years and true year using p(t.) from
the model.
69Outline
Discovering Latent Structure in Multiple
Modalities
a
- Groups Text (Group-Topic Model, GT)
- Nested Correlations (Pachinko Allocation, PAM)
- Time Text (Topics-over-Time Model, TOT)
- Time Text with Nested Correlations (PAM-TOT)
- Multi-Conditional Mixtures
a
a
a
70Want a topic model with the advantages of CRFs
- Use arbitrary, overlapping features of the input.
- Undirected graphical model, so we dont have to
think about avoiding cycles. - Integrate naturally with our other CRF
components. - Train discriminatively
- Natural semi-supervised transfer learning
What does this mean? Topic models are
unsupervised!
71Multi-Conditional MixturesLatent Variable
Models fit by Multi-way Conditional Probability
McCallum, Wang, Pal, 2005, McCallum, Pal,
Druck, Wang, 2006
- For clustering structured data,ala Latent
Dirichlet Allocation its successors - But an undirected model,like the Harmonium
Welling, Rosen-Zvi, Hinton, 2005 - But trained by a multi-conditional objective
O P(AB,C) P(BA,C) P(CA,B)e.g. A,B,C are
different modalities
See also Minka 2005 TR and Pereira Gordon
2006 ICML
72Objective Functions for Parameter Estimation
Traditional
New, multi-conditional
73Multi-Conditional Learning (Regularization)
McCallum, Pal, Wang, 2006
74Multi-Conditional Mixtures
75Predictive Random Fieldsmixture of Gaussians on
synthetic data
McCallum, Wang, Pal, 2005
Data, classify by color
Generatively trained
Multi-Conditional
Conditionally-trained Jebara 1998
76Topic WordsStrong positive and negative
indicators
20 Newsgroups data, two subtopics of
talk.politics.guns
77Multi-Conditional Harmonium
78Multi-Conditional Mixturesvs. Harmoniunon
document retrieval task
McCallum, Wang, Pal, 2005
Multi-Conditional,multi-way conditionally trained
Conditionally-trained,to predict class labels
Harmonium, joint,with class labels and words
Harmonium, joint with words, no labels
79Outline
Discovering Latent Structure in Multiple
Modalities
a
- Groups Text (Group-Topic Model, GT)
- Nested Correlations (Pachinko Allocation, PAM)
- Time Text (Topics-over-Time Model, TOT)
- Time Text with Nested Correlations (PAM-TOT)
- Multi-Conditional Mixtures
a
a
a
a
80Summary