Discovering Latent Structure in Multiple Modalities - PowerPoint PPT Presentation

About This Presentation
Title:

Discovering Latent Structure in Multiple Modalities

Description:

Discovering Latent Structure in Multiple Modalities Andrew McCallum Computer Science Department University of Massachusetts Amherst Joint work with Xuerui Wang ... – PowerPoint PPT presentation

Number of Views:239
Avg rating:3.0/5.0
Slides: 70
Provided by: Andrew1675
Category:

less

Transcript and Presenter's Notes

Title: Discovering Latent Structure in Multiple Modalities


1
Discovering Latent Structure inMultiple
Modalities
  • Andrew McCallum
  • Computer Science Department
  • University of Massachusetts Amherst

Joint work with ?Xuerui Wang, Natasha
Mohanty, Andres Corrada, Chris Pal, Wei Li, Greg
Druck.
2
Social Network in an Email Dataset
3
Social Network in Political Data
Vote similarity inU.S. Senate
Jakulin Buntine 2005
4
Inference and Estimation
  • Gibbs Sampling
  • Easy to implement
  • Reasonably fast

r
5
Enron Email Corpus
  • 250k email messages
  • 23k people

Date Wed, 11 Apr 2001 065600 -0700 (PDT) From
debra.perlingiere_at_enron.com To
steve.hooser_at_enron.com Subject
Enron/TransAltaContract dated Jan 1, 2001 Please
see below. Katalin Kiss of TransAlta has
requested an electronic copy of our final draft?
Are you OK with this? If so, the only version I
have is the original draft without
revisions. DP Debra Perlingiere Enron North
America Corp. Legal Department 1400 Smith Street,
EB 3885 Houston, Texas 77002 dperlin_at_enron.com
6
Topics, and prominent senders /
receiversdiscovered by ART
Topic names, by hand
7
ART Roles but not Groups
Traditional SNA
Author-Topic
ART
Not
Not
Block structured
Enron TransWestern Division
8
Outline
Social Network Analysis with Topic Models
a
  • Role Discovery (Author-Recipient-Topic Model,
    ART)
  • Group Discovery (Group-Topic Model, GT)
  • Enhanced Topic Models
  • Correlations among Topics (Pachinko Allocation,
    PAM)
  • Time Localized Topics (Topics-over-Time Model,
    TOT)
  • Markov Dependencies in Topics (Topical N-Grams
    Model, TNG)
  • Bibliometric Impact Measures enabled by Topics

Multi-Conditional Mixtures
9
Groups and Topics
  • Input
  • Observed relations between people
  • Attributes on those relations (text, or
    categorical)
  • Output
  • Attributes clustered into topics
  • Groups of people---varying depending on topic

10
Discovering Groups from Observed Set of Relations
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
Admiration relations among six high school
students.
11
Adjacency Matrix Representing Relations
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
A B C D E F
G1 G2 G1 G2 G3 G3
G1
G2
G1
G2
G3
G3
A C B D E F
G1 G1 G2 G2 G3 G3
G1
G1
G2
G2
G3
G3
A B C D E F
A
B
C
D
E
F
A
B
C
D
E
F
A
C
B
D
E
F
12
Group Model Partitioning Entities into Groups
Stochastic Blockstructures for Relations Nowicki,
Snijders 2001
Beta
Dirichlet
Multinomial
S number of entities G number of groups
Binomial
Enhanced with arbitrary number of groups in
Kemp, Griffiths, Tenenbaum 2004
13
Two Relations with Different Attributes
Student Roster Adams BennettCarterDavis Edward
s Frederking
Academic Admiration Acad(A, B) Acad(C,
B) Acad(A, D) Acad(C, D) Acad(B, E) Acad(D,
E) Acad(B, F) Acad(D, F) Acad(E, A) Acad(F,
A) Acad(E, C) Acad(F, C)
Social Admiration Soci(A, B) Soci(A, D) Soci(A,
F) Soci(B, A) Soci(B, C) Soci(B, E) Soci(C, B)
Soci(C, D) Soci(C, F) Soci(D, A) Soci(D, C)
Soci(D, E) Soci(E, B) Soci(E, D) Soci(E,
F) Soci(F, A) Soci(F, C) Soci(F, E)
A C B D E F
G1 G1 G2 G2 G3 G3
G1
G1
G2
G2
G3
G3
A C E B D F
G1 G1 G1 G2 G2 G2
G1
G1
G1
G2
G2
G2
A
C
E
B
D
F
A
C
B
D
E
F
14
The Group-Topic Model Discovering Groups and
Topics Simultaneously
Wang, Mohanty, McCallum 2006
Beta
Uniform
Dirichlet
Multinomial
Dirichlet
Binomial
Multinomial
15
Inference and Estimation
  • Gibbs Sampling
  • Many r.v.s can be integrated out
  • Easy to implement
  • Reasonably fast

We assume the relationship is symmetric.
16
Dataset 1U.S. Senate
  • 16 years of voting records in the US Senate (1989
    2005)
  • a Senator may respond Yea or Nay to a resolution
  • 3423 resolutions with text attributes (index
    terms)
  • 191 Senators in total across 16 years

S.543 Title An Act to reform Federal deposit
insurance, protect the deposit insurance funds,
recapitalize the Bank Insurance Fund, improve
supervision and regulation of insured depository
institutions, and for other purposes. Sponsor
Sen Riegle, Donald W., Jr. MI (introduced
3/5/1991) Cosponsors (2) Latest Major Action
12/19/1991 Became Public Law No 102-242. Index
terms Banks and banking Accounting
Administrative fees Cost control Credit Deposit
insurance Depressed areas and other 110 terms
Adams (D-WA), Nay Akaka (D-HI), Yea Bentsen
(D-TX), Yea Biden (D-DE), Yea Bond (R-MO), Yea
Bradley (D-NJ), Nay Conrad (D-ND), Nay
17
Topics Discovered (U.S. Senate)
Education Energy Military Misc. Economic
education energy government federal
school power military labor
aid water foreign insurance
children nuclear tax aid
drug gas congress tax
students petrol aid business
elementary research law employee
prevention pollution policy care
Mixture of Unigrams
Education Domestic Foreign Economic Social Security Medicare
education foreign labor social
school trade insurance security
federal chemicals tax insurance
aid tariff congress medical
government congress income care
tax drugs minimum medicare
energy communicable wage disability
research diseases business assistance
Group-Topic Model
18
Groups Discovered (US Senate)
Groups from topic Education Domestic
19
Senators Who Change Coalition the most Dependent
on Topic
e.g. Senator Shelby (D-AL) votes with the
Republicans on Economic with the Democrats on
Education Domestic with a small group of
maverick Republicans on Social Security Medicaid
20
Dataset 2The UN General Assembly
  • Voting records of the UN General Assembly (1990 -
    2003)
  • A country may choose to vote Yes, No or Abstain
  • 931 resolutions with text attributes (titles)
  • 192 countries in total
  • Also experiments later with resolutions from
    1960-2003

Vote on Permanent Sovereignty of Palestinian
People, 87th plenary meeting The draft
resolution on permanent sovereignty of the
Palestinian people in the occupied Palestinian
territory, including Jerusalem, and of the Arab
population in the occupied Syrian Golan over
their natural resources (document A/54/591) was
adopted by a recorded vote of 145 in favour to 3
against with 6 abstentions In favour
Afghanistan, Argentina, Belgium, Brazil, Canada,
China, France, Germany, India, Japan, Mexico,
Netherlands, New Zealand, Pakistan, Panama,
Russian Federation, South Africa, Spain, Turkey,
and other 126 countries. Against Israel,
Marshall Islands, United States. Abstain
Australia, Cameroon, Georgia, Kazakhstan,
Uzbekistan, Zambia.
21
Topics Discovered (UN)
Everything Nuclear Human Rights Security in Middle East
Everything Nuclear Security in Middle East
nuclear rights occupied
weapons human israel
use palestine syria
implementation situation security
countries israel calls
Mixture of Unigrams
Nuclear Non-proliferation Nuclear Arms Race Human Rights
nuclear nuclear rights
states arms human
united prevention palestine
weapons race occupied
nations space israel
Group-TopicModel
22
GroupsDiscovered(UN)
The countries list for each group are ordered by
their 2005 GDP (PPP) and only 5 countries are
shown in groups that have more than 5 members.
23
Outline
Discovering Latent Structure in Multiple
Modalities
a
  • Groups Text (Group-Topic Model, GT)
  • Nested Correlations (Pachinko Allocation, PAM)
  • Time Text (Topics-over-Time Model, TOT)
  • Time Text with Nested Correlations (PAM-TOT)
  • Multi-Conditional Mixtures

24
Latent Dirichlet Allocation
Blei, Ng, Jordan, 2003
a
N
?
n
z
ß
T
w
f
25
Correlated Topic Model
Blei, Lafferty, 2005
?
?
N
logistic normal
?
n
z
ß
T
w
f
Square matrix of pairwise correlations.
26
Topic Correlation Representation
7 topics A, B, C, D, E, F, G Correlations
A, B, C, D, E and C, D, E, F, G
CTM
B
C
D
E
F
G
A
B
C
D
E
F
27
Pachinko Machine
28
Pachinko Allocation Model
Thanks to Michael Jordan for suggesting the name
Li, McCallum, 2005, 2006
?11
Given directed acyclic graph (DAG) at each
interior node a Dirichlet over its children
and words at leaves
Model structure, not the graphical model
?22
?21
For each document Sample a multinomial from
each Dirichlet
?31
?33
?32
For each word in this document Starting from
the root, sample a child from successive
nodes, down to a leaf. Generate the word at the
leaf
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
Like a Polya tree, but DAG shaped, with arbitrary
number of children.
29
Pachinko Allocation Model
Li, McCallum, 2005
?11
  • DAG may have arbitrary structure
  • arbitrary depth
  • any number of children per node
  • sparse connectivity
  • edges may skip layers

Model structure, not the graphical model
?22
?21
?31
?33
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
30
Pachinko Allocation Model
Li, McCallum, 2005
?11
Model structure, not the graphical model
?22
?21
Distributions over distributions over topics...
Distributions over topicsmixtures, representing
topic correlations
?31
?33
?32
?41
?42
?43
?44
?45
Distributions over words (like LDA topics)
word1
word2
word3
word4
word5
word6
word7
word8
Some interior nodes could contain one
multinomial, used for all documents. (i.e. a very
peaked Dirichlet)
31
Pachinko Allocation Model
Li, McCallum, 2005
?11
Estimate all these Dirichlets from
data. Estimate model structure from data.
(number of nodes, and connectivity)
Model structure, not the graphical model
?22
?21
?31
?33
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
32
Pachinko Allocation Special Cases
Latent Dirichlet Allocation
?32
?41
?42
?43
?44
?45
word1
word2
word3
word4
word5
word6
word7
word8
33
Pachinko Allocation Special Cases
Hierarchical Latent Dirichlet Allocation (HLDA)
Very low variance Dirichlet at root
?11
Each leaf of the HLDA topic hier. has a distr.
over nodes on path to the root.
?22
?23
?24
?21
?32
?33
?31
?34
TheHLDAhier.
?41
?42
?51
word1
word2
word3
word4
word5
word6
word7
word8
34
Pachinko Allocation on a Topic Hierarchy
Combining best of HLDA and Pachinko Allocation
?00
ThePAMDAG.
?11
?12
...representingcorrelations amongtopic leaves.
?22
?23
?24
?21
?32
?33
?31
?34
TheHLDAhier.
?41
?42
?51
word1
word2
word3
word4
word5
word6
word7
word8
35
Pachinko Allocation Model
... with two layers, no skipping
layers,fully-connected from one layer to the
next.
?11
?21
?23
?22
super-topics
sub-topics
?31
?32
?33
?34
?35
fixed multinomials
word1
word2
word3
word4
word5
word6
word7
word8
Another special case would select only one
super-topic per document.
36
Graphical Models
Four-level PAM (with fixed multinomials for
sub-topics)
LDA
T
a1
a
a2
N
N
?2
?
?3
n
n
z2
z3
z
ß
ß
T
T
w
f
w
f
37
Inference Gibbs Sampling
T
a2
a3
N
?2
?3
n
Jointly sampled
z2
z3
ß
T
w
f
Dirichlet parameters a are estimated with moment
matching
38
Experimental Results
  • Topic clarity by human judgement
  • Likelihood on held-out data
  • Document classification

39
Datasets
  • Rexa (http//rexa.info/)
  • 4000 documents, 278438 word tokens and 25597
    unique words.
  • NIPS
  • 1647 documents, 114142 word tokens and 11708
    unique words.
  • 20 newsgroup comp5 subset
  • 4836 documents, 35567 unique words.

40
Topic Correlations
41
Example Topics
images, motion eyes
motion ( some generic)
motion
eyes
images
LDA 100 motion detection field optical flow sensit
ive moving functional detect contrast light dimens
ional intensity computer mt measures occlusion tem
poral edge real
PAM 100 motion video surface surfaces figure scene
camera noisy sequence activation generated analy
tical pixels measurements assigne advance lated sh
own closed perceptual
LDA 20 visual model motion field object image ima
ges objects fields receptive eye position spatial
direction target vision multiple figure orientatio
n location
PAM 100 eye head vor vestibulo oculomotor vestibul
ar vary reflex vi pan rapid semicircular canals re
sponds streams cholinergic rotation topographicall
y detectors ning
PAM 100 image digit faces pixel surface interpolat
ion scene people viewing neighboring sensors patch
es manifold dataset magnitude transparency rich dy
namical amounts tor
42
Blind Topic Evaluation
  • Randomly select 25 similar pairs of topics
    generated from PAM and LDA
  • 5 people
  • Each asked to select the topic in each pair that
    you find more semantically coherent.

Topic counts
LDA PAM
5 votes 0 5
gt 4 votes 3 8
gt 3 votes 9 16
43
Examples
PAM LDA
control systems robot adaptive environment goal state controller control systems based adaptive direct con controller change
PAM LDA
motion image detection images scene vision texture segmentation image motion images multiple local generated noisy optical
5 votes 0 vote 4
votes 1 vote
44
Examples
PAM LDA
algorithm learning algorithms gradient convergence function stochastic weight algorithm algorithms gradient convergence stochastic line descent converge
PAM LDA
signals source separation eeg sources blind single event signal signals single time low source temporal processing
4 votes 1 vote 1
vote 4 votes
45
Likelihood Comparison
  • Dataset NIPS
  • Two sets of experiments
  • Varying number of topics
  • Different proportions of training data

46
Likelihood Comparison
  • Varying number of topics

47
Likelihood Comparison
  • Different proportions of training data

48
Likelihood Estimation
  • VariationalPerform inference in a simpler model
  • (Gibbs sampling) Harmonic mean
  • Approximate the marginal probability with the
    harmonic mean of conditional probabilities
  • (Gibbs sampling) Empirical likelihood
  • Estimate the distribution based on empirical
    samples

Diggle Gratton, 1984
49
Empirical Likelihood Estimation






50
Document Classification
  • 20 newsgroup comp5 subset
  • 5-way classification (accuracy in )

class docs LDA PAM
graphics 243 83.95 86.83
os 239 81.59 84.10
pc 245 83.67 88.16
mac 239 86.61 89.54
Windows.x 243 88.07 92.20
total 1209 84.70 87.34
Statistically significant with a p-value lt 0.05.
51
Outline
Discovering Latent Structure in Multiple
Modalities
a
  • Groups Text (Group-Topic Model, GT)
  • Nested Correlations (Pachinko Allocation, PAM)
  • Time Text (Topics-over-Time Model, TOT)
  • Time Text with Nested Correlations (PAM-TOT)
  • Multi-Conditional Mixtures

a
52
Want to Model Trends over Time
  • Is prevalence of topic growing or waning?
  • Pattern appears only briefly
  • Capture its statistics in focused way
  • Dont confuse it with patterns elsewhere in time
  • How do roles, groups, influence shift over time?

53
Topics over Time (TOT)
Wang, McCallum 2006
?
Dirichlet
?
multinomialover topics
Uniformprior
Dirichlet prior
topicindex
z
?
?
timestamp
word
w
t
?
?
T
T
Nd
Betaover time
Multinomialover words
D
54
Attributes of this Approach to Modeling Time
  • Not a Markov model
  • No state transitions, or Markov assumption
  • Continuous Time
  • Time not discretized
  • Easily incorporated into other more complex
    models with additional modalities.

55
State of the Union Address
208 Addresses delivered between January 8, 1790
and January 29, 2002.
  • To increase the number of documents, we split the
    addresses into paragraphs and treated them as
    documents. One-line paragraphs were excluded.
    Stopping was applied.
  • 17156 documents
  • 21534 words
  • 669,425 tokens

Our scheme of taxation, by means of which this
needless surplus is taken from the people and put
into the public Treasury, consists of a tariff
or duty levied upon importations from abroad and
internal-revenue taxes levied upon the
consumption of tobacco and spirituous and malt
liquors. It must be conceded that none of the
things subjected to internal-revenue
taxation are, strictly speaking, necessaries.
There appears to be no just complaint of this
taxation by the consumers of these articles, and
there seems to be nothing so well able to bear
the burden without hardship to any portion of the
people.
1910
56
ComparingTOTagainst LDA
57
TOT on 17 years of NIPS proceedings
58
Topic Distributions Conditioned on Time
topic mass (in vertical height)
time
59
TOT on 17 years of NIPS proceedings
TOT
LDA
60
TOT versusLDAon my email
61
TOT improves ability to Predict Time
Predicting the year of a State-of-the-Union
address.
L1 distance between predicted year and actual
year.
62
Outline
Discovering Latent Structure in Multiple
Modalities
a
  • Groups Text (Group-Topic Model, GT)
  • Nested Correlations (Pachinko Allocation, PAM)
  • Time Text (Topics-over-Time Model, TOT)
  • Time Text with Nested Correlations (PAM-TOT)
  • Multi-Conditional Mixtures

a
a
63
PAM Over Time (PAMTOT)
T
a1
a2
N
?2
?3
n
z2
z3
?
ß
T
w
f
t3
t2
64
Experimental Results
  • Dataset Rexa subset
  • 4454 documents between years 1991 and 2005
  • 372936 word tokens
  • 21748 unique words
  • Topic Examples
  • Predict Time

65
Topic Examples (1)
PAMTOT
PAM
66
Topic Examples (2)
PAMTOT
PAM
67
Topic Examples (3)
PAMTOT
PAM
68
Predict Time with PAMTOT
L1 Error E(L1) Accuracy
PAMTOT 1.56 1.57 0.29
PAM 5.34 5.30 0.10
70 error reduction
L1 Error the difference between predicted and
true years E(L1) average difference between all
years and true year using p(t.) from
the model.
69
Outline
Discovering Latent Structure in Multiple
Modalities
a
  • Groups Text (Group-Topic Model, GT)
  • Nested Correlations (Pachinko Allocation, PAM)
  • Time Text (Topics-over-Time Model, TOT)
  • Time Text with Nested Correlations (PAM-TOT)
  • Multi-Conditional Mixtures

a
a
a
70
Want a topic model with the advantages of CRFs
  • Use arbitrary, overlapping features of the input.
  • Undirected graphical model, so we dont have to
    think about avoiding cycles.
  • Integrate naturally with our other CRF
    components.
  • Train discriminatively
  • Natural semi-supervised transfer learning

What does this mean? Topic models are
unsupervised!
71
Multi-Conditional MixturesLatent Variable
Models fit by Multi-way Conditional Probability
McCallum, Wang, Pal, 2005, McCallum, Pal,
Druck, Wang, 2006
  • For clustering structured data,ala Latent
    Dirichlet Allocation its successors
  • But an undirected model,like the Harmonium
    Welling, Rosen-Zvi, Hinton, 2005
  • But trained by a multi-conditional objective
    O P(AB,C) P(BA,C) P(CA,B)e.g. A,B,C are
    different modalities

See also Minka 2005 TR and Pereira Gordon
2006 ICML
72
Objective Functions for Parameter Estimation
Traditional
New, multi-conditional
73
Multi-Conditional Learning (Regularization)
McCallum, Pal, Wang, 2006
74
Multi-Conditional Mixtures
75
Predictive Random Fieldsmixture of Gaussians on
synthetic data
McCallum, Wang, Pal, 2005
Data, classify by color
Generatively trained
Multi-Conditional
Conditionally-trained Jebara 1998
76
Topic WordsStrong positive and negative
indicators
20 Newsgroups data, two subtopics of
talk.politics.guns
77
Multi-Conditional Harmonium
78
Multi-Conditional Mixturesvs. Harmoniunon
document retrieval task
McCallum, Wang, Pal, 2005
Multi-Conditional,multi-way conditionally trained
Conditionally-trained,to predict class labels
Harmonium, joint,with class labels and words
Harmonium, joint with words, no labels
79
Outline
Discovering Latent Structure in Multiple
Modalities
a
  • Groups Text (Group-Topic Model, GT)
  • Nested Correlations (Pachinko Allocation, PAM)
  • Time Text (Topics-over-Time Model, TOT)
  • Time Text with Nested Correlations (PAM-TOT)
  • Multi-Conditional Mixtures

a
a
a
a
80
Summary
Write a Comment
User Comments (0)
About PowerShow.com