Title: Hierarchical Topic Models and the Nested Chinese Restaurant Process
1Hierarchical Topic Models and the Nested Chinese
Restaurant Process
- Blei, Griffiths, Jordan, Tenenbaum
- presented by Rodrigo de Salvo Braz
2Document classification
- One-class approach one topic per document, with
words generated according to the topic. - For example, a Naive Bayes model.
3Document classification
- It is more realistic to assume more than one
topic per document. - Generative model pick a mixture distribution
over K topics and generate words from it.
4Document classification
- Even more realistic topics may be organized in a
hierarchy (not independent) - Pick a path from root to leaf in a tree each
node is a topic sample from the mixture.
5Dirichlet distribution (DD)
- Distribution over distribution vectors of
dimension KP(p u, ?) 1/Z(u) ?i piui - Parameters are a prior distribution (previous
observations) - Symmetric Dirichlet distribution assumes a
uniform prior distribution (ui uj, any i, j).
6Latent Dirichlet Allocation (LDA)
- Generative model of multiple-topic documents
- Generate a mixture distribution on topics using a
Dirichlet distribution - Pick a topic according to their distribution and
generate words according to the word distribution
for the topic.
7Latent Dirichlet Allocation (LDA)
DD hyper parameter
Topics
?
?
K
?
Words
w
Topic distribution
W
8Chinese Restaurant Process (CRP)
1 out of 9 customers
9Chinese Restaurant Process (CRP)
2 out of 9 customers
10Chinese Restaurant Process (CRP)
3 out of 9 customers
11Chinese Restaurant Process (CRP)
4 out of 9 customers
12Chinese Restaurant Process (CRP)
5 out of 9 customers
13Chinese Restaurant Process (CRP)
6 out of 9 customers
14Chinese Restaurant Process (CRP)
7 out of 9 customers
15Chinese Restaurant Process (CRP)
8 out of 9 customers
16Chinese Restaurant Process (CRP)
9 out of 9 customers
Data point (a distribution itself) sampled
17Species Sampling Mixture
- Generative model of multiple-topic documents
- Generate a mixture distribution on topics using a
CRP prior - Pick a topic according to their distribution and
generate words according to the word distribution
for the topic.
18Species Sampling Mixture
CRP hyper parameter
Topics
?
?
K
?
Words
w
Topic distribution
W
19Nested CRP
1
2
3
4
5
6
1
2
3
4
5
6
3
6
1
2
4
5
20Hierarchical LDA (hLDA)
- Generative model of multiple-topic documents
- Generate a mixture distribution on topics using a
Nested CRP prior - Pick a topic according to their distribution and
generate words according to the word distribution
for the topic.
21hLDA graphical model
22Artificial data experiment
100 1000-word documents on 25-term
vocabulary Each vertical bar is a topic
23CRP prior vs. Bayes Factors
24Predicting the structure
25NIPS abstracts
26Comments
- Accommodates growing collections of data
- Hierarchical organization makes sense, but not
clear to me why the CRP prior is the best prior
for that - No mention of time maybe it takes a very long
time.