Hierarchical Topic Models and the Nested Chinese Restaurant Process presentation

About This Presentation

Title:

Hierarchical Topic Models and the Nested Chinese Restaurant Process

Description:

... Restaurant Process (CRP) 9 out of ... distribution on topics using a CRP prior; ... CRP prior vs. Bayes Factors. Predicting the structure. NIPS ... –

Number of Views:428

Avg rating:3.0/5.0

Slides: 27

Provided by: rodrigode

Learn more at: https://cogcomp.seas.upenn.edu

Category:

more less

Transcript and Presenter's Notes

Title: Hierarchical Topic Models and the Nested Chinese Restaurant Process

1
Hierarchical Topic Models and the Nested Chinese
Restaurant Process

Blei, Griffiths, Jordan, Tenenbaum
presented by Rodrigo de Salvo Braz

2
Document classification

One-class approach one topic per document, with
words generated according to the topic.
For example, a Naive Bayes model.

3
Document classification

It is more realistic to assume more than one
topic per document.
Generative model pick a mixture distribution
over K topics and generate words from it.

4
Document classification

Even more realistic topics may be organized in a
hierarchy (not independent)
Pick a path from root to leaf in a tree each
node is a topic sample from the mixture.

5
Dirichlet distribution (DD)

Distribution over distribution vectors of
dimension KP(p u, ?) 1/Z(u) ?i piui
Parameters are a prior distribution (previous
observations)
Symmetric Dirichlet distribution assumes a
uniform prior distribution (ui uj, any i, j).

6
Latent Dirichlet Allocation (LDA)

Generative model of multiple-topic documents
Generate a mixture distribution on topics using a
Dirichlet distribution
Pick a topic according to their distribution and
generate words according to the word distribution
for the topic.

7
Latent Dirichlet Allocation (LDA)
DD hyper parameter
Topics
?
?
K
?
Words
w
Topic distribution
W
8
Chinese Restaurant Process (CRP)
1 out of 9 customers
9
Chinese Restaurant Process (CRP)
2 out of 9 customers
10
Chinese Restaurant Process (CRP)
3 out of 9 customers
11
Chinese Restaurant Process (CRP)
4 out of 9 customers
12
Chinese Restaurant Process (CRP)
5 out of 9 customers
13
Chinese Restaurant Process (CRP)
6 out of 9 customers
14
Chinese Restaurant Process (CRP)
7 out of 9 customers
15
Chinese Restaurant Process (CRP)
8 out of 9 customers
16
Chinese Restaurant Process (CRP)
9 out of 9 customers
Data point (a distribution itself) sampled
17
Species Sampling Mixture

Generative model of multiple-topic documents
Generate a mixture distribution on topics using a
CRP prior
Pick a topic according to their distribution and
generate words according to the word distribution
for the topic.

18
Species Sampling Mixture
CRP hyper parameter
Topics
?
?
K
?
Words
w
Topic distribution
W
19
Nested CRP
1
2
3
4
5
6
1
2
3
4
5
6
3
6
1
2
4
5
20
Hierarchical LDA (hLDA)

Generative model of multiple-topic documents
Generate a mixture distribution on topics using a
Nested CRP prior
Pick a topic according to their distribution and
generate words according to the word distribution
for the topic.

21
hLDA graphical model
22
Artificial data experiment
100 1000-word documents on 25-term
vocabulary Each vertical bar is a topic
23
CRP prior vs. Bayes Factors
24
Predicting the structure
25
NIPS abstracts
26
Comments

Accommodates growing collections of data
Hierarchical organization makes sense, but not
clear to me why the CRP prior is the best prior
for that
No mention of time maybe it takes a very long
time.

Write a Comment

User Comments (0)

About PowerShow.com