Title: Latent Dirichlet allocation
1Latent Dirichlet allocation
2Outline
- Brief introduction
- LDA parameters estimation
3Outline
- Brief introduction
- LDA parameters estimation
4First paper of LDA
- Latent dirichlet allocation
- Blei, D.M. and Ng, A.Y. and Jordan, M.I.
- The Journal of Machine Learning Research,2003
- Figure
5Why propose LDA
- Other latent variable models
- Models
- Unigram
- Mixture of unigrams
- PLSI
- Drawbacks
- No ability to model multiple topics phenomenon
- No ability to predict on new data
- Too many parameters to estimate, intractable
6(No Transcript)
7Usage of LDA
- Topic-word-document distribution
- An result on wikipedia
Topic 0th medical health medicine care practice
patient training treatment patients Topic 1th
memory intel processor instruction processors
cpu performance instructions . Topic 199th
distribution probability test random sample
variables statistical variable data error
8Usage of LDA (Cont)
- The author-topic model for authors and documents
(UAI, 2004)
9Usage of LDA (Cont)
- Learning to Classify Short and Sparse Text Web
with Hidden Topics from Large-scale Data
Collections (WWW08)
10Usage of LDA (cont)
- A Latent Dirichlet Model for Unsupervised Entity
Resolution (SIAM06)
11Usage of LDA (cont)
- LDA-Based Document Models for Ad-hoc Retrieval
(SIGIR06)
Topic Based Language Models for ad hoc
Information Retrieval (Neural networks, 2004)
12Usage of LDA (Cont)
- Latent Dirichlet Allocation in Web Spam Filtering
(AIRWeb08)
- Probabilistic Models for Discovering
ECommunities (WWW06)
- A mixture model for contextual text mining
(SIGKDD06)
- Latent Friend Mining from Blog Data (SIGKDD06)
13Usage of LDA (cont)
- Finding Scientific Topics (PNAS,2004)
- Gibbs Sampling method to estimate parameters
- Automatic determine topic number
- Application on PNAS data
14Outline
- Brief introduction
- LDA parameters estimation
15Beta distribution
16Beta distribution (Cont)
17Dirichlet distribution
- Generalize Beta distribution from 2 to K
dimensions
18Conjugate prior distributions
If the likelihood P(Xtheta) is a multinomial
distribution with parameters theta (a vector),
then for theta, the conjugate prior is the
Dirichlet distribution.
19Latent Dirichlet allocation
20(No Transcript)
21Likelihoods
22Inference via Gibbs Sampling
23Collapsed LDA Gibbs Sampler
24Joint distribution
25Joint distribution (cont)
26Update equation
27Multinomial parameters
28(No Transcript)