Probabilistic Latent Semantic Analysis - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Probabilistic Latent Semantic Analysis

Description:

Evaluation: Perplexity Comparison. Perplexity Log-averaged inverse ... High probability will give lower perplexity, thus good predictions. MED data. 10/5/09 ... – PowerPoint PPT presentation

Number of Views:241

Avg rating:3.0/5.0

Slides: 19

Provided by: Fankh

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic Latent Semantic Analysis

1
Probabilistic Latent Semantic Analysis

Thomas Hofmann
Presented by
Peter Fankhauser

2
Probabilistic Latent Semantic Analysis

Automated Document Indexing and Information
retrieval

Identification of Latent Classes (Aspects) using
an Expectation Maximization (EM) Algorithm

Shown to solve
Polysemy
Java could mean coffee and also the PL Java
Cricket is a game and also an insect
Synonymy
computer, pc, desktop all could mean the
same

Has a better statistical foundation than LSA

3
Aspect Model

Latent Variable model for general co-occurrence
data
Associate each observation (w,d)with a class
variable z ? Zz_1,,z_K

Generative Model
Select a doc with probability P(d)
Pick a latent class z with probability P(zd)
Generate a word w with probability P(wz)

P(d)
P(zd)
P(wz)
d
z
w
4
PLSA Aspect Model

Aspect Model
Document is a mixture of underlying (latent) K
aspects
Each aspect is represented by a distribution of
words P(wz)
Model fitting with Tempered EM

5
Aspect Model

The joint distribution P(d,w) is the sum over all
aspects z
Assuming conditional independence of d and w
given z.

6
Advantages of this model over Document Clustering

Documents are not related to a single cluster
(i.e. aspect )
For each z, P(dz) defines a specific mixture of
factors
This offers more flexibility, and produces
effective modeling
Problem Estimate P(z), P(dz), P(wz). We are
given just documents(d) and words(w).
Approach Expectation Maximization (EM)

7
EM Steps

E-Step
Expectation step where expectation of the
likelihood function is calculated with the
current parameter values
M-Step
Update the parameters with the calculated
posterior probabilities
Find the parameters that maximizes the likelihood
function

8
E Step

Probability of aspect zgiven that a word w
occurs in a document d

9
M Step

All these equations use P(zd,w) calculated in E
Step

10
Tempered EM

Trade off between Predictive performance on the
training data and Unseen new data
Must prevent the model to over fit the training
data
Introduce damp factor ß into E-step

11
Evaluation Perplexity Comparison

Perplexity Log-averaged inverse probability on
unseen data
High probability will give lower perplexity, thus
good predictions
MED data

12
Evaluation Topic Decomposition

Abstracts of 1568 documents
Clustering 128 latent classes
Shows word stems for
the same word power
as p(wz)
Power1 Astronomy
Power2 - Electricals

13
Evaluation Polysemy

Segment occurring in two different contexts are
identified (image, sound)

14
Evaluation Information Retrieval

MED 1033 docs
CRAN 1400 docs
CACM 3204 docs
CISI 1460 docs
Reporting only the best results with K varying
from 32, 48, 64, 80, 128
PLSI model takes the average across all models
at different K values

15
Evaluation Information Retrieval

Cosine Similarity is the baseline
In LSI, query vector(q) is multiplied to get the
reduced space vector
In PLSI, p(zd) and p(zq). In EM iterations,
only P(zq) is adapted

16
Evaluation Precision-Recall results
17
Comparing PLSA and LSA

LSA and PLSA perform dimensionality reduction
In LSA, by keeping only K singular values
In PLSA, by having K aspects
Comparison to SVD
U Matrix related to P(dz) (doc to aspect)
V Matrix related to P(zw) (aspect to term)
E Matrix related to P(z) (aspect strength)
The main difference is the way the approximation
is done
PLSA generates a model (aspect model) and
maximizes its predictive power
Selecting the proper value of K is heuristic in
LSA
Model selection in statistics can determine
optimal K in PLSA

18
Conclusion

PLSI consistently outperforms LSI in the
experiments
Precision gain is 100 compared to baseline
method in some cases
PLSA has statistical theory to support it, and
thus better than LSA.
.
Enter Probabilistic Topic Models, Mark Steyvers,
Tom Griffiths (In T. Landauer, D. McNamara, S.
Dennis, and W. Kintsch (eds), Latent Semantic
Analysis A Road to Meaning. Laurence Erlbaum.