Probabilistic Latent Semantic Analysis - PowerPoint PPT Presentation

1 / 28

About This Presentation

Title:

Probabilistic Latent Semantic Analysis

Description:

Perplexity Comparison (1/2) What is perplexity? Indicator ... High probability will give lower perplexity, thus good predictions. Perplexity Comparison (2/2) ... – PowerPoint PPT presentation

Number of Views:299

Avg rating:3.0/5.0

Slides: 29

Provided by: Ral97

Category:

more less

Transcript and Presenter's Notes

Title: Probabilistic Latent Semantic Analysis

1
Probabilistic Latent Semantic Analysis

Thomas Hofmann
Presented by
Vortrag Quang Lam Nguyen
Basierend auf Mummoorthy Murugesan, Cs 6901

2
Outline

Background
LSA
PLSA
Model Fitting
Basic I Maximum Likelihood Estimation
Basic II EM Algorithm
Basic III Over fitting
Experimental Results
Conclusion

3
Background (1/2)
Probabilistic Latent Semantic Analysis and
Latent Semantic Analysis

Latent present but not evident, hidden
Semantic meaning
Hidden meaning of terms, and their
occurrences in documents

4
Background (2/2)
N dimensions lexical space
Sport
Polysemy
Synonymy
Muskelkater
Kater
Polysemy
Auto
Bank
Bank
Du hast nicht alle Tassen im Schrank
Wagen
KltN dimensions semantic (latent) space
Du bist verrückt
Bank
Auto
Bank
Park
Wagen
Einzahlung
Kater
Muskelkater
Sport
5
The Setting

Set of N documents
Dd_1, ,d_N
Set of M words
Ww_1, ,w_M
Set of K Latent classes
Zz_1, ,z_K

6
Latent Semantic Indexing (1/2)

Term-Document-matrix A of size N M to represent
the frequency counts
Singular Value Decomposition (SVD)
A(nm) U(nn) E(nm) V(mm)
Keep only k eigen values from E
A(nm) U(nk) E(kk) V(km)
A A
Term represented by k factors or a vector in
k-dimensional space
Terms with common meaning mapped to same
direction

7
Latent Semantic Indexing (2/2)

LSI puts documents together even if they dont
have common words
Disadvantages
Statistical foundation is missing
PLSA addresses this concern!

8
Probabilistic Latent Semantic Analysis

Overview
Aspect Model
Model fitting with EM and TEM
Basic I Maximum Likelihood Estimation
Basic II EM Algorithm
Basic III Over fitting

9
PLSA Overview

Automated Document Indexing and Information
retrieval

Identification of Latent Classes using an
Expectation Maximization (EM) Algorithm

Shown to solve
Polysemy and Synonymy

Has a better statistical foundation than LSA

10
PLSA Aspect Model (1/3)

Aspect Model
Document is a mixture of underlying (latent) K
aspects
Each aspect is represented by a distribution of
words p(wz)

11
Aspect Model (2/3)

Latent Variable model for general co-occurrence
data
Associate each observation (w,d) with a class
variable z ? Zz_1,,z_K

Generative Model predicting words
Select a doc with probability P(d)
Pick a latent class z with probability P(zd)
Generate a word w with probability p(wz)

P(d)
P(zd)
P(wz)
d
z
w
12
Aspect Model (3/3)

To get the joint probability model

(d,w) assumed to be independent

Now, we have to compute P(z), P(zd), P(wz). We
are given just documents(d) and words(w).

13
Basic I Maximum Likelihood Estimation

Probability model based on real data
? it has to be fit ? Model Fitting
Tuning free Parameters of the model to provide an
optimal fit to real-world data
Parameters in a way that make the data more
likely than other values would do it
Prerequisite correct parameters are known!

14
Basic II EM Algorithm (1/2)

Maximum Likelihood Estimation
BUT correct parameters not known
FOR they depend on unknown properties!
Iterative
1. Expectation Step
2. Maximization Step

15
Basic II EM Algorithm (2/2)

E-Step (Expectation)
Hidden parameters estimated - expectation of the
likelihood function is calculated with the
current parameter values
M-Step (Maximization)
Determine the actual parameters -
Find the parameters that maximizes the likelihood
function (Maximum Likelihood Estimation)

16
Model fitting

We have the equation for log-likelihood function
from the aspect model, and we need to maximize
it.
Expectation Maximization ( EM) is used for this
purpose

17
E-Step Model Fitting (2/2)

It is the probability that a word w occurring in
a document d, is explained by aspect z
(based on some calculations)

18
M Step Model Fitting (3/3)

All these equations use p(zd,w) calculated in E
Step
Converges to local maximum of the likelihood
function

19
Basic III Over fitting

Trade off between Predictive performance on the
training data and Unseen new data
Actual aim predict correct output for UNSEEN
data, too -gt generalization
Problem may adjust to very specific random
features of the training data too much -gt over
fitting
? Tempered EM

20
TEM (Tempered EM)