Probabilistic Latent Semantic Analysis - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Probabilistic Latent Semantic Analysis

Description:

Evaluation: Perplexity Comparison. Perplexity Log-averaged inverse ... High probability will give lower perplexity, thus good predictions. MED data. 10/5/09 ... – PowerPoint PPT presentation

Number of Views:241
Avg rating:3.0/5.0
Slides: 19
Provided by: Fankh
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Latent Semantic Analysis


1
Probabilistic Latent Semantic Analysis
  • Thomas Hofmann
  • Presented by
  • Peter Fankhauser

2
Probabilistic Latent Semantic Analysis
  • Automated Document Indexing and Information
    retrieval

Identification of Latent Classes (Aspects) using
an Expectation Maximization (EM) Algorithm
  • Shown to solve
  • Polysemy
  • Java could mean coffee and also the PL Java
  • Cricket is a game and also an insect
  • Synonymy
  • computer, pc, desktop all could mean the
    same
  • Has a better statistical foundation than LSA

3
Aspect Model
  • Latent Variable model for general co-occurrence
    data
  • Associate each observation (w,d)with a class
    variable z ? Zz_1,,z_K
  • Generative Model
  • Select a doc with probability P(d)
  • Pick a latent class z with probability P(zd)
  • Generate a word w with probability P(wz)

P(d)
P(zd)
P(wz)
d
z
w
4
PLSA Aspect Model
  • Aspect Model
  • Document is a mixture of underlying (latent) K
    aspects
  • Each aspect is represented by a distribution of
    words P(wz)
  • Model fitting with Tempered EM

5
Aspect Model
  • The joint distribution P(d,w) is the sum over all
    aspects z
  • Assuming conditional independence of d and w
    given z.

6
Advantages of this model over Document Clustering
  • Documents are not related to a single cluster
    (i.e. aspect )
  • For each z, P(dz) defines a specific mixture of
    factors
  • This offers more flexibility, and produces
    effective modeling
  • Problem Estimate P(z), P(dz), P(wz). We are
    given just documents(d) and words(w).
  • Approach Expectation Maximization (EM)

7
EM Steps
  • E-Step
  • Expectation step where expectation of the
    likelihood function is calculated with the
    current parameter values
  • M-Step
  • Update the parameters with the calculated
    posterior probabilities
  • Find the parameters that maximizes the likelihood
    function

8
E Step
  • Probability of aspect zgiven that a word w
    occurs in a document d

9
M Step
  • All these equations use P(zd,w) calculated in E
    Step

10
Tempered EM
  • Trade off between Predictive performance on the
    training data and Unseen new data
  • Must prevent the model to over fit the training
    data
  • Introduce damp factor ß into E-step

11
Evaluation Perplexity Comparison
  • Perplexity Log-averaged inverse probability on
    unseen data
  • High probability will give lower perplexity, thus
    good predictions
  • MED data

12
Evaluation Topic Decomposition
  • Abstracts of 1568 documents
  • Clustering 128 latent classes
  • Shows word stems for
  • the same word power
  • as p(wz)
  • Power1 Astronomy
  • Power2 - Electricals

13
Evaluation Polysemy
  • Segment occurring in two different contexts are
    identified (image, sound)

14
Evaluation Information Retrieval
  • MED 1033 docs
  • CRAN 1400 docs
  • CACM 3204 docs
  • CISI 1460 docs
  • Reporting only the best results with K varying
    from 32, 48, 64, 80, 128
  • PLSI model takes the average across all models
    at different K values

15
Evaluation Information Retrieval
  • Cosine Similarity is the baseline
  • In LSI, query vector(q) is multiplied to get the
    reduced space vector
  • In PLSI, p(zd) and p(zq). In EM iterations,
    only P(zq) is adapted

16
Evaluation Precision-Recall results
17
Comparing PLSA and LSA
  • LSA and PLSA perform dimensionality reduction
  • In LSA, by keeping only K singular values
  • In PLSA, by having K aspects
  • Comparison to SVD
  • U Matrix related to P(dz) (doc to aspect)
  • V Matrix related to P(zw) (aspect to term)
  • E Matrix related to P(z) (aspect strength)
  • The main difference is the way the approximation
    is done
  • PLSA generates a model (aspect model) and
    maximizes its predictive power
  • Selecting the proper value of K is heuristic in
    LSA
  • Model selection in statistics can determine
    optimal K in PLSA

18
Conclusion
  • PLSI consistently outperforms LSI in the
    experiments
  • Precision gain is 100 compared to baseline
    method in some cases
  • PLSA has statistical theory to support it, and
    thus better than LSA.
  • .
  • Enter Probabilistic Topic Models, Mark Steyvers,
    Tom Griffiths (In T. Landauer, D. McNamara, S.
    Dennis, and W. Kintsch (eds), Latent Semantic
    Analysis A Road to Meaning. Laurence Erlbaum.
Write a Comment
User Comments (0)
About PowerShow.com