Bayesian Learning for Latent Semantic Analysis - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Bayesian Learning for Latent Semantic Analysis

Description:

... for Latent Semantic Analysis. Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu ... Chia-Sheng Wu, 'Bayesian Latent Semantic Analysis for Text Categorization and ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 49
Provided by: XUAN87
Category:

less

Transcript and Presenter's Notes

Title: Bayesian Learning for Latent Semantic Analysis


1
Bayesian Learning for Latent Semantic Analysis
  • Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu

Presenter Hsuan-Sheng Chiu
2
Reference
  • Chia-Sheng Wu, Bayesian Latent Semantic Analysis
    for Text Categorization and Information
    Retrieval, 2005
  • Q. Huo and C.-H. Lee, On-line adaptive learning
    of the continuous density hidden Markov model
    based on approximate recursive Bayes estimate,
    1997

3
Outline
  • Introduction
  • PLSA
  • ML (Maximum Likelihood)
  • MAP (Maximum A Posterior)
  • QB (Quasi-Bayes)
  • Experiments
  • Conclusions

4
Introduction
  • LSA vs. PLSA
  • Linear algebra and probability
  • Semantic space and latent topics
  • Batch learning vs. Incremental learning

5
PLSA
  • PLSA is a general machine learning technique,
    which adopts the aspect model to represent the
    co-occurrence data.
  • Topics (hidden variables)
  • Corpus (document-word pairs)

6
PLSA
  • Assume that di and wj are independent
    conditionally on the mixture of associated topic
    zk
  • Joint probability

7
ML PLSA
  • Log likelihood of Y
  • ML estimation

8
ML PLSA
  • Maximization

9
ML PLSA
  • Complete data
  • Incomplete data
  • EM (Expectation-Maximization) Algorithm
  • E-step
  • M-step

10
ML PLSA
  • E-Step

11
ML PLSA
  • Auxiliary function
  • And

12
ML PLSA
  • M-step
  • Lagrange multiplier

13
ML PLSA
  • Differentiation
  • New parameter estimation

14
MAP PLSA
  • Estimation by Maximizing the posteriori
    probability
  • Definition of prior distribution
  • Dirichlet density
  • Prior density

Kronecker delta
Assume and are independent
15
MAP PLSA
  • Consider prior density
  • Maximum a Posteriori

16
MAP PLSA
  • E-step
  • expectation
  • Auxiliary function

17
MAP PLSA
  • M-step
  • Lagrange multiplier

18
MAP PLSA
  • Auxiliary function

19
MAP PLSA
  • Differentiation
  • New parameter estimation

20
QB PLSA
  • It needs to update continuously for an online
    information system.
  • Estimation by maximize the posteriori
    probability
  • Posterior density is approximated by the closest
    tractable prior density with hyperparameters
  • As compared to MAP PLSA, the key difference using
    QB PLSA is due to the updating of
    hyperparameters.

21
QB PLSA
  • Conjugate prior
  • In Bayesian probability theory, a conjugate prior
    is a prior distribution which has the property
    that the posterior distribution is the same type
    of distribution.
  • A close-form solution
  • A reproducible prior/posteriori pair for
    incremental learning

22
QB PLSA
  • Hyperparameter a

23
QB PLSA
  • After careful arrangement, exponential of
    posteriori expectation function can be expressed
  • A reproducible prior/posterior pair is generated
    to build the updating mechanism of hyperparameters

24
Initial Hyperparameters
  • A open issue in Bayesian learning
  • If the initial prior knowledge is too strong or
    after a lot of adaptation data have been
    incrementally processed, the new adaptation data
    usually have only a small impact on parameters
    updating in incremental training.

25
Experiments
  • MED Corpus
  • 1033 medical abstracts with 30 queries
  • 7014 unique terms
  • 433 abstracts for ML training
  • 600 abstracts for MAP or QB training
  • Query subset for testing
  • K8
  • Reuters-21578
  • 4270 documents for training
  • 2925 for QB learning
  • 2790 documents for testing
  • 13353 unique words
  • 10 categories

26
Experiments
27
Experiments
28
Experiments
29
Conclusions
  • This paper presented an adaptive text modeling
    and classification approach for PLSA based
    information system.
  • Future work
  • Extension of PLSA for bigram or trigram will be
    explored.
  • Application for spoken document classification
    and retrieval

30
Discriminative Maximum Entropy Language Model for
Speech Recognition
  • Chuang-Hua Chueh, To-Chang Chien and Jen-Tzung
    Chien

Presenter Hsuan-Sheng Chiu
31
Reference
  • R. Rosenfeld, S. F. Chen and X. Zhu,
    Whole-sentence exponential language models a
    vehicle for linguistic statistical integration,
    2001
  • W.H. Tsai, An Initial Study on Language Model
    Estimation and Adaptation Techniques for Mandarin
    Large Vocabulary Continuous Speech Recognition,
    2005

32
Outline
  • Introduction
  • Whole-sentence exponential model
  • Discriminative ME language model
  • Experiment
  • Conclusions

33
Introduction
  • Language model
  • Statistical n-gram model
  • Latent semantic language model
  • Structured language model
  • Based on maximum entropy principle, we can
    integrate different features to establish optimal
    probability distribution.

34
Whole-Sentence Exponential Model
  • Traditional method
  • Exponential form
  • Usage
  • When used for speech recognition, the model is
    not suitable for the first pass of the
    recognizer, and should be used to re-score N-best
    lists.

35
Whole-Sentence ME Language Model
  • Expectation of feature function
  • Empirical
  • Actual
  • Constraint

36
Whole-Sentence ME Language Model
  • To Solve the constrained optimization problem

37
GIS algorithm
38
Discriminative ME Language Model
  • In general, ME can be considered as a maximum
    likelihood model using log-linear distribution.
  • Propose a Discriminative language model based on
    whole-sentence ME model (DME)

39
Discriminative ME Language Model
  • Acoustic features for ME estimation
  • Sentence-level log-likelihood ratio of competing
    and target sentences
  • Feature weight parameter
  • Namely, we activate feature parameter to be one
    for those speech signals observed in training
    database

40
Discriminative ME Language Model
  • New estimation
  • Upgrade to discriminative linguistic parameters

41
Discriminative ME Language Model
42
Experiment
  • Corpus TCC300
  • 32 mixtures
  • 12 Mel-frequency cepstral coefficients
  • 1 log-energy and first derivation
  • 4200 sentences for training, 450 for testing
  • Corpus Academia Sinica CKIP balanced corpus
  • Five million words
  • Vocabulary 32909 words

43
Experiment
44
Conclusions
  • A new ME language model integrating linguistic
    and acoustic features for speech recognition
  • The derived ME language model was inherent with
    discriminative power.
  • DME model involved a constrained optimization
    procedure and was powerful for knowledge
    integration.

45
Relation between DME and MMI
  • MMI criterion
  • Modified MMI criterion
  • Express ME model as ML model

46
Relation between DME and MMI
  • The optimal parameter

47
Relation between DME and MMI

48
Relation between DME and MMI
Write a Comment
User Comments (0)
About PowerShow.com