Generative Models - PowerPoint PPT Presentation

About This Presentation
Title:

Generative Models

Description:

Learning a Statistical Model. Prediction. p(y|x; ) Male: Gaussian distribution N ... Pr(male|1.67m) Pr(female|1.67m) Probabilistic Models for. Classification Problems ... – PowerPoint PPT presentation

Number of Views:301
Avg rating:3.0/5.0
Slides: 77
Provided by: rong7
Learn more at: http://www.cse.msu.edu
Category:

less

Transcript and Presenter's Notes

Title: Generative Models


1
Generative Models
  • Rong Jin

2
Statistical Inference
Female Gaussian distribution N(?1,?1)
Pr(male1.67m) Pr(female1.67m)
Male Gaussian distribution N(?2,?2)
3
Statistical Inference
Male Gaussian distribution N(?1,?1)
Pr(male1.67m) Pr(female1.67m)
Female Gaussian distribution N(?2,?2)
4
Probabilistic Models for Classification Problems
  • Apply statistical inference methods
  • Given training example
  • Assume a parametric model
  • Learn the model parameters ? from training
    example using maximum likelihood approach
  • The class of a new instance is predicted by

5
Probabilistic Models for Classification Problems
  • Apply statistical inference methods
  • Given training example
  • Assume a parametric model
  • Learn the model parameters ? from training
    example using maximum likelihood approach
  • The class of a new instance is predicted by

6
Probabilistic Models for Classification Problems
  • Apply statistical inference methods
  • Given training example
  • Assume a parametric model
  • Learn the model parameters ? from training
    example using maximum likelihood approach
  • The class of a new instance is predicted by

7
Probabilistic Models for Classification Problems
  • Apply statistical inference methods
  • Given training example
  • Assume a parametric model
  • Learn the model parameters ? from training
    example using the maximum likelihood approach
  • The class of a new instance is predicted by

8
Maximum Likelihood Estimation (MLE)
  • Given training example
  • Compute log-likelihood of data
  • Find the parameters ? that maximizes the
    log-likelihood
  • In many case, the expression for log-likelihood
    is not closed form and therefore MLE requires
    numerical calculation

9
Maximum Likelihood Estimation (MLE)
  • Given training example
  • Compute log-likelihood of data
  • Find the parameters ? that maximizes the
    log-likelihood
  • In many case, the expression for log-likelihood
    is not closed form and therefore MLE requires
    numerical calculation

10
Probabilistic Models for Classification Problems
  • Apply statistical inference methods
  • Given training example
  • Assume a parametric model
  • Learn the model parameters ? from training
    example using the maximum likelihood approach
  • The class of a new instance is predicted by

11
Generative Models
  • Most probabilistic distributions are joint
    distribution (i.e., p(x?)), not conditional
    distribution (i.e., p(yx?))
  • Using Bayes rule
  • p(xly?) ? p(yx?) p(y?)

12
Generative Models
  • Most probabilistic distributions are joint
    distribution (i.e., p(x?)), not conditional
    distribution (i.e., p(yx?))
  • Using Bayes rule
  • p(yx?) ? p(xy?) p(y?)

13
Generative Models (contd)
  • Treatment of p(xy?)
  • Let y?Y1, 2, , c
  • Allocate a separate set of parameters for each
    class
  • ? ? ?1, ?2,, ?c
  • p(xly?) ? p(x?y)
  • Data in different class have different input
    patterns

14
Generative Models (contd)
  • Parameter space
  • Parameters for distribution ?1, ?2,, ?c
  • Class priors p(y1), p(y2), , p(yc)
  • Learn parameters from training examples using
    MLE
  • Compute log-likelihood
  • Search for the optimal parameters by maximizing
    the log-likelihood

15
Generative Models (contd)
  • Parameter space
  • Parameters for distribution ?1, ?2,, ?c
  • Class priors p(y1), p(y2), , p(yc)
  • Learn parameters from training examples using
    MLE
  • Compute log-likelihood
  • Search for the optimal parameters by maximizing
    the log-likelihood

16
Generative Models (contd)
  • Parameter space
  • Parameters for distribution ?1, ?2,, ?c
  • Class priors p(y1), p(y2), , p(yc)
  • Learn parameters from training examples using
    MLE
  • Compute log-likelihood
  • Search for the optimal parameters by maximizing
    the log-likelihood

17
Generative Models (contd)
  • Parameter space
  • Parameters for distribution ?1, ?2,, ?c
  • Class priors p(y1), p(y2), , p(yc)
  • Learn parameters from training examples using
    MLE
  • Compute log-likelihood
  • Search for the optimal parameters by maximizing
    the log-likelihood

18
Example
  • Task predict gender of individuals based on
    their heights
  • Given
  • 100 height examples of women
  • 100 height examples of man
  • Assume height of women and man follow different
    Gaussian distributions

19
Example (contd)
  • Gaussian distribution
  • Parameter space
  • Gaussian distribution for man (?m ?m)
  • Gaussian distribution for man (?w ?w)
  • Class priors pm p(yman), pw p(ywomen)

20
Example (contd)
  • Gaussian distribution
  • Parameter space
  • Gaussian distribution for male (?m, ?m)
  • Gaussian distribution for female (?f , ?f)
  • Class priors pm p(ymale), pf p(yfemale)

21
Example (contd)
22
Example (contd)
23
Example (contd)
24
Example (contd)
  • Learn a Gaussian generative model

25
Example (contd)
  • Learn a Gaussian generative model

26
Example (contd)
27
Example (contd)
  • Predict the gender of an individual given his/her
    height

28
Decision boundary
  • Decision boundary h
  • Predict female when h
  • Predict male when hh
  • Random when hh
  • Where is the decision boundary?
  • It depends on the ratio pm/pf

29
Example
  • Decision boundary h
  • Predict female when h
  • Predict male when hh
  • Random when hh
  • Where is the decision boundary?
  • It depends on the ratio pm/pf

30
Example
  • Decision boundary h
  • Predict female when h
  • Predict male when hh
  • Random when hh
  • Where is the decision boundary?
  • It depends on the ratio pm/pf

31
Gaussian Generative Model (II)
  • Inputs contain multiple features
  • Example
  • Task predict if an individual is overweight
    based on his/her salary and the number of hours
    on watching TV
  • Input (s salary, h hours for watching TV)
  • Output 1 (overweight), -1 (normal)

32
Multi-variate Gaussian Distribution
33
Multi-variate Gaussian Distribution
34
Multi-variate Gaussian Distribution
35
Properties of Covariance Matrix
  • What if the number of data points N
  • How about for any vector ?
  • Positive semi-definitive matrix

36
Properties of Covariance Matrix
  • What if the number of data points N
  • How about for any ?
  • Positive semi-definitive matrix

37
Properties of Covariance Matrix
  • What if the number of data points N
  • How about for any ?
  • Positive semi-definitive matrix
  • Number of different elements in ??

38
Gaussian Generative Model (II)
  • Joint distribution p(s,h) for salary (s) and
    hours for watching TV (h)

39
Gaussian Generative Model (II)
  • Joint distribution p(s,h) for salary (s) and
    hours for watching TV (h)

40
Multi-variate Gaussian Generative Model
  • Input with multiple input features
  • A multi-variate Gaussian distribution for each
    class

41
Improve Multivariate Gaussian Model
  • How could we improve the prediction of model for
    overweight?
  • Multiple modes for each class
  • Introduce more attributes of individuals
  • Location
  • Occupation
  • The number of children
  • House
  • Age

42
Problems with Using Multi-variate Gaussian
Generative Model
  • ? is a matrix of size dxd, contains d(d1)/2
    independent variables
  • d100 the number of variables in ? is 5,050
  • d1000 the number of variables in ? is 505,000
  • ? A large parameter space
  • ? can be singular
  • If N
  • If two features are linear correlated ? ?-1 does
    not exist

43
Problems with Using Multi-variate Gaussian
Generative Model
  • Diagonalize ?

44
Problems with Using Multi-variate Gaussian
Generative Model
  • Diagonalize ?
  • Feature independence assumption (Naïve Bayes
    assumption)

45
Problems with Using Multi-variate Gaussian
Generative Model
  • Diagonalize ?
  • Smooth the covariance matrix

46
Overfitting Issue
  • Complex model vs. insufficient training
  • Example
  • Consider a classification problem of multiple
    inputs
  • 100 input features
  • 5 classes
  • 1000 training examples
  • Total number parameters for a full Gaussian model
    is
  • 5 class prior ? 5 parameters
  • 5 means ? 500 parameters
  • 5 covariance matrices ? 50,500 parameters
  • 51,005 parameters ? insufficient training data

47
Model Complexity Vs. Data
48
Model Complexity Vs. Data
49
Model Complexity Vs. Data
50
Model Complexity Vs. Data
51
Problems with Using Multi-variate Gaussian
Generative Model
  • Diagonalize ?
  • Feature independence assumption

52
Naïve Bayes Model
  • In general, for any generative model, we have to
    estimate
  • For x in high dimension space, this probability
    is hard to estimate
  • In Naïve Bayes Model, we approximate

53
Naïve Bayes Model
  • In general, for any generative model, we have to
    estimate
  • For x in high dimension space, this probability
    is hard to estimate
  • In Naïve Bayes Model, we approximate

54
Naïve Bayes Model
  • In general, for any generative model, we have to
    estimate
  • For x in high dimension space, this probability
    is hard to estimate
  • In Naïve Bayes Model, we approximate

55
Text Categorization
  • Learn to classify text into predefined
    categories
  • Input x a document
  • Represented by a vector of words
  • Example (president, 10), (bush, 2), (election,
    5),
  • Output y if the document is politics or not
  • 1 for political document, -1 for not political
    document

56
Text Categorization
  • A generative model for text classification (TC)
  • Parameter space
  • p() and p(-)
  • p(doc?), p(doc-?)
  • It is difficult to estimate both p(doc?),
    p(doc-?)
  • Typical vocabulary size 100,000
  • Each document is a vector of 100,000 attributes
    !
  • Too many words in a document
  • A Naïve Bayes approach

57
Text Classification
  • A generative model for text classification (TC)
  • Parameter space
  • p() and p(-)
  • p(doc?), p(doc-?)
  • It is difficult to estimate both p(doc?),
    p(doc-?)
  • Typical vocabulary size 100,000
  • Each document is a vector of 100,000 attributes
    !
  • Too many words in a document
  • A Naïve Bayes approach

58
Text Classification
  • A generative model for text classification (TC)
  • Parameter space
  • p() and p(-)
  • p(doc?), p(doc-?)
  • It is difficult to estimate both p(doc?),
    p(doc-?)
  • Typical vocabulary size 100,000
  • Each document is a vector of 100,000 attributes
    !
  • Too many words in a document
  • A Naïve Bayes approach

59
Text Classification
  • A Naïve Bayes approach
  • For a document

60
Text Classification
  • The original parameter space
  • p() and p(-)
  • p(doc?), p(doc-?)
  • Parameter space after Naïve Bayes simplification
  • p() and p(-)
  • p(w1), p(w2),, p(wn)
  • p(w1-), p(w2-),, p(wn-)

61
Text Classification
  • Learning parameters from training examples
  • Each document
  • Learn parameters using maximum likelihood
    estimation

62
Text Classification
63
Text Classification
64
Text Classification
65
Text Classification
  • The optimal solution that maximizes the
    likelihood of training data

66
Text Classification
Twenty Newsgroups
An Example
67
Text Classification
  • Any problems with the Naïve Bayes text
    classifier?
  • Unseen words
  • Word w is unseen from the training documents,
    what is the consequence?
  • Word w is only unseen for documents of one
    class, what is the consequence?
  • Related to the overfitting problem
  • Any suggestion?
  • Solution word class approach
  • Introducing word class T t1, t2, , tm
  • Compute p(ti), p(ti-)
  • When w is unseen before, replace p(w?) with
    p(ti?)
  • Introducing prior for word probabilities

68
Naïve Bayes Model
  • This is a terrible approximation

69
Naïve Bayes Model
  • Why use Naïve Bayes Model ?
  • We are essentially interested in p(yx?), not
    p(xy?)

70
Naïve Bayes Model
  • Why use Naïve Bayes Model ?
  • We are essentially interested in p(yx?), not
    p(xy?)

71
Naïve Bayes Model
  • Why use Naïve Bayes Model ?
  • We are essentially interested in p(yx?), not
    p(xy?)

72
Naïve Bayes Model
  • The key for the prediction model is not p(xy?),
    but the ratio p(xy?)/p(xy?)
  • Although Naïve Bayes model does a poor job for
    estimating p(xy?), it does a reasonable good on
    estimating the ratio.

73
The Ratio of Likelihood for Binary Classes
  • Assume that both classes share the same variance


74
The Ratio of Likelihood for Binary Classes
  • Assume that both classes share the same variance


75
The Ratio of Likelihood for Binary Classes
  • Assume that both classes share the same variance


Gaussian generative model is a linear model
76
Linear Decision Boundary
  • Gaussian Generative Models Finding a linear
    decision boundary
  • Why not directly estimate the decision boundary?
Write a Comment
User Comments (0)
About PowerShow.com