Hierarchical Mixture Models - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Hierarchical Mixture Models

Description:

Our results. 1. A two stage generative model: ... 4. Experimental results. Base topic. Science. Sports. baseball. hockey. physics. math. Tour de france ... – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 25
Provided by: Goog211
Category:

less

Transcript and Presenter's Notes

Title: Hierarchical Mixture Models


1
Hierarchical Mixture Models
  • A Probabilistic Analysis
  • Mark Sandler
  • Google Inc.

2
Mixture models quick overview
  • Classical problem
  • Many documents on various topics, how do we
    automatically classify them?
  • Mixture models allows to formalize the problem
  • Each topic defines a probability distribution
    over entire vocabulary
  • Math (0.1, 0.00, 0. 03, ) Physics (0.01,
    0.3, 0.01, )
  • Each document has quantitative relevance to one
    or more topics
  • Document is created by repeated sample from the
    mixture of topics
  • Goal given documents, reconstruct the underlying
    topics and each documents relevance to each
    topic

Term 1
Term 2
Term 3
Term 2
Term 1
3
Topical Hierarchy
  • Motivation there are lots of topics in real
    data!
  • Topics are not independent
  • Document about math, usually is related to
    science as well.
  • Hierarchy allows to encode these dependencies
  • Hierarchy allows to encode dependencies and do
    lazy evaluation
  • A report on Tour De France, can be initially
    classified into sports, without worrying about
    where it falls within sports
  • Where do we get hierarchy from?

4
Our results
  • 1. A two stage generative model
  • Topical hierarchy is constructed using
    adversarial game
  • We treat the hierarchy is a giant mixture model
    and documents are created accordingly.
  • 2. Given a part of the hierarchy we prove that
    we maintain classification accuracy for documents
    in entire hierarchy
  • 3. Design an algorithm which learns hierarchy
    from unlabeled data
  • 4. Experimental results

Tour de france
Cycling (un)
5
Generative model for the topical hierarchy
  • Each topic is probability distribution over terms
    as before
  • There is a base topic which includes all the
    documents
  • Each new topic is generated from the parent by
    adversarial mutation of some (possibly all)
    frequencies.

Base topic
Science
Sports
baseball
hockey
physics
math
6
Generative model for the hierarchy
  • Mulstistep adversary-driven random process
  • Adversary first chooses the base topical
    distribution( is a frequency of term i
    in the language)
  • For parent topic adversary
  • Decides on the number of children
  • For each child chooses a vector
    probability distribution
  • Frequency of term l in the child topic is
    determined by
  • Where is sampled from
  • Distributions can depend on constructed
    part of the hierarchy

D1(i), , Dl(i),,
7
Distributions satisfy a few conditions
  • We have
  • Each frequency change has zero expectation
  • The new topic is different from the parent
  • No negative frequencies allowed
  • The spread (slope) of a distribution is large
  • (change in frequencies is not concentrated on
    just a few terms)

8
Related work reconstructing hierarchies
  • Using chinese restaurant process followed by LDA
  • Blei et al, NIPS 2004
  • Get it from labeled data
  • Toutanova et al , CIKM 2001
  • Cluster-Abstraction model (EM based local search)
  • Hofmann, IJCAI 1999
  • Bottom-up approach Hierarchical agglomerative
    clustering (lots of work)

9
Our results
  • 1. A two stage generative model
  • Topical hierarchy is constructed using
    adversarial game
  • We treat the hierarchy is a giant mixture model
    and documents are created accordingly
  • 2. Given a part of the hierarchy we prove that
    we maintain classification accuracy for documents
    in entire hierarchy
  • 3. Design an algorithm which learns hierarchy
    from unlabeled data
  • 4. Experimental results

10
Classification along the path in the tree
Base topic
  • Suppose we know
  • Base -gt science -gt physics
  • Consider a document on physics of hockey puck.
  • If a document is relevant to a topic, not in the
    known part of the hierarchy then it contributes
    to the closest node in the path.

Science
Sports
baseball
hockey
physics
math
11
Algorithm when a path in the hierarchy is known
  • There is an hierarchy, and we know path in it
  • Treat the path as an instance of mixture models
  • We can treat this as a single classification
  • problem and solve it
  • How to classify? Why does it work?
  • Problem documents are generated from
  • distributions which are not part of a known
    mixture

12
Why does it work? Part 1 back to plain mixtures
  • Suppose we know a matrix of topics
  • Each document is a sample from a mixture of
    topics d Wp
  • Need to compute the underlying mixing
    coefficients.
  • Classical approach use Naïve Bayes gives
    unclear guarantees
  • Pseudoinverses guarantee that we find underlying
    mixing coefficients with small error and high
    probability

13
Generalized pseudoinverse
  • Generalized pseudoinverses (Kleinberg, S, STOC04)
  • Let V is such that
  • Then
  • Error is bounded with high probability
  • The length of a document is a function of B
  • Take home message.
  • Exists a matrix V, such that
  • If the topics are linearly independent then we
    can guarantee accuracy of classification
  • See the above papers for more details

14
Part II still, why does it work?
  • The mixture model is obtained by using the
    topics along the path.
  • The documents are generated by topics in the
    different (unknown) parts of hierarchy
  • Suppose document d is produced using topic
  • (underlying distribution is )
  • If is a parent of then
  • because expectation of e(i) is 0

15
Reconstructing hierarchy from unlabeled data
  • Construct base topic - an ambient distribution
    across all documents
  • Gives a root of the hierarchy tree
  • For each child topic T
  • Build a co-occurrence matrix from the documents
    which belong to the topic
  • Choose column which is furthest away from T (in
    L1 norm)
  • Classify documents that belong to the new topic
  • Iterate on topic T, until no documents are
    split out of the parent
  • Iterate the procedure on each child topic

16
Overview of the rest of the talk
  • 1. A two stage model
  • Topic hierarchy is constructed using adversarial
    game
  • Topics form a mixture model and documents are
    created using the model
  • In this model we show that if we know a part of
    the hierarchy we can guarantee classification
    accuracy along this path
  • Design an algorithm which learns hierarchy from
    unlabeled data
  • Experimental results

17
Experiments abstracts from ArXiV
  • 250K abstracts on different areas of physics with
    some computer science and math
  • 15 categories total
  • Run our hierarchy reconstruction algorith to
    produce individual clusters.
  • 76 clusters, overall recall / precision 70/70

18
Experiments ArXiV
19
Newsgroup 20
  • Contains 20 newsgroup on several related topics
  • (computers, electronics, politics, religion,
    etc..)
  • Relatively small dataset (20K documents)
  • We use our algorithm to build the top level
    clusters
  • The clusters coincide with natural split of the
    topic

20
Experiments Newsgroups
21
Conclusions
  • Theoretical framework to analyze topical
    hierarchies
  • A natural generative model to construct
    hieararchy of topics
  • Provided algorithms can operate without
    reconstructing the entire hierarchy
  • Provides an algorithm to reconstruct topics
  • Questions?
  • (Ask now, or come see poster 13)

22
Pseudoinverse, independence coefficient and such
  • From Kleinberg and S. (STOC04), and S. (KDD05)
  • Simple observation
  • but d is sparse whereas d is not
  • Can be shown that for any k x N matrix V, if
    its maximal element is bounded
  • The error can be bounded in terms of k, maximal
    element of V, and the length of a document
    (number of non zero entries in d )
  • But independent of the total size of the
    dictionary!

23
Thanks!
24
An example
  • Dictionary algebra, equation, generator,
    charge, electron
  • Topics MathT 0.4, 0.5, 0.098, 0.01, 0.01
  • PhysicsT 0.01, 0.4, 0.2, 0.2, 0.2
  • Typical Math document relevance vector (1,0)
  • Typical physics document relevance vector
    (0, 1)
  • Document related to both Math and Physics (0.5,
    0.5)

Algebra
Generator
Equation
Equation
Algebra
Equation
Equation
Charge
Generator
Electron
Equation
Charge
Generator
Equation
Algebra
Write a Comment
User Comments (0)
About PowerShow.com