Hierarchical Mixture Models - PowerPoint PPT Presentation

1 / 24

About This Presentation

Title:

Hierarchical Mixture Models

Description:

Our results. 1. A two stage generative model: ... 4. Experimental results. Base topic. Science. Sports. baseball. hockey. physics. math. Tour de france ... – PowerPoint PPT presentation

Number of Views:130

Avg rating:3.0/5.0

Slides: 25

Provided by: Goog211

Category:

more less

Transcript and Presenter's Notes

Title: Hierarchical Mixture Models

1
Hierarchical Mixture Models

A Probabilistic Analysis
Mark Sandler
Google Inc.

2
Mixture models quick overview

Classical problem
Many documents on various topics, how do we
automatically classify them?
Mixture models allows to formalize the problem
Each topic defines a probability distribution
over entire vocabulary
Math (0.1, 0.00, 0. 03, ) Physics (0.01,
0.3, 0.01, )
Each document has quantitative relevance to one
or more topics
Document is created by repeated sample from the
mixture of topics
Goal given documents, reconstruct the underlying
topics and each documents relevance to each
topic

Term 1
Term 2
Term 3
Term 2
Term 1
3
Topical Hierarchy

Motivation there are lots of topics in real
data!
Topics are not independent
Document about math, usually is related to
science as well.
Hierarchy allows to encode these dependencies
Hierarchy allows to encode dependencies and do
lazy evaluation
A report on Tour De France, can be initially
classified into sports, without worrying about
where it falls within sports
Where do we get hierarchy from?

4
Our results

1. A two stage generative model
Topical hierarchy is constructed using
adversarial game
We treat the hierarchy is a giant mixture model
and documents are created accordingly.
2. Given a part of the hierarchy we prove that
we maintain classification accuracy for documents
in entire hierarchy
3. Design an algorithm which learns hierarchy
from unlabeled data
4. Experimental results

Tour de france
Cycling (un)
5
Generative model for the topical hierarchy

Each topic is probability distribution over terms
as before
There is a base topic which includes all the
documents
Each new topic is generated from the parent by
adversarial mutation of some (possibly all)
frequencies.

Base topic
Science
Sports
baseball
hockey
physics
math
6
Generative model for the hierarchy

Mulstistep adversary-driven random process
Adversary first chooses the base topical
distribution( is a frequency of term i
in the language)
For parent topic adversary
Decides on the number of children
For each child chooses a vector
probability distribution
Frequency of term l in the child topic is
determined by
Where is sampled from
Distributions can depend on constructed
part of the hierarchy

D1(i), , Dl(i),,
7
Distributions satisfy a few conditions

We have
Each frequency change has zero expectation
The new topic is different from the parent
No negative frequencies allowed
The spread (slope) of a distribution is large
(change in frequencies is not concentrated on
just a few terms)

8
Related work reconstructing hierarchies

Using chinese restaurant process followed by LDA
Blei et al, NIPS 2004
Get it from labeled data
Toutanova et al , CIKM 2001
Cluster-Abstraction model (EM based local search)
Hofmann, IJCAI 1999
Bottom-up approach Hierarchical agglomerative
clustering (lots of work)

9
Our results

1. A two stage generative model
Topical hierarchy is constructed using
adversarial game
We treat the hierarchy is a giant mixture model
and documents are created accordingly
2. Given a part of the hierarchy we prove that
we maintain classification accuracy for documents
in entire hierarchy
3. Design an algorithm which learns hierarchy
from unlabeled data
4. Experimental results

10
Classification along the path in the tree
Base topic

Suppose we know
Base -gt science -gt physics
Consider a document on physics of hockey puck.
If a document is relevant to a topic, not in the
known part of the hierarchy then it contributes
to the closest node in the path.

Science
Sports
baseball
hockey
physics
math
11
Algorithm when a path in the hierarchy is known

There is an hierarchy, and we know path in it
Treat the path as an instance of mixture models
We can treat this as a single classification
problem and solve it
How to classify? Why does it work?
Problem documents are generated from
distributions which are not part of a known
mixture

12
Why does it work? Part 1 back to plain mixtures

Suppose we know a matrix of topics
Each document is a sample from a mixture of
topics d Wp
Need to compute the underlying mixing
coefficients.
Classical approach use Naïve Bayes gives
unclear guarantees
Pseudoinverses guarantee that we find underlying
mixing coefficients with small error and high
probability

13
Generalized pseudoinverse

Generalized pseudoinverses (Kleinberg, S, STOC04)
Let V is such that
Then
Error is bounded with high probability
The length of a document is a function of B
Take home message.
Exists a matrix V, such that
If the topics are linearly independent then we
can guarantee accuracy of classification
See the above papers for more details

14
Part II still, why does it work?

The mixture model is obtained by using the
topics along the path.
The documents are generated by topics in the
different (unknown) parts of hierarchy
Suppose document d is produced using topic
(underlying distribution is )
If is a parent of then
because expectation of e(i) is 0

15
Reconstructing hierarchy from unlabeled data

Construct base topic - an ambient distribution
across all documents
Gives a root of the hierarchy tree
For each child topic T
Build a co-occurrence matrix from the documents
which belong to the topic
Choose column which is furthest away from T (in
L1 norm)
Classify documents that belong to the new topic
Iterate on topic T, until no documents are
split out of the parent
Iterate the procedure on each child topic

16
Overview of the rest of the talk

1. A two stage model
Topic hierarchy is constructed using adversarial
game
Topics form a mixture model and documents are
created using the model
In this model we show that if we know a part of
the hierarchy we can guarantee classification
accuracy along this path
Design an algorithm which learns hierarchy from
unlabeled data
Experimental results

17
Experiments abstracts from ArXiV

250K abstracts on different areas of physics with
some computer science and math
15 categories total
Run our hierarchy reconstruction algorith to
produce individual clusters.
76 clusters, overall recall / precision 70/70

18
Experiments ArXiV
19
Newsgroup 20

Contains 20 newsgroup on several related topics
(computers, electronics, politics, religion,
etc..)
Relatively small dataset (20K documents)
We use our algorithm to build the top level
clusters
The clusters coincide with natural split of the
topic

20
Experiments Newsgroups
21
Conclusions

Theoretical framework to analyze topical
hierarchies
A natural generative model to construct
hieararchy of topics
Provided algorithms can operate without
reconstructing the entire hierarchy
Provides an algorithm to reconstruct topics
Questions?
(Ask now, or come see poster 13)

22
Pseudoinverse, independence coefficient and such

From Kleinberg and S. (STOC04), and S. (KDD05)
Simple observation
but d is sparse whereas d is not
Can be shown that for any k x N matrix V, if
its maximal element is bounded
The error can be bounded in terms of k, maximal
element of V, and the length of a document
(number of non zero entries in d )
But independent of the total size of the
dictionary!

23
Thanks!
24
An example