Title: Distributional Clustering of Words for Text Classification
1Distributional Clustering of Words for Text
Classification
- L. Douglas Baker
- Andrew Kachites McCallum
- SIGIR98
2Distributional Clustering
- Word similarity based on class label distribution
- puck and goalie
- team
3Distributional Clustering
- Clustering words based on class distribution -
(supervised) - Similarity between wt ws?similarity between
P(Cwt) P(Cws) - Information theoretic measure to calculate
similarity between distributions - Kullback-Leibler divergence to the mean
4Distributional Clustering
Class 8 Autos and Class 9 Motorcycles
5Distributional Clustering
6Kullback-Leibler Divergence
Here,
D is asymmetric and D?infinity when P(y)0 and
P(x)?0
Also, D 0
7Kullback-Leibler Divergence
Where,
Jensen-Shannon Divergence is a special case of
symmetrised KL-Divergence. P(wt)P(ws)0.5
8Clustering Algorithm
Characteristics -Greedy Aggressive -Local
Optimal -Hard Clustering -Agglomerative
9Experiments
- Dataset
- 20 Newsgroups
- Reuters-21578
- Yahoo Science Hierarchy
- Compared with
- Supervised Latent Semantic indexing
- Class-based clustering
- Feature selection by mutual information with the
class variable - Feature selection by Markov-blanket method
- Classifier NBC
10Results
11Conclusion
- Useful semantic word clusterings
- Higher classification accuracy
- Smaller classification models
- Word clustering vs. feature selection ??
- What if the data is
- Noisy??
- Sparse??