Distributional Clustering of Words for Text Classification - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Distributional Clustering of Words for Text Classification

Description:

Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR 98 Distributional Clustering Word similarity based on ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 12

Provided by: nitina6

Learn more at: https://www.public.asu.edu

Category:

Tags: autos | classification | clustering | distributional | text | words

Transcript and Presenter's Notes

Title: Distributional Clustering of Words for Text Classification

1
Distributional Clustering of Words for Text
Classification

L. Douglas Baker
Andrew Kachites McCallum
SIGIR98

2
Distributional Clustering

Word similarity based on class label distribution
puck and goalie
team

3
Distributional Clustering

Clustering words based on class distribution -
(supervised)
Similarity between wt ws?similarity between
P(Cwt) P(Cws)
Information theoretic measure to calculate
similarity between distributions
Kullback-Leibler divergence to the mean

4
Distributional Clustering
Class 8 Autos and Class 9 Motorcycles
5
Distributional Clustering
6
Kullback-Leibler Divergence
Here,
D is asymmetric and D?infinity when P(y)0 and
P(x)?0
Also, D 0
7
Kullback-Leibler Divergence
Where,
Jensen-Shannon Divergence is a special case of
symmetrised KL-Divergence. P(wt)P(ws)0.5
8
Clustering Algorithm
Characteristics -Greedy Aggressive -Local
Optimal -Hard Clustering -Agglomerative
9
Experiments

Dataset
20 Newsgroups
Reuters-21578
Yahoo Science Hierarchy
Compared with
Supervised Latent Semantic indexing
Class-based clustering
Feature selection by mutual information with the
class variable
Feature selection by Markov-blanket method
Classifier NBC

10
Results
11
Conclusion

Useful semantic word clusterings
Higher classification accuracy
Smaller classification models
Word clustering vs. feature selection ??
What if the data is
Noisy??
Sparse??

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Text Classification from Labeled and Unlabeled Documents using EM PowerPoint PPT Presentation

Text Classification from Labeled and Unlabeled Documents using EM - Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew Kachites Mccallum Sebastian Thrun Tom Mitchell Presented by Yuan Fang, Fengyuan ... | PowerPoint PPT presentation | free to view

Contents of this Chapter PowerPoint PPT Presentation

Contents of this Chapter - Mining Text and Web Data Contents of this Chapter Introduction Data Preprocessing Text and Web Clustering Text and Web Classification [Han & Kamber 2006, Sections 10 ... | PowerPoint PPT presentation | free to view

Semi-Supervised Clustering and its Application to Text Clustering and Record Linkage PowerPoint PPT Presentation

Semi-Supervised Clustering and its Application to Text Clustering and Record Linkage - Title: KGP 2002 Talk Author: Sugato Basu Last modified by: Raymond Mooney Created Date: 5/20/2001 10:11:52 PM Document presentation format: On-screen Show | PowerPoint PPT presentation | free to view

Transfer Learning with Applications to Text Classification PowerPoint PPT Presentation

Transfer Learning with Applications to Text Classification - Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department | PowerPoint PPT presentation | free to view

On feature distributional clustering for text categorization PowerPoint PPT Presentation

On feature distributional clustering for text categorization - Reuters (ModApte split): 7063 articles in the training set, 2742 articles in the ... The results are achieved on 10 largest categories of Reuters. ... | PowerPoint PPT presentation | free to view

Topic: Refinement Method of Post-processing and Training for Improvement of Automated Classification PowerPoint PPT Presentation

Topic: Refinement Method of Post-processing and Training for Improvement of Automated Classification - Topic: Refinement Method of Post-processing and Training for Improvement of Automated Classification Yun Jeong Choi Dept. of Computer Science & Engineering | PowerPoint PPT presentation | free to view

Author-Topic Models for Large Text Corpora PowerPoint PPT Presentation

Author-Topic Models for Large Text Corpora - Date: Thu, 15 Jun 2000 08:59:00 -0700 (PDT) From: ... want to talk right after the election. PubMed-Query Topics. PubMed-Query Topics ... | PowerPoint PPT presentation | free to view

Semi-Supervised Clustering and its Application to Text Clustering and Record Linkage PowerPoint PPT Presentation

Semi-Supervised Clustering and its Application to Text Clustering and Record Linkage - Combines labeled and unlabeled data during training to improve performance: ... 864 restaurant listings from Fodor's and Zagat's guidebooks. ... | PowerPoint PPT presentation | free to view

Text%20Mining%20for%20Biomedicine:%20Techniques%20 PowerPoint PPT Presentation

Text%20Mining%20for%20Biomedicine:%20Techniques%20 - TerMine, AcroMine, Smart dictionary look up, Phenetica. Medie, InfoPubMed, KLEIO. 3 ... Focus: biology, medicine, social sciences... 9. We don't just press a ... | PowerPoint PPT presentation | free to view

CLASSIFICATION OF PRIMARY CARE MEDICAL RECORDS WITH RUBRYX-2: FIRST EXPERIENCE PowerPoint PPT Presentation

CLASSIFICATION OF PRIMARY CARE MEDICAL RECORDS WITH RUBRYX-2: FIRST EXPERIENCE - CLASSIFICATION OF PRIMARY CARE MEDICAL RECORDS WITH RUBRYX-2: FIRST EXPERIENCE Olga Kaurova 1 kaurovskiy@gmail.com Mikhail Alexandrov 1 malexandrov@mail.ru | PowerPoint PPT presentation | free to view

Techniques of Classification and Clustering PowerPoint PPT Presentation

Techniques of Classification and Clustering - Problem Description Assume A={A1, A2, , Ad}: (ordered or unordered) domain S= A1 A2 Ad : d-dimensional (numerical or non-numerical) space Input V={v1, v2 ... | PowerPoint PPT presentation | free to view

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin PowerPoint PPT Presentation

Information Theoretic Clustering, Co-clustering and Matrix Approximations Inderjit S. Dhillon University of Texas, Austin - Title: Information Theoretic Clustering and Co-Clustering for Text Mining Author: Inderjit Dhillon Last modified by: Inderjit Dhillon Created Date | PowerPoint PPT presentation | free to view

Naive clustering of a large XML document collection PowerPoint PPT Presentation

Naive clustering of a large XML document collection - Naive clustering of a large XML document collection Antoine Doucet University of Helsinki Department of Computer Science 1st INEX Workshop Schloss Dagstuhl, 10.12.2002 | PowerPoint PPT presentation | free to view

Text%20Classification%20from%20Labeled%20and%20Unlabeled%20Documents%20using%20EM PowerPoint PPT Presentation

Text%20Classification%20from%20Labeled%20and%20Unlabeled%20Documents%20using%20EM - Reuters (21578 Distribution 1.0) data set: ... For all experiments on Reuters, 10 binary classifiers are trained one per topic. ... Classification of Reuters ... | PowerPoint PPT presentation | free to view

CS276B Text Retrieval and Mining Winter 2005 PowerPoint PPT Presentation

CS276B Text Retrieval and Mining Winter 2005 - CS276B Text Retrieval and Mining Winter 2005 Lecture 9 Plan for today Web size estimation Mirror/duplication detection Pagerank Size of the web What is the size of ... | PowerPoint PPT presentation | free to view

Text-retrieval%20Systems PowerPoint PPT Presentation

Text-retrieval%20Systems - www.ms.mff.cuni.cz | PowerPoint PPT presentation | free to view

Predictively Modeling Social Text PowerPoint PPT Presentation

Predictively Modeling Social Text - Predictively Modeling Social Text William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer Science Carnegie Mellon University | PowerPoint PPT presentation | free to view

Predictively Modeling Social Text PowerPoint PPT Presentation

Predictively Modeling Social Text - football. The. Pittsburgh. Steelers. b. won. Box is shorthand for many repetitions of the structure... Cd ~ Mult( | ) = football' For each position ... | PowerPoint PPT presentation | free to view

Supervised Learning for Text Classification PowerPoint PPT Presentation

Supervised Learning for Text Classification - Penalized Likelihood. Independent Laplace priors give this not so intuitive ... Higher prior variance = less penalization. We used: C is tuning constant ... | PowerPoint PPT presentation | free to view

ICS 278: Data Mining Lectures 7 and 8: Classification Algorithms PowerPoint PPT Presentation

ICS 278: Data Mining Lectures 7 and 8: Classification Algorithms - and more generally cost(i,j) is a matrix of K x K losses (e.g., surgery, spam email, etc) ... Requires fast lookup at run-time to do classification with large n ... | PowerPoint PPT presentation | free to view

Record Linkage Tutorial: Distance Metrics for Text PowerPoint PPT Presentation

Record Linkage Tutorial: Distance Metrics for Text - Smith-Waterman distance in. Monge & Elkan's WEBFIND (1996) Used a standard version of Smith-Waterman with hand-tuned weights for inserts ... | PowerPoint PPT presentation | free to view

Discovering and Utilizing Structure in Large Unstructured Text Datasets PowerPoint PPT Presentation

Discovering and Utilizing Structure in Large Unstructured Text Datasets - ... years, Microsoft Corporation CEO Bill Gates railed against the economic ... Bill Gates CEO Microsoft. Bill Veghte VP Microsoft. Richard Stallman founder ... | PowerPoint PPT presentation | free to view

Probability%20Theory%20%20Bayes%20Theorem%20and%20Na PowerPoint PPT Presentation

Probability%20Theory%20%20Bayes%20Theorem%20and%20Na - All words = just count all the words in the dictionary ... MEDLINE (National Library of Medicine) $2 million/year for manual indexing of journal articles ... | PowerPoint PPT presentation | free to view

Text-Mining%20Tutorial PowerPoint PPT Presentation

Text-Mining%20Tutorial - PowerPoint Presentation | PowerPoint PPT presentation | free to view

Classification and Morphology of Microorganisms PowerPoint PPT Presentation

Classification and Morphology of Microorganisms - Title: Slide 1 Author: Suslow Last modified by: Admin Created Date: 1/4/2005 12:33:20 AM Document presentation format: (4:3) Company: UC Davis | PowerPoint PPT presentation | free to view

Real-time Text Mining for the Biomedical Literature PowerPoint PPT Presentation

Real-time Text Mining for the Biomedical Literature - Real-time Text Mining for the Biomedical Literature ... Discovery Net: An e-Science testbed for High Throughput Informatics. 2.2M EPSRC Pilot Project ... | PowerPoint PPT presentation | free to view

Real-time Text Mining for the Biomedical Literature PowerPoint PPT Presentation

Real-time Text Mining for the Biomedical Literature - April 21, 2005. EPSRC E-Science Meeting, NeSC. Real-time Text Mining for the ... Mark 1 System Implemented. Annotation based on terminology lookups ... | PowerPoint PPT presentation | free to view