Distributional clustering of English words

About This Presentation

Title:

Description:

Number of Views:23

Avg rating:3.0/5.0

Slides: 13

Provided by: www1CsCol

Learn more at: http://www1.cs.columbia.edu

Category:

Tags: clustering | distributional | english | words

Transcript and Presenter's Notes

Title: Distributional clustering of English words

1
Distributional clustering of English words

2
Introduction

3
Introduction

Simple tabulation of frequencies
Data sparseness
Hindle proposed smoothing based on clustering
Estimating likelihood of unseen events from the
frequencies of similar events that have been
seen
Example estimating the likelihood of a
particular direct object for a verb from the
likelihood of that direct object for similar
verbs

4
Introduction

Hindles proposal
Words are similar if there is strong statistical
evidence that they tend to participate in the
same events
This paper
Factor word association tendencies into
associations of words to certain hidden classes
and association between classes themselves
Derive classes directly from data

5
Introduction

6
Problem

7
Methodology

Measure of similarity between distributions
Kullback-Leibler distance
This problem
Unsupervised learning leardn underlying
distribution of data
Objects have no internal structure, the only
info. statistics about joint appearance (kind
of supervised learning)

8
Distributional Clustering