Measures of Distributional Similarity

About This Presentation

Title:

Measures of Distributional Similarity

Description:

an empirical comparison of a broad range of measures; ... Jelinek-Mercer interpolation method. An alternative approach: distance-weighted averaging ... – PowerPoint PPT presentation

Number of Views:77

Avg rating:3.0/5.0

Slides: 12

Provided by: ADy5

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Measures of Distributional Similarity

1
Measures of Distributional Similarity
Lillian Lee Department of Computer
Science Cornell University

Presenter Cosmin Adrian Bejan

2
Overview

Goal improve probability estimation for unseen
cooccurences.
Contributions of this paper
an empirical comparison of a broad range of
measures
a classification of similarity functions based on
the information that they incorporate
a new function for evaluating proxy distributions

3
Introduction

How to estimate the conditional cooccurence
probability P(vn) of an unseen word pair (n,v)
drawn from some finite set NxV ?
Normal approaches
Katz back-off method
Jelinek-Mercer interpolation method.
An alternative approach
distance-weighted averaging

where S(n) is a set of candidate similar words
and sim(n,m) is a function of similarity between
n and m.
4
Distributional Similarity Functions
Notations N set of nouns V
set of transitive verbs (n,v)
coocurence pair where n is the
head of the direct object of v.
n,m two nouns whose distributi-onal similarity
is to be determined q(v) P(vn) r(v) P(vm)
(1)
Euclidean distance
(2)
L1 norm
(3)
cosine
(4)
Jaccards coefficient
5
Distributional Similarity Functions
Jensen-Shannon divergence
(5)
Kullback Leibler divergence
confusion probability
(6)
Kendals ?
(7)
6
The Evaluation Method

Evaluation of similarity functions on a binary
decision task
Data verb-object cooccurence pairs involving
1000 most frequent nouns
Training/Testing set 80 / 20
Testing set
discard the pairs occurring in the training data
split the remaining pairs into five partitions
replace each (n,v1) with a (n,v1,v2) triple such
that P(v1)?P(v2)
The task reconstruct which of (n,v1) and (n,v2)
was the original cooccurence.
The error-rate measured for test-set performance

where T is the number of test triple tokens in
the set
7
The Evaluation Method

Incorporate similarity function into a decision
rule as follows
(n,v1,v2) test instance
Sf,k(n) the k most similar words to n according
to f
evidence Ef,k(n,v1) for v1 the number of
neighbors m? Sf,k(n) such that P(v1m)gtP(v2m)
the decision rule choose the verb alternative
with the greatest evidence
For two functions f and g if Ef,k(n,v1)gtEg,k(n,v
1) then the k most similar words according to f
are on the whole better predictors that the k
most similar words according to g hence f
induces an inherently better similarity ranking
for distance-weighted averaging.