Measures of Distributional Similarity - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Measures of Distributional Similarity

Description:

an empirical comparison of a broad range of measures; ... Jelinek-Mercer interpolation method. An alternative approach: distance-weighted averaging ... – PowerPoint PPT presentation

Number of Views:77
Avg rating:3.0/5.0
Slides: 12
Provided by: ADy5
Category:

less

Transcript and Presenter's Notes

Title: Measures of Distributional Similarity


1
Measures of Distributional Similarity
Lillian Lee Department of Computer
Science Cornell University
  • Presenter Cosmin Adrian Bejan

2
Overview
  • Goal improve probability estimation for unseen
    cooccurences.
  • Contributions of this paper
  • an empirical comparison of a broad range of
    measures
  • a classification of similarity functions based on
    the information that they incorporate
  • a new function for evaluating proxy distributions

3
Introduction
  • How to estimate the conditional cooccurence
    probability P(vn) of an unseen word pair (n,v)
    drawn from some finite set NxV ?
  • Normal approaches
  • Katz back-off method
  • Jelinek-Mercer interpolation method.
  • An alternative approach
  • distance-weighted averaging

where S(n) is a set of candidate similar words
and sim(n,m) is a function of similarity between
n and m.
4
Distributional Similarity Functions
Notations N set of nouns V
set of transitive verbs (n,v)
coocurence pair where n is the
head of the direct object of v.
n,m two nouns whose distributi-onal similarity
is to be determined q(v) P(vn) r(v) P(vm)
(1)
Euclidean distance
(2)
L1 norm
(3)
cosine
(4)
Jaccards coefficient
5
Distributional Similarity Functions
Jensen-Shannon divergence
(5)
Kullback Leibler divergence
confusion probability
(6)
Kendals ?
(7)
6
The Evaluation Method
  • Evaluation of similarity functions on a binary
    decision task
  • Data verb-object cooccurence pairs involving
    1000 most frequent nouns
  • Training/Testing set 80 / 20
  • Testing set
  • discard the pairs occurring in the training data
  • split the remaining pairs into five partitions
  • replace each (n,v1) with a (n,v1,v2) triple such
    that P(v1)?P(v2)
  • The task reconstruct which of (n,v1) and (n,v2)
    was the original cooccurence.
  • The error-rate measured for test-set performance

where T is the number of test triple tokens in
the set
7
The Evaluation Method
  • Incorporate similarity function into a decision
    rule as follows
  • (n,v1,v2) test instance
  • Sf,k(n) the k most similar words to n according
    to f
  • evidence Ef,k(n,v1) for v1 the number of
    neighbors m? Sf,k(n) such that P(v1m)gtP(v2m)
  • the decision rule choose the verb alternative
    with the greatest evidence
  • For two functions f and g if Ef,k(n,v1)gtEg,k(n,v
    1) then the k most similar words according to f
    are on the whole better predictors that the k
    most similar words according to g hence f
    induces an inherently better similarity ranking
    for distance-weighted averaging.

8
Similarity Metric Performance
9
(No Transcript)
10
The Skew Divergence
  • Remark it is desirable to have a similarity
    function that focuses on the verbs that cooccur
    with both of the nouns being compared.

a - skew divergence
  • the skew divergence is asymmetric
  • sa depends only on the verbs in Vqr.

11
Performance of the Skew Divergence
Write a Comment
User Comments (0)
About PowerShow.com