Frequency Estimates for Statistical Word Similarity Measures

About This Presentation

Title:

Frequency Estimates for Statistical Word Similarity Measures

Description:

University of Waterloo. Introduction. A comparative study of two methods for estimating word ... the notion for cooccurence of two words can be depicted by ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 10

Provided by: ADy5

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Frequency Estimates for Statistical Word Similarity Measures

1
Frequency Estimates for Statistical Word
Similarity Measures
Egidio Terra and C.L.A. Clarke School of Computer
Science University of Waterloo

Presenter Cosmin Adrian Bejan

2
Introduction

A comparative study of two methods for estimating
word cooccurence frequencies required by word
similarity measures to solve human-oriented
language tests.
Example of such tests
determine the best synonym in a set of
alternatives AA1, A2, A3, A4 for a specific
target word TW in a context Cw1, w2, wn \
TW.
determine the best synonym when no context is
available

3
Measuring Word Similarity

the notion for cooccurence of two words can be
depicted by a contingency table
each dimension represents a random discrete
variable Wi with range A wi, ? wi
each cell represent the joint frequency
where Nmax is the maximum number of
cooccurences.

4
Similarity between two words
Pointwise Mutual Information
?2 - test
Likelihood ratio
Average Mutual Information
5
Context supported similarity
Cosine of Pointwise Mutual Information
L1 norm
Contextual Average Mutual Information
Contextual Jensen- Shanon Digergence
Pointwise Mutual Infor- mation of Multiple words
6
Window-oriented approach

fw_i frequency of wi
fw_1,w_2 cooccurence frequency of w1 and w2
N size of the corpus in words
P(wi) fw_i/N
fw_1,w_2 is estimated by the number of windows
where the two words cooccur.
Nwt number of windows of size t
P(w1, w2) fw_1,w_2 / Nwt

7
Document-oriented approach

dfw_i frequency of a word wi. It corresponds to
the number of documents in which the words
appears.
D the number of documents
P(wi) dfw_i/ D
dfw_1,w_2 cooccurence frequency of two words
is the number of documents where the words
cooccur.
P(w1, w2) dfw_1,w_2 / D

8
Results for TOEFL test set
9
Results for TS1 and context

Write a Comment

User Comments (0)