Frequency Estimates for Statistical Word Similarity Measures - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Frequency Estimates for Statistical Word Similarity Measures

Description:

University of Waterloo. Introduction. A comparative study of two methods for estimating word ... the notion for cooccurence of two words can be depicted by ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 10
Provided by: ADy5
Category:

less

Transcript and Presenter's Notes

Title: Frequency Estimates for Statistical Word Similarity Measures


1
Frequency Estimates for Statistical Word
Similarity Measures
Egidio Terra and C.L.A. Clarke School of Computer
Science University of Waterloo
  • Presenter Cosmin Adrian Bejan

2
Introduction
  • A comparative study of two methods for estimating
    word cooccurence frequencies required by word
    similarity measures to solve human-oriented
    language tests.
  • Example of such tests
  • determine the best synonym in a set of
    alternatives AA1, A2, A3, A4 for a specific
    target word TW in a context Cw1, w2, wn \
    TW.
  • determine the best synonym when no context is
    available

3
Measuring Word Similarity
  • the notion for cooccurence of two words can be
    depicted by a contingency table
  • each dimension represents a random discrete
    variable Wi with range A wi, ? wi
  • each cell represent the joint frequency
  • where Nmax is the maximum number of
    cooccurences.

4
Similarity between two words
Pointwise Mutual Information
?2 - test
Likelihood ratio
Average Mutual Information
5
Context supported similarity
Cosine of Pointwise Mutual Information
L1 norm
Contextual Average Mutual Information
Contextual Jensen- Shanon Digergence
Pointwise Mutual Infor- mation of Multiple words
6
Window-oriented approach
  • fw_i frequency of wi
  • fw_1,w_2 cooccurence frequency of w1 and w2
  • N size of the corpus in words
  • P(wi) fw_i/N
  • fw_1,w_2 is estimated by the number of windows
    where the two words cooccur.
  • Nwt number of windows of size t
  • P(w1, w2) fw_1,w_2 / Nwt

7
Document-oriented approach
  • dfw_i frequency of a word wi. It corresponds to
    the number of documents in which the words
    appears.
  • D the number of documents
  • P(wi) dfw_i/ D
  • dfw_1,w_2 cooccurence frequency of two words
    is the number of documents where the words
    cooccur.
  • P(w1, w2) dfw_1,w_2 / D

8
Results for TOEFL test set
9
Results for TS1 and context
Write a Comment
User Comments (0)
About PowerShow.com