Asymmetric Word Similarity - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Asymmetric Word Similarity

Description:

Identified Synonyms/antonyms. Close Hypernyms identified. Exhaustive search. Total antonyms/synonyms/hypernyms. that exists but not identified. Hit rate of 67%, 28 ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 11
Provided by: bt447
Category:

less

Transcript and Presenter's Notes

Title: Asymmetric Word Similarity


1
Asymmetric Word Similarity
  • Behrad Assadian
  • Trevor Martin
  • Ben Azvine

2
Introduction
  • An approach to understanding of text documents
  • Capture semantics of textual information
  • Matrix of Word Similarity
  • Applicable to a particular domain
  • Use a corpus of textual documents
  • Resolves issues encountered by other traditional
    methods
  • Can use this to measure document similarity and
    clustering

3
  • It is deduced that it is possible to guess the
    meaning of an unknown word from its context
    (Pantal P, D Linn)

A bottle of Tezguno is on the table. Everyone
likes Tezguno. Tezguno makes you drunk. We make
Tezguno out of corn
Can be deduces using Distributional Hypothesis
that Tezguno is a type of alcoholic drink
4
Asymmetric Word Similarity Matrix
Based on Identifying frequencies of ngrams of
context words e.g c1-x-c2 represented as
x(c1,c2) Consider The quick brown fox
jumps over the lazy dog. The quick brown cat
jumps onto the active dog. The slow brown fox
jumps onto the quick brown cat. The quick
brown cat leaps over the quick brown fox.
5
  • Convert frequencies to fuzzy sets
  • Fuzzy set represents context of a word
  • e.g for brown
  • (quick,cat)1,(quick, fox)0.833,
    (slow,fox)0.50

6
  • Mass assignment followed by Semantic Unification
  • is carried out.
  • Result given as a single value probability
  • Two words W1 and W2
  • pr(w1w2)
  • degree to which w1 could replace w2
  • Performing every possible semantic unification
    gives
  • word similarity matrix
  • Many elements shall be zero

7
Document Clustering
  • Can cluster documents using AWS matrix
  • Other known methods Vector Space Model
  • Limitation- String matching
  • Words such as taxi and cab could be ignored
  • document similarity matrix
  • Distance between two documents can be
    identified.
  • Cluster files around starting file
  • ????

8
Results
  • Film Description
  • Reviews of movies
  • Tested using WordNet inspection
  • Identified Synonyms/antonyms
  • Close Hypernyms identified
  • Exhaustive search
  • Total antonyms/synonyms/hypernyms
  • that exists but not identified
  • Hit rate of 67, 28 and 30

9
Clustering results
  • Movie corpus reviews
  • Possible to compare clustered results
  • Can set threshold value

10
  • Proposed a method for clustering documents
  • using Asymmetric Word Similarity
  • Results using WordNet prove encouraging
  • Using context to determine semantics can be
    affective
  • Must carry out further comparison with other
    common methods
  • Performance issues for large corpuses must be
    addressed
Write a Comment
User Comments (0)
About PowerShow.com