USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS - PowerPoint PPT Presentation

About This Presentation
Title:

USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS

Description:

Traditional dictionary is of no use. From a dictionary, find an appropriate word ... Solving crossword puzzles. Reverse dictionary. Outline. Problem statement ... – PowerPoint PPT presentation

Number of Views:124
Avg rating:3.0/5.0
Slides: 37
Provided by: kullan
Category:

less

Transcript and Presenter's Notes

Title: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS


1
USING WORDNET TO RETRIEVE WORDS FROM THEIR
MEANINGS
  • Ilknur Durgar El-Kahlout and Kemal Oflazer
  • Sabanci University
  • Istanbul, Turkey

2
Problem
  • For a given definition, find the appropriate word
    (or words)
  • Traditional dictionary is of no use
  • From a dictionary, find an appropriate word that
    has a similar definition

3
Examples
?
  • User definition
  • Akimi ölçmek için kullanilan alet
  • (A device that is used to measure the currenta)
  • In the dictionary
  • akimölçer elektrik akiminin siddetini ölçmeye
    yarayan araç, ampermetre
  • (ammeter a device that measures the intensity of
    electrical current, amperemeter)

4
Applications
  • Computer-assisted language learning
  • Solving crossword puzzles
  • Reverse dictionary

5
Outline
  • Problem statement
  • Meaning-to-Word System (MTW)
  • Our Approach
  • Methods
  • Results
  • Result Summary
  • Conclusion

6
Problem Statement
  • Find the similarity between two definitions
  • Akimi ölçmek için kullanilan alet
  • (A device that is used to measure the current)
  • Elektrik akiminin siddetini ölçmeye yarayan araç,
    ampermetre
  • (a device that measures the intensity of
    electrical current, amperemeter)

7
Meaning-to-Word (MTW)
  • addresses the problem of finding the appropriate
    word (or words), whose meaning matches the
    given definition
  • Two subproblems
  • finding words whose definitions are "similar" to
    the query in some sense
  • ranking the candidate words using a variety of
    ways

8
Information Flow in MTW
User Definition
query
Search in Dictionary
candidates
Rank Candidates
List of words
9
Available Resources
  • Turkish Monolingual Dictionary
  • About 50.000 entries
  • Turkish WordNet
  • About 11.000 synsets

10
Normalization
User Definition
Normalization
query
Search in Dictionary
candidates
Rank Candidates
List of words
11
Normalization
  • Tokenization
  • Stemming
  • Stop Word Elimination

12
Query Processing
User Definition
query
Query Processing
Search in Dictionary
candidates
Rank Candidates
List of words
13
Query Processing
  • Subset Generation
  • Search with different set of words
  • Select informative words from users query
  • Query daha önce hiç evlenmemis kisi (a person
    who has never been married)
  • önce, evlen, kisi (before, marry, person)
  • evlen, kisi, önce, kisi, önce, evlen
    (marry, person) (before, person)
    (before, marry)
  • evlen, önce, kisi
  • (marry) (before) (person)

14
Query Processing
  • Subset Sorting
  • Unordered list of subsets are insufficient
  • Rank the generated subsets
  • 1) By the number of words
  • önce, evlen, kisi (before, marry, person)
  • evlen, kisi (marry, person)
  • 2) By the sum of frequency logarithm
  • evlen, kisi (marry, person)
  • önce, kisi (before, person)

15
Searching for Meanings
User Definition
query
Search in Dictionary
candidates
Rank Candidates
List of words
16
Searching for Meanings
  • Two methods
  • Stem Matching
  • Query Expansion (using WordNet)

17
Stem Matching
  • Morphological normalization of words
  • Find meanings that contain morphological variants
    of the original definition

18
Stem Matching (Ex.)
(A device that is used to measure the current)
akimi ölçmek için kullanilan
alet
ak (white) ölç(measure) için(to) kullan(use)
alet (device) akim(current)
iç(drink) kul (slave) aki (flux) Colored stems
are the matching ones
19
Stem Matching
  • (A device that is used to measure the current)
  • akimi ölçmek için kullanilan alet
  • elektrik akiminin siddetini ölçmeye yarayan araç,
    ampermetre
  • (a device that measures the intensity of
    electrical current, amperemeter)

20
Stem Matching
  • (A device that is used to measure the current)
  • akimi ölçmek için kullanilan alet
  • elektrik akiminin siddetini ölçmeye yarayan araç,
    ampermetre
  • (a device that measures the intensity of
    electrical current, amperemeter)

21
Stem Matching
  • Drawbacks
  • Generate noisy stems
  • ilim (science, my city) ? ilim (science), il
    (city)
  • Conflate two words with very different meanings
    to the same stem
  • ilim (science, my city), ilde (in the city) ?
    il (city)
  • Cannot find relations between similar words
  • kimse (someone) kisi (person)
  • bölüm (part) kisim (portion)

22
Using Query Expansion
  • Two different approaches
  • Expand query with relations (synonyms,
    specializations, generalizations)
  • Expand query with unexpanded querys relevant
    answers
  • WordNet synonyms are used in MTW
  • besin, gida (food,
    nourishment)
  • iyiles, düzel (to get better)
    /iyiles, gelis (to improve)

23
Query Expansion (Ex.)
(A device that is used to measure the current)
akimi ölçmek için kullanilan
alet
ak (white) ölç(measure) için(to) kullan(use)
alet (device) akim(current)
iç(drink) kul (slave) aki (flux)
beyaz
faydalan araç debi
yararlan
gereç akis
köle
24
Query Expansion (Ex.)
  • (A device that is used to measure the current)
  • akimi ölçmek için kullanilan alet
  • elektrik akiminin siddetini ölçmeye yarayan araç,
    ampermetre
  • (a device that measures the intensity of
    electrical current, amperemeter)

25
Query Expansion (Ex.)
  • (A device that is used to measure the current)
  • akimi ölçmek için kullanilan alet
  • elektrik akiminin siddetini ölçmeye yarayan araç,
    ampermetre
  • (a device that measures the intensity of
    electrical current, amperemeter)

26
Ranking
User Definition
query
Search in Dictionary
candidates
Rank Candidates
List of words
27
Ranking
  • Very important part of MTW
  • Having the right answer in the retrieved set is
    not enough
  • Aim is to have the right answer at top of the
    retrieved set (Ex in first top 50 answers)

28
Ranking
  • Simple but effective methods
  • Number of matched words
  • Subset informativeness - frequency of words in
    the subset
  • Ratio of number of matched words to the number of
    words in the candidate dictionary definition
  • Longest Common Subsequence - order of the matched
    words

29
Some Statistics
  • Training sets
  • 50 queries from users
  • 50 queries from a dictionary
  • Test sets
  • 50 queries from users
  • 50 queries from a separate dictionary

Test set 1 (user) Training set 1 Test set 2 (dict.) Training set 2
of queries 50 50 50 50
Avg. of query words 5.66 4.64 9.24 13.98
Max. of query words 17 12 23 45
Min. of query words 2 1 1 6
30
Stem Matching
all stems included
Rank Test set 1 Training set 1 Test set 2 Training set 2
1-10 13 (26) 18 (36) 45 (90) 41 (82)
11-50 7 (14) 12 (24) 2 (4) 5 (10)
gt50 19 (38) 10 (20) 3 (6) 4 (8)
Not found 11 (22) 10 (20) 0 (0) 0 (0)
Low in top 10 in user queries but very high
results in dictionary queries
31
Stem Matching
longest stem included (heuristics)
Rank Test set 1 Training set 1 Test set 2 Training set 2
1-10 14 (28) 21 (42) 46 (92) 43 (86)
11-50 5 (10) 9 (18) 1 (2) 5 (10)
gt50 18 (36) 9 (18) 3 (6) 2 (4)
Not found 13 (26) 11 (22) 0 (0) 0 (0)
Improvement in user queries, slightly better
performance in dictionary queries
32
Query Expansion (WordNet)
all stems included
Rank Test set 1 Training set 1 Test set 2 Training set 2
1-10 14(28) 24 (48) 45 (90) 41 (82)
11-50 9 (18) 9 (18) 2 (4) 5 (10)
gt50 18 (36) 12 (24) 3 (6) 4 (8)
Not found 9 (18) 5 (10) 0 (0) 0 (0)
Better results in user queries, no change in
dictionary queries
33
Query Expansion (WordNet)
longest stem included (heuristics)
Rank Test set 1 Training set 1 Test set 2 Training set 2
1-10 14 (28) 24 (48) 41 (82) 39 (78)
11-50 6 (12) 8 (16) 5 (10) 6 (12)
gt50 21 (42) 13 (26) 1 (2) 5 (10)
Not found 9 (18) 5 (10) 0 (0) 0 (0)
Better performance than longest stem matching
in user queries, but worse performance in
dictionary queries
34
Result Summary
  • Stem Matching (longest stem included)
  • 60 success in real user queries
  • 96 success in dictionary queries
  • Query Expansion (all stems included)
  • 68 success in real user queries
  • 92 success in dictionary queries

35
Conclusion
  • We have implemented a Meaning to Word system
    for Turkish
  • Results on unseen data are rather satisfactory
  • Query expansion is better
  • Although, it cannot find the words for all
    queries
  • 68 of real user queries and 90 of dictionary
    queries are found in the first 50 results

36
THANK YOU !
Write a Comment
User Comments (0)
About PowerShow.com