Title: USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS
1USING WORDNET TO RETRIEVE WORDS FROM THEIR
MEANINGS
- Ilknur Durgar El-Kahlout and Kemal Oflazer
- Sabanci University
- Istanbul, Turkey
2Problem
- For a given definition, find the appropriate word
(or words) - Traditional dictionary is of no use
- From a dictionary, find an appropriate word that
has a similar definition
3Examples
?
- User definition
- Akimi ölçmek için kullanilan alet
- (A device that is used to measure the currenta)
- In the dictionary
- akimölçer elektrik akiminin siddetini ölçmeye
yarayan araç, ampermetre - (ammeter a device that measures the intensity of
electrical current, amperemeter)
4Applications
- Computer-assisted language learning
- Solving crossword puzzles
- Reverse dictionary
5Outline
- Problem statement
- Meaning-to-Word System (MTW)
- Our Approach
- Methods
- Results
- Result Summary
- Conclusion
6Problem Statement
- Find the similarity between two definitions
- Akimi ölçmek için kullanilan alet
- (A device that is used to measure the current)
- Elektrik akiminin siddetini ölçmeye yarayan araç,
ampermetre - (a device that measures the intensity of
electrical current, amperemeter)
7Meaning-to-Word (MTW)
- addresses the problem of finding the appropriate
word (or words), whose meaning matches the
given definition - Two subproblems
- finding words whose definitions are "similar" to
the query in some sense - ranking the candidate words using a variety of
ways
8Information Flow in MTW
User Definition
query
Search in Dictionary
candidates
Rank Candidates
List of words
9Available Resources
- Turkish Monolingual Dictionary
- About 50.000 entries
- Turkish WordNet
- About 11.000 synsets
10Normalization
User Definition
Normalization
query
Search in Dictionary
candidates
Rank Candidates
List of words
11Normalization
- Tokenization
- Stemming
- Stop Word Elimination
12Query Processing
User Definition
query
Query Processing
Search in Dictionary
candidates
Rank Candidates
List of words
13Query Processing
- Subset Generation
- Search with different set of words
- Select informative words from users query
- Query daha önce hiç evlenmemis kisi (a person
who has never been married) - önce, evlen, kisi (before, marry, person)
- evlen, kisi, önce, kisi, önce, evlen
(marry, person) (before, person)
(before, marry) - evlen, önce, kisi
- (marry) (before) (person)
14Query Processing
- Subset Sorting
- Unordered list of subsets are insufficient
- Rank the generated subsets
- 1) By the number of words
- önce, evlen, kisi (before, marry, person)
- evlen, kisi (marry, person)
- 2) By the sum of frequency logarithm
- evlen, kisi (marry, person)
- önce, kisi (before, person)
15Searching for Meanings
User Definition
query
Search in Dictionary
candidates
Rank Candidates
List of words
16Searching for Meanings
- Two methods
- Stem Matching
- Query Expansion (using WordNet)
17Stem Matching
- Morphological normalization of words
- Find meanings that contain morphological variants
of the original definition
18Stem Matching (Ex.)
(A device that is used to measure the current)
akimi ölçmek için kullanilan
alet
ak (white) ölç(measure) için(to) kullan(use)
alet (device) akim(current)
iç(drink) kul (slave) aki (flux) Colored stems
are the matching ones
19Stem Matching
- (A device that is used to measure the current)
- akimi ölçmek için kullanilan alet
- elektrik akiminin siddetini ölçmeye yarayan araç,
ampermetre - (a device that measures the intensity of
electrical current, amperemeter)
20Stem Matching
- (A device that is used to measure the current)
- akimi ölçmek için kullanilan alet
- elektrik akiminin siddetini ölçmeye yarayan araç,
ampermetre - (a device that measures the intensity of
electrical current, amperemeter)
21Stem Matching
- Drawbacks
- Generate noisy stems
- ilim (science, my city) ? ilim (science), il
(city) - Conflate two words with very different meanings
to the same stem - ilim (science, my city), ilde (in the city) ?
il (city) - Cannot find relations between similar words
- kimse (someone) kisi (person)
- bölüm (part) kisim (portion)
22Using Query Expansion
- Two different approaches
- Expand query with relations (synonyms,
specializations, generalizations) - Expand query with unexpanded querys relevant
answers - WordNet synonyms are used in MTW
- besin, gida (food,
nourishment) - iyiles, düzel (to get better)
/iyiles, gelis (to improve)
23Query Expansion (Ex.)
(A device that is used to measure the current)
akimi ölçmek için kullanilan
alet
ak (white) ölç(measure) için(to) kullan(use)
alet (device) akim(current)
iç(drink) kul (slave) aki (flux)
beyaz
faydalan araç debi
yararlan
gereç akis
köle
24Query Expansion (Ex.)
- (A device that is used to measure the current)
- akimi ölçmek için kullanilan alet
- elektrik akiminin siddetini ölçmeye yarayan araç,
ampermetre - (a device that measures the intensity of
electrical current, amperemeter)
25Query Expansion (Ex.)
- (A device that is used to measure the current)
- akimi ölçmek için kullanilan alet
- elektrik akiminin siddetini ölçmeye yarayan araç,
ampermetre - (a device that measures the intensity of
electrical current, amperemeter)
26Ranking
User Definition
query
Search in Dictionary
candidates
Rank Candidates
List of words
27Ranking
- Very important part of MTW
- Having the right answer in the retrieved set is
not enough - Aim is to have the right answer at top of the
retrieved set (Ex in first top 50 answers)
28Ranking
- Simple but effective methods
- Number of matched words
- Subset informativeness - frequency of words in
the subset - Ratio of number of matched words to the number of
words in the candidate dictionary definition - Longest Common Subsequence - order of the matched
words
29Some Statistics
- Training sets
- 50 queries from users
- 50 queries from a dictionary
- Test sets
- 50 queries from users
- 50 queries from a separate dictionary
Test set 1 (user) Training set 1 Test set 2 (dict.) Training set 2
of queries 50 50 50 50
Avg. of query words 5.66 4.64 9.24 13.98
Max. of query words 17 12 23 45
Min. of query words 2 1 1 6
30Stem Matching
all stems included
Rank Test set 1 Training set 1 Test set 2 Training set 2
1-10 13 (26) 18 (36) 45 (90) 41 (82)
11-50 7 (14) 12 (24) 2 (4) 5 (10)
gt50 19 (38) 10 (20) 3 (6) 4 (8)
Not found 11 (22) 10 (20) 0 (0) 0 (0)
Low in top 10 in user queries but very high
results in dictionary queries
31Stem Matching
longest stem included (heuristics)
Rank Test set 1 Training set 1 Test set 2 Training set 2
1-10 14 (28) 21 (42) 46 (92) 43 (86)
11-50 5 (10) 9 (18) 1 (2) 5 (10)
gt50 18 (36) 9 (18) 3 (6) 2 (4)
Not found 13 (26) 11 (22) 0 (0) 0 (0)
Improvement in user queries, slightly better
performance in dictionary queries
32Query Expansion (WordNet)
all stems included
Rank Test set 1 Training set 1 Test set 2 Training set 2
1-10 14(28) 24 (48) 45 (90) 41 (82)
11-50 9 (18) 9 (18) 2 (4) 5 (10)
gt50 18 (36) 12 (24) 3 (6) 4 (8)
Not found 9 (18) 5 (10) 0 (0) 0 (0)
Better results in user queries, no change in
dictionary queries
33Query Expansion (WordNet)
longest stem included (heuristics)
Rank Test set 1 Training set 1 Test set 2 Training set 2
1-10 14 (28) 24 (48) 41 (82) 39 (78)
11-50 6 (12) 8 (16) 5 (10) 6 (12)
gt50 21 (42) 13 (26) 1 (2) 5 (10)
Not found 9 (18) 5 (10) 0 (0) 0 (0)
Better performance than longest stem matching
in user queries, but worse performance in
dictionary queries
34Result Summary
- Stem Matching (longest stem included)
- 60 success in real user queries
- 96 success in dictionary queries
- Query Expansion (all stems included)
- 68 success in real user queries
- 92 success in dictionary queries
35Conclusion
- We have implemented a Meaning to Word system
for Turkish - Results on unseen data are rather satisfactory
- Query expansion is better
- Although, it cannot find the words for all
queries - 68 of real user queries and 90 of dictionary
queries are found in the first 50 results
36THANK YOU !