Overview of Peter D. Turney - PowerPoint PPT Presentation

1 / 109

About This Presentation

Title:

Overview of Peter D. Turney

Description:

Synonym, Antonym, Hypernym, Hyponym, Meronym:substance, Meronym:part, Meronym: ... 'Y such as the X' can be used to mine large text corpora for hypernym-hyponym ... – PowerPoint PPT presentation

Number of Views:63

Avg rating:3.0/5.0

Slides: 110

Provided by: iirRu

Category:

more less

Transcript and Presenter's Notes

Title: Overview of Peter D. Turney

1
Overview of Peter D. Turneys Work on Similarity

From 2001-2008

2
similarity

Attributional similarity (2001 - 2003)
the degree to which two words are synonymous
also known as
Semantic relatedness and semantic association
Relational similarity (2005 - 2008)
the degree to which two relations are analogous

3
Objective evaluation of the approaches by

Attributional similarity
80 TOFEL Synonym questions
Relational similarity
374 SAT analogy questions

4
2001 Mining the Web for Synonyms PMI-IR versus
LSA on TOEFL

In Proceedings of the 12th European
Conference on Machine Learning,
pages 491502, Springer, Berlin, 2001.

5
1 Introduction

?????
???????????,????????????????????
??????co-occurrence
a word is characterized by the company it keeps

6
1 Introduction idea

?????problem??????choice1, choice2, , choicen
??choicei?score(choicei),???????????
uses Pointwise Mutual Information (PMI)
to analyze statistical data collected by
Information Retrieval (IR).

7
2 formula

Score 1
Score 2 NEAR???????

8
2 formula

Score 3 ??????big vs. small
Score 4 ?????context
context word???????(?????)

9
3 Experiments

Compare with
LSA Latent Semantic Analysis
????????????X61,000 30,473
????????
????SVD
Element tfidf weight
Similarity cosine
???TOFEL??

Dataset
80?TOFEL??
50?ESL???

11
3 Experiments PMI-IR Vs. LSA

????
PMI-IR????,???
2s/query 8 querys,???????????
??2S
LSA???
61,000 30,473???61,000 300,UNIX Station???????

12
3 Experiments

80?TOFEL??, 50?ESL???
PMI-IR 73.75(59/80) 74(37/50)
??? 64.5(51.6/80)
LSA 64.4(51.5/80)
?? PMI-IR WIN 10
??
NEAR???,Smaller chunk size
LSA 64.4
PMI-IR with AND 62.5
PMI-IR with NEAR 72.5

13
4 Conclusion

??PMI?IR
??????????????
PMI
?????????
??????????

14
2003Combining independent modules in lexical
multiple-choice problems

In RANLP-03, pages 482489,
Borovets, Bulgaria
(RANLP Recent Advances in Natural Language
Processing )

15
1 Introduction

There are several approaches to natural language
problems
No one will be the best for all problem
instances.
How about combine them?

16
1 Introduction

two main contributions
introduces and evaluates several new modules
for answering multiple-choice synonym questions
and analogy questions.
3 merging rules
presents a novel product rule
compares it with other 2 similar merging rules.

17
2 Merging rules the parameter

The parameter of the rules w
phij gt 0 represents the probability
? i ?module 1 lt i lt n
? h ? instance 1 lt h lt m.
? j ?choice 1 lt j lt k
Dh,wj be the probability
assigned by the merging rule to choice j of
training instance h when the weights are set to
w.
1lt a(h) lt k be the correct answer for instance

18
2 Merging rules old

mixture rule very common
???
logarithmic rule

19
2 Merging rules novel

product rule

20
3 Synonym dataset

a training set of 431 4-choice synonym questions
randomly divided them into 331 training questions
and 100 testing questions.
Optimize w with the training set

21
3 Synonym Modules

LSA
PMI-IR
Thesaurus
queries Wordsmyth (www.wordsmyth.net)
Create synonyms lists for both stem and choices
scored them by their overlap
Connector
used summary pages from querying Google with a
pair of words
Weighted sum of
the times when the words appear separated by a
symbol
, , , ,, , /, ,, (,
means, defined, equals, synonym, whitespace, and
the number of times dictionary or thesaurus
appear

22
3 Synonym combine results

3 rules accuracies are nearly identical
the product and logarithmic rules assign higher
probabilities to correct answers
as evidenced by the mean likelihood.

23
3 Synonym compare with other approaches
24
4 Analogies dataset

374 5-choice instances
randomly split the collection into 274 training
instances and 100 testing instances.
Eg. catmeow
(a) mousescamper,
(b) birdpeck,
(c) dogbark,
(d) horsegroom,
(e) lionscratch

25
4 Analogies modules

Phrase vectors
Create vector r to present the relationship
between X and Y.
Phrases with 128 patterns
Eg. X for Y", Y with X", X in the Y", Y on X
Query and record the number of hits
Measure by cosine
Thesaurus paths (WordNet)
degree of similarity between paths

26
4 Analogies combine results

Lexical relation modules
a set of more specific modules using the WordNet
9 modules Each checks a relationship
Synonym, Antonym, Hypernym, Hyponym,
Meronymsubstance, Meronympart, Meronymmember,
Holonymsubstance, Holonymmember.
Check the stem first, then the choices
Similarity
Make use of definition
Similaritydict uses dictionary.com and
Similaritywordsmyth uses wordsmyth.net
Given ABCD, similarity sim (A, C) sim (B,
D)

27
(No Transcript)
28
5 Conclusion

applied three trained merging rules to TOEFL
questions
Accuracy 97.5
provided first results on a challenging analogy
task with a set of novel modules that use both
lexical databases and statistical information.
Accuracy 45
the popular mixture rule was consistently weaker
than the logarithmic and product rules at
assigning high probabilities to correct answers.

29
State of the art (accuracy)
LSA HUMAN PIM-IR (2001) HYBRID (2003)
Synonym question 64.4 64.5 73.75 97.5
HYBRID HUMAN
Analogies 45 57
30
2005 Corpus-based Learning of Analogies and
Semantic Relations

IJCAI 2005
Proceedings of the Nineteenth International Joint
Conference on Artificial Intelligence, Edinburgh,
Scotland, UK, July 30-August 5, 2005.

31
1 Introduction

Verbal analogy VSM
AB CD
The novelty of the paper is the application of
VSM to measure the similarity between
relationships.
Noun-modifier pairs relations supervised nearest
neighbour algorithm
Dataset Nastase and Szpakowicz (2003), 600
none-modifier pairs.

32
1 Introduction examples

Analogy
Noun-modifier pairs relations
Laser printer
Relation instrument

33
2 Solving Analogy Problems

assign scores to candidate analogies ABCD
For multiple-choice questions, guess highest
scoring choice
Sim(R1, R2)
difficulty is that R1 and R2 are implicit
attempt to learn R1 and R2 using unsupervised
learning from a very large corpus

34
2 Solving Analogy Problems Vector Space Model

create vectors, r1 and r2, that represent
features of R1 and R2
measure the similarity of R1 and R2 by the cosine
of the angle ? between r1 and r2

35
2 Solving Analogy Problems?????

Generate vector for each word pair
Joining terms
X for Y", Y with X", X in the Y", Y on X
vector
log(hit1), log(hit2), log(hit128)

64 joining terms
search
phrases
hits
log
Word Pair AB
vector
36
2 Solving Analogy Problems experiment
37
2 Solving Analogy Problems experiment
38
3 Noun-Modifier Semantic Relations

First attempt to classify semantic relations
without a lexicon.

39
30 Semantic Relations of training data
40
3 Noun-Modifier Semantic Relations algorithm

nearest neighbour supervised learning
nearest neighbour cosine
Cosine (training pair, testing pair)
vector of 128 elements, same joining terms as
before

41
3 Noun-Modifier Semantic RelationsExperiment
for the 30 Classes
42
30 Semantic Relations

F when precision and recall are balanced
26.5
F for random guessing
3.3
much better than random guessing
but still much room for improvement
30 classes is hard
too many possibilities for confusing classes
try 5 classes instead
group classes together

43
5 Semantic Relations
44
F for the 5 Classes
45
5 Semantic Relations

F when precision and recall are balanced
43.2
F for random guessing
20.0
better than random guessing
better than 30 classes
26.5
but still room for improvement

46
Execution Time

experiments presented here required 76,800
queries to AltaVista
600 word pairs
128 queries per word pair
76,800 queries
as courtesy to AltaVista, inserted a five second
delay between each query
processing 76,800 queries took about five days

47
Conclusion

The cosine metric in the VSM used to
Analogy
Classify semantic relations
It performs much better than random guessing, but
below human levels.

48
State of the art
accuracy HYBRID (2003) VSM (2005) HUMAN
Analogies 45 47 57
F-measure VSM (2005)
Noun-Modifier (5 classes) 43.2
49
2006aSimilarity of Semantic Relations

Computational Linguistics, 32(3)379416.

50
1 Introduction

Latent Relational Analysis (LRA)
LRA extends the VSM approach of Turney and
Littman (2005) in three ways
The connecting patterns are derived automatically
from the corpus, instead of using a fixed set of
patterns.
Singular Value Decomposition (SVD) is used to
smooth the frequency data.
automatically generated synonyms are used to
explore variations of the word pairs.

51
2 A short description of LRA?????

Generate vector for each word pair

64 joining terms
search
vector
phrases
hits
log
Word Pair AB
?log(hit)
????? pattern
??
AB, AB ?????
SVD
Calculate avg(cosine)
52
3 Experiment Word Analogy Questions Baseline LSA

Matrix 17,232 8,000, density of 5.8
Time required 2094936, 9 days
Performance

53
Experiment Word Analogy Questions LSA vs. VSM

Corpus size
AltaVista 51011 English words
WMTS 51010 English words

54
Experiment Word Analogy Questions Varying the
Parameters
55
Experiment Word Analogy Questions Ablation
Experiments

No SVD not significant, but maybe significant
with more word pairs
No synonyms recall drops
No both recall drops
VSM drop is significant

56
Experiments with Noun-Modi?er Relations

Dataset
600 noun-modi?er pairs, hand-labeled with 30
classes of semantic relations
Algorithm
Baseline LRA with Single Nearest Neighbor
LRA a distance (nearness) measure

57
(No Transcript)
58
Discussion

For Word Analogy Questions
Performance is not yet be adequate for practical
application
Speed
For noun-modifier classification
More hand-labeled data, but its expensive
the choice of classification scheme for the
semantic relations
Hybrid approach
combine the corpus-based approach of LRA with the
lexicon-based approach of Veale (2004)

59
Conclusion of 2006a

LRA, extend the VSM (2005) in
Patterns are derived automatically
SVD is used to smooth and compress data.
automatically generated synonyms are used to
explore variations of the word pairs.

60
State of the art
accuracy HYBRID (2003) VSM (2005) LRA (2006a) HUMAN
Analogies 45 47 56.8 57
F-measure VSM (2005) LRA (2006a)
Noun-Modifier (5 classes) 43.2 54.6
61
2006bExpressing Implicit Semantic Relations
without Supervision

Coling/ACL-06

62
Introduction

Hearst (1992) pattern XY
Pattern Y such as the X can be used to mine
large text corpora for hypernym-hyponym
Search using the pattern Y such as the X and
find the string bird such as the ostrich, then
we can infer that ostrich is a hyponym of
bird.
Here we consider the inverse of this problem
XY pattern
Can we mine a large text corpus for patterns
that express the implicit relations between X
and Y?

63
Introduction

Discovering high quality patterns
Pertinence measure of quality
Reliable for mining further word pairs with the
same semantic relations

64
2 Pertinence

the first formal measure of quality for text
mining patterns.
a set of word pairs
a set of patterns
Pi is pertinent to XjYj
if highly typical word pairs XkYk for the
pattern Pi tend to be relationally similar to
XjYj
Pertinence tends to be highest with
unambiguous patterns

65
2 Pertinence ??

fk,I is the number of occurrences in a corpus of
the word pair XkYk with the pattern Pi
Smoothing

?????
66
3 Related Work

Hearst (1992)
describes a method for finding patterns like Y
such as the X.
but her method requires human judgment.
Riloff and Jones (1999)
use a mutual bootstrapping technique that can
find patterns automatically
but the bootstrapping requires an initial seed of
manually chosen examples.
Other works all require training examples or
initial seed patterns for each relation

67
3 Related Work

Turney (2006a) LRA
maps each pair XY to a high-dimensional vector
v, then calculate the cosine.
Pertinence is based on it
A limitation
the semantic content of the vectors is difficult
to interpret

68
The Algorithm

1. Find phrases
2. Generate patterns
Note pattern frequency (TF)
A local frequency count
3. Count pair frequency
Its a global frequency count (DF)
4. Map pairs to rows
both for XjYj and YjXj
5. Map patterns to columns
drop all patterns with a pair frequency
less than 10
1,706,845 distinct patterns 42,032 patterns

69
The Algorithm

6. Build a sparse matrix
Element is frequency
7. Calculate entropy log and entropy
gives more weight to patterns that vary
substantially in frequency for each pair.
8. Apply SVD
9. Calculate cosines
10. Calculate conditional probabilities
For every word pair and every pattern
11. Calculate pertinence

70
The Algorithm?????

????? pattern list????

??, ??patterns?
??1, pattern list1 ??n, pattern listn
??, ??
??
??
71
5 Experiments with Word Analogies

Dataset
374 college-level multiple-choice word
analogies, taken from the SAT test.
6374 2244 pairs
4194rows 84,064 columns
The sparse matrix density is 0.91

Score ( rankstem rankchoice ) / 2
72
(No Transcript)
73

the four highest ranking patterns for the
stem and solution for the first example

the top five pairs match the pattern Y such
as the X.

75
Comparing with other measures
76
Experiments with Noun-Modifiers
77
Method and Result

Method
A single nearest neighbour algorithm with
leave-one-out cross-validation.
The distance between two noun-modifier pairs is
measured by the average rank of their best
shared pattern.
Result

78
More

For the 5 general classes

79
Comparing with other measures
80
Discussion

Time
Word Analogies 5 hours, vs. 5 days (2005), 9
days(2006a)
Noun-Modifiers 9 hours
the majority of the time was spent in SEARCHING
Performance
Near the level of the average senior high school
student (54.6 vs. 57)
For applications such as building a thesaurus,
lexicon, or ontology, this level of
performance suggests that our algorithm could
assist, but not replace, a human expert.

81
Conclusion

LRA is a black box
The main contribution of this paper is the idea
of pertinence
use it to find patterns that express the
implicit semantic relations between two words.

82
State of the art
accuracy HYBRID (2003) VSM (2005) LRA (2006a) pertinence (2006b) HUMAN
Analogies 45 47 56.8 55.7 57
F-measure VSM (2005) LRA (2006a) pertinence (2006b)
Noun-Modifier (5 classes) 43.2 54.6 50.2
83
2008A Uniform Approach to Analogies, Synonyms,
Antonyms,and Associations

Proceedings of the 22nd International Conference
on Computational Linguistics (Coling 2008),
August 2008, Manchester, UK, Pages 905-912

84
1 Introduction

??????,???????????????
we restrict our attention to
analogous
synonymous
Antonymous
Associated
As far as we know, the algorithm proposed here is
the first attempt to deal with all four tasks
using a uniform approach.

85
1 Introduction idea

Analogous
Synonymous
XY is analogous to the pair leviedimposed
Antonymous
XY is analogous to the pair blackwhite
Associated
XY is analogous to the pair doctorhospital

86
1 Introduction Why not WordNet?

WordNet contains all of the needed relations.
Corpus-based algorithm is BETTER than lexicon
answer 374 multiple-choice SAT analogy questions
WordNet (Veale, 2004) 43
corpus-based (Turney, 2006a) 56
Less human labor
Easy to extend to other languages

87
1 Introduction experiments

SAT college entrance test
TOFEL
ESL
a set of word pairs that are labeled similar,
associated, and both, developed for experiments
in cognitive psychology

88
2 Algorithm PairClass

view the task of recognizing word analogies
as a problem of classifying word pairs
standard classification problem for supervised
machine learning

89
2 Algorithm Resource

Corpus
5 1010 words, consisting of web pages gathered
by a web crawler, gathered by Clarke,CharlesL.A.,
2003
Wumpus
an efficient search engine for passage retrieval
from large corpora. (http//www.wumpus-search.org/
)
to study issues that arise in the context of
indexing dynamic text collections in multi-user
environments.

90
2 Algorithm PairClass
training set testing set
0 to 1 words X 0 to 3 words Y 0 to 1 words
masonstone
the mason cut the stone with
Step1 generate morphological variations
Step 2 search in a large corpus for all phrases
Step 3 generate patterns
masonsstones
the X cut Y with X the Y
2(n-2) patterns
SMO RBF algorithm
Step 4 reduce the number of patterns
Step 5 generate feature vectors
Step 6 apply a standard supervised learning
algorithm Weka
top kN patterns k 20
91
PairClass vs. LSA(Turney, 2006a)

PairClass does not use a lexicon to find synonyms
for the input word pairs.
a pure corpus-based algorithm can handle synonyms
without a lexicon.
PairClass uses a support vector machine (SVM)
instead of a nearest neighbour (NN) learning
algorithm.
PairClass does not use SVD to smooth the feature
vectors.
It has been our experience that SVD is not
necessary with SVMs.

Measure of similarity
PairClass probability estimates, more useful
Turney (2006) cosine
The automatically generated patterns are slightly
more general
PairClass 0 to 1 words X 0 to 3 words Y 0
to 1 words
Turney (2006) X 0 to 3 words Y
The morphological processing in PairClass (Minnen
et al., 2001) is more sophisticated than in
Turney (2006).

93
3 Experiment SAT Analogies

use a set of 374 multiple-choice questions from
the SAT college entrance exam.
Eg.

a binary classification problem
94
3 Experiment SAT Analogies

1st DIFFICULTY no negative examples
the training set consists of one positive example
(the stem pair) and the testing set consists of
five unlabeled examples (the five choice pairs).
Solution
Randomly choose one of the other 373 questions,
to be a negative example
use PairClass to estimate the probability that
each testing example is positive, and we guess
the testing example with the highest probability.

95
(No Transcript)
96
3 Experiment SAT Analogies

2nd DIFFICULTY
the algorithm is very unstable, for lack of
examples.
Solution
To increase the stability, we repeat the learning
process 10 times, using a different randomly
chosen negative training example each time.
Average the 10 probability

97
PairClass accuracy of 52.1
52.1
98
3 Experiment TOEFL Synonyms

Recognizing synonyms
a set of 80 multiple-choice synonym questions
from the TOEFL

View it as a binary classification problem

100
3 Experiment TOEFL Synonyms

80 questions, 80 positive, 240 negative
apply PairClass using ten-fold cross-validation
In each random fold, 90 of the pairs are used
for training and 10 are used for testing.
For each fold, the model that is learned from the
training set is used to assign probabilities to
the pairs in the testing set.
They are non-overlapping, so can cover the whole
dataset.
Choice the one with hightest probability

101
PairClass accuracy of 76.1
76.1
102
3 Experiment Synonyms and Antonyms

a set of 136 ESL practice questions

103
3 Experiment Synonyms and Antonyms

By patterns hand-coded
Lin et al. (2003)
two patterns, from X to Y and either X or Y
.
Antonyms they occasionally appear in a large
corpus in one of these two patterns
Synonyms very rare to appear in these patterns.
PairClass
automatically

104
3 Experiment Synonyms and Antonyms

RESULT
PairClass ten-fold cross-validation
accuracy of 75.0 (ten-fold cross-validation)
Baseline
accuracy of 65.4 (Always guessing the majority
class)
NO COMPARISON

105
3 Experiment Similar, Associated, and Both

Lund et al. (1995) evaluated their corpus-based
algorithm for measuring word similarity with word
pairs that were labeled similar, associated, or
both.
These 144 labeled pairs were originally created
for cognitive psychology experiments with human
subjects

106
3 Experiment Similar, Associated, and Both

Lund et al. (1995)
did not measure the accuracy
showed that their algorithms similarity scores
were correlated with the response times of human
subjects in priming tests.
PairClass with ten-fold cross-validation
accuracy of 77.1
Baseline
guessing the majority and Randomly guessing
33.3
Since the three classes are of equal size

107
3 Experiment summary

For the first two experiments
PairClass is not the best,
But it performs competitively
For the second two experiments,
PairClass performs significantly above the
baselines.

108
State of the art
YEAR ?? ?? synonym analogy
2001 PMI-IR Corpus-based 73.75
2003 PR Hybrid 97.50
2005 VSM Corpus-based 47.1
2006a LRA Corpus-based 56.1
2006b PERT Corpus-based 53.5
2008 PairClass Corpus-based 76.1 52.1
HUMAN 64.5 57.0
109
?????o_0