Overview of Peter D. Turney - PowerPoint PPT Presentation

1 / 109
About This Presentation
Title:

Overview of Peter D. Turney

Description:

Synonym, Antonym, Hypernym, Hyponym, Meronym:substance, Meronym:part, Meronym: ... 'Y such as the X' can be used to mine large text corpora for hypernym-hyponym ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 110
Provided by: iirRu
Category:
Tags: overview | peter | turney

less

Transcript and Presenter's Notes

Title: Overview of Peter D. Turney


1
Overview of Peter D. Turneys Work on Similarity
  • From 2001-2008

2
similarity
  • Attributional similarity (2001 - 2003)
  • the degree to which two words are synonymous
  • also known as
  • Semantic relatedness and semantic association
  • Relational similarity (2005 - 2008)
  • the degree to which two relations are analogous

3
Objective evaluation of the approaches by
  • Attributional similarity
  • 80 TOFEL Synonym questions
  • Relational similarity
  • 374 SAT analogy questions

4
2001 Mining the Web for Synonyms PMI-IR versus
LSA on TOEFL
  • In Proceedings of the 12th European
  • Conference on Machine Learning,
  • pages 491502, Springer, Berlin, 2001.

5
1 Introduction
  • ?????
  • ???????????,????????????????????
  • ??????co-occurrence
  • a word is characterized by the company it keeps

6
1 Introduction idea
  • ?????problem??????choice1, choice2, , choicen
  • ??choicei?score(choicei),???????????
  • uses Pointwise Mutual Information (PMI)
  • to analyze statistical data collected by
    Information Retrieval (IR).

7
2 formula
  • Score 1
  • Score 2 NEAR???????

8
2 formula
  • Score 3 ??????big vs. small
  • Score 4 ?????context
  • context word???????(?????)

9
3 Experiments
  • Compare with
  • LSA Latent Semantic Analysis
  • ????????????X61,000 30,473
  • ????????
  • ????SVD
  • Element tfidf weight
  • Similarity cosine
  • ???TOFEL??

10
  • Dataset
  • 80?TOFEL??
  • 50?ESL???

11
3 Experiments PMI-IR Vs. LSA
  • ????
  • PMI-IR????,???
  • 2s/query 8 querys,???????????
  • ??2S
  • LSA???
  • 61,000 30,473???61,000 300,UNIX Station???????

12
3 Experiments
  • 80?TOFEL??, 50?ESL???
  • PMI-IR 73.75(59/80) 74(37/50)
  • ??? 64.5(51.6/80)
  • LSA 64.4(51.5/80)
  • ?? PMI-IR WIN 10
  • ??
  • NEAR???,Smaller chunk size
  • LSA 64.4
  • PMI-IR with AND 62.5
  • PMI-IR with NEAR 72.5

13
4 Conclusion
  • ??PMI?IR
  • ??????????????
  • PMI
  • ?????????
  • ??????????

14
2003Combining independent modules in lexical
multiple-choice problems
  • In RANLP-03, pages 482489,
  • Borovets, Bulgaria
  • (RANLP Recent Advances in Natural Language
    Processing )

15
1 Introduction
  • There are several approaches to natural language
    problems
  • No one will be the best for all problem
    instances.
  • How about combine them?

16
1 Introduction
  • two main contributions
  • introduces and evaluates several new modules
  • for answering multiple-choice synonym questions
    and analogy questions.
  • 3 merging rules
  • presents a novel product rule
  • compares it with other 2 similar merging rules.

17
2 Merging rules the parameter
  • The parameter of the rules w
  • phij gt 0 represents the probability
  • ? i ?module 1 lt i lt n
  • ? h ? instance 1 lt h lt m.
  • ? j ?choice 1 lt j lt k
  • Dh,wj be the probability
  • assigned by the merging rule to choice j of
    training instance h when the weights are set to
    w.
  • 1lt a(h) lt k be the correct answer for instance

18
2 Merging rules old
  • mixture rule very common
  • ???
  • logarithmic rule

19
2 Merging rules novel
  • product rule

20
3 Synonym dataset
  • a training set of 431 4-choice synonym questions
  • randomly divided them into 331 training questions
    and 100 testing questions.
  • Optimize w with the training set

21
3 Synonym Modules
  • LSA
  • PMI-IR
  • Thesaurus
  • queries Wordsmyth (www.wordsmyth.net)
  • Create synonyms lists for both stem and choices
  • scored them by their overlap
  • Connector
  • used summary pages from querying Google with a
    pair of words
  • Weighted sum of
  • the times when the words appear separated by a
    symbol
  • , , , ,, , /, ,, (,
  • means, defined, equals, synonym, whitespace, and
  • the number of times dictionary or thesaurus
    appear

22
3 Synonym combine results
  • 3 rules accuracies are nearly identical
  • the product and logarithmic rules assign higher
    probabilities to correct answers
  • as evidenced by the mean likelihood.

23
3 Synonym compare with other approaches
24
4 Analogies dataset
  • 374 5-choice instances
  • randomly split the collection into 274 training
    instances and 100 testing instances.
  • Eg. catmeow
  • (a) mousescamper,
  • (b) birdpeck,
  • (c) dogbark,
  • (d) horsegroom,
  • (e) lionscratch

25
4 Analogies modules
  • Phrase vectors
  • Create vector r to present the relationship
    between X and Y.
  • Phrases with 128 patterns
  • Eg. X for Y", Y with X", X in the Y", Y on X
  • Query and record the number of hits
  • Measure by cosine
  • Thesaurus paths (WordNet)
  • degree of similarity between paths

26
4 Analogies combine results
  • Lexical relation modules
  • a set of more specific modules using the WordNet
  • 9 modules Each checks a relationship
  • Synonym, Antonym, Hypernym, Hyponym,
    Meronymsubstance, Meronympart, Meronymmember,
    Holonymsubstance, Holonymmember.
  • Check the stem first, then the choices
  • Similarity
  • Make use of definition
  • Similaritydict uses dictionary.com and
  • Similaritywordsmyth uses wordsmyth.net
  • Given ABCD, similarity sim (A, C) sim (B,
    D)

27
(No Transcript)
28
5 Conclusion
  • applied three trained merging rules to TOEFL
    questions
  • Accuracy 97.5
  • provided first results on a challenging analogy
    task with a set of novel modules that use both
    lexical databases and statistical information.
  • Accuracy 45
  • the popular mixture rule was consistently weaker
    than the logarithmic and product rules at
    assigning high probabilities to correct answers.

29
State of the art (accuracy)
LSA HUMAN PIM-IR (2001) HYBRID (2003)
Synonym question 64.4 64.5 73.75 97.5
HYBRID HUMAN
Analogies 45 57
30
2005 Corpus-based Learning of Analogies and
Semantic Relations
  • IJCAI 2005
  • Proceedings of the Nineteenth International Joint
    Conference on Artificial Intelligence, Edinburgh,
    Scotland, UK, July 30-August 5, 2005.

31
1 Introduction
  • Verbal analogy VSM
  • AB CD
  • The novelty of the paper is the application of
    VSM to measure the similarity between
    relationships.
  • Noun-modifier pairs relations supervised nearest
    neighbour algorithm
  • Dataset Nastase and Szpakowicz (2003), 600
    none-modifier pairs.

32
1 Introduction examples
  • Analogy
  • Noun-modifier pairs relations
  • Laser printer
  • Relation instrument

33
2 Solving Analogy Problems
  • assign scores to candidate analogies ABCD
  • For multiple-choice questions, guess highest
    scoring choice
  • Sim(R1, R2)
  • difficulty is that R1 and R2 are implicit
  • attempt to learn R1 and R2 using unsupervised
    learning from a very large corpus

34
2 Solving Analogy Problems Vector Space Model
  • create vectors, r1 and r2, that represent
    features of R1 and R2
  • measure the similarity of R1 and R2 by the cosine
    of the angle ? between r1 and r2

35
2 Solving Analogy Problems?????
  • Generate vector for each word pair
  • Joining terms
  • X for Y", Y with X", X in the Y", Y on X
  • vector
  • log(hit1), log(hit2), log(hit128)

64 joining terms
search
phrases
hits
log
Word Pair AB
vector
36
2 Solving Analogy Problems experiment
37
2 Solving Analogy Problems experiment
38
3 Noun-Modifier Semantic Relations
  • First attempt to classify semantic relations
    without a lexicon.

39
30 Semantic Relations of training data
40
3 Noun-Modifier Semantic Relations algorithm
  • nearest neighbour supervised learning
  • nearest neighbour cosine
  • Cosine (training pair, testing pair)
  • vector of 128 elements, same joining terms as
    before

41
3 Noun-Modifier Semantic RelationsExperiment
for the 30 Classes
42
30 Semantic Relations
  • F when precision and recall are balanced
  • 26.5
  • F for random guessing
  • 3.3
  • much better than random guessing
  • but still much room for improvement
  • 30 classes is hard
  • too many possibilities for confusing classes
  • try 5 classes instead
  • group classes together

43
5 Semantic Relations
44
F for the 5 Classes
45
5 Semantic Relations
  • F when precision and recall are balanced
  • 43.2
  • F for random guessing
  • 20.0
  • better than random guessing
  • better than 30 classes
  • 26.5
  • but still room for improvement

46
Execution Time
  • experiments presented here required 76,800
    queries to AltaVista
  • 600 word pairs
  • 128 queries per word pair
  • 76,800 queries
  • as courtesy to AltaVista, inserted a five second
    delay between each query
  • processing 76,800 queries took about five days

47
Conclusion
  • The cosine metric in the VSM used to
  • Analogy
  • Classify semantic relations
  • It performs much better than random guessing, but
    below human levels.

48
State of the art
accuracy HYBRID (2003) VSM (2005) HUMAN
Analogies 45 47 57
F-measure VSM (2005)
Noun-Modifier (5 classes) 43.2
49
2006aSimilarity of Semantic Relations
  • Computational Linguistics, 32(3)379416.

50
1 Introduction
  • Latent Relational Analysis (LRA)
  • LRA extends the VSM approach of Turney and
    Littman (2005) in three ways
  • The connecting patterns are derived automatically
    from the corpus, instead of using a fixed set of
    patterns.
  • Singular Value Decomposition (SVD) is used to
    smooth the frequency data.
  • automatically generated synonyms are used to
    explore variations of the word pairs.

51
2 A short description of LRA?????
  • Generate vector for each word pair

64 joining terms
search
vector
phrases
hits
log
Word Pair AB
?log(hit)
????? pattern
??
AB, AB ?????
SVD
Calculate avg(cosine)
52
3 Experiment Word Analogy Questions Baseline LSA
  • Matrix 17,232 8,000, density of 5.8
  • Time required 2094936, 9 days
  • Performance

53
Experiment Word Analogy Questions LSA vs. VSM
  • Corpus size
  • AltaVista 51011 English words
  • WMTS 51010 English words

54
Experiment Word Analogy Questions Varying the
Parameters
55
Experiment Word Analogy Questions Ablation
Experiments
  • No SVD not significant, but maybe significant
    with more word pairs
  • No synonyms recall drops
  • No both recall drops
  • VSM drop is significant

56
Experiments with Noun-Modi?er Relations
  • Dataset
  • 600 noun-modi?er pairs, hand-labeled with 30
    classes of semantic relations
  • Algorithm
  • Baseline LRA with Single Nearest Neighbor
  • LRA a distance (nearness) measure

57
(No Transcript)
58
Discussion
  • For Word Analogy Questions
  • Performance is not yet be adequate for practical
    application
  • Speed
  • For noun-modifier classification
  • More hand-labeled data, but its expensive
  • the choice of classification scheme for the
    semantic relations
  • Hybrid approach
  • combine the corpus-based approach of LRA with the
    lexicon-based approach of Veale (2004)

59
Conclusion of 2006a
  • LRA, extend the VSM (2005) in
  • Patterns are derived automatically
  • SVD is used to smooth and compress data.
  • automatically generated synonyms are used to
    explore variations of the word pairs.

60
State of the art
accuracy HYBRID (2003) VSM (2005) LRA (2006a) HUMAN
Analogies 45 47 56.8 57
F-measure VSM (2005) LRA (2006a)
Noun-Modifier (5 classes) 43.2 54.6
61
2006bExpressing Implicit Semantic Relations
without Supervision
  • Coling/ACL-06

62
Introduction
  • Hearst (1992) pattern XY
  • Pattern Y such as the X can be used to mine
    large text corpora for hypernym-hyponym
  • Search using the pattern Y such as the X and
    find the string bird such as the ostrich, then
    we can infer that ostrich is a hyponym of
    bird.
  • Here we consider the inverse of this problem
    XY pattern
  • Can we mine a large text corpus for patterns
    that express the implicit relations between X
    and Y?

63
Introduction
  • Discovering high quality patterns
  • Pertinence measure of quality
  • Reliable for mining further word pairs with the
    same semantic relations

64
2 Pertinence
  • the first formal measure of quality for text
    mining patterns.
  • a set of word pairs
  • a set of patterns
  • Pi is pertinent to XjYj
  • if highly typical word pairs XkYk for the
    pattern Pi tend to be relationally similar to
    XjYj
  • Pertinence tends to be highest with
    unambiguous patterns

65
2 Pertinence ??
  • fk,I is the number of occurrences in a corpus of
    the word pair XkYk with the pattern Pi
  • Smoothing

?????
66
3 Related Work
  • Hearst (1992)
  • describes a method for finding patterns like Y
    such as the X.
  • but her method requires human judgment.
  • Riloff and Jones (1999)
  • use a mutual bootstrapping technique that can
    find patterns automatically
  • but the bootstrapping requires an initial seed of
    manually chosen examples.
  • Other works all require training examples or
    initial seed patterns for each relation

67
3 Related Work
  • Turney (2006a) LRA
  • maps each pair XY to a high-dimensional vector
    v, then calculate the cosine.
  • Pertinence is based on it
  • A limitation
  • the semantic content of the vectors is difficult
    to interpret

68
The Algorithm
  • 1. Find phrases
  • 2. Generate patterns
  • Note pattern frequency (TF)
  • A local frequency count
  • 3. Count pair frequency
  • Its a global frequency count (DF)
  • 4. Map pairs to rows
  • both for XjYj and YjXj
  • 5. Map patterns to columns
  • drop all patterns with a pair frequency
    less than 10
  • 1,706,845 distinct patterns 42,032 patterns

69
The Algorithm
  • 6. Build a sparse matrix
  • Element is frequency
  • 7. Calculate entropy log and entropy
  • gives more weight to patterns that vary
    substantially in frequency for each pair.
  • 8. Apply SVD
  • 9. Calculate cosines
  • 10. Calculate conditional probabilities
  • For every word pair and every pattern
  • 11. Calculate pertinence

70
The Algorithm?????
  • ????? pattern list????

??, ??patterns?
??1, pattern list1 ??n, pattern listn
??, ??
??
??
71
5 Experiments with Word Analogies
  • Dataset
  • 374 college-level multiple-choice word
    analogies, taken from the SAT test.
  • 6374 2244 pairs
  • 4194rows 84,064 columns
  • The sparse matrix density is 0.91

Score ( rankstem rankchoice ) / 2
72
(No Transcript)
73
  • the four highest ranking patterns for the
    stem and solution for the first example

74
  • the top five pairs match the pattern Y such
    as the X.

75
Comparing with other measures
76
Experiments with Noun-Modifiers
77
Method and Result
  • Method
  • A single nearest neighbour algorithm with
    leave-one-out cross-validation.
  • The distance between two noun-modifier pairs is
    measured by the average rank of their best
    shared pattern.
  • Result

78
More
  • For the 5 general classes

79
Comparing with other measures
80
Discussion
  • Time
  • Word Analogies 5 hours, vs. 5 days (2005), 9
    days(2006a)
  • Noun-Modifiers 9 hours
  • the majority of the time was spent in SEARCHING
  • Performance
  • Near the level of the average senior high school
    student (54.6 vs. 57)
  • For applications such as building a thesaurus,
    lexicon, or ontology, this level of
    performance suggests that our algorithm could
    assist, but not replace, a human expert.

81
Conclusion
  • LRA is a black box
  • The main contribution of this paper is the idea
    of pertinence
  • use it to find patterns that express the
    implicit semantic relations between two words.

82
State of the art
accuracy HYBRID (2003) VSM (2005) LRA (2006a) pertinence (2006b) HUMAN
Analogies 45 47 56.8 55.7 57
F-measure VSM (2005) LRA (2006a) pertinence (2006b)
Noun-Modifier (5 classes) 43.2 54.6 50.2
83
2008A Uniform Approach to Analogies, Synonyms,
Antonyms,and Associations
  • Proceedings of the 22nd International Conference
    on Computational Linguistics (Coling 2008),
    August 2008, Manchester, UK, Pages 905-912

84
1 Introduction
  • ??????,???????????????
  • we restrict our attention to
  • analogous
  • synonymous
  • Antonymous
  • Associated
  • As far as we know, the algorithm proposed here is
    the first attempt to deal with all four tasks
    using a uniform approach.

85
1 Introduction idea
  • Analogous
  • Synonymous
  • XY is analogous to the pair leviedimposed
  • Antonymous
  • XY is analogous to the pair blackwhite
  • Associated
  • XY is analogous to the pair doctorhospital

86
1 Introduction Why not WordNet?
  • WordNet contains all of the needed relations.
  • Corpus-based algorithm is BETTER than lexicon
  • answer 374 multiple-choice SAT analogy questions
  • WordNet (Veale, 2004) 43
  • corpus-based (Turney, 2006a) 56
  • Less human labor
  • Easy to extend to other languages

87
1 Introduction experiments
  • SAT college entrance test
  • TOFEL
  • ESL
  • a set of word pairs that are labeled similar,
    associated, and both, developed for experiments
    in cognitive psychology

88
2 Algorithm PairClass
  • view the task of recognizing word analogies
  • as a problem of classifying word pairs
  • standard classification problem for supervised
    machine learning

89
2 Algorithm Resource
  • Corpus
  • 5 1010 words, consisting of web pages gathered
    by a web crawler, gathered by Clarke,CharlesL.A.,
    2003
  • Wumpus
  • an efficient search engine for passage retrieval
    from large corpora. (http//www.wumpus-search.org/
    )
  • to study issues that arise in the context of
    indexing dynamic text collections in multi-user
    environments.

90
2 Algorithm PairClass
training set testing set
0 to 1 words X 0 to 3 words Y 0 to 1 words
masonstone
the mason cut the stone with
Step1 generate morphological variations
Step 2 search in a large corpus for all phrases
Step 3 generate patterns
masonsstones
the X cut Y with X the Y
2(n-2) patterns
SMO RBF algorithm
Step 4 reduce the number of patterns
Step 5 generate feature vectors
Step 6 apply a standard supervised learning
algorithm Weka
top kN patterns k 20
91
PairClass vs. LSA(Turney, 2006a)
  • PairClass does not use a lexicon to find synonyms
    for the input word pairs.
  • a pure corpus-based algorithm can handle synonyms
    without a lexicon.
  • PairClass uses a support vector machine (SVM)
    instead of a nearest neighbour (NN) learning
    algorithm.
  • PairClass does not use SVD to smooth the feature
    vectors.
  • It has been our experience that SVD is not
    necessary with SVMs.

92
  • Measure of similarity
  • PairClass probability estimates, more useful
  • Turney (2006) cosine
  • The automatically generated patterns are slightly
    more general
  • PairClass 0 to 1 words X 0 to 3 words Y 0
    to 1 words
  • Turney (2006) X 0 to 3 words Y
  • The morphological processing in PairClass (Minnen
    et al., 2001) is more sophisticated than in
    Turney (2006).

93
3 Experiment SAT Analogies
  • use a set of 374 multiple-choice questions from
    the SAT college entrance exam.
  • Eg.

a binary classification problem
94
3 Experiment SAT Analogies
  • 1st DIFFICULTY no negative examples
  • the training set consists of one positive example
    (the stem pair) and the testing set consists of
    five unlabeled examples (the five choice pairs).
  • Solution
  • Randomly choose one of the other 373 questions,
    to be a negative example
  • use PairClass to estimate the probability that
    each testing example is positive, and we guess
    the testing example with the highest probability.

95
(No Transcript)
96
3 Experiment SAT Analogies
  • 2nd DIFFICULTY
  • the algorithm is very unstable, for lack of
    examples.
  • Solution
  • To increase the stability, we repeat the learning
    process 10 times, using a different randomly
    chosen negative training example each time.
  • Average the 10 probability

97
PairClass accuracy of 52.1
52.1
98
3 Experiment TOEFL Synonyms
  • Recognizing synonyms
  • a set of 80 multiple-choice synonym questions
    from the TOEFL

99
  • View it as a binary classification problem

100
3 Experiment TOEFL Synonyms
  • 80 questions, 80 positive, 240 negative
  • apply PairClass using ten-fold cross-validation
  • In each random fold, 90 of the pairs are used
    for training and 10 are used for testing.
  • For each fold, the model that is learned from the
    training set is used to assign probabilities to
    the pairs in the testing set.
  • They are non-overlapping, so can cover the whole
    dataset.
  • Choice the one with hightest probability

101
PairClass accuracy of 76.1
76.1
102
3 Experiment Synonyms and Antonyms
  • a set of 136 ESL practice questions

103
3 Experiment Synonyms and Antonyms
  • By patterns hand-coded
  • Lin et al. (2003)
  • two patterns, from X to Y and either X or Y
    .
  • Antonyms they occasionally appear in a large
    corpus in one of these two patterns
  • Synonyms very rare to appear in these patterns.
  • PairClass
  • automatically

104
3 Experiment Synonyms and Antonyms
  • RESULT
  • PairClass ten-fold cross-validation
  • accuracy of 75.0 (ten-fold cross-validation)
  • Baseline
  • accuracy of 65.4 (Always guessing the majority
    class)
  • NO COMPARISON

105
3 Experiment Similar, Associated, and Both
  • Lund et al. (1995) evaluated their corpus-based
    algorithm for measuring word similarity with word
    pairs that were labeled similar, associated, or
    both.
  • These 144 labeled pairs were originally created
    for cognitive psychology experiments with human
    subjects

106
3 Experiment Similar, Associated, and Both
  • Lund et al. (1995)
  • did not measure the accuracy
  • showed that their algorithms similarity scores
    were correlated with the response times of human
    subjects in priming tests.
  • PairClass with ten-fold cross-validation
  • accuracy of 77.1
  • Baseline
  • guessing the majority and Randomly guessing
    33.3
  • Since the three classes are of equal size

107
3 Experiment summary
  • For the first two experiments
  • PairClass is not the best,
  • But it performs competitively
  • For the second two experiments,
  • PairClass performs significantly above the
    baselines.

108
State of the art
YEAR ?? ?? synonym analogy
2001 PMI-IR Corpus-based 73.75
2003 PR Hybrid 97.50
2005 VSM Corpus-based 47.1
2006a LRA Corpus-based 56.1
2006b PERT Corpus-based 53.5
2008 PairClass Corpus-based 76.1 52.1
HUMAN 64.5 57.0
109
?????o_0
  • Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com