Word Sense Disambiguation

About This Presentation

Title:

Word Sense Disambiguation

Description:

Word Sense Disambiguation German Rigau i Claramunt http://www.lsi.upc.es/~rigau TALP Research Center Departament de Llenguatges i Sistemes Inform tics – PowerPoint PPT presentation

Number of Views:180

Avg rating:3.0/5.0

Slides: 71

Provided by: Germ138

Learn more at: https://www.cs.upc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Word Sense Disambiguation

1
Word Sense Disambiguation
German Rigau i Claramunt http//www.lsi.upc.es/ri
gau TALP Research Center Departament de
Llenguatges i Sistemes Informàtics Universitat
Politècnica de Catalunya
2
WSDOutline

Setting
Unsupervised WSD systems
Supervised WSD systems
Using the Web and EWN to WSD

3
Using the Web and EWN for WSDSetting

Word Sense Disambiguation
is the problem of assigning the appropriate
meaning (sense) to a given word in a text
WSD is perhaps the great open problem at the
lexical level of NLP (Resnik Yarowsky 97)
WSD resolution would allow
acquisition of subcategorisation structure
parsing
improve existing Information Retrieval
Machine Translation
Natural Language Understanding

4
Using the Web and EWN for WSDSetting

Example
Senses (WordNet 1.5.)
age 1 the length of time something (or someone)
has existed "his age was 71" "it was replaced
because of its age"
age 2 a historic period "the Victorian age"
"we live in a litigious age
DSO Corpora examples (Ng 96)
He was mad about stars at the gtgt age 1 ltlt of
nine .
About 20,000 years ago the last ice gtgt age 2 ltlt
ended .

5
Using the Web and EWN for WSDSetting

Knowledge-Driven WSD (Unsupervised)
knowledge-based WSD
100 coverage
55 accuracy (SensEval-1)
No Training Process
Large scale lexical knowledge resources
WordNet
MRDs
Thesaurus

6
Using the Web and EWN for WSDSetting

Data-Driven (Supervised)
corpus-based WSD
statistical-based WSD
Machine-Learning WSD
no full coverage
75 accuracy (SensEval-1)
Training Process
learning from large amount of sense annotated
corpora
(Ng 97) effort of 16 man/year per year per
language

7
UnsupervisedWord Sense DisambiguationSystems
German Rigau i Claramunt http//www.lsi.upc.es/ri
gau TALP Research Center Departament de
Llenguatges i Sistemes Informàtics Universitat
Politècnica de Catalunya
8
Unsupervised WSD SystemsOutline

Setting
Knowledge-driven WSD methods
MRDs
Thesauri Corpus
LKBs
LKBs Conceptual Distance
LKBs Conceptual Density
LKBs Corpus
Experiments Genus Sense Disambiguation
Future Work

9
Unsupervised WSD SystemsSetting

Knowledge-Driven (Unsupervised)
No Need of large anotated corpora
Tested on unrestricted domains (words and
senses)
- Worst results

10
Unsupervised WSD SystemsMRDs

Lesk Method
(Lesk 86)
Counting word overlaping between context and
senses of the word
(Cowie et al. 92)
simulated annealing for overcomming the
combinatorial explosion using LDOCE
(Wilks Stevenson 97)
simulated annealing
57 accuracy at a sense level

11
Unsupervised WSD SystemsMRDs

Coocurrence Word Vectors
(Wilks et al. 93)
word-context vectors from LDOCE
testing large set of relateness functions
13 senses of word bank
45 accuracy
(Rigau et al. 97)
(Noun) Genus Sense Disambiguation
60 accuracy

12
Unsupervised WSD SystemsMRDs
371.616 conexions 11.8004 9.8 16 elaborado
queso 35 113 10.8938 8.0 23 pasta queso
178 113 10.4846 7.5 25 leche queso 274
113 10.2483 9.2 13 oveja queso 45
113 9.1513 7.6 16 queso sabor 113
160 7.4956 8.3 8 queso tortilla 113
51 6.7732 7.5 8 queso vaca 113 84 6.5830
6.1 12 maíz queso 347 113 6.2208 8.9 5
queso suero 113 21 6.1509 8.8 5
mantequilla queso 22 113 6.1474 7.9 6
compacta queso 50 113 5.9918 7.7 6 picante
queso 55 113 5.9002 9.8 4 manchego queso
9 113 5.6805 7.3 6 cabra queso 75
113 5.6300 5.9 9 pan queso 287 113
13
Unsupervised WSD SystemsThesauri Corpus

(Yarowsky 92)
uses Rogets Thesaurus to partition
Groliers Enciclopedia
1042 categories
92 accuracy for 12 polysemous words
(Yarowsky 95)
seed words
(Liddy Paik 92)
subject-code correlation matrix
122 LDOCE semantic codes
166 sentences of Wall Street Journal
89 correct subject code

14
Unsupervised WSD SystemsLKBs Conceptual
Distance

(Rada et al. 92)
length of the shortest path
(Sussna 93)
(Agirre et al. 94)
(Rigau 94 Rigau et al. 95, 97 Atserias et al.
97)
length of the shortest path
specificity of the concepts

15
Unsupervised WSD SystemsLKBs Conceptual Density

(Agirre Rigau 95, 96)

16
Unsupervised WSD SystemsLKBs Conceptual Density

(Agirre Rigau 95, 96)
length of the shortest path
the depth in the hierarchy
concepts in a dense part of the hierarchy are
relatively closer than those in a more sparse
region.
the measure should be independent of the number
of concepts involved

17
Unsupervised WSD SystemsLKBs Corpus

(Resnik 95)
Information Content
(Richardson et al. 94)
(Jiang Conrath 97)

18
Unsupervised WSD SystemsExperiments Genus Sense
Disambiguation

Unsupervised WSD
Unrestricted WSD (coverage 100)
Eight Heuristics (McRoy 92)
Combining several lexical resources
Combining several methods

19
Unsupervised WSD SystemsExperiments Genus Sense
Disambiguation

0) Monosemous Genus Term
1) Entry Sense Ordering
2) Explicit Semantic Domain
3) Word Matching (Lesk 86)
4) Simple Concordance
5) Coocurrence Word Vectors
6) Semantic Vectors
7) Conceptual Distance

20
Unsupervised WSD SystemsExperiments Genus Sense
Disambiguation

Results

21
Unsupervised WSD SystemsExperiments Genus Sense
Disambiguation

Knowledge provided by each heuristic

22
SupervisedWord Sense DisambiguationSystems
German Rigau i Claramunt http//www.lsi.upc.es/ri
gau TALP Research Center Departament de
Llenguatges i Sistemes Informàtics Universitat
Politècnica de Catalunya
23
WSD using ML algoritmsOutline

Setting
Methodology
Machine Learning algorithms
Naive Bayes (Mooney 98)
Snow (Dagan et al. 97)
Exemplar-based (Ng 97)
LazyBoosting (Escudero et al. 00)
Experimental Results
Naive Bayes vs. Exemplar Based
Portability and Tuning of Supervised WSD
Future Work

24
WSD using ML algorithmsSetting

Data-Driven (Supervised)
Better results
- Need of large corpora
knowledge adquisition bottleneck
(Gale et al. 93, Ng 97)
- Tested on limited domains (words and senses)

25
WSD using ML algorithmsSetting

Current research lines open the bottleneck
Design of efficient example sampling methods
(Engelson Dagan 96 Fujii et al. 98)
Use of WordNet and Web to automatically obtain
examples (Leacock et al. 98 Mihalcea Moldovan
99)
Use of unsupervised methods for estimating
parameters (Pedersen Bruce 98)

26
WSD using ML algorithmsSetting

Contradictory Previous Work
(Mooney, 98)
t-student test of significance
n-fold cross-validation
- Word line with 4,149 examples and 6 senses
(Leacock et al. 93).
- Neither parameter setting nor algorithm
tunning
(Ng 97)
Large corpora (192,800 occurrences of 191
words)
- Direct Test (No n-fold crossvalidation).
- Small set of features.

27
WSD using ML algoritmsOutline

Setting
Methodology
Machine Learning algorithms
Naive Bayes (Mooney 98)
Snow (Dagan et al. 97)
Exemplar-based (Ng 97)
LazyBoosting (Escudero et al. 00)
Experimental Results
Naive Bayes vs. Exemplar Based
Portability and Tuning of Supervised WSD
Future Work

28
WSD using ML algorithms Methodology

Main goals
Study supervised methods for WSD
Use it with Automatically Extracted Examples
from the Web using WordNet
Rigorous direct comparisons
Supervised WSD Methods
Naive Bayes
State-of-the-art accuracy (Mooney 98)
Snow
From Text Categorization (Dagan et al. 97)
Exemplar-based
State-of-the-art accuracy (Ng 97)
Boosting
From Text Categorization (Schapire Singer to
appear, Escudero, Màrquez Rigau 2000)

29
WSD using ML algorithms Methodology

Evaluation (Dietterich 98)
10-fold crossvalidation
t-student test of significance
Data
LDC (Ng 96)
192,800 occurrences of 191 words
(121 nouns 70 verbs)
Avg. Number of senses 7.2 N, 12.6 V, 9.2 (all)
WSJ Corpus (Corpus A)
Brown Corpus (Corpus B)
Sets of attributes
Set A (Ng 97)
Small set of features
No broad-context attributes
Set B ? (Ng 96)
Large set of features
Broad-context attributes

30
WSD using ML algoritmsOutline

Setting
Methodology
Machine Learning algorithms
Naive Bayes (Mooney 98)
Snow (Dagan et al. 97)
Exemplar-based (Ng 97)
LazyBoosting (Escudero et al. 00)
Experimental Results
Naive Bayes vs. Exemplar Based
Portability and Tuning of Supervised WSD
Future Work

31
WSD using ML algorithmsNaive Bayes

Based on Bayes Theorem (Duda Hart 73)
Frequencies used as probabilities
Assumed independence of example features
Smoothing technique (Ng 97)

32
WSD using ML algorithmsExemplar-based WSD

k-NN approach (Ng 96 Ng 97)
Distances
Hamming
Modified Value Difference Metric
MVDM (Cost Salzberg 93)
Variants
Example weighting
Attribute weighting (RLM 91)

k3
33
WSD using ML algorithmsSnow

Snow (Golding Roth 99)
Sparse Network of Winows
On-line learning system
Winow (Littlestone 88)
linear threshold
mistake-driven (when predicted class is wrong)

34
WSD using ML algorithmsSnow
MAX
Winow Sense 1
Winow Sense 2
wf
w-1 average
w242
w1of
w2nuclear
... an average ltage_1gt of 42 ...
... in this ltage_2gt of nuclear ...
35
WSD using ML algorithmsBoosting

AdaBoost.MH (Freund Shapire00)
Combine many simple weak classifiers
(hypothesis)
Weak classifiers are trained sequencially
Each iteration concentrate on the most difficult
cases
Results Better than NB and EB
- Problem Computational Complexity
Time and space grow linearly with number of
examples.
Solution LazyBoosting!

36
WSD using ML algoritmsOutline

Setting
Methodology
Machine Learning algorithms
Naive Bayes (Mooney 98)
Snow (Dagan et al. 97)
Exemplar-based (Ng 97)
LazyBoosting (Escudero et al. 00)
Experimental Results
Naive Bayes vs. Exemplar Based
Portability and Tuning of Supervised WSD
Future Work

37
WSD using ML algorithmsExperimental Results
(LazyBoosting)

Features from Set A (Ng 97)
w-2, w-1 , w1, w2 , (w-2, w-1), (w-1 , w1),
(w1, w2)
15 reference words (10 N, 5 V)
Average
ns ex att
nouns (121) 8.6 1040 3978
verbs (70) 17.9 1266 4432
total (191) 12.1 1115 4150
Accuaracy
MFS NB EB1 EB15 AB750 ABSC
nouns (121) 57.4 71.7 65.8 71.1 73.5 73.4
verbs (70) 46.6 57.6 51.1 58.1 59.3 59.1
total (191) 53.3 66.4 60.2 66.2 68.1 68.0

38
WSD using ML algorithmsExperimental Results
(LazyBoosting)

Accelerating the WeakLearner
Reducing Feature Space
Frequency filtering (Freq)
Discard those features occourring less than N
times
Local frequency filtering (LFreq)
Selects the N most freqeunt features of each
sense
RLM ranking (López de Mantaras 91)
Selects the N most relevant features
Reducing the number of Attributes examined
LazyBoosting
A small proportion of attributes are randomly
selected at each iteration

39
WSD using ML algorithmsExperimental Results
(LazyBoosting)

Accelerating the WeakLearner
All methods perform quite well
many irrelevant attributes in the domain
LFreq is slghly better than Freq
RLM performs better than LFreq and Freq
LazyBoosting is better than all other methods
acceptable performance with 1 of exploration
when looking for a weak rule.
10 achieves the same performance than 100
7 times faster!

40
WSD using ML algorithmsExperimental Results
(LazyBoosting)

7 features from Set A (Ng 97)
w-2, w-1 , w1, w2 , (w-2, w-1), (w-1 , w1),
(w1, w2)
15 reference words (10 N, 5 V)
Average
ns ex att
nouns (121) 8.6 1040 3978
verbs (70) 17.9 1266 4432
total (191) 12.1 1115 4150
Accuaracy
MFS NB EB15 LB10SC
nouns (121) 56.4 68.7 68.0 70.8
verbs (70) 46.7 64.8 64.9 67.5
total (191) 52.3 67.1 66.7 69.5

41
WSD using ML algorithmsExperimental Results (NB
vs EB)

Experiments on Set A with 15 words
Results
Conclusions
NB and EB are better than MFS
k-NN performs better with kgt1
Variants of EB improve the EB
MVDM(cs) metric is better than Hamming distance
EB performs better than NB

42
WSD using ML algorithmsExperimental Results (NB
vs EB)

Experiments on Set B with 15 words
Results
What happened?
Problem with the binary representation of the
broad-context attributes.
Examples are represented with sparse vectors
(5,000 positions).
Two examples coincide in the majority of values.
Biases the similarity measure in favour of
shortest sentences.
Related work Clarified
(Mooney 98)
Poor results of k-NN algorithm
(Ng 96 Ng 97)
Lower results of a system with a large number of
attributes

43
WSD using ML algorithmsExperimental Results (NB
vs EB)

Improving both methods (NB and EB) (Escudero et
al. 00b)
Use only positive information
Treat the broad-context attributes as
multivalued attributes
Let two values
The similarity S between two values has to be
redefined as
This representation allows a very
computationally efficient implementation
Positive Naive Bayes (PNB)
Positive Exemplar-based (PEB)

44
WSD using ML algorithmsExperimental Results (NB
vs EB)

Experiments on Set B with 15 words
Results
Conclusions
PEB improves by 12.2 points the accuracy of EB
PEB is higher than Set A except PEBh,10,e,a
PNB is at least as accurate as NB
The positive approach increases greatly the
efficiency (80 times for NB and 15 for EB) of the
algorithms
PEB accuracy is higher than PNB

45
WSD using ML algorithmsExperimental Results (NB
vs EB)

Global Results (191 words)
Conclusions
In Set A,
The best option is Exemplar-based using MVDM
metric
In Set B,
The best option is Exemplar-based using Hamming
distance and example weighting
MVDM metric has higher accuracy but is currently
computationally prohibitive
Positive Exemplar-based allows the addition of
unordered contextual attributes with an accuracy
improvement
Positive information allows to improve greatly
the efficiency

46
WSD using ML algorithmsExperimental Results
(Portability)

15 features from Set A (Ng 96)
p-3, p-2, p-1 , p1, p2, p3, w-1 , w1, (w-2,
w-1), (w-1 , w1), (w1, w2), (w-3, w-2, w-1),
(w-2, w-1 , w1), (w-1 , w1, w2), (w1, w2 ,
w3)
21 reference words (13 N, 8 V)
DSO Corpus
Wall Street Journal (Corpus A)
Brown Corpus (Corpus B)
7 combinations of training-test sets
AB-AB, AB-A, AB-B
A-A, B-B, A-B, B-A
Forcing the number of examples of corpus A and B
be the same (reducing the size to the smalest)

47
WSD using ML algorithmsExperimental Results
(Portability)
First Experiment ( accuracy) Method AB-AB AB-
A AB-B MFC 46.6 53.0 39.2 NB 61,6 67.3 55.9
EB 63.0 69.0 57.0 Snow 60.1 65.6 56.3 LB 66
.3 71.8 60.9 Method A-A B-B A-B B-A MFC
56.0 45.5 36.4 38.7 NB 65.9 56.8 41.4 47.
7 EB 69.0 57.4 45.3 51.1 Snow 67.1 56.1
44.1 49.8 LB 71.3 59.0 47.1 52.0
48
WSD using ML algorithmsExperimental Results
(Portability)

Conclusions of First Experiment
LazyBoosting outperforms all other methods in all
cases
the knowledge acquired from a single corpus
almost covers the knowldge of combining both
corpora
Very disappointing results!
Looking at Kappa values
NB most similar to MFC
LB most similar to DSO
LB most disimilar to MFC

49
WSD using ML algorithmsExperimental Results
(Portability)

Second Experiment
Adding tuning material
BA-A, AB-B, A-A, B-B
ranging from 10 to 50 (50 remaining for test)
For NB, EB, Snow it is not worth keeping the
original corpus
LB has a moderate (but consistent) improvement
when retaining the original training set

50
WSD using ML algorithmsExperimental Results
(Portability)

Third Experiment
Two main reasons
Corpus A and B have a very different distribution
of senses
Examples of corpus A and B contain different
information
New corpus sense-balanced
Forcing the number of examples of each sense of
corpus A and B be the same (reducing the size to
the smalest)

51
WSD using ML algorithmsExperimental Results
(Portability)

Third Experiment ( accuracy)
Method AB-AB AB-A AB-B
MFC 48.6 48.6 48.5
LB 64.4 66.2 62.5
Method A-A B-B A-B B-A
MFC 48.6 48.5 48.7 48.7
LB 65.2 61.7 56.1 58.0
Even when the same distribution of senses is
conserved between training and test examples, the
portability is not garanteed!

52
WSD using ML algoritmsOutline

Setting
Methodology
Machine Learning algorithms
Naive Bayes (Mooney 98)
Snow (Dagan et al. 97)
Exemplar-based (Ng 97)
LazyBoosting (Escudero et al. 00)
Experimental Results
Naive Bayes vs. Exemplar Based
Portability and Tuning of Supervised WSD
Future Work

53
WSD using ML algorithmsFuture Works

Other methods (SVMs, DLs, ...)
Other corpora (Semcor, Senseval, Bruce, ...)
Comparison with unsupervised methods
Combination of classifiers
Search of the optimum set of features for each
method
Try new sets of features (semantic features,
...)
3 research lines of bottleneck knowledge
adquisition solution
Other tagsets (synsets, semantic fields, base
concepts, groups of synsets, ...)

54
Using the Web and EuroWordNet forWord Sense
Disambiguation
German Rigau i Claramunt http//www.lsi.upc.es/ri
gau TALP Research Center Departament de
Llenguatges i Sistemes Informàtics Universitat
Politècnica de Catalunya
55
Using the Web and EWN for WSDOutline

Setting
Exploiting EWN Semantic Relations
Collecting training Corpus from the Web

56
Using the Web and EWN for WSDSetting

Our approach
Unsupervised
Automatically obtain training corpora
using the Web or on-line corpora
to feed a supervised ML WSD system

57
Using the Web and EWN for WSDOutline

Setting
Exploiting EWN Semantic Relations
Collecting training Corpus from the Web

58
Using the Web and EWN for WSDExploiting EWN
Semantic Relations

WordNet
WordNet is organized conceptually
123,497 content words
11,514 polisemous
99,642 synsets

wine, vino -- (fermented juice (of grapes
especially)) gt sake, saki -- (Japanese
beverage from fermented rice ...) gt
vintage -- (a season's yield of wine from a
vineyard) gt red wine -- (wine having a
red color derived from skins ...) gt
Pinot noir -- (dry red California table wine
...) gt claret, red Bordeaux -- (dry
red Bordeaux or Bordeaux-like wine)
gt Saint Emilion -- (full-bodied red wine from
...) gt Chianti -- (dry red Italian
table wine from the Chianti ...) gt
Cabernet, Cabernet Sauvignon -- (superior
Bordeaux-type red wine) gt Rioja --
(dry red table wine from the Rioja ...)
gt zinfandel -- (dry fruity red wine from
California)
59
Using the Web and EWN for WSDExploiting EWN
Semantic Relations

SR PoS Examples
Synonymy Noun coche, automóvil
Verb salir, pasear
Adj feliz, contento
Adv duramente, severamente
Hyponymy Noun coche -gt vehículo
Meronymy Noun motor -gt coche
Troponymy Verb marchar -gt caminar
Entailment Verb roncar -gt dormir

60
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
61
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
partido 1 Todos los partidos piden reformas
legales para TV3. La derecha planea agruparse en
un partido. El diputado reiteró que ni él ni UDC,
como partido, han recibido dinero de
Pellerols. partido 2 Pero España puso al
partido intensidad, ritmo y coraje. El
seleccionador cree que el partido de hoy contra
Italia dará la medida de España El Racing no gana
en su campo desde hace seis partidos.
62
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
partido 1 No negociaremos nunca com un partido
político que sea partidario de la independencia
de Taiwan. Una vez más es noticia la desviación
de fondos destinadoss a la formación ocupacional
hacia la financiación de un partido
político. Estas lleyess fueron votadas gracias a
un consenso general de los partidos
políticos. partido 2 Rivera pide el suporte de
la afición para encarrilar las semifinales. Sólo
el equipo de Valero Ribera puede sentenciar una
semifinal como lo hizo ayer en un Palau Blaugrana
completamente entregado. El Racing ganó los
cuartos de final en su campo.
63
Using the Web and EWN for WSDExploiting EWN
Semantic Relations

11,514 polisemous words
1 sense
synonym brother father daugther grandchid
1 step 2095 8903 3894 759 116
2 step 3 1331 16 3
3 step 512
4 step 147
5 step 43
total 2905 8906 5927 775 119

64
Using the Web and EWN for WSDExploiting EWN
Semantic Relations

11,514 polisemous words
2 senses
synonym brother father daughter grandchild
1 step 479 6988 584 408 87
2 step 24 97 8 2
3 step 9
4 step 3
total 479 7012 693 417 89

65
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
11,514 polisemous words 3 senses synonym bro
ther father daughter grandchild 1
step 108 5640 76 239 59 2
step 22 6 1 total 108 5662 76 245
60
66
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
11,514 polisemous words 1 sense SB SD SB
D SBDF SBDFC 1 step 8903 3461 9257 102
84 10284 2 step 3 34 188 1068 1068 3
step 2 30 137 137 4 step 4 19
19 total 8906 3487 9479 11508 11508
67
Using the Web and EWN for WSDExploiting EWN
Semantic Relations
11,514 polisemous words 2 sense SB SD SBD
SBDF SBDFC 1 step 7580 1282 8048 8891 88
99 2 step 281 16 461 1196 1213 3
step 11 1 33 264 245 4 step 2 80 74 5
step 13 13 6 step 2 2 total 7872 1299 8
544 10446 10446
68
Using the Web and EWN for WSDExploiting EWN
Semantic Relations