Title: Word Sense Disambiguation at SensevalII
1Word Sense Disambiguation at Senseval-II
- Bernardo Magnini, Carlo Strapparava
- Giovanni Pezzulo and Alfio Gliozzo
- ITC-irst, Centro per la Ricerca Scientifica e
Tecnologica - Povo (Trento) - Italy
- magnini, strappa, pezzulo, gliozzo_at_itc.it
2Outline
- Word Sense Disambiguation (WSD)
- Definition of the task
- Methodological issues
- Senseval-II Experience
- Overview
- Systems description
3Word Sense Disambiguation (WSD)WSD is the
process of deciding the meaning of a word in its
context.
- The problem to resolve derives from Lexical
Ambiguity - he cashed a check at the bank
- he sat on the bank of the river and watched the
currents
The same word can assume different senses with
respect to the particular context where it
occurs.
4WSD Preliminary observations
- All the senses for a word are collected into a
dictionary. - Evaluating a WSD program is a process of
comparison between human and systems answers. - Most common words have more than one meaning
(Zirpfs law), so the great part of terms into a
text are polysemous.
5Uses of WSD systems
- Many NLP applications could be improved using a
good WSD module - Examples
- Machine Translation
- Information Retrieval Question Answering
6The WSD Problem
- Choose the sense repository (meanings are
represented in different ways in different
dictionaries) - Elaborate WSD procedures and systems
- Evaluate systems results
7Machine Readable Dictionaries (MRD)
- Provide sense repertories for disambiguation
systems - Different dictionaries present different sense
distinctions for the same word (granularity) - Some algorithms use information taken from
dictionaries - The most used dictionaries for WSD are WordNet,
LDOCE, Hector
8Choosing the right sense
- he cashed a check at the bank
-
Fine Grained Dictionary (WordNet) 1. depository
financial institution, bank, banking concern,
banking company -- 2. bank -- (sloping land
(especially the slope beside a body of water))3.
bank -- (a supply or stock held in reserve for
future use (especially in emergencies))4. bank,
bank building 5. bank -- (an arrangement of
similar objects in a row or in tiers)6. savings
bank, coin bank, money box, bank 7. bank -- (a
long ridge or pile) 8. bank -- (the funds held
by a gambling house or the dealer in some
gambling games)9. bank, cant, camber 10. bank
-- (a flight maneuver aircraft tips laterally
about its longitudinal axis )
Coarse Grained Dictionary (WordNet Domains) 1.
ECONOMY (Institution or place where money can be
saved) 2. GEOGRAPHY (the sloping land beside a
body of water) 3. FACTOTUM (an arrangement of
similar objects in a row or in tiers) 4.
ARCHITECTURE (a slope in the turn of the road) 5
TRANSPORT (a flight maneuver)
9Evaluation of WSD systems
- Consists in a comparison between systems and
human answers - Human answers are collected in an annotated
corpus (Gold Standard) - Precision and Recall can be used.
- Baseline and upper bound can be fixed.
10Corpora
- Large collections of texts
- Sense Annotated
- Semcor (200.000), DSO (192.000 semantically
annotated occurrences of 121 nouns and 70 verbs),
training data senseval (8699 texts for 73 words),
Tal-treebank(80000) - Difficult and expensive to realise.
- Non Annotated
- Brown, LOB, Reuters
- Available in large quantity
- Uses for WSD
- To evaluate systems (gold standard)
- Learning
11Gold Standard Datasets Manually sense tagged
corpus with respect to a given dictionary
- Requirements
- Sense selections must be independently made by
more than one person using the same dictionary,
in cases of disagreement a supervisor is called
to choose. - Inter-Tagger Agreement must be high enough (more
than 80)
12Inter-Tagger Agreement (ITA)
- People often disagree on the sense to be assigned
to a corpus instance of a word - ITA can be evaluated if more than one person made
the sense selection on the same text - It is the percentage of the same choices made by
annotators - It can also be evaluated using the Kappa measure.
13Precision and Recall for WSD
- Precision
- Recall
- where Good is the number of correct answers
provided by the system - Bad is the number of wrong answers provided by
the system - Null is the number of cases in which the
system doesnt provide any answer -
Note many systems provide multiple senses for a
single instance of a word so variations for the
measures shown can be used.
14Classification of WSD systems
- Unsupervised
- Knowledge based (WN, dictionaries)
- Learning from non annotated corpora
- Supervised
- Learning from sense annotated corpora (e.g.
Semcor, DSO, TAL-treebank, and training data)
Many systems make use of mixed techniques to
improve their results.
15Baselines for a WSD system
- Very easy (naive) WSD procedures
- Used to measure the improvement in a WSD system
performance - Represent the lower bound of a WSD systems
accuracy. - Examples
- Unsupervised Random, Simple-Lesk
- Supervised Most Frequent,Lesk-plus-corpus.
16Lesks algorithm (1986)
- Simple
- Choose the sense whose dictionary definition and
example texts have most word in common with the
words around the instance to be disambiguated.
- Plus corpus
- As Simple Lesk, but also considers the word
contained in the tagged training data.
Supervised
Unsupervised
17Is ITA the upper bound for the accuracy of WSD
systems?
- If a second human agrees with a first only 80
of the time, then it is not clear what it means
to say that a program was more then 80
accurate (Kilgarriff,1998) - The debate is still open
- ITA defines the upper bound of how well a
computer program can perform (Kilgarriff) - Computers could work better than humans (Wilks)
- If a WSD system has got a recall higher than ITA,
then either the system or the task itself are
wrongly designed (our opinion).
18Outline
- Word Sense Disambiguation (WSD)
- Definition of the task
- Methodological issues
- Senseval-II Experience
- Overview
- Description of some systems
19SENSEVAL goals
- Provide a common framework to compare WSD systems
- Standardise the task (especially evaluation
procedures) - Build and distribute new lexical resources
(dictionaries and sense tagged corpora) - There are now many computer programs for
automatically determining the sense of a word in
context (Word Sense Disambiguation or WSD). The
purpose of Senseval is to evaluate the strengths
and weaknesses of such programs with respect to
different words, different varieties of language,
and different languages. from
http//www.sle.sharp.co.uk/senseval2
20SENSEVAL History
- ACL-SIGLEX workshop (1997)
- Yarowsky and Resnik paper
- SENSEVAL-I (1998)
- Lexical Sample for English, French, and Italian
- SENSEVAL-II (Toulouse, 2001)
- Lexical Sample and All Words
- Organization Kilkgarriff (Brighton)
- SENSEVAL-III (???)
- Senseval workshop (ACL 2002)
21WSD at SENSEVAL-II
- Choosing the right sense for a word among those
of WordNet
Sense 1 horse, Equus caballus -- (solid-hoofed
herbivorous quadruped domesticated since
prehistoric times) Sense 2 horse -- (a padded
gymnastic apparatus on legs) Sense 3 cavalry,
horse cavalry, horse -- (troops trained to fight
on horseback "500 horse led the attack") Sense
4 sawhorse, horse, sawbuck, buck -- (a framework
for holding wood that is being sawed) Sense 5
knight, horse -- (a chessman in the shape of a
horse's head can move two squares horizontally
and one vertically (or vice versa)) Sense 6
heroin, diacetyl morphine, H, horse, junk, scag,
shit, smack -- (a morphine derivative)
Corton has been involved in the design,
manufacture and installation of horse stalls and
horse-related equipment like external doors,
shutters and accessories.
22SENSEVAL-II Schedule
23SENSEVAL-II Tasks
- All Words (without training data) Czech, Dutch,
English, Estonian - Lexical Sample (with training data) Basque,
Chinese, Danish, English, Italian, Japanese,
Korean, Spanish, Swedish
24English SENSEVAL-II
- Organization Martha Palmer (UPENN)
- Gold-standard 2 annotators and 1 supervisor
(Fellbaum) - Interchange data format XML
- Sense repository WordNet 1.7 (special Senseval
release) - Competitors
- All Words 11systems
- Lexical Sample 16 systems
25English All Words
- Data 3 texts for a total of 1770 words
- Average polysemy 6.5
- Example (part of) Text 1
The art of change-ringing is peculiar to the
English and, like most English peculiarities ,
unintelligible to the rest of the world . --
Dorothy L. Sayers , " The Nine Tailors " ASLACTON
, England -- Of all scenes that evoke rural
England , this is one of the loveliest An
ancient stone church stands amid the fields , the
sound of bells cascading from its tower ,
calling the faithful to evensong . The
parishioners of St. Michael and All Angels stop
to chat at the church door , as members here
always have .
26English All Words Systems
- Supervised (5)
- S. Sebastian (decision lists in Semcor)
- UCLA (Semcor, Semantic Distance and Density,
AltaVista for frequency) - Sinequa (Semcor and Semantic Classes)
- Antwerp (Semcor, Memory Based Learning)
- Moldovan (Semcor plus an additional sense tagged
corpus, heuristics)
- Unsupervised (6)
- UMED (relevance matrix over Gutemberg project
corpus) - Illinois (Lexical Proximity)
- Malaysia (MTD, Machine Tractable Dictionary)
- Litkowsky (New Oxford Dictionary and Contextual
Clues) - Sheffield (Anaphora and WN hierarchy)
- IRST (WordNet Domains)
27Fine and coarse grained senses
- Fine-grained The answers are compared to the
senses from the Gold Standard. - Coarse-grained The answers are mapped to
coarse-grained senses and compared to the Gold
Standard tags, also mapped to coarse-grained
senses. - Example groups for the verb to use
- GROUP 1
- use1 use, utilize, utilise, apply, employ --
(put into service) - use3 use -- (seek or achieve an end )
- use5 practice, apply, use -- (avail oneself
to) - GROUP 2
- use2 use -- (take or consume (regularly))
- use4 use, expend -- (use up, consume fully
...) - GROUP 3
- use6 use -- (habitually do something)
28(No Transcript)
29Lexical Sample
- Data 8699 texts for 73 words
- Average WN polysemy 9.22
- Training Data 8166 (average 118/word)
- Baseline (commonest) 0.47 precision
- Baseline (Lesk) 0.51 precision
30Lexical Sample
Example to leave
ltinstance id"leave.130"gt ltcontextgt I 'd been
seeing Johnnie almost a year now, but I still
didn't want to ltheadgtleavelt/headgt him for five
whole days. lt/contextgt lt/instancegt ltinstance
id"leave.157"gt ltcontextgt And he saw them all as
he walked up and down. At two that morning, he
was still walking -- up and down Peony, up and
down the veranda, up and down the silent, moonlit
beach. Finally, in desperation, he opened the
refrigerator, filched her hand lotion, and
ltheadgtleftlt/headgt a note. lt/contextgt lt/instancegt
31English Lexical Sample Systems
- Unsupervised (5) Sunderlard, UNED, Illinois,
Litkowsky, ITRI - Supervised (12) S. Sebastian, Sinequa, Manning,
Pedersen, Korea, Yarowsky, Resnik, Pennsylvania,
Barcellona, Moldovan, Alicante, IRST
32Supervised Techniques
- Algorithms
- Decision Lists
- Boosting
- Domain Driven Disambiguation
- ...
- Features
- Lexical Context
- Words
- Morphological roots
- Syntactic Context
- POS bigrams/trigrams
- Semantic Context
- Domains
- ...
33Decision Lists 1/2
- Training lexical context of n words
- Example the word bank
- bank1 depository financial institution ...
- bank2 sloping land ...
- ...
34Decision Lists 2/2
- The evidences most strongly indicative of a
particular pattern will have the largest
log-likelihood (strongest and most reliable
evidence) - Log-likelihood for each evidence takes into
account positive and negative examples
Classification of new examples the highest line
in the list that matches the given context
35Boosting 1/2
- Combine many simple and moderately accurate Weak
Classifiers (WC) - Train WCs sequentially, each on the examples
which were most difficult to classify by the
preceding WCs - Examples of WCs
- preceding_wordhouse
- domainsport
- ...
36Boosting 2/2
- WCi is trained and tested on the whole corpus
- Each pair word, synset is given an importance
weight h depending on how difficult it was for
WC1,,WCi to classify - WCi1 is tuned to classify the worst pairs word,
synset correctly and it is tested on the whole
corpus - so h is updated at each step
At the end all the WCs are combined into a single
rule, the combined hypothesis each WCs is
weighted according to its effectiveness in the
tests
37Domain Driven Disambiguation 1/3
- Comparison between
- the domain(s) of each synset of a word
- the domain(s) of the context where the word
appears - Domain information is collected in Domain Vectors
having 41 dimensions (one for each domain label) - We build
- Text Vectors
- Synset Vectors
- and we compare them using scalar products
38Domain Driven Disambiguation 2/3
Example Bank1 depository financial institution
... Bank2 sloping land TEXT He cashed a
check at the bank
1,731878
0,06185
- The module of a Synset Vector is proportional to
its frequency (in Semcor or in other training
data) - The direction is indicative of the contribute of
domain(s)
39Domain Driven Disambiguation 3/3
- Obtaining Text Vectors
- text categorisation technique based on WordNet
Domains resource - Obtaining Synset Vectors
- from training data
- from manual annotation (WordNet Domains)
40(No Transcript)
41Discussion about IRST Results
- Domain Driven Disambiguation can not be
successfully applied to the words that do not
carry relevant domain information - for instance
- factotum words (i.e. having many generic senses
e.g. the verb to be) - words whose senses have domains that are far
from the relevant ones in the context - In this cases the system gives no answer this
explains the low recall
42IRST Results at SENSEVAL-II
43(No Transcript)
44Word Sense Disambiguation at Senseval-II
- Bernardo Magnini, Carlo Strapparava
- Giovanni Pezzulo and Alfio Gliozzo
- ITC-irst, Centro per la Ricerca Scientifica e
Tecnologica - Povo (Trento) - Italy
- magnini, strappa, pezzulo, gliozzo_at_itc.it
45Domain Driven Disambiguation
- Semantic domains play an important role in the
disambiguation process - Underlying assumption
- Knowing in advance the relevant semantic
domain(s) of a text makes word sense
disambiguation easier
46Domain Information 1/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
1. Furniture chair -- (a seat for one
person) 2. University professorship, chair --
(the position of professor) 3. Administration
president, chairman, chairwoman, chair,
chairperson 4. Law electric chair, chair,
death chair, hot seat
Furniture Play Literature
47Domain Information 2/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
48Domain Information 3/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
49Domain Information 4/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
50Domain Information 5/5
From the plush Connolly hide leather sofa and
chairs in the living room to the Bang and Olufsen
stereo, and remote control television complete
with video, you're surrounded by the HIGHEST
QUALITY. The inlaid chequerboard top of the
coffee table houses all kind of games, including
backgammon, chess and Scrabble. You'll also find
a selection of books, from Queen Victoria's
Highland journals, to the very latest bestselling
thriller. The dinner table and chairs are
elegant yet comfortable, and you can be assured
of the finest tableware and crystal for meals at
home.
51Domain Information Sources
- Annotated WordNet (WordNet Domains)
- ontology-based (according to the WordNet
hierarchical structure) - focused on technical senses (e.g. believe)
- Categorised corpora
- words clustering reflects their distribution over
texts - focused on common use
52WordNet Domains
- Integrate taxonomic and domain oriented
information - Cross hierarchy relations
- doctor2 Medicine --gt person1
- hospital1 Medicine --gt location1
- Cross category relations operate3 Medicine
- Cross language information
53Polysemy Reduction
U
B
L
I
S
H
I
N
G
P
R
U
B
L
I
S
H
I
N
G
E
L
I
G
I
O
N
T
H
E
A
T
E
R
C
O
M
M
E
R
C
E
F
A
C
T
O
T
U
M
54Semantic Domains Organization
- 250 Domain labels collected from dictionaries
- Four level hierarchy (Dewey Decimal
Classification) - 41 basic domains used for Senseval
55WordNet Domain Statistics 1/2
56WordNet Domains Statistics 2/2
57Domain Overlapping
Alimentation
Supermarket
Recipe
Restaurant
Cooking
Food
Eating
Fork
Kitchen
Drinking
Diet
Bulimia
Medicine
Hospital
Illness
Doctor
58 This was another difficult verb to group,
possibly even more difficult than "match" (and
thankfully less polysemous!). The problem with
grouping "use" -- and I remember encountering
this in my tagging -- is that the various senses
of "use" sort of shade off into one another, so
that the boundaries are fuzzy even for verbs.
In fact, of all the verbs I tagged, this one is
the murkiest. Ultimately, these groups are
almost artificial. This is true for any grouping
assignment really, but in this case the artifice
is worn on the sleeve. GROUP ONE. If the sense
seemed to be fairly explicit about the existence
of an inherent function or purpose, I grouped it
here. This ended up to be the general
all-purpose when-in-doubt-tag-to-this-sense sense
(that would be sense 1), as well as the specific
exploitative sense where the subject is using the
direct object to further his own advantage
(sense 3) and the sense which refers to using
more abstract sorts of principles (sense
5). GROUP TWO. If the sense seemed to imply
that the thing being used was a commodity, and
that it was being consumed, I put it here. This
ended up to be the drug addict sense (sense 2)
and the deplete sense (sense 5). SENSE 6. I
love sense 6. Since it is really only an
aspectual marker, you can't group it with
anything, and no matter how lumpy you might be,
you can't argue that it should be grouped
anywhere.