CS4705

About This Presentation

Title:

CS4705

Description:

A Huge Problem for NLP? ... OE b rs (with loss of r before s as in ... with bass1 added to signature for bass1 Weight each by frequency of ... – PowerPoint PPT presentation

Number of Views:211

Avg rating:3.0/5.0

Slides: 33

Provided by: juliah164

Learn more at: https://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: CS4705

1
CS4705

Relationships among Words, Semantic Roles, and
Word-Sense Disambiguation

2
Today

Lexical Relations
Wordnet
Semantic Role
Review Semantic Roles
Selectional Restrictions
Selectional Association
Word-Sense Disambiguation
Supervised
Unsupervised
Evaluation

3
Lexical Relations

Semantic Networks Used to represent lexical
relationships
e.g. WordNet (George Miller et al)
Most widely used hierarchically organized lexical
database for English
Synset set of synonyms, a dictionary-style
definition (or gloss), and some examples of uses
--gt a concept
Databases for nouns, verbs, and modifiers
Applications can traverse network to find
synonyms, antonyms, hyper- and hyponyms
Available for download or online use
http//www.cogsci.princeton.edu/wn

4
Homonymy

Homonyms Words with same form orthography and
pronunciation -- but different, unrelated
meanings, or senses
A bank1 holds investments in a custodial account
in the clients name.
As agriculture is burgeoning on the east bank2,
the river will shrink even more

5
http//www.etymonline.com/

bank1 "financial institution," 1474, from either
O.It. banca or M.Fr. banque (itself from the
O.It. term), both meaning "table" (the notion is
of the moneylender's exchange table), from a Gmc.
source (cf. O.H.G. bank "bench") see bank (2).
The verb meaning "to put confidence in" (U.S.
colloquial) is attested from 1884. Bank holiday
is from 1871, though the tradition is as old as
the Bank of England. Bankroll (v.) "to finance"
is 1920s. To cry all the way to the bank was
coined 1956 by flamboyant pianist Liberace, after
a Madison Square Garden concert that was packed
with patrons but panned by critics.
bank2 "earthen incline, edge of a river," c.1200,
probably in O.E., from O.N. banki, from P.Gmc.
bangkon "slope," cognate with P.Gmc. bankiz
"shelf."

6
Related Phenomena

Homophones (same pron/different orth)
Read/red
Homographs (same orth/different pron)
Bass/bass

7
Polysemy

Words with multiple but related meanings
They rarely serve red meat.
He served as U.S. ambassador.
He might have served his time in prison.
idea bank, sperm bank, blood bank, bank bank
Can the two candidate senses be conjoined?
?He served his time and as ambassador to Norway.
Same etymology
Often a domain-dependent specialization

8
Synonymy

Substitutability different words, same meaning
Old/aged, pretty/attractive, food/sustenance,
money
How big is that plane? How large is that plane?
How big are you? How large are you?
What makes words substitutable and not?
Polysemy (large vs. old sense)
register Hes really cheap/?parsimonious.
collocational constraints
roast beef, ?baked beef
economy fare ?economy price

9
How could we find Synonyms and Collocations
automatically?

Synonyms Identify words appearing frequently in
similar contexts
Blast victims were helped by civic-minded
passersby.
Public-spirited passersby came to the aid of this
bombing victim.
Collocations Identify synonyms or closely
related words that do and dont appear in similar
contexts
Flu victims, flu sufferers vs. ?Cold victims,
cold sufferers
Roast turkey vs. Baked turkey

10
Hyponomy

General hypernym (superordinate)
dog is a hypernym of poodle
Test That is a poodle implies that is a dog
Specific hyponym (under..neath)
poodle is a hyponym of dog
Test That is a poodle implies that is a dog
Ontology set of domain objects
Taxonomy Specification of relations between
those objects
Object hierarchy Structured hierarchy that
supports feature inheritance (e.g. poodle
inherits some properties of dog)

11
Tropes, or Figures of Speech

Metaphor one entity is given the attributes of
another (tenor/vehicle/ground)
Life is a bowl of cherries. Dont take it
serious.
We are the eyelids of defeated caves. ??
GM killed the Fiero. (conventional metaphor
corp. as person)
Metonymy one entity used to stand for another
(replacive)
GM killed the Fiero.
The ham sandwich wants his check. (deferred
reference)
Both extend existing sense to new meaning
Metaphor completely different concept
Metonymy related concepts

12
Sum

Many definable word relations useful to NLP in
different ways
Homonymy, polysemy, synonymy, hypernymy
Homography, homophony
Metaphor, metonymy
Collocations
Resources available to aid in processing
WordNet, FrameNet, online dictionaries,.
A Huge Problem for NLP?

13
Ambiguity and Word Sense Disambiguation

Recall For semantic attachment approaches what
happens when a given lexeme has multiple
meanings?
Flies V vs. Flies N
He robbed the bank. He sat on the bank.
How do we determine the correct sense of the
word?
Machine Learning
Supervised methods
Lightly supervised and Unsupervised Methods
Bootstrapping
Dictionary-based techniques
Selectional Association

14
Supervised WSD

Approaches
Tag a corpus with correct senses of particular
words (lexical sample) or all words (all-words
task)
E.g. SENSEVAL corpora
Lexical sample
Extract features which might predict word sense
POS? Word identity? Punctuation after? Previous
word? Its POS?
Use Machine Learning algorithm to produce a
classifier which can predict the senses of one
word or many
All-words
Use semantic concordance each open class word
labeled with sense from dictionary or thesaurus

E.g. SemCor (Brown Corpus), tagged with WordNet
senses

16
What Features Are Useful?

Words are known by the company they keep
How much company do we need to look at?
What do we need to know about the friends?
POS, lemmas/stems/syntactic categories,
Collocations words that frequently appear with
the target, identified from large corpora
federal government, honor code, baked potato
Position is key
Bag-of-words words that appear somewhere in a
context window
I want to play a musical instrument so I chose
the bass.
Ordering/proximity not critical

Punctuation, capitalization, formatting

18
Rule Induction Learners and WSD

Given a feature vector of values for independent
variables associated with observations of values
for the training set
Top-down greedy search driven by information
gain how will entropy of (remaining) data be
reduced if we split on this feature?
Produce a set of rules that perform best on the
training data, e.g.
bank2 if w-1river posNP srcFishing
News
Easy to understand result but many passes to
achieve each decision, susceptible to over-fitting

19
Naïve Bayes

s p(sV), or
Where s is one of the senses S possible for a
word w and V the input vector of feature values
for w
Assume features independent, so probability of V
is the product of probabilities of each feature,
given s, so
p(V) same
for any s
Then

How do we estimate p(s) and p(vjs)?
p(si) is max. likelihood estimate from a
sense-tagged corpus (count(si,wj)/count(wj))
how likely is bank to mean financial
institution over all instances of bank?
P(vjs) is max. likelihood of each feature given
a candidate sense (count(vj,s)/count(s)) how
likely is the previous word to be river when
the sense of bank is financial institution
Calculate for
each possible sense and
take the highest scoring sense as the most
likely choice

21
Decision List Classifiers

Transparent
Like case statements applying tests to input in
turn
fish within window --gt bass1
striped bass --gt bass1
guitar within window --gt bass2
bass player --gt bass1
Yarowsky 96s approach orders tests by
individual accuracy on entire training set based
on log-likelihood ratio

22
Lightly Supervised Methods Bootstrapping

Bootstrapping I
Start with a few labeled instances of target item
as seeds to train initial classifier, C
Use high confidence classifications of C on
unlabeled data as training data
Iterate
Bootstrapping II
Start with sentences containing words strongly
associated with each sense (e.g. sea and music
for bass), either intuitively or from corpus or
from dictionary entries, and label those
automatically
One Sense per Discourse hypothesis

23
Dictionary Approaches

Problem of scale for all ML approaches
Building a classifier for each word with multiple
senses
Machine-Readable dictionaries with senses
identified and examples
Simplified Lesk
Retrieve all content words occurring in context
of target (e.g. Sailors love to fish for bass.)
Compute overlap with sense definitions of target
entry
bass1 a musical instrument
bass2 a type of fish that lives in the sea

bass1 /be?s/ Pronunciation Key - Show
Spelled Pronunciationbeys Pronunciation Key -
Show IPA Pronunciation Music.
adjective 1.low in pitch of the lowest pitch or
range a bass voice a bass instrument. 2.of or
pertaining to the lowest part in harmonic music.
noun 3.the bass part. 4.a bass voice, singer, or
instrument. 5.double bass.
Origin 140050 late ME, var. of base2 with ss
of basso
bass2 /bæs/ Pronunciation Key - Show
Spelled Pronunciationbas Pronunciation Key -
Show IPA Pronunciation
noun, plural (especially collectively ) bass,
(especially referring to two or more kinds or
species ) basses. 1.any of numerous edible,
spiny-finned, freshwater or marine fishes of the
families Serranidae and Centrarchidae.
2.(originally) the European perch, Perca
fluviatilis.
Origin 13751425 late ME bas, earlier bærs, OE
bærs (with loss of r before s as in ass2, passel,
etc.) c. D baars, G Barsch, OSw agh-borre

Choose sense with most content-word overlap
Original Lesk
Compare dictionary entries of all content-words
in context with entries for each sense
But.dictionary entries are short
Expand with entries of related words that
appear in the original entry
If tagged corpus available, collect all the words
appearing in context of each sense of target word
e.g. all words appearing in sentences with bass1
added to signature for bass1
Weight each by frequency of occurrence of word
with that sense tagged in corpus (e.g. all senses
of bass) to capture how discriminating a word is
for the target words senses
Corpus Lesk performs best of all Lesk approaches

26
Disambiguation via Selectional Restrictions

Verbs are known by the company they keep
Different verbs select for different thematic
roles
wash the dishes (takes washable-thing as patient)
serve delicious dishes (takes food-type as
patient)
Method another semantic attachment in grammar
Semantic attachment rules are applied as
sentences are syntactically parsed, e.g.
VP --gt V NP
V? serve ltthemegt themefood-type
Selectional restriction violation no parse

But this means we must
Write selectional restrictions for each sense of
each predicate or use FrameNet
Serve alone has 15 verb senses
Obtain hierarchical type information about each
argument (using WordNet)
How many hypernyms does dish have?
How many words are hyponyms of dish?
But also
Sometimes selectional restrictions dont restrict
enough (Which dishes do you like?)
Sometimes they restrict too much (Eat dirt, worm!
Ill eat my hat!)
Can we take a statistical approach?

28
Selectional Association (Resnik 97)

Selectional Preference Strength how much does a
predicate tell us about the word class of its
argument?
George is a monster, George cooked a steak
SR(v) How different is p(c), the probability
that any direct object will be a member of some
class c, from p(cv), the probability that a
direct object of a specific verb will fall into
that class?
Estimate conditional probabilities of word senses
from a parsed corpus, counting how often each
predicate occurs with an object argument
e.g. How likely is dish to be an object of
served?
Jane served/V the dish/Obj
Then estimate the strength of association between
each predicate and the super-class (hypernym) of
the argument in Wordnet

E.g. For each object x of serve (e.g. ragout,
Mary, dish)
Look up all xs hypernym classes in WordNet (e.g
dish isa piece of crockery, dish isa food item,
ragout isa food item, Mary isa person)
Distribute credit for each of xs senses
occurring with serve among all hypernym classes
(sense) to which x belongs (1/n for n classes)
Pr(cv) is estimated at count(c,v)/count(v)
Why does this work?
Ambiguous words have many superordinate classes
John served food/the dish/tuna/curry
The most common sense across all objects of the
verb should eventually dominate the likelihood
score

How can we use this in wsd?
Choose the class (sense) of the direct object
with the highest probability, given the verb
Mary served the dish proudly.
Results
Baselines
random choice of word sense is 26.8
choose most frequent sense (NB requires
sense-labeled training corpus) is 58.2
Resniks 44 correct from corpus only pred/arg
relations labeled

31
Evaluating WSD

In vivo/end-to-end/task-based/extrinsic vs. in
vitro/stand-alone/intrinsic evaluation in some
task (parsing? q/a? IVR system?) vs. application
independent
In vitro metrics classification accuracy on
held-out test set or precision/recall/f-measure
if not all instances must be labeled
Baseline
Most frequent sense?
Lesk algorithms
Ceiling human annotator agreement