CS4705 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

CS4705

Description:

A Huge Problem for NLP? ... OE b rs (with loss of r before s as in ... with bass1 added to signature for bass1 Weight each by frequency of ... – PowerPoint PPT presentation

Number of Views:207
Avg rating:3.0/5.0
Slides: 33
Provided by: juliah164
Category:
Tags: cs4705 | loss | weight

less

Transcript and Presenter's Notes

Title: CS4705


1
CS4705
  • Relationships among Words, Semantic Roles, and
    Word-Sense Disambiguation

2
Today
  • Lexical Relations
  • Wordnet
  • Semantic Role
  • Review Semantic Roles
  • Selectional Restrictions
  • Selectional Association
  • Word-Sense Disambiguation
  • Supervised
  • Unsupervised
  • Evaluation

3
Lexical Relations
  • Semantic Networks Used to represent lexical
    relationships
  • e.g. WordNet (George Miller et al)
  • Most widely used hierarchically organized lexical
    database for English
  • Synset set of synonyms, a dictionary-style
    definition (or gloss), and some examples of uses
    --gt a concept
  • Databases for nouns, verbs, and modifiers
  • Applications can traverse network to find
    synonyms, antonyms, hyper- and hyponyms
  • Available for download or online use
  • http//www.cogsci.princeton.edu/wn

4
Homonymy
  • Homonyms Words with same form orthography and
    pronunciation -- but different, unrelated
    meanings, or senses
  • A bank1 holds investments in a custodial account
    in the clients name.
  • As agriculture is burgeoning on the east bank2,
    the river will shrink even more

5
http//www.etymonline.com/
  • bank1 "financial institution," 1474, from either
    O.It. banca or M.Fr. banque (itself from the
    O.It. term), both meaning "table" (the notion is
    of the moneylender's exchange table), from a Gmc.
    source (cf. O.H.G. bank "bench") see bank (2).
    The verb meaning "to put confidence in" (U.S.
    colloquial) is attested from 1884. Bank holiday
    is from 1871, though the tradition is as old as
    the Bank of England. Bankroll (v.) "to finance"
    is 1920s. To cry all the way to the bank was
    coined 1956 by flamboyant pianist Liberace, after
    a Madison Square Garden concert that was packed
    with patrons but panned by critics.
  • bank2 "earthen incline, edge of a river," c.1200,
    probably in O.E., from O.N. banki, from P.Gmc.
    bangkon "slope," cognate with P.Gmc. bankiz
    "shelf."

6
Related Phenomena
  • Homophones (same pron/different orth)
  • Read/red
  • Homographs (same orth/different pron)
  • Bass/bass

7
Polysemy
  • Words with multiple but related meanings
  • They rarely serve red meat.
  • He served as U.S. ambassador.
  • He might have served his time in prison.
  • idea bank, sperm bank, blood bank, bank bank
  • Can the two candidate senses be conjoined?
  • ?He served his time and as ambassador to Norway.
  • Same etymology
  • Often a domain-dependent specialization

8
Synonymy
  • Substitutability different words, same meaning
  • Old/aged, pretty/attractive, food/sustenance,
    money
  • How big is that plane? How large is that plane?
  • How big are you? How large are you?
  • What makes words substitutable and not?
  • Polysemy (large vs. old sense)
  • register Hes really cheap/?parsimonious.
  • collocational constraints
  • roast beef, ?baked beef
  • economy fare ?economy price

9
How could we find Synonyms and Collocations
automatically?
  • Synonyms Identify words appearing frequently in
    similar contexts
  • Blast victims were helped by civic-minded
    passersby.
  • Public-spirited passersby came to the aid of this
    bombing victim.
  • Collocations Identify synonyms or closely
    related words that do and dont appear in similar
    contexts
  • Flu victims, flu sufferers vs. ?Cold victims,
    cold sufferers
  • Roast turkey vs. Baked turkey

10
Hyponomy
  • General hypernym (superordinate)
  • dog is a hypernym of poodle
  • Test That is a poodle implies that is a dog
  • Specific hyponym (under..neath)
  • poodle is a hyponym of dog
  • Test That is a poodle implies that is a dog
  • Ontology set of domain objects
  • Taxonomy Specification of relations between
    those objects
  • Object hierarchy Structured hierarchy that
    supports feature inheritance (e.g. poodle
    inherits some properties of dog)

11
Tropes, or Figures of Speech
  • Metaphor one entity is given the attributes of
    another (tenor/vehicle/ground)
  • Life is a bowl of cherries. Dont take it
    serious.
  • We are the eyelids of defeated caves. ??
  • GM killed the Fiero. (conventional metaphor
    corp. as person)
  • Metonymy one entity used to stand for another
    (replacive)
  • GM killed the Fiero.
  • The ham sandwich wants his check. (deferred
    reference)
  • Both extend existing sense to new meaning
  • Metaphor completely different concept
  • Metonymy related concepts

12
Sum
  • Many definable word relations useful to NLP in
    different ways
  • Homonymy, polysemy, synonymy, hypernymy
  • Homography, homophony
  • Metaphor, metonymy
  • Collocations
  • Resources available to aid in processing
  • WordNet, FrameNet, online dictionaries,.
  • A Huge Problem for NLP?

13
Ambiguity and Word Sense Disambiguation
  • Recall For semantic attachment approaches what
    happens when a given lexeme has multiple
    meanings?
  • Flies V vs. Flies N
  • He robbed the bank. He sat on the bank.
  • How do we determine the correct sense of the
    word?
  • Machine Learning
  • Supervised methods
  • Lightly supervised and Unsupervised Methods
  • Bootstrapping
  • Dictionary-based techniques
  • Selectional Association

14
Supervised WSD
  • Approaches
  • Tag a corpus with correct senses of particular
    words (lexical sample) or all words (all-words
    task)
  • E.g. SENSEVAL corpora
  • Lexical sample
  • Extract features which might predict word sense
  • POS? Word identity? Punctuation after? Previous
    word? Its POS?
  • Use Machine Learning algorithm to produce a
    classifier which can predict the senses of one
    word or many
  • All-words
  • Use semantic concordance each open class word
    labeled with sense from dictionary or thesaurus

15
  • E.g. SemCor (Brown Corpus), tagged with WordNet
    senses

16
What Features Are Useful?
  • Words are known by the company they keep
  • How much company do we need to look at?
  • What do we need to know about the friends?
  • POS, lemmas/stems/syntactic categories,
  • Collocations words that frequently appear with
    the target, identified from large corpora
  • federal government, honor code, baked potato
  • Position is key
  • Bag-of-words words that appear somewhere in a
    context window
  • I want to play a musical instrument so I chose
    the bass.
  • Ordering/proximity not critical

17
  • Punctuation, capitalization, formatting

18
Rule Induction Learners and WSD
  • Given a feature vector of values for independent
    variables associated with observations of values
    for the training set
  • Top-down greedy search driven by information
    gain how will entropy of (remaining) data be
    reduced if we split on this feature?
  • Produce a set of rules that perform best on the
    training data, e.g.
  • bank2 if w-1river posNP srcFishing
    News
  • Easy to understand result but many passes to
    achieve each decision, susceptible to over-fitting

19
Naïve Bayes
  • s p(sV), or
  • Where s is one of the senses S possible for a
    word w and V the input vector of feature values
    for w
  • Assume features independent, so probability of V
    is the product of probabilities of each feature,
    given s, so
  • p(V) same
    for any s
  • Then

20
  • How do we estimate p(s) and p(vjs)?
  • p(si) is max. likelihood estimate from a
    sense-tagged corpus (count(si,wj)/count(wj))
    how likely is bank to mean financial
    institution over all instances of bank?
  • P(vjs) is max. likelihood of each feature given
    a candidate sense (count(vj,s)/count(s)) how
    likely is the previous word to be river when
    the sense of bank is financial institution
  • Calculate for
    each possible sense and
    take the highest scoring sense as the most
    likely choice

21
Decision List Classifiers
  • Transparent
  • Like case statements applying tests to input in
    turn
  • fish within window --gt bass1
  • striped bass --gt bass1
  • guitar within window --gt bass2
  • bass player --gt bass1
  • Yarowsky 96s approach orders tests by
    individual accuracy on entire training set based
    on log-likelihood ratio

22
Lightly Supervised Methods Bootstrapping
  • Bootstrapping I
  • Start with a few labeled instances of target item
    as seeds to train initial classifier, C
  • Use high confidence classifications of C on
    unlabeled data as training data
  • Iterate
  • Bootstrapping II
  • Start with sentences containing words strongly
    associated with each sense (e.g. sea and music
    for bass), either intuitively or from corpus or
    from dictionary entries, and label those
    automatically
  • One Sense per Discourse hypothesis

23
Dictionary Approaches
  • Problem of scale for all ML approaches
  • Building a classifier for each word with multiple
    senses
  • Machine-Readable dictionaries with senses
    identified and examples
  • Simplified Lesk
  • Retrieve all content words occurring in context
    of target (e.g. Sailors love to fish for bass.)
  • Compute overlap with sense definitions of target
    entry
  • bass1 a musical instrument
  • bass2 a type of fish that lives in the sea

24
  • bass1     /be?s/ Pronunciation Key - Show
    Spelled Pronunciationbeys Pronunciation Key -
    Show IPA Pronunciation Music.
  • adjective 1.low in pitch of the lowest pitch or
    range a bass voice a bass instrument. 2.of or
    pertaining to the lowest part in harmonic music.
    noun 3.the bass part. 4.a bass voice, singer, or
    instrument. 5.double bass.
  • Origin 140050 late ME, var. of base2 with ss
    of basso
  • bass2     /bæs/ Pronunciation Key - Show
    Spelled Pronunciationbas Pronunciation Key -
    Show IPA Pronunciation
  • noun, plural (especially collectively ) bass,
    (especially referring to two or more kinds or
    species ) basses. 1.any of numerous edible,
    spiny-finned, freshwater or marine fishes of the
    families Serranidae and Centrarchidae.
    2.(originally) the European perch, Perca
    fluviatilis.
  • Origin 13751425 late ME bas, earlier bærs, OE
    bærs (with loss of r before s as in ass2, passel,
    etc.) c. D baars, G Barsch, OSw agh-borre

25
  • Choose sense with most content-word overlap
  • Original Lesk
  • Compare dictionary entries of all content-words
    in context with entries for each sense
  • But.dictionary entries are short
  • Expand with entries of related words that
    appear in the original entry
  • If tagged corpus available, collect all the words
    appearing in context of each sense of target word
  • e.g. all words appearing in sentences with bass1
    added to signature for bass1
  • Weight each by frequency of occurrence of word
    with that sense tagged in corpus (e.g. all senses
    of bass) to capture how discriminating a word is
    for the target words senses
  • Corpus Lesk performs best of all Lesk approaches

26
Disambiguation via Selectional Restrictions
  • Verbs are known by the company they keep
  • Different verbs select for different thematic
    roles
  • wash the dishes (takes washable-thing as patient)
  • serve delicious dishes (takes food-type as
    patient)
  • Method another semantic attachment in grammar
  • Semantic attachment rules are applied as
    sentences are syntactically parsed, e.g.
  • VP --gt V NP
  • V? serve ltthemegt themefood-type
  • Selectional restriction violation no parse

27
  • But this means we must
  • Write selectional restrictions for each sense of
    each predicate or use FrameNet
  • Serve alone has 15 verb senses
  • Obtain hierarchical type information about each
    argument (using WordNet)
  • How many hypernyms does dish have?
  • How many words are hyponyms of dish?
  • But also
  • Sometimes selectional restrictions dont restrict
    enough (Which dishes do you like?)
  • Sometimes they restrict too much (Eat dirt, worm!
    Ill eat my hat!)
  • Can we take a statistical approach?

28
Selectional Association (Resnik 97)
  • Selectional Preference Strength how much does a
    predicate tell us about the word class of its
    argument?
  • George is a monster, George cooked a steak
  • SR(v) How different is p(c), the probability
    that any direct object will be a member of some
    class c, from p(cv), the probability that a
    direct object of a specific verb will fall into
    that class?
  • Estimate conditional probabilities of word senses
    from a parsed corpus, counting how often each
    predicate occurs with an object argument
  • e.g. How likely is dish to be an object of
    served?
  • Jane served/V the dish/Obj
  • Then estimate the strength of association between
    each predicate and the super-class (hypernym) of
    the argument in Wordnet

29
  • E.g. For each object x of serve (e.g. ragout,
    Mary, dish)
  • Look up all xs hypernym classes in WordNet (e.g
    dish isa piece of crockery, dish isa food item,
    ragout isa food item, Mary isa person)
  • Distribute credit for each of xs senses
    occurring with serve among all hypernym classes
    (sense) to which x belongs (1/n for n classes)
  • Pr(cv) is estimated at count(c,v)/count(v)
  • Why does this work?
  • Ambiguous words have many superordinate classes
  • John served food/the dish/tuna/curry
  • The most common sense across all objects of the
    verb should eventually dominate the likelihood
    score

30
  • How can we use this in wsd?
  • Choose the class (sense) of the direct object
    with the highest probability, given the verb
  • Mary served the dish proudly.
  • Results
  • Baselines
  • random choice of word sense is 26.8
  • choose most frequent sense (NB requires
    sense-labeled training corpus) is 58.2
  • Resniks 44 correct from corpus only pred/arg
    relations labeled

31
Evaluating WSD
  • In vivo/end-to-end/task-based/extrinsic vs. in
    vitro/stand-alone/intrinsic evaluation in some
    task (parsing? q/a? IVR system?) vs. application
    independent
  • In vitro metrics classification accuracy on
    held-out test set or precision/recall/f-measure
    if not all instances must be labeled
  • Baseline
  • Most frequent sense?
  • Lesk algorithms
  • Ceiling human annotator agreement

32
Summing Up
  • Word relations how can we identify different
    types?
  • Disambiguating among word senses
  • Next time Ch 17 3-5
Write a Comment
User Comments (0)
About PowerShow.com