Chapter 11 Lexical Acquisition - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Chapter 11 Lexical Acquisition

Description:

The children ate the cake with their hands. The children ate the cake with blue icing ... Susan had never eaten a fresh durian before. ( food item) ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 42
Provided by: hsinhs
Category:

less

Transcript and Presenter's Notes

Title: Chapter 11 Lexical Acquisition


1
Chapter 11Lexical Acquisition
2
Lecture Overview
  • Methodological Issues Evaluation Measures
  • Verb Subcategorization
  • the syntactic means by which verbs express their
    arguments
  • Attachment Ambiguity
  • The children ate the cake with their hands
  • The children ate the cake with blue icing
  • Selectional Preferences
  • The semantic categorization of a verbs arguments
  • Semantic Similarity (refer to IR course)
  • Semantic similarity between words

3
Lexicon
  • That part of the grammar of a language which
    includes the lexical entries for all the words
    and/or morphemes in the language and which may
    also include various other information, depending
    on the particular theory of grammar

quantitative information,
PP attachment,
4
Evaluation Measures
  • Precision and Recall
  • F Measure
  • Fallout
  • Receiver Operating Characteristic (ROC) Curve
  • Show how different levels of fallout (false
    positives as a proportion of all non-targeted
    events) influence recall or sensitivity (true
    positives as a proportion of all targeted events)

5
?
?
tp true positives tn true negatives fp
false positives fn false negatives (type
II errors) (type I errors)
6
Precision and Recall versus Accuracy and Error
Accuracy tptn Error
fpfn When tn is huge, it will dwarf all other
numbers
7
Verb Subcategorization I
  • Verbs express their semantic categories using
    different syntactic means. A particular set of
    syntactic categories that a verb can appear with
    is called a subcategorization frame.

8
Verb Subcategorization I
  • Most dictionaries doe not contain information on
    subcategorization frame.
  • (Brent, 93)s subcategorization frame learner
    tries to decide based on corpus evidence whether
    verb v takes frame f. It works in 2 steps.

9
Verb Subcategorization II
  • Brents Lerner system
  • Cues Define a regular pattern of words and
    syntactic categories which indicates the presence
    of the frame with high certainty. For a
    particular cue cj we define a probability of
    error ?j that indicates how likely we are to make
    a mistake if we assign frame f to verb v based on
    cue cj.
  • Hypothesis Testing Define the null hypothesis,
    H0, as the frame is not appropriate for the
    verb. Reject this hypothesis if the cue cj
    indicates with high probability that our H0 is
    wrong.

10
Example
  • Cues
  • regular pattern for subcategorization frame NP
    NP
  • greet-V Peter-Cap ,-PUNC (o)
  • I came Thursday, before the storm started. (?)
  • Null hypothesis testing
  • Verb vi occurs a total of n times in the corpus
    and there are m ? n occurrences with a cue for
    frame fj.
  • Reject the null hypothesis H0 that vi does not
    permit fj with the following probability of error

e.g., me, him,
any capitalized words
e.g., it, you,
(OBJ SUBJ_OBJ CAP) (PUNC CC)
error rate for cue fj
Verb vi does not permit frame fj.
of times that vi occurs with cue cj
11
Verb Subcategorization III
  • Brents system does well at precision, but not
    well at recall.
  • (Manning, 93)s system addresses this problem by
    using a tagger and running the cue detection on
    the output of the tagger.
  • Mannings method can learn a large number of
    subcategorization frames, even those that have
    only low-reliability cues.
  • Mannings results are still low and one way to
    improve them is to use prior knowledge.

12
PCFG prefers parses that use common constructions
  • Sue bought a plant with Jane.
  • Sue bought a plant with yellow leaves.
  • Syntactic information is insufficient.
  • Simple Methods for Prepositional Phrases
  • PP-attachment problem
  • verb np1 (prep np2)

?
attachment decision
13
  • Assumption
  • P(Aprep, verb, np1, np2, w)
  • ? P(Aprep, verb, np1, np2)
  • ? words in the text outside of verb np1(prep
    np2)
  • A random variable representing attachment
    decision
  • V(A) verb or np1

14
  • Counter example.
  • ..., ...,
  • Fred saw a movie with Arnold Schwarzenegger.
  • Further assumption (simplification)
  • P(Aprep, verb, np1, np2, noun1, noun2)
  • ? P(Aprep, verb, noun1, noun2)
  • total parameters 1013
  • (prep) ? (verb) ? (noun) ? (noun)

head of np1 (np2)
15
  • Further simplification statistics on its
    propensity to attach to verb versus its
    propensity to attach to noun
  • P(A noun prep, verb, noun1) vs.
  • P(A verb prep, verb, noun1)
  • total parameters
  • prep ? noun2 ? noun1

Alternative to reduce parameters (1)
Condition probabilities on fewer things. (2)
Condition probabilities on more general things.
16
Attachment Ambiguity
  • The preference bias for low attachment in the
    parse tree is formalized by (Hindle and Rooth,
    1993)
  • The model asks the following questions
  • VAp Is there a PP headed by p and following the
    verb v which attaches to v (VAp1) or not
    (VAp0)?
  • NAp Is there a PP headed by p and following the
    noun n which attaches to n (NAp1) or not (NAp0)?

17
Assume conditional independence of the two
attachments. Assume whether verb (noun) is
modified by PP is independent of noun (verb).
  • Determine the attachment of a PP that is
    immediately following
  • an object noun, i.e. compute the probability of
    NAp1.
  • In order for the first PP headed by the
    preposition p to attach to
  • the verb, both Vap1 and Nap0

18
Likelihood ratio ?
  • Verb attachment large positive value of ?
  • Noun attachment large negative value of ?
  • undecidable ? is closer to zero

Maximum estimation of P(Vap1v) and P(Nap1n)
where C(v) and C(n) are of occurrences of v and
n C(v,p) and C(n,p) are of times
that p attaches to v and p attaches to
n.
19
  • Estimation of PP attachment counts
  • 1. If a noun is followed by a PP but no preceding
    verb, increment C(prep attached to noun). Sure
    Noun Attach
  • E.G. noun in subject position or preverbal
    position
  • 2.if a passive verb is followed by a PP other
    than a by phrase, increment C(prep attached to
    verb). Sure Verb Attach 1
  • E.G. the dog was hit on the leg.
  • 3.if a PP follows both a noun phrase and a verb
    but the noun phrase is a pronoun, increment
    C(prep attached to verb).
  • E.G. Sue saw him in the park.
    Sure Verb Attach 2
  • 4.if a PP follows both a noun and a verb, see if
    the probabilities based on the attachment decided
    by 1-3 greatly favor one or the other attachment.
    (e.g., ?gt2.0 ? VA, ?lt-2.0 ? NA)
  • 5.otherwise increment both attachment counters by
    0.5.

Ambiguous Attach 1
Ambiguous Attach 2
20
example
Moscow sent more than 100,000 soldiers into
Afghanistan
(attachment to the verb is much more likely)
21
Choose Choose Percent Noun Verb Correct Ch
oose Noun 889 0 64.0 2 human
judges 551 338 85.7 Program 565 354 78.1 Prog
ram gt 95 407 201 85.0 Human gt
95 416 192 86.9
889
919?
608
No significant difference
608
Sparse data is a major cause of the difference
between the human judges performance and that of
the program
22
  • Using Semantic Information
  • condition on semantic tags of verb noun.
  • Example.
  • Artist, Jane, plumber, Ted (human)
  • Friday, June, yesterday (time)
  • hammer, leaves, pot (object)
  • Sue bought a plant with Jane.
  • human
  • Sue bought a plant with yellow leaves.

  • object

attach vp
attach np
23
General Remarks on PP Attachment
  • There are some limitations to the method by
    Hindle and Rooth
  • Sometimes information other than v, n and p is
    useful.
  • There are other types of PP attachment than the
    basic case of a PP immediately after an NP
    object.
  • There are other types of attachments altogether
    N N N or V N P. The Hindle and Rooth formalism
    is more difficult to apply in these cases because
    of data sparsity.
  • In certain cases, there is attachment
    indeterminacy.

24
  • Relative -clause Attachment
  • unrestrictive relative clauses
  • (1) Fred awarded a prize to the dog and Bill, who
    trained it.
  • (2) Fred awarded a prize to Sue and Fran, who
    sang a great song.
  • (3) Fred awarded a prize for penmanship that was
    worth 50.50
  • (4) Fred awarded a prize for the dog that ran the
    fastest.

restrictive relative clause
25
(No Transcript)
26
  • Determine the referent of relative pronoun
  • Determine the noun phrase to which it is
    attached.
  • The dog that e ate the cookie (subject)
  • The dog that sue bought e (object)
  • The dog that Fred gave the biscuit to e (np of
    pp)
  • Where does the head noun fit into the relative
    clause?
  • Sue ate a cookie that was baked e by Fran.
  • Assumption
  • The noun phrase severs as the subject of the
    relative clause. (Fisher Riloff)

27
  • (1) collect subject-verb and verb-object
    pairs. (training part)
  • (2) compute t-score (testing part)
  • t-score gt 0.10 (significant)
  • possible attachment points
  • x y z
  • P (relative clause attaches to x main verb of
    clause v)
  • gt
  • P (relative clause attaches to y main verb of
    clausev)
  • ?P (x subject/object v) gt P (y subject/
    object)

28
  • If the probabilities for all attachments were not
    significant,
  • then attachment was left unresolved.
  • If there was at least one significant score, the
    highest was
  • chosen.
  • 3. If there was a tie, then the rightmost
    attachment was chosen.

29
Small training data
(semantic tag)
(Fixed topic Semantic tag)
(General Semantic tag)
30
  • Uniform use of Lexical / Semantic Information
  • Generalize to other ambiguities
  • noun-noun, adjective-noun
  • song bird feeder kit metal
    bird feeder kit

np
np
np
metal bird feeder kit
31
Selectional Preferences I
  • Most verbs prefer arguments of a particular type
    Such regularities are called selectional
    preferences or selectional restrictions.
  • eat ? object (food item)
  • think ? subject (people)
  • bark ? subject (dog)
  • Selectional preferences are useful for a couple
    of reasons
  • If a word is missing from our machine-readable
    dictionary, aspects of its meaning can be
    inferred from selectional restrictions.
  • Susan had never eaten a fresh durian before.
    (food item)
  • Selectional preferences can be used to rank
    different parses of a sentence.

32
Selectional Preferences II
  • Resnik (1993, 1996)s idea for Selectional
    Preferences uses the notions of selectional
    preference strength and selectional association.
  • Selectional Preference strength, S(v) measures
    how strongly the verb constrains its direct
    object.

verb subject, verb direct object, verb
prepositional phrase adjective noun, noun
noun
where P(C) is the overall probability
distribution of noun classes P(Cv) is
the probability distribution of noun classes in
the direct object position of v.
head noun
33
Selectional Preferences II
  • S(v) is defined as the KL divergence between the
    prior distribution of direct objects (for verbs
    in general) and the distribution of direct
    objects of the verb we are trying to
    characterize.
  • We make 2 assumptions in this model 1) only the
    head noun of the object is considered 2) rather
    than dealing with individual nouns, we look at
    classes of nouns.

34
Selectional Preferences III
  • The Selectional Association between a verb and a
    class is defined as the proportion that this
    classes contribution to S(v) contributes to the
    overall preference strength S(v).

35
Selectional Preferences III
  • There is also a rule for assigning association
    strengths to nouns as opposed to noun classes. If
    a noun is in a single class, then its association
    strength is that of its class. If it belongs to
    several classes, then its association strength is
    that of the class it belongs to that has the
    highest association strength.
  • Finally, there is a rule for estimating the
    probability that a direct object in noun class c
    occurs given a verb v.

36
Susan interrupted the chair. chair furniture
vs. people (i.e., chairperson) A(interrupt,
people) gtgt A(interrupt, furniture) A(interrupt,
chair)
A(interrupt, people)
37
  • eat prefers food item. ? very specific
  • A(eat, food)1.08
  • see has a uniform distribution. ? no selectional
    preference
  • A(see, people)A(see, furniture)A(see,
    food)A(see, action)0
  • find disprefers action item. ? less specific
  • A(find, action)-0.13

38
typical objects
atypical objects
39
Semantic Similarity I
  • Text Understanding or Information Retrieval could
    benefit much from a system able to acquire
    meaning.
  • Meaning acquisition is not possible at this
    point, so people focus on assessing semantic
    similarity between a new word and other already
    known words.
  • Semantic similarity is not as intuitive and clear
    a notion as we may first think synonymy? Same
    semantic domain? Contextual interchangeability?
  • Vector Space versus Probabilistic Measures

40
Semantic Similarity II Vector Space Measures
  • Words can be expressed in different spaces
    document space, word space and modifier space.
  • Similarity measures for binary vectors matching
    coefficient, Dice coefficient, Jaccard (or
    Tanimoto) coefficient, Overlap coefficient and
    cosine.
  • Similarity measures for the real-valued vector
    space cosine, Euclidean Distance, normalized
    correlation coefficient

41
Semantic Similarity II Probabilistic Measures
  • The problem with vector space based measures is
    that, aside from the cosine, they operate on
    binary data. The cosine, on the other hand,
    assumes a Euclidean space which is not
    well-motivated when dealing with word counts.
  • A better way of viewing word counts is by
    representing them as probability distributions.
  • Then we can compare two probability
    distributions using the following measures KL
    Divergence, Information Radius (Irad) and L1 Norm.
Write a Comment
User Comments (0)
About PowerShow.com