Investigating adjective denotation and collocation - PowerPoint PPT Presentation

About This Presentation
Title:

Investigating adjective denotation and collocation

Description:

Not necessarily decomposable: natural kinds (dog' canis familiaris), natural predicates ... e.g., heavy spindrift because spindrift is semantically similar to snow ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 38
Provided by: anncop
Category:

less

Transcript and Presenter's Notes

Title: Investigating adjective denotation and collocation


1
Investigating adjective denotation and collocation
  • Ann Copestake
  • Computer Laboratory,
  • University of Cambridge

2
Outline
  • introduction compositional semantics, GL and
    semantic space models. denotation and collocation
  • distribution of magnitude adjectives
  • hypotheses about adjective denotation and
    collocation
  • semi-productivity

3
Themes
  • semi-productivity extending paper in GL 2001 to
    phrases
  • statistical and symbolic models interacting
  • generation as well as analysis
  • computational account

4
Different branches of computational semantics
  • compositional semantics capture syntax, (some)
    close-class words and (some) morphology
  • every x dog(x) -gt bark(x)
  • large coverage grammars as testbed for GL
    (constructions, composition, underspecification)
  • lexical semantics, e.g.,
  • GL (interacts with compositional semantics)
  • WordNet
  • meaning postulates etc
  • semantic space models, e.g.,
  • LSA
  • Schütze (1995)
  • Lin (multiple papers), Pado and Lapata (2003)

5
semantic spaces
  • acquired from corpora
  • generally, collect vectors of words which
    co-occur with the target
  • more sophisticated models incorporate syntactic
    relationships

dog bark house cat
dog - 1 0 0
bark 1 - 0 0
6
Semantic space models and compositional semantics?
  • do spaces correspond to predicates in
    compositional semantics? e.g., bark
  • attractions
  • automatic acquisition
  • similarity metrics, priming
  • fuzziness, meaning variation, sense clustering
  • statistical approximation to real world
    knowledge? (but fallacy with parse selection
    techniques)
  • problems
  • classical lexical semantic relations (hyponymy
    etc) arent captured well
  • cant do inference
  • sensitivity to domain/corpus
  • role of collocation?

7
Denotation assumptions
  • Truth-conditional, logically formalisable (in
    principle), refers to real world (extension)
  • Not necessarily decomposable natural kinds (dog
    canis familiaris), natural predicates
  • Naive physics, biology, etc
  • Computationally specification of meaning that
    interfaces with non-linguistic components
  • Selectional restrictions?
  • bark(x) -gt dog(x) or seal(x) or ...

8
Collocation assumptions
  • Significant co-occurrences of words in
    syntactically interesting relationships
  • syntactically interesting for this talk,
    attributive adjectives and the nouns they
    immediately precede
  • significant statistically significant (but on
    what assumptions about baseline?)
  • Compositional, no idiosyncratic syntax etc (as
    opposed to multiword expression)
  • About language rather than the real world

9
Collocation versus denotation
  • Whether an unusually frequent word pair is a
    collocation or not depends on assumptions about
    denotation fix denotation to investigate
    collocation
  • Empirically investigations using WordNet synsets
    (Pearce, 2001)
  • Anti-collocation words that might be expected to
    go together and tend not to
  • e.g., ? flawless behaviour (Cruse, 1986) big
    rain (unless explained by denotation)
  • e.g., buy house is predictable on basis of
    denotation, shake fist is not

10
Collocation and denotation investigations
  • can this notion of collocation be made precise,
    empirically testable?
  • assumptions about denotation determine whether
    something is a collocation
  • semantic space models will include collocational
    effects
  • initial, very preliminary, investigations with
    magnitude adjectives
  • attributive adjectives can get corpus data
    without parsing
  • only one argument to consider

11
Distribution of magnitude adjectives summary
  • some very frequent adjectives have
    magnitude-related meanings (e.g., heavy, high,
    big, large)
  • basic meaning with simple concrete entities
  • extended meaning with abstract nouns,
    non-concrete physical entities (high taxation,
    heavy rain)
  • extended uses more common than basic
  • not all magnitude adjectives e.g. tall
  • nouns tend to occur with a limited subset of
    these extended adjectives
  • some apparent semantic groupings of nouns which
    go with particular adjectives, but not easily
    specified

12
Some adjective-noun frequencies in the BNC
number proportion quality problem part winds rain
large 1790 404 0 10 533 0 0
high 92 501 799 0 3 90 0
big 11 1 0 79 79 3 1
heavy 0 0 1 0 1 2 198
13
Grammaticality judgments
number proportion quality problem part winds rain
large ?
high ?
big ?
heavy ?
14
More examples
importance success majority number proportion quality role problem part winds support rain
great 310 360 382 172 9 11 3 44 71 0 22 0
large 1 1 112 1790 404 0 13 10 533 0 1 0
high 8 0 0 92 501 799 1 0 3 90 2 0
major 62 60 0 0 7 0 272 356 408 1 8 0
big 0 40 5 11 1 0 3 79 79 3 1 1
strong 0 0 2 0 0 1 8 0 3 132 147 0
heavy 0 0 1 0 0 1 0 0 1 2 4 198
15
Judgments
importance success majority number proportion quality role problem part winds support rain
great ?
large ? ? ?
high ? ? ?
major ? ? ?
big ? ?
strong ? ? ?
heavy ? ?
16
Distribution
  • Investigated the distribution of heavy, high,
    big, large, strong, great, major with the most
    common co-occurring nouns in the BNC
  • Nouns tend to occur with up to three of these
    adjectives with high frequency and low or zero
    frequency with the rest
  • My intuitive grammaticality judgments correlate
    but allow for some unseen combinations and
    disallow a few observed but very infrequent ones
  • big, major and great are grammatical with many
    nouns (but not frequent with most), strong and
    heavy are ungrammatical with most nouns, high and
    large intermediate

17
heavy groupings?
  • magnitude dew, rainstorm, downpour, rain,
    rainfall, snowfall, fall, snow, shower frost,
    spindrift clouds, mist, fog flow, flooding,
    bleeding, period, traffic demands, reliance,
    workload, responsibility, emphasis, dependence
    irony, sarcasm, criticism infestation, soiling
    loss, price, cost, expenditure, taxation, fine,
    penalty, damages, investment punishment,
    sentence fire, bombardment, casualties, defeat,
    fighting burden, load, weight, pressure crop
    advertising use, drinking
  • magnitude of verb drinker, smoker
  • magnitude related? odour, perfume, scent, smell,
    whiff lunch sea, surf, swell

18
high groupings?
  • magnitude esteem, status, regard, reputation,
    standing, calibre, value, priority grade,
    quality, level proportion, degree, incidence,
    frequency, number, prevalence, percentage
    volume, speed, voltage, pressure, concentration,
    density, performance, temperature, energy,
    resolution, dose, wind risk, cost, price, rate,
    inflation, tax, taxation, mortality, turnover,
    wage, income, productivity, unemployment, demand
  • magnitude of verb earner

19
heavy and high
  • 50 nouns in BNC with the extended magnitude use
    of heavy with frequency 10 or more
  • 160 such nouns with high
  • Only 9 such nouns with both adjectives price,
    pressure, investment, demand, rainfall, cost,
    costs, concentration, taxation

20
Basic adjective denotation
  • with simple concrete objects
  • high(x) gt zdim(x) gt norm(zdim,type(x),c)
  • heavy(x) gt wt(x) gt norm(wt,type(x),c)
  • where zdim is distance on vertical, wt is weight
    (measure functions, MF)
  • norm(MF,class,context) is some standard for MF
    for class in context
  • (high also requires selectional restriction
    not animate)

21
Metaphor
  • Different metaphors for different nouns (cf.,
    Lakoff et al)
  • high nouns measured with an upright scale
    e.g., temperature temperature is rising
  • heavy nouns metaphorically like burden e.g.,
    workload her workload is weighing on her
  • Empirical account of distribution?
  • predictability of noun classes? high volume?
    high and heavy taxation
  • adjective denotation for inference etc? via
    literal denotation?
  • Discussed again at end of talk

22
Possible empirical accounts of distribution
  • Difference in denotation between extended uses
    of adjectives
  • Grammaticized selectional restrictions/preferences
  • Lexical selection
  • stipulate Magn function with nouns (Meaning-Text
    Theory)
  • Semi-productivity / collocation
  • plus semantic back-off

23
Computational semantics perspective
  • Require workable account of denotation not too
    difficult to acquire, not over-specific
  • Require account of distribution for generation
  • Robustness and completeness
  • Cant assume pragmatics / real world knowledge
    does the difficult bits!

24
Denotation account of distribution
  • Denotation of adjective simply prevents it being
    possible with the noun.
  • heavy and high have different denotations
  • heavy(x) gt MF(x) gt norm(MF,type(x),c)
    precipitation(x) or cost(x) or flow(x) or
    consumption(x)...
  • (where rain(x) -gt precipitation(x) and so on)
  • But messy disjunction or multiple senses,
    open-ended, unlikely to be tractable.
  • e.g., heavy shower only for rain sense, not
    bathroom sense
  • Not falsifiable, but no motivation other than
    distribution.
  • Dictionary definitions can be seen as doing this
    (informally), but none account for observed
    distribution.

25
Selectional restrictions and distribution
  • Assume the adjectives have the same denotation
  • Distribution via features in the lexicon
  • e.g., literal high selects for ANIMATE false
  • approach used in the LinGO ERG for in/on in
    temporal expressions
  • grammaticized, so doesnt need to be determined
    by denotation (though assume consistency)
  • can utilise qualia structure
  • Problem cant find a reasonable set of
    cross-cutting features!
  • Stipulative approach possible, but unattractive.

26
Lexical selection
  • MTT approach
  • noun specifies its Magn adjective
  • in Melcuk and Polguère (1987), Magn is a
    function, but could modify to make it a set, or
    vary meanings
  • stipulative if were going to do this, why not
    use a corpus directly?

27
Collocational account of distribution
  • all the adjectives share a denotation
    corresponding to magnitude (more details later),
    distribution differences due to collocation, soft
    rather than hard constraints
  • linguistically
  • adjective-noun combination is semi-productive
  • denotation and syntax allow heavy esteem etc, but
    speakers are sensitive to frequencies, prefer
    more frequent phrases with same meaning
  • cf morphology and sense extension Briscoe and
    Copestake (1999)
  • blocking (but weaker than with morphology)
  • anti-collocations as reflection of
    semi-productivity

28
Collocational account of distribution
  • computationally, fits with some current practice
  • filter adjective-noun realisations according to
    n-grams (statistical generation e.g., Langkilde
    and Knight)
  • use of co-occurrences in WSD
  • back-off techniques

29
Collocational vs denotational differences
heavy
high
Denotation difference
low
Collocation difference
30
Back-off and analogy
  • back-off decision for infrequent noun with no
    corpus evidence for specific magnitude adjective
  • based on productivity of adjective number of
    nouns it occurs with
  • default to big
  • back-off also sensitive to word clusters
  • e.g., heavy spindrift because spindrift is
    semantically similar to snow
  • semantic space models i.e., group according to
    distribution with other words
  • hence, adjective has some correlation with
    semantics of the noun

31
Metaphor again
  • extended metaphor idea is consistent with idea
    that clusters for backoff are based on semantic
    space
  • words cluster according to how they co-occur
  • e.g., high words cluster with rise words?
  • but this doesnt require that we interpret high
    literally and then coerce

32
More details denotation of extended adjective
uses
  • mass e.g., rain, and some plural e.g.,
    casualties
  • cf much, many
  • inherent measure e.g., grade, percentage, fine
  • other e.g., rainstorm, defeat, bombardment
  • attribute in qualia has Magn heavy rainstorm
    equivalent to storm with heavy rain
  • also heavy drinker etc

33
More details
  • Different uses cross-cut adjective distinction
    and domain categories
  • Want to have single extended sense and some form
    of co-composition
  • Further complications nouns with temporal
    duration
  • heavy rain not the same as persistent rain
  • heavy fighting but heavy drinking
  • how much of this do we have to encode
    specifically?

34
Connotation
  • heavy often has negative connotations
  • heavy fine but not ? heavy reward etc
  • heavy taxation versus high taxation
  • consistent with the semantic cluster / extended
    metaphor idea

35
Necessary experiments
  • None of this is tested yet!
  • Specify denotation, check for accuracy
  • Implement semi-productivity model with back-off
  • Determine predictability of adjective based on
    noun alone
  • Extension to other adjectives? Magnitude
    adjectives may be more lexical than others.

36
Conclusions
  • Testing collocational account of distribution
    requires fixing denotation
  • Magnitude adjectives assume same denotation
  • more complex denotations would need different
    experiments
  • Semi-productivity at the phrasal level
  • Back-off account is crucial

37
Some final comments
  • denotation, selectional restriction, collocation
    choice between mechanisms?
  • ngrams for language models for speech recognition
  • variants of semantic space models that are less
    sensitive to collocation effects?
  • can we remove collocation?
Write a Comment
User Comments (0)
About PowerShow.com