Automatic Methods to Detect the Compositionality of Multiwords - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Methods to Detect the Compositionality of Multiwords

Description:

Current work (McCarthy) 'prototypical selectional preference models' acquired ... Is the argument prototypical for this predicate and argument relationship? E.g. ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 32
Provided by: dianamcc
Category:

less

Transcript and Presenter's Notes

Title: Automatic Methods to Detect the Compositionality of Multiwords


1
Automatic Methods to Detect the Compositionality
of Multiwords
2
Outline
  1. What we want to cover
  2. Why we do it
  3. A survey of current methods
  4. Approaches to evaluation
  5. Comparison of some of the results
  6. Conclusions
  7. Directions for the future

3
Compositionality, non-compositionality and
decomposability
  • Compositionality the meaning of the phrase is
    a function of the meaning of the parts

  • Non-Compositionality The meaning of the phrase
    is not a function of the meaning of the parts

  • Decomposability The meaning of the phrase can
    be ascribed to its parts
  • Idiosyncratic spill the beans, let the cat out
    of the bag
  • Simple traffic light, car park

4
Correlation (or confusion) of compositionality
  • with productivity
  • with statistical frequency of occurrence

5
Motivation
  • Any requirement for semantic interpretation will
    require handling of non-compositional multiwords
    in order to arrive at the correct interpretation
  • e.g. She kicked the bucket
  • Associated syntactic behaviour is needed for
    parsing
  • e.g. blow up the houses of parliament
  • Important for lexical acquisition
  • e.g. eat hot dog
  • Associated non-productive and syntactic
    behaviour important for generation
  • e.g. Wine and dine

6
Methods the main categories
  • Statistical p(see,red) /
    (p(see)p(red)
  • Translations see red lt-gt aberrear
  • Dictionaries listings, semantic
    codes and semantic relationships
  • Substitutions see red, see yellow,
    see blue
  • Distributional see look perceive
    gaze
  • red yellow
    orange blue

7
Statistical Methods
  • Statistical measures
  • e.g. pointwise mutual information
  • Venkatapathy and Joshi, (2006) useful for
    alignment
  • Syntactic flexibility
  • Fazly and Stevenson (2006) (verbnoun compounds)
  • idiomatic nature reflected
  • (passivization, determiner type and
    pluralization)

8
Translations
  • Melamed (1997) "non compositional compounds
    statistical comparison of translation models i)
    with concatenated words ii) separate words
  • Mukerjee et al (2006) Hindi-English Parallel
    corpora used for detecting Hindi complex
    predicates.
  • Venkatapathy and Joshi (2006) compositionality
    (PMI) used for alignment.
  • Translations from one ? many are not necessarily
    non-compositional
  • e.g. swimming pool (piscine) video tape (video),
  • Nevertheless, very useful to find collocations
    for a language pair
  • Villada Moirón and Tieldemann (2006) diversity of
    translations for an expression. Overlap of
    meaning of expression from translation and those
    of its component words.

9
Substitution Methods
baggage, luggage
  • Pearce (2001) Anti-collocations using WordNet
    synonyms
  • e.g. emotional baggage vs emotional luggage
  • Lin (1999) PMI 95 significant difference
    between phrase and phrase with close substitute.
    Close substitutes found from an automatically
    generated thesaurus (Lin,98)
  • e.g. see gaze, look, perceive
  • Lexical fixedness Fazly and Stevenson 2006
    (verbnoun compounds) as Lin (1999) but using
    difference in PMI between target and average of
    the PMI of the set of substitutes

10
Dictionary methods
  • Recognition of idiomatic tokens in a Japanese
    corpus using syntactic evidence and information
    in an idiom dictionary Hasimoto et al (2006)
  • Using hierarchical information in WordNet to
    model decomposability for evaluation (Baldwin et
    al. 2003)
  • Piao et al. (2006) lexical resource (Lancaster
    Semantic Lexicon) to compare meaning of listed
    multiword to that of its component words. Measure
    semantic distance using semantic tags given in
    lexicon

11
Substitution Methods Contd
  • What is being captured?
  • Bannard et al (2003) and Baldwin et al (2003)
    argue that these methods capture
    non-productivity, (simple decomposable
    collocations)
  • NB Pearce (2001) is explicitly targeting
    collocations rather than compositionality
  • Fazly and Stevenson (2006) acknowledge the
    partial relationship (compositionality and
    lexical fixedness) but the relationship exists
    nevertheless

12
Selectional Preference Models
  • Bannard (2002) verb particle data eat up
    ltobjectgt vs eat ltobjectgt
  • (Li and Abe, 1995) models acquired using corpus
    data and WordNet,
  • Current work (McCarthy) prototypical
    selectional preference models acquired using
    corpus data and an automatically generated
    thesaurus
  • (Lin, 98 see later)
  • e.g. drink ltobjectgt vs drink tea
  • e.g. throw ltobjectgt vs throw light

13
Distributional Approaches Latent Semantic
Analysis
Contexts of dog Contexts of dog
context frequency
bark 50
animal 30
food 10
water 5
drink 3
bath 1
14
Distributional Approaches Latent Semantic
Analysis
15
Distributional Approaches Thesaurus creation
  • Example dog, hot and hot dog
  • feed the dog, keep dogs, keep cats, stroke cats,
    feed the horse,
  • ---------------------------------
  • hot water cold water, hot milk, warm milk,
    boiling milk, hot weather
  • ------------------------------
  • eat the sandwich, eat the hot dog, cook the hot
    dog, serve the burger
  • dog cat animal pet horse
  • ---------------------------------
  • hot cold warm boiling mild
  • ---------------------------------
  • hot dog hamburger sandwich pizza

16
Distributional Approaches
  • Schone and Jurafsky (2001) LSA weighed sum of
    vectors for component words compared to MWE
    candidate
  • Baldwin et al (2003) decomposability (simple vs
    non or idiosyncratic)
  • of noun noun compounds and verb particle
    constructions. Compared vectors of constituent
    words in isolation
  • Bannard et al (2003) compare LSA with Lin (1999)
    on verb particle constructions
  • Katz and Giesbrecht (2006) do token analysis
    for 1 example "ins Wasser fallen" . Compare
    literal and compositional vectors for this
    example. Type based experiment with composed
    vectors where constituent words have occurred in
    isolation.

17
Distributional Methods
  • McCarthy et al. (2003) look at overlap of similar
    words (neighbours) in a distributional thesaurus
    for verb e.g. climb compared to verb and particle
    construction e.g. climb down

Various other measures, including number of
neighbours in the phrasal set with the same
particle, (minus the number having the same
particle in the simplex verb neighbours)
18
Combining approaches
  • Venkatapathy and Joshi (2005)
  • frequency
  • PMI
  • substitution based on Lin (1999)
  • distributed frequency of object,
  • distributed frequency of object with dissimilar
    verbs
  • LSA similarity of V-O with verbal form of O
  • LSA dissimilarity of V-O with V
  • All combined with SVM ranking

19
Method Selectional Preferences using
distributional thesaurus (McCarthy)
  • Is the argument prototypical for this predicate
    and argument relationship?
  • E.g. eat my hat
  • like substitution methods, but not explicitly
    looking for substitute
  • Verb direct objects
  • e.g. eat meal 5 dinner 5 tea 6 lunch 10 food 6
    sandwich 3 duck 1 cheese 2
  • hat 3

  • food sandwich, cheese, meat duck

  • ---------------------------------

  • meal dinner lunch tea supper

  • ---------------------------------

  • clothing shirt belt hat trousers

20
Methods for evaluation token based
  • token based
  • Hashimoto et al (2006) 300 example sentences of
    100 idioms, Information from dictionary for
    discrimination
  • Katz and Giesbrecht (2006) 67 occurrences of 1
    idiom (ins Wasser fallen)
  • literal and idiomatic readings have orthogonal
    LSA vectors
  • Compare individual token vectors to these

21
Methods for evaluation type based
  • Dictionary
  • Schone and Jurfasky (2001) Fazly and Stevenson
    (2001)
  • Using is-links (hyponymy)
  • Baldwin et al. (2003), WordNet
  • manual verification
  • Lin (1999)
  • Web as validation
  • Villavicencio (2005)
  • Hayes et al (2005)
  • Compositionality judgements
  • Contribution from constituents, (Bannard, 2002)
    (Bannard et al 2003)
  • Along a continuum (McCarthy et al 2003),
    (Venkatapath and Joshi, 2005)

22
Some results Compositionality Judgements on a
Continuum
  • McCarthy et al. (2003) 111 phrasal verb versus
    verb constructions
  • (0-10)
  • 3 native english speakers, highly significant
    Kendall coefficient of Concordance
  • Venkatapathy and Joshi (2005) 765 verb object
    pairs (1-6)
  • 2 fluent english speakers, Spearmans Rank
    Correlation Coefficient
  • Good level of agreement

carry out
cloud over
climb up
change hands take
interest announce
plan
23
Results McCarthy et al. datasets
Overlap rs Z score p under H0
X 30 0.166 1.74 0.04
X 50 0.136 1.43 0.08
OverlapS
X 30 0.306 3.21 lt0.0007
X 50 0.303 3.18 lt0.0007
24
Results McCarthy et al. datasets
X500 statistic Z score p under H0
sameparticle rs0.414 4.34 lt 0.00003
sameparticle-simplex rs0.49 5.17 lt0.00003
simplexasneighbour Mann Whitney 0.950 0.171
simplexrank rs-0.115 -1.21 0.113
simplexscore rs0.052 0.54 0.295
Piao et al (2006) Semantic lexicon (79/116) rs0.354 0.001357
25
Correlation of McCarthy et al (2003) human
rankings with statistics and dictionaries
statistic Z score P under H0
LLR rs -0.168 -1.76 0.0392
?2 rs -0.213 -2.22 0.0139
MI rs -0.248 -2.60 0.0047
Phrasal freq rs -0.096 -1.01 0.156
Simplex freq rs 0.092 0.96 0.169
WordNet Mann Whitney 2.39 0.0084
ANLT phrasals Mann Whitney 3.03 0.0012
26
Correlation of measures with man-made resources
(Mann Whitney Z scores)
In WordNet In ANLT phrasals
PMI -2.61 -4.53
sameparticle-simplex 3.71 4.59
27
Results with Venkatapathy and Joshi (2005) dataset
feature correlation feature correlation
1) Frequency (BNC) .129 2) PMI .203
3) Distributed frequency of object .111 4) Distributed frequency of object with dissimilar verbs .139
5) LSA dissimilarity of V-O with V .139 6) LSA similarity of V-O with verbal form of O .300
7) Lin (1999) substitution .210 Ranking SVM function (using 1-7) .448
McCarthy 1/pref score (638/765) -.403
28
Conclusions
  • Purpose of task should match method and
    evaluation
  • Evaluation is tricky
  • Decisions are not clear cut
  • Statistical measures and substitution methods
    may be useful, though capturing behaviour that
    correlates with compositionality
  • Distributional approaches promising for
    languages without resources
  • Selectional preferences may add useful
    information, alongside other measures

29
Future
  • Address tokens as well as types
  • Tokens on a continuum
  • Error analysis
  • Separating non-decomposable from
    idiosyncratically decomposable
  • Detecting what multiwords mean, distributional
    approaches might be promising in this respect
  • kick the bucket --- die
  • share datasets!!!

30
References
  • Baldwin, Timothy, Colin Bannard, Takaaki Tanaka
    and Dominic Widdows (2003) An Empirical Model of
    Multiword Expression Decomposability. In
    Proceedings of the ACL Workshop on Multiword
    Expressions Analysis, Acquisition and Treatment,
    Sapporo, Japan, pp. 8996.
  • Bannard, Colin (2002) Statistical Techniques for
    Automatically Inferring the Semantics of
    Verb-Particle Constructions LinGO Working Paper
    No. 2002-06 http//lingo.stanford.edu/pubs/WP-20
    02-06.pdf
  • Bannard, Colin, Timothy Baldwin and Alex
    Lascarides (2003) A Statistical Approach to the
    Semantics of Verb-Particles, In Proceedings of
    the ACL Workshop on Multiword Expressions
    Analysis, Acquisition and Treatment, Sapporo,
    Japan, pp. 6572.
  • Fazly, Afsaneh, and Suzanne Stevenson (2006)
    Automatically constructing a lexicon of verb
    phrase idiomatic combinations, In Proceedings of
    the 11th Conference of the European Chapter of
    the Association for Computational Linguistics
    (EACL), 337-344, Trento, Italy.
  • Hayes, Jer, Nuno Seco, and Tony Veale (2005)
    Creative discovery in the lexical validation gap.
    Computer Speech and Language, 19(4)513-523,
  • Hashimoto, Chikara, Sato Satoshi and Utsuro
    Takehito (2006) Japanese Idiom Recognition
    Drawing a Line between Literal and Idiomatic
    Meanings, In Proceedings of the COLING/ACL 2006
    Main Conference Poster Sessions pp 353-360,
    Sydney, Australia.
  • Katz, Graham and Eugenie Giesbrecht (2006)
    Automatic Identification of Non-Compositional
    Multi-Word Expressions using Latent Semantic
    Analysis, In Proceedings of the ACL Workshop on
    Multiword Expressions Identifying and Exploiting
    Underlying Properties Sydney Australia
  • Lin, Dekang (1998) Automatic Retrieval and
    Clustering of Similar Words Automatic, In
    Proceedings of 17th International Conference on
    Computational Linguistics and the 36th Annual
    Meeting of the Association for Computational
    Linguistics Montreal, Canada.
  • Lin, Dekang (1999) Automatic Identification of
    Non-Compositional Phrases, In Proceedings of
    ACL-99, pp.317--324. University of Maryland,
    Colledge Park, Maryland.
  • Melamed, I. Dan (1997) Automatic Discovery of
    Non-Compositional Compounds in Parallel Data, in
    Proceedings of the 2nd Conference on Empirical
    Methods in Natural Language Processing (EMNLP),
    Providence, RI.

31
References continued
  • McCarthy, Diana, Bill Keller and John Carroll
    (2003) Detecting a Continuum of Compositionality
    in Phrasal Verbs. In Proceedings of the
    ACL-SIGLEX Workshop on Multiword Expressions
    Analysis, Acquisition and Treatment , Sapporo,
    Japan.
  • Mukerjee, Amitabha, Ankit Soni and Achla M Raina
    (2006) Detecting Complex Predicates in Hindi
    using POS Projection across Parallel Corpora In
    Proceedings of the ACL Workshop on Multiword
    Expressions Identifying and Exploiting
    Underlying Properties pp 28-35 Sydney Australia
  • Pearce, Darren (2001) Synonymy in Collocation
    Extraction. In WordNet and Other Lexical
    Resources Applications, Extensions and
    Customizations (NAACL 2001 Workshop). pp 41-46.
    June. 2001. Carnegie Mellon University,
    Pittsburgh.
  • Piao, Scott S.L., Paul Rayson, Olga Mudraya,
    Andrew Wilson and Roger Garside (2006) Measuring
    MWE Compositionality Using Semantic Annotation In
    Proceedings of the ACL Workshop on Multiword
    Expressions Identifying and Exploiting
    Underlying Properties Sydney Australia pp 28-35
  • Schone, Patrick and Daniel Jurafsky (2001) Is
    Knowledge-Free Induction of Multiword Unit
    Dictionary Headwords a Solved Problem?
    Proceedings of Empirical Methods in Natural
    Language Processing, Pittsburgh, PA.
  • Venkatapathy, Sriram and Aravind, K. Joshi (2005)
    Measuring the relative compositionality of
    verb-noun (V-N) collocations by integrating
    features. In Proceedings of HLT/EMNLP, Vancouver.
  • Villada Moirón, Begoña and Joerg Tiedemann
    (2006). Identifying idiomatic expressions using
    automatic word-alignment. In Proceedings of the
    EACL Workshop on Multiword Expressions in a
    Multilingual Context. Trento, Italy.
  • Villavicencio, A.  (2005) The availability of
    verb-particle constructions in lexical resources
    How. much is enough? Computer Speech and
    Language, 19(4)
Write a Comment
User Comments (0)
About PowerShow.com