Title: Automatic Methods to Detect the Compositionality of Multiwords
1Automatic Methods to Detect the Compositionality
of Multiwords
2Outline
- What we want to cover
- Why we do it
- A survey of current methods
- Approaches to evaluation
- Comparison of some of the results
- Conclusions
- Directions for the future
3Compositionality, non-compositionality and
decomposability
- Compositionality the meaning of the phrase is
a function of the meaning of the parts -
-
- Non-Compositionality The meaning of the phrase
is not a function of the meaning of the parts -
- Decomposability The meaning of the phrase can
be ascribed to its parts -
- Idiosyncratic spill the beans, let the cat out
of the bag - Simple traffic light, car park
4Correlation (or confusion) of compositionality
- with productivity
- with statistical frequency of occurrence
5Motivation
- Any requirement for semantic interpretation will
require handling of non-compositional multiwords
in order to arrive at the correct interpretation - e.g. She kicked the bucket
- Associated syntactic behaviour is needed for
parsing - e.g. blow up the houses of parliament
- Important for lexical acquisition
- e.g. eat hot dog
- Associated non-productive and syntactic
behaviour important for generation - e.g. Wine and dine
6Methods the main categories
- Statistical p(see,red) /
(p(see)p(red)
-
- Translations see red lt-gt aberrear
- Dictionaries listings, semantic
codes and semantic relationships - Substitutions see red, see yellow,
see blue - Distributional see look perceive
gaze - red yellow
orange blue
7Statistical Methods
- Statistical measures
- e.g. pointwise mutual information
- Venkatapathy and Joshi, (2006) useful for
alignment - Syntactic flexibility
- Fazly and Stevenson (2006) (verbnoun compounds)
- idiomatic nature reflected
- (passivization, determiner type and
pluralization) -
8Translations
- Melamed (1997) "non compositional compounds
statistical comparison of translation models i)
with concatenated words ii) separate words - Mukerjee et al (2006) Hindi-English Parallel
corpora used for detecting Hindi complex
predicates. - Venkatapathy and Joshi (2006) compositionality
(PMI) used for alignment. - Translations from one ? many are not necessarily
non-compositional - e.g. swimming pool (piscine) video tape (video),
- Nevertheless, very useful to find collocations
for a language pair - Villada Moirón and Tieldemann (2006) diversity of
translations for an expression. Overlap of
meaning of expression from translation and those
of its component words.
9Substitution Methods
baggage, luggage
- Pearce (2001) Anti-collocations using WordNet
synonyms - e.g. emotional baggage vs emotional luggage
- Lin (1999) PMI 95 significant difference
between phrase and phrase with close substitute.
Close substitutes found from an automatically
generated thesaurus (Lin,98) - e.g. see gaze, look, perceive
- Lexical fixedness Fazly and Stevenson 2006
(verbnoun compounds) as Lin (1999) but using
difference in PMI between target and average of
the PMI of the set of substitutes
10Dictionary methods
- Recognition of idiomatic tokens in a Japanese
corpus using syntactic evidence and information
in an idiom dictionary Hasimoto et al (2006) - Using hierarchical information in WordNet to
model decomposability for evaluation (Baldwin et
al. 2003) - Piao et al. (2006) lexical resource (Lancaster
Semantic Lexicon) to compare meaning of listed
multiword to that of its component words. Measure
semantic distance using semantic tags given in
lexicon
11Substitution Methods Contd
- What is being captured?
- Bannard et al (2003) and Baldwin et al (2003)
argue that these methods capture
non-productivity, (simple decomposable
collocations) - NB Pearce (2001) is explicitly targeting
collocations rather than compositionality - Fazly and Stevenson (2006) acknowledge the
partial relationship (compositionality and
lexical fixedness) but the relationship exists
nevertheless
12Selectional Preference Models
- Bannard (2002) verb particle data eat up
ltobjectgt vs eat ltobjectgt - (Li and Abe, 1995) models acquired using corpus
data and WordNet, - Current work (McCarthy) prototypical
selectional preference models acquired using
corpus data and an automatically generated
thesaurus - (Lin, 98 see later)
- e.g. drink ltobjectgt vs drink tea
- e.g. throw ltobjectgt vs throw light
-
13Distributional Approaches Latent Semantic
Analysis
Contexts of dog Contexts of dog
context frequency
bark 50
animal 30
food 10
water 5
drink 3
bath 1
14Distributional Approaches Latent Semantic
Analysis
15Distributional Approaches Thesaurus creation
- Example dog, hot and hot dog
- feed the dog, keep dogs, keep cats, stroke cats,
feed the horse, - ---------------------------------
- hot water cold water, hot milk, warm milk,
boiling milk, hot weather - ------------------------------
- eat the sandwich, eat the hot dog, cook the hot
dog, serve the burger -
- dog cat animal pet horse
- ---------------------------------
- hot cold warm boiling mild
- ---------------------------------
- hot dog hamburger sandwich pizza
16Distributional Approaches
- Schone and Jurafsky (2001) LSA weighed sum of
vectors for component words compared to MWE
candidate - Baldwin et al (2003) decomposability (simple vs
non or idiosyncratic) - of noun noun compounds and verb particle
constructions. Compared vectors of constituent
words in isolation - Bannard et al (2003) compare LSA with Lin (1999)
on verb particle constructions - Katz and Giesbrecht (2006) do token analysis
for 1 example "ins Wasser fallen" . Compare
literal and compositional vectors for this
example. Type based experiment with composed
vectors where constituent words have occurred in
isolation.
17Distributional Methods
- McCarthy et al. (2003) look at overlap of similar
words (neighbours) in a distributional thesaurus
for verb e.g. climb compared to verb and particle
construction e.g. climb down
Various other measures, including number of
neighbours in the phrasal set with the same
particle, (minus the number having the same
particle in the simplex verb neighbours)
18Combining approaches
- Venkatapathy and Joshi (2005)
- frequency
- PMI
- substitution based on Lin (1999)
- distributed frequency of object,
- distributed frequency of object with dissimilar
verbs - LSA similarity of V-O with verbal form of O
- LSA dissimilarity of V-O with V
- All combined with SVM ranking
19Method Selectional Preferences using
distributional thesaurus (McCarthy)
- Is the argument prototypical for this predicate
and argument relationship? - E.g. eat my hat
- like substitution methods, but not explicitly
looking for substitute - Verb direct objects
- e.g. eat meal 5 dinner 5 tea 6 lunch 10 food 6
sandwich 3 duck 1 cheese 2 - hat 3
-
food sandwich, cheese, meat duck -
--------------------------------- -
meal dinner lunch tea supper -
--------------------------------- -
clothing shirt belt hat trousers
20Methods for evaluation token based
- token based
- Hashimoto et al (2006) 300 example sentences of
100 idioms, Information from dictionary for
discrimination - Katz and Giesbrecht (2006) 67 occurrences of 1
idiom (ins Wasser fallen) - literal and idiomatic readings have orthogonal
LSA vectors - Compare individual token vectors to these
21Methods for evaluation type based
- Dictionary
- Schone and Jurfasky (2001) Fazly and Stevenson
(2001) - Using is-links (hyponymy)
- Baldwin et al. (2003), WordNet
- manual verification
- Lin (1999)
- Web as validation
- Villavicencio (2005)
- Hayes et al (2005)
- Compositionality judgements
- Contribution from constituents, (Bannard, 2002)
(Bannard et al 2003) - Along a continuum (McCarthy et al 2003),
(Venkatapath and Joshi, 2005)
22Some results Compositionality Judgements on a
Continuum
- McCarthy et al. (2003) 111 phrasal verb versus
verb constructions - (0-10)
- 3 native english speakers, highly significant
Kendall coefficient of Concordance - Venkatapathy and Joshi (2005) 765 verb object
pairs (1-6) - 2 fluent english speakers, Spearmans Rank
Correlation Coefficient - Good level of agreement
carry out
cloud over
climb up
change hands take
interest announce
plan
23Results McCarthy et al. datasets
Overlap rs Z score p under H0
X 30 0.166 1.74 0.04
X 50 0.136 1.43 0.08
OverlapS
X 30 0.306 3.21 lt0.0007
X 50 0.303 3.18 lt0.0007
24Results McCarthy et al. datasets
X500 statistic Z score p under H0
sameparticle rs0.414 4.34 lt 0.00003
sameparticle-simplex rs0.49 5.17 lt0.00003
simplexasneighbour Mann Whitney 0.950 0.171
simplexrank rs-0.115 -1.21 0.113
simplexscore rs0.052 0.54 0.295
Piao et al (2006) Semantic lexicon (79/116) rs0.354 0.001357
25Correlation of McCarthy et al (2003) human
rankings with statistics and dictionaries
statistic Z score P under H0
LLR rs -0.168 -1.76 0.0392
?2 rs -0.213 -2.22 0.0139
MI rs -0.248 -2.60 0.0047
Phrasal freq rs -0.096 -1.01 0.156
Simplex freq rs 0.092 0.96 0.169
WordNet Mann Whitney 2.39 0.0084
ANLT phrasals Mann Whitney 3.03 0.0012
26Correlation of measures with man-made resources
(Mann Whitney Z scores)
In WordNet In ANLT phrasals
PMI -2.61 -4.53
sameparticle-simplex 3.71 4.59
27Results with Venkatapathy and Joshi (2005) dataset
feature correlation feature correlation
1) Frequency (BNC) .129 2) PMI .203
3) Distributed frequency of object .111 4) Distributed frequency of object with dissimilar verbs .139
5) LSA dissimilarity of V-O with V .139 6) LSA similarity of V-O with verbal form of O .300
7) Lin (1999) substitution .210 Ranking SVM function (using 1-7) .448
McCarthy 1/pref score (638/765) -.403
28Conclusions
- Purpose of task should match method and
evaluation - Evaluation is tricky
- Decisions are not clear cut
- Statistical measures and substitution methods
may be useful, though capturing behaviour that
correlates with compositionality - Distributional approaches promising for
languages without resources - Selectional preferences may add useful
information, alongside other measures
29Future
- Address tokens as well as types
- Tokens on a continuum
- Error analysis
- Separating non-decomposable from
idiosyncratically decomposable - Detecting what multiwords mean, distributional
approaches might be promising in this respect
- kick the bucket --- die
- share datasets!!!
30References
- Baldwin, Timothy, Colin Bannard, Takaaki Tanaka
and Dominic Widdows (2003) An Empirical Model of
Multiword Expression Decomposability. In
Proceedings of the ACL Workshop on Multiword
Expressions Analysis, Acquisition and Treatment,
Sapporo, Japan, pp. 8996. - Bannard, Colin (2002) Statistical Techniques for
Automatically Inferring the Semantics of
Verb-Particle Constructions LinGO Working Paper
No. 2002-06 http//lingo.stanford.edu/pubs/WP-20
02-06.pdf - Bannard, Colin, Timothy Baldwin and Alex
Lascarides (2003) A Statistical Approach to the
Semantics of Verb-Particles, In Proceedings of
the ACL Workshop on Multiword Expressions
Analysis, Acquisition and Treatment, Sapporo,
Japan, pp. 6572. - Fazly, Afsaneh, and Suzanne Stevenson (2006)
Automatically constructing a lexicon of verb
phrase idiomatic combinations, In Proceedings of
the 11th Conference of the European Chapter of
the Association for Computational Linguistics
(EACL), 337-344, Trento, Italy. - Hayes, Jer, Nuno Seco, and Tony Veale (2005)
Creative discovery in the lexical validation gap.
Computer Speech and Language, 19(4)513-523, - Hashimoto, Chikara, Sato Satoshi and Utsuro
Takehito (2006) Japanese Idiom Recognition
Drawing a Line between Literal and Idiomatic
Meanings, In Proceedings of the COLING/ACL 2006
Main Conference Poster Sessions pp 353-360,
Sydney, Australia. - Katz, Graham and Eugenie Giesbrecht (2006)
Automatic Identification of Non-Compositional
Multi-Word Expressions using Latent Semantic
Analysis, In Proceedings of the ACL Workshop on
Multiword Expressions Identifying and Exploiting
Underlying Properties Sydney Australia - Lin, Dekang (1998) Automatic Retrieval and
Clustering of Similar Words Automatic, In
Proceedings of 17th International Conference on
Computational Linguistics and the 36th Annual
Meeting of the Association for Computational
Linguistics Montreal, Canada. - Lin, Dekang (1999) Automatic Identification of
Non-Compositional Phrases, In Proceedings of
ACL-99, pp.317--324. University of Maryland,
Colledge Park, Maryland. - Melamed, I. Dan (1997) Automatic Discovery of
Non-Compositional Compounds in Parallel Data, in
Proceedings of the 2nd Conference on Empirical
Methods in Natural Language Processing (EMNLP),
Providence, RI.
31References continued
- McCarthy, Diana, Bill Keller and John Carroll
(2003) Detecting a Continuum of Compositionality
in Phrasal Verbs. In Proceedings of the
ACL-SIGLEX Workshop on Multiword Expressions
Analysis, Acquisition and Treatment , Sapporo,
Japan. - Mukerjee, Amitabha, Ankit Soni and Achla M Raina
(2006) Detecting Complex Predicates in Hindi
using POS Projection across Parallel Corpora In
Proceedings of the ACL Workshop on Multiword
Expressions Identifying and Exploiting
Underlying Properties pp 28-35 Sydney Australia - Pearce, Darren (2001) Synonymy in Collocation
Extraction. In WordNet and Other Lexical
Resources Applications, Extensions and
Customizations (NAACL 2001 Workshop). pp 41-46.
June. 2001. Carnegie Mellon University,
Pittsburgh. - Piao, Scott S.L., Paul Rayson, Olga Mudraya,
Andrew Wilson and Roger Garside (2006) Measuring
MWE Compositionality Using Semantic Annotation In
Proceedings of the ACL Workshop on Multiword
Expressions Identifying and Exploiting
Underlying Properties Sydney Australia pp 28-35 - Schone, Patrick and Daniel Jurafsky (2001) Is
Knowledge-Free Induction of Multiword Unit
Dictionary Headwords a Solved Problem?
Proceedings of Empirical Methods in Natural
Language Processing, Pittsburgh, PA. - Venkatapathy, Sriram and Aravind, K. Joshi (2005)
Measuring the relative compositionality of
verb-noun (V-N) collocations by integrating
features. In Proceedings of HLT/EMNLP, Vancouver. - Villada Moirón, Begoña and Joerg Tiedemann
(2006). Identifying idiomatic expressions using
automatic word-alignment. In Proceedings of the
EACL Workshop on Multiword Expressions in a
Multilingual Context. Trento, Italy. - Villavicencio, A. (2005) The availability of
verb-particle constructions in lexical resources
How. much is enough? Computer Speech and
Language, 19(4)