Title: Evaluation of Association Measures
1Evaluation of Association Measures
2Want to
- identify the practical feasibility of a certain
AM for identifying collocations - which types of collocation
- which corpora (domain, size)
- high frequency versus low frequency data
- compare the outcomes of different association
measures
3- We have
- differently ranked collocation candidates
- We need
- true collocation data for comparison, e.g
- collocation lexica
- list of true collocations occurring in the
extraction corpus
4Problems Inconveniences
- using collocation lexica for evaluation
- will not tell us how well an AM worked on a
particular corpus - it only tells us that
- some of the reference collocations also occur in
in our base data and - the AM has found them
5Problems Inconveniences
- Using a list of true collocations occurring in
the extraction corpus - requires a good deal of hand-annotation
- requires objective criteria for the distinction
of collocational and noncollocational word
combinations in our candidate list
6Our Approach
- Evaluation of lexical association measures AMs
against a manually identified reference corpus
of true collocations (TPs) - Evaluation based on the full reference set
- Precise, linguistically motivated definition of
TPs - The evaluation of results based on recall and
precision graphs
7For Further Discussion
- Testing for significance of AMs is an important
but still open question - There is a potential for fine-tuning of AMs
given a specific data set and a particular type
of collocations to be extracted(Krenn, Evert
2001)
8 9Data
- Extraction corpora
- newspaper 8 million wordsFrankfurter Rundschau
Corpus(ECI Multilingual Corpus 1) - newsgroup 10 million words FLAG corpus (LT-DFKI)
10Data
- Base data
- list of PP-verb pairs (PN,V)-combinations
- Collocation types
- support verb constructions FVG
- figurative expressions figur
11Examples
12Support Verb Constructions FVG
- verb-object collocation
- function as predicates
- can be paraphrased by main verbs
- NP-verb or PP-verb
- verbal collocate (function verb / light verb /
support verb) - main verb
- conveys Aktionsart and causativity
13Support Verb Constructions FVG
- nominal collocate
- abstract noun
- often de-verbal or de-adjectival
- contributes the core meaning
- (prepositional collocate)
- verbal and nominal collocate together determine
the argument structure of the collocation
14FVG Examples
pred. phrase verb Actionsart caus translation
in Betrieb gehen incho - go into operation
nehmen incho put into operation
setzen incho start up
sein neutral - be running
bleiben contin - keep on running
lassen contin keep (sth) running
15FVG Examples
pred. phrase verb Actionsart caus translation
ausser Betrieb gehen termin - go out of sevice
nehmen termin take out of sevice
setzen termin stop
sein neutral - be out of order
bleiben contin - stay out of order
lassen contin keep out of order
16Figurative Expressionsfigur
- not restricted to NP/PP-verb
- figurative reinterpretation of literal meaning
required(e.g., unter die Haut gehen
(get under ones skin) - nouns conrete
- verbs often causative-noncausative alternation
e.g., auf Eis legen (put on ice) auf Eis
liegen (be on ice)
17Decision TreeFVG versus figur
18Frequency Distributions
19Frequency Distributions
20Frequency Distributions
21 Combination of Properties in the Candidate Lists
22Evaluation Procedure
Source Corpus
candidate list
23Evaluation Procedure
significance list
24Evaluation Procedure N-best Lists
total 1280 TPs
25Precision GraphPNV full forms
26Base LineRandom Selection
27Precision Graphs
28Precision Graphs
29Precision Graphs
30Recall Graphs
31Precision/Recall
32Precision GraphsNewspaper, FVG figur
33Precision Graphs Newspaper
FVG
34Precision GraphsAdjN
35Precision GraphsAdjN
36Precision/RecallAdjN
37Frequency Layers AdjN Data
f ? 5
2 ? f lt 5
38Frequency Layers PNV Data
f ? 10
3 ? f lt 5
39Lemmas vs. Word Forms (PNV)
lemmas f ? 3
word forms f ? 3
40Text Type and Domain (PNV)
news group discussions
newspaper
comparison for non-lemmatised candidates
41The MI Mystery (FVG)
region of high "local precision" for 4.0 lt MI lt
7.5
42Further particularities of the newspaper data
- candidates with MI gt 7.5 are more frequent than
expected under independence assumption - but very few FVG among them
- data do not support the counter-MI argument of
overestimation of data with low-frequency joint
and marginal distributions
43optimized MI
- MI - 5.75
- account for the FVG concentration
- among 4.0lt MI gt 7.5
- in the newspaper test data
44Summary of Results
- Best measures
- t-score / frequency best for identifying PP-verb
collocations (FVG, figur) - log-likelihood, t-score, Fisher, binominal and
multinominal p value work well for AdjN
45Summary of Results
- Reproducibility of results for different text
types - Precision results from newsgroup data comparable
to newspaper data - Strong evidence that identical classes of
collocations are similarly distributed in
different types of corpora
46Summary of Results
- Differences in suitability of AMs to identify
particular collocation types - (PN,V)-candidates with high MI score are less
likely to be FVG - Log-likelihood not well suited for identifying
FVG - but better suited for identifying figur
47Summary of Results
- Experimental results based either on a small
number of best-scoring candidates or on more than
the first 50 of the SLs are unreliable
48Conclusion on AMs
- Optimal results
- do not necessarily come from a statistical
discussion - but
- from tuning on a particular data set
49Vast LandLowest-frequency Data
- lowest-frequency data (hapax legomena, dis
legomena, ...) are a serious challenge for all
statistical approaches - typical solution cut-off thresholds
- Evert/Krenn used cut-off thresholds in evaluation
to reduce manual annotation work - need to estimate number of TPs among excluded
lowest-frequency candidates