Evaluation of Association Measures - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluation of Association Measures

Description:

identify the practical feasibility of a certain AM for identifying collocations ... verbs: often causative-noncausative alternation. e.g., auf Eis legen (put on ice) ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 50
Provided by: brig154
Category:

less

Transcript and Presenter's Notes

Title: Evaluation of Association Measures


1
Evaluation of Association Measures
2
Want to
  • identify the practical feasibility of a certain
    AM for identifying collocations
  • which types of collocation
  • which corpora (domain, size)
  • high frequency versus low frequency data
  • compare the outcomes of different association
    measures

3
  • We have
  • differently ranked collocation candidates
  • We need
  • true collocation data for comparison, e.g
  • collocation lexica
  • list of true collocations occurring in the
    extraction corpus

4
Problems Inconveniences
  • using collocation lexica for evaluation
  • will not tell us how well an AM worked on a
    particular corpus
  • it only tells us that
  • some of the reference collocations also occur in
    in our base data and
  • the AM has found them

5
Problems Inconveniences
  • Using a list of true collocations occurring in
    the extraction corpus
  • requires a good deal of hand-annotation
  • requires objective criteria for the distinction
    of collocational and noncollocational word
    combinations in our candidate list

6
Our Approach
  • Evaluation of lexical association measures AMs
    against a manually identified reference corpus
    of true collocations (TPs)
  • Evaluation based on the full reference set
  • Precise, linguistically motivated definition of
    TPs
  • The evaluation of results based on recall and
    precision graphs

7
For Further Discussion
  • Testing for significance of AMs is an important
    but still open question
  • There is a potential for fine-tuning of AMs
    given a specific data set and a particular type
    of collocations to be extracted(Krenn, Evert
    2001)

8
  • Evaluation Experiments

9
Data
  • Extraction corpora
  • newspaper 8 million wordsFrankfurter Rundschau
    Corpus(ECI Multilingual Corpus 1)
  • newsgroup 10 million words FLAG corpus (LT-DFKI)

10
Data
  • Base data
  • list of PP-verb pairs (PN,V)-combinations
  • Collocation types
  • support verb constructions FVG
  • figurative expressions figur

11
Examples
12
Support Verb Constructions FVG
  • verb-object collocation
  • function as predicates
  • can be paraphrased by main verbs
  • NP-verb or PP-verb
  • verbal collocate (function verb / light verb /
    support verb)
  • main verb
  • conveys Aktionsart and causativity

13
Support Verb Constructions FVG
  • nominal collocate
  • abstract noun
  • often de-verbal or de-adjectival
  • contributes the core meaning
  • (prepositional collocate)
  • verbal and nominal collocate together determine
    the argument structure of the collocation

14
FVG Examples
pred. phrase verb Actionsart caus translation
in Betrieb gehen incho - go into operation
nehmen incho put into operation
setzen incho start up
sein neutral - be running
bleiben contin - keep on running
lassen contin keep (sth) running
15
FVG Examples
pred. phrase verb Actionsart caus translation
ausser Betrieb gehen termin - go out of sevice
nehmen termin take out of sevice
setzen termin stop
sein neutral - be out of order
bleiben contin - stay out of order
lassen contin keep out of order
16
Figurative Expressionsfigur
  • not restricted to NP/PP-verb
  • figurative reinterpretation of literal meaning
    required(e.g., unter die Haut gehen
    (get under ones skin)
  • nouns conrete
  • verbs often causative-noncausative alternation
    e.g., auf Eis legen (put on ice) auf Eis
    liegen (be on ice)

17
Decision TreeFVG versus figur
18
Frequency Distributions
19
Frequency Distributions
20
Frequency Distributions
21
Combination of Properties in the Candidate Lists
22
Evaluation Procedure
Source Corpus
candidate list
23
Evaluation Procedure
significance list
24
Evaluation Procedure N-best Lists
  • precision11/20 55

total 1280 TPs
25
Precision GraphPNV full forms
26
Base LineRandom Selection
27
Precision Graphs
28
Precision Graphs
29
Precision Graphs
30
Recall Graphs
31
Precision/Recall
32
Precision GraphsNewspaper, FVG figur
33
Precision Graphs Newspaper
FVG
  • figur

34
Precision GraphsAdjN
35
Precision GraphsAdjN
36
Precision/RecallAdjN
37
Frequency Layers AdjN Data
f ? 5
2 ? f lt 5
38
Frequency Layers PNV Data
f ? 10
3 ? f lt 5
39
Lemmas vs. Word Forms (PNV)
lemmas f ? 3
word forms f ? 3
40
Text Type and Domain (PNV)
news group discussions
newspaper
comparison for non-lemmatised candidates
41
The MI Mystery (FVG)
region of high "local precision" for 4.0 lt MI lt
7.5
42
Further particularities of the newspaper data
  • candidates with MI gt 7.5 are more frequent than
    expected under independence assumption
  • but very few FVG among them
  • data do not support the counter-MI argument of
    overestimation of data with low-frequency joint
    and marginal distributions

43
optimized MI
  • MI - 5.75
  • account for the FVG concentration
  • among 4.0lt MI gt 7.5
  • in the newspaper test data

44
Summary of Results
  • Best measures
  • t-score / frequency best for identifying PP-verb
    collocations (FVG, figur)
  • log-likelihood, t-score, Fisher, binominal and
    multinominal p value work well for AdjN

45
Summary of Results
  • Reproducibility of results for different text
    types
  • Precision results from newsgroup data comparable
    to newspaper data
  • Strong evidence that identical classes of
    collocations are similarly distributed in
    different types of corpora

46
Summary of Results
  • Differences in suitability of AMs to identify
    particular collocation types
  • (PN,V)-candidates with high MI score are less
    likely to be FVG
  • Log-likelihood not well suited for identifying
    FVG
  • but better suited for identifying figur

47
Summary of Results
  • Experimental results based either on a small
    number of best-scoring candidates or on more than
    the first 50 of the SLs are unreliable

48
Conclusion on AMs
  • Optimal results
  • do not necessarily come from a statistical
    discussion
  • but
  • from tuning on a particular data set

49
Vast LandLowest-frequency Data
  • lowest-frequency data (hapax legomena, dis
    legomena, ...) are a serious challenge for all
    statistical approaches
  • typical solution cut-off thresholds
  • Evert/Krenn used cut-off thresholds in evaluation
    to reduce manual annotation work
  • need to estimate number of TPs among excluded
    lowest-frequency candidates
Write a Comment
User Comments (0)
About PowerShow.com