The Significance of Result Differences - PowerPoint PPT Presentation

About This Presentation
Title:

The Significance of Result Differences

Description:

everybody knows we have to test the significance of our results. but do we really? ... for a sound association measure isoline { = c} is lower boundary ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 33
Provided by: brig154
Category:

less

Transcript and Presenter's Notes

Title: The Significance of Result Differences


1
The Significance of Result Differences
2
Why Significance Tests?
  • everybody knows we have to test the significance
    of our results
  • but do we really?
  • evaluation results are valid for
  • data from specific corpus
  • extracted with specific methods
  • for a particular type of collocations
  • according to the intuitions of one particular
    annotator (or two)

3
Why Significance Tests?
  • significance tests are about generalisations
  • basic question"If we repeated the evaluation
    experiment (on similar data), would we get the
    same results?"
  • influence of source corpus, domain, collocation
    type and definition, annotation guidelines, ...

4
Evaluation of Association Measures
5
Evaluation of Association Measures
6
A Different Perspective
  • pair types are described by tables (O11, O12,
    O21, O22)? coordinates in 4-D space
  • O22 is redundant becauseO11 O12 O21 O22
    N
  • can also describe pair type by joint and marginal
    frequencies(f, f1, f2) "coordinates" ?
    coordinates in 3-D space

7
A Different Perspective
  • data set cloud of points in three-dimensional
    space
  • visualisation is "challenging"
  • many association measures depend on O11 and E11
    only(MI, gmean, t-score, binomial)
  • projection to (O11, E11) ? coordinates in 2-D
    space(ignoring the ratio f1 / f2)

8
The Parameter Space of Collocation Candidates
9
The Parameter Space of Collocation Candidates
10
The Parameter Space of Collocation Candidates
11
The Parameter Space of Collocation Candidates
12
The Parameter Space of Collocation Candidates
13
N-best Lists in Parameter Space
  • N-best List for AM ? includes all pair types
    where score ? ? c(threshold c obtained from
    data)
  • ? ? c describes a subset of the parameter space
  • for a sound association measure isoline ? c
    is lower boundary(because scores should increase
    with O11 for fixed value of E11)

14
N-Best Isolines in the Parameter Space
MI
15
N-Best Isolines in theParameter Space
MI
16
N-Best Isolines in theParameter Space
t-score
17
N-Best Isolines in theParameter Space
t-score
18
95 Confidence Interval
19
99 Confidence Interval
20
95 Confidence Interval
21
Comparing Precision Values
  • number of TPs and FPs for 1000-best lists

22
McNemar's Test
  • in 1000-best list not in 1000-best
    list
  • ideally all TPs in 1000-best list (possible!)
  • H0 differences between AMs are random

23
McNemar's Test
  • in 1000-best list not in 1000-best
    list
  • gt mcnemar.test(tbl)
  • p-value lt 0.001 ? highly significant

24
Significant Differences
25
Significant Differences
26
Significant Differences
27
Lowest-Frequency Data Samples
  • Too much data for full manual evaluation ? random
    samples
  • AdjN data
  • 965 pairs with f 1 (15 sample)
  • manually identified 31 TPs (3.2)
  • PNV data
  • 983 pairs with f lt 3 (0.35 sample)
  • manually identified 6 TPs (0.6)

28
Lowest-Frequency Data Samples
  • Estimate proportion p of TPs among all
    lowest-frequency data
  • Confidence set from binomial test
  • AdjN 31 TPs among 965 items
  • p ? 5 with 99 confidence
  • at most ? 320 TPs
  • PNV 6 TPs among 983-items
  • p ? 1.5 with 99 confidence
  • there might still be ? 4200 TPs !!

29
N-best Lists for Lowest-Frequency Data
  • evaluate 10,000-best lists
  • to reduce manual annotation work,take 10 sample
    from each list(i.e. 1,000 candidates for each
    AM)
  • precision graphs for N-best lists
  • up to N 10,000 for the PNV data
  • 95 confidence estimates for precision of
    best-performing AM (from binomial test)

30
Random Sample Evaluation
31
Random Sample Evaluation
32
Random Sample Evaluation
Write a Comment
User Comments (0)
About PowerShow.com