METEOR MBLEU and MTER: Flexible Matching - PowerPoint PPT Presentation

About This Presentation
Title:

METEOR MBLEU and MTER: Flexible Matching

Description:

Originally developed in 2005 as an automatic metric ... for BLEU,TER ... Compute any metric (BLEU, TER, etc.) with the new targeted references ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 14
Provided by: csC76
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: METEOR MBLEU and MTER: Flexible Matching


1
METEOR M-BLEU and M-TERFlexible Matching
Parameter Tuning for MT Evaluation
  • Alon Lavie and Abhaya Agarwal
  • Language Technologies Institute
  • Carnegie Mellon University

2
METEOR
  • Originally developed in 2005 as an automatic
    metric designed for higher correlation with human
    judgments at the sentence level
  • Main ingredients
  • Extended Matching between translation and
    reference
  • Unigram Precision, Recall ? parameterized
    F-measure
  • Reordering Penalty
  • Parameters can be tuned to optimize correlation
    with human judgments
  • Our previous work established improved
    correlation with human judgments (compared with
    BLEU and other metrics)
  • Not biased against non-statistical MT systems
  • Only metric that correctly ranked NIST MT Eval-06
    Arabic systems
  • Was used as primary metric in DARPA TRANSTAC and
    ET-07 Evaluations, one of several metrics in NIST
    MT Eval, IWSLT, WMT
  • Main change in latest versions (v0.6, v0.7,
    METEOR-ranking)
  • Retuning of free parameters optimizing on
    different criteria and data sets

3
METEOR Flexible Word Matching
  • Words in reference translation and MT hypothesis
    are matched using a series of modules with
    increasingly loose criteria of matching
  • Exact match
  • Porter Stemmer
  • Word Net based Synonymy
  • A word-to-word alignment for the sentence pair is
    computed using these word level matchings
  • NP-hard in general, uses fast approximate search

4
Alignment Example
5
Alignment Example
6
Alignment Example
7
Alignment Example
8
METEOR Score Computation
  • Weighted combination of the unigram precision and
    recall
  • A fragmentation penalty to address fluency
  • A chunk is a monotonic sequence of aligned
    words
  • Final Score

9
METEOR Parameter Tuning
  • The 3 free parameters in the metric are tuned to
    obtain maximum correlation with human judgements.
  • Since the ranges of the parameters are bounded,
    we perform an exhaustive search
  • METEOR v0.6 was released in 2007 was tuned to
    obtain optimal correlations with adequacy and
    fluency human judgements used as a baseline
    metric for MetricsMATR
  • Additional versions of METEOR submitted
  • METEOR v0.7 parameters re-tuned for adequacy on
    data released for MetricsMATR
  • METEOR-ranking parameters re-tuned for ranking
    judgements on ranking data released for
    MetricsMATR

10
Computing Ranking Correlation
  • Optimization Criterion maximize the total number
    of correct binary classifications based on metric
    score across all translation pairs and all
    training sentences
  • Equivalent translations are ignored
  • Possible alternative method
  • Create full rankings for each sentence and
    maximize average Spearman ranking correlation
  • We tried both ways, no significant differences in
    performance

11
Flexible Matching for BLEU,TER
  • The flexible matching in METEOR can be used to
    extend any metric that is based on word overlap
    between translation and reference(s)
  • Compute the alignment between reference and
    hypothesis using the METEOR matcher
  • Create a targeted reference by substituting
    words in the reference with their matched
    equivalences from the translation hypothesis.
  • Compute any metric (BLEU, TER, etc.) with the new
    targeted references

12
Discussion and Future Work
  • METEOR has consistently demonstrated
    comparatively high levels of correlation with
    human judgements in multiple evaluations in
    recent years
  • Simple and relatively fast to compute
  • Some minor issues for using it in MERT being
    resolved
  • More sophisticated paraphrase detection methods
    (multi-word correspondences, such as compounds in
    German)
  • Weights for different words or POSs
  • Incorporation of additional syntactic and
    semantic features such as dependency links

13
Download Information
  • All versions of METEOR and also a stand-alone
    version of the word alignment matcher can be
    freely downloaded from our website
  • http//www.cs.cmu.edu/alavie/METEOR/
Write a Comment
User Comments (0)
About PowerShow.com