Evaluation State-of the-art and future actions - PowerPoint PPT Presentation

About This Presentation
Title:

Evaluation State-of the-art and future actions

Description:

More than 150 papers were submitted to the Evaluation track, both Written and Spoken ... Building on the same thinking as the FEMTI taxonomy ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 8
Provided by: bentema
Learn more at: http://www.cocosda.org
Category:

less

Transcript and Presenter's Notes

Title: Evaluation State-of the-art and future actions


1
EvaluationState-of the-art and future actions
  • Bente Maegaard
  • CST, University of Copenhagen
  • bente_at_cst.dk

2
Evaluation at LREC
  • More than 150 papers were submitted to the
    Evaluation track, both Written and Spoken
  • This is a significant rise compared to previous
    years
  • Evaluation as a field is attracting increasing
    interest.
  • Many papers discuss evaluation methodology, the
    field is still under development, and the answers
    to some of the methodological questions are still
    not known.
  • An example MT
  • Automatic evaluation
  • Evaluation in Context (task-based,
    function-based)

3
Evaluation Written
  • Parsing evaluation 6
  • Semantics, sense 6
  • Evaluation methodologies 7
  • Time annotation 9
  • MT 13
  • Annotation, alignment, morph. 15
  • Lexica, tools 21
  • QA, IR, IE, summarisation, authoring 25
  • Total 102
  • Note These figures may contain papers that were
    originally in other tracks.

4
Discussion MT evaluation
  • MT evaluation since 1965
  • Van Slype Adequacy, fluency, fidelity,
  • Human evaluation, expensive, time-consuming,
    problems with counting of errors, objective?
  • Formalising human evaluation, adding e.g.
    grammaticality
  • Another measure Cost of post-editing, objective
  • Automatic evaluation Papineni et al. 2001 BLEU,
    with various modifications. Expensive to
    establish the reference translations, after that
    cheap and fast.
  • However, research shows that this automatic
    method does not correlate well with human
    evaluation, also does not correlate with the cost
    of post-editing etc.
  • Automatic statistical evaluation can probably be
    used for evaluation of MT for gisting, but it
    cannot be used for MT for publishing

5
Digression Metrics that do not work
  • Why is it so difficult to evaluate MT?
  • Because there is more than one correct answer.
  • And because answers may be more or less correct.
  • Measures like WER are not relevant for MT.
  • Methods relying on a specific number of words in
    the translation are not OK (if the translation
    does not have the same number of words as the
    reference)

6
Generic Contextual Quality Model (GCQM)
  • Popescu-Belis et al. LREC2006
  • Building on the same thinking as the FEMTI
    taxonomy
  • One can only evaluate a system in the context in
    which it will be used.
  • Quality workshop 27/5 task-based, function-based
    evaluation. (Huang, Take)
  • Karen Sparck-Jones the set-up
  • So, the understanding that a system can only be
    reasonably evaluated wrt. a specific task, is
    accepted
  • Domain-specific vs. general purpose MT

7
What do we need? When?
  • What?
  • In the field of MT evaluation we need more
    experiments in order to establish a methodology.
  • The French CESTA (Hamon et al, LREC2006) is a
    good example.
  • So, we need international cooperation for the
    infrastructure, but in the first instance this
    cooperation should lead to reliable metrics for
    MT evaluation. Later on it may be used for
    actually measuring MT systems performance.
  • (Of course not only MT!)
  • When?
  • As soon as possible.
  • Start with methodology, for each application
  • Move on to doing evaluation
  • Goal in 2011 we can reliably evaluate MT - and
    other applications!
Write a Comment
User Comments (0)
About PowerShow.com