Automatic Evaluation - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Evaluation

Description:

Evaluation on large test sets reveals minor improvements ... an email from Mr. Bin Laden and other rich businessman from Saudi Arabia. ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 13
Provided by: peopleC6
Category:

less

Transcript and Presenter's Notes

Title: Automatic Evaluation


1
Automatic Evaluation
Philipp Koehn
Computer Science and Artificial Intelligence
Lab Massachusetts Institute of Technology
2
A utomatic Evaluation
  • Why automatic evaluation metrics?
  • Manual evaluation is too slow
  • Evaluation on large test sets reveals minor
    improvements
  • Automatic tuning to improve machine translation
    performance
  • History
  • Word Error Rate
  • BLEU since 2002
  • BLEU in short Overlap with reference translations

3
BLEU in Action
the gunman was shot to death by the police .
(Reference Translation) the gunman was police
kill . 1wounded police jaya of 2the
gunman was shot dead by the police . 3the
gunman arrested by police kill . 4the gunmen
were killed . 5the gunman was shot to death
by the police . 6 gunmen were killed by police
?SUBgt0 ?SUBgt0 7 al by the police . 8the
ringer is killed by the police . 9police
killed the gunman . 10 What is the best
translation?
4
BLEU in Action
the gunman was shot to death by the police .
(Reference Translation) the gunman was police
kill . 1wounded police jaya of 2the
gunman was shot dead by the police . 3the
gunman arrested by police kill . 4the gunmen
were killed . 5the gunman was shot to death
by the police . 6 gunmen were killed by police
?SUBgt0 ?SUBgt0 7 al by the police . 8the
ringer is killed by the police . 9police
killed the gunman . 10
green 4-gram match (good!) cyan 3-gram
match blue 2-gram match purple 1-gram
match red word not matched (bad!)
5
(No Transcript)
6
DARPA MT Evaluation Corpus11 Human Translations
of 100 Chinese News Article
At least 12 people were killed in the battle
last week. Last week 's fight took at least 12
lives. The fighting last week killed at least
12. The battle of last week killed at least 12
persons. At least 12 people lost their lives in
last week 's fighting. At least 12 persons died
in the fighting last week. At least 12 died in
the battle last week. At least 12 people were
killed in the fighting last week. During last
week 's fighting , at least 12 people died. Last
week at least twelve people died in the fighting.
Last week 's fighting took the lives of twelve
people.
7
BLEU in Theory
  • How many n-grams in the output
    match n-grams in the reference ?
  • Usually 1-gram to 4-grams
  • Length penalty to assure that output is of
    similar length
  • BLEU BP exp(w1 log p1 ... w4 log p4)
  • pn correct n-grams / count n-grams in output
  • BP min(1, exp(length_output/length_reference) )

8
BLEU Tends to Predict Human Judgments
(variant of BLEU)
slide from G. Doddington (NIST)
9
Developing with BLEU
  • Track improvements quit dead ends early

10
Optimize Systems for BLEU
Learning algorithm for directly reducing
translation error ? big improvements in quality.
11
Criticisms of BLEU
  • Not sensitive to global syntactic structure
  • Some words are more important than others (not
    vs. the)
  • Score by itself is not very meaningful (is
    0.34 good?)
  • ... but does this matter?
  • ... can it be fixed?

12
Is BLEU perfect?
  • A very useful tool at this point
  • Some caveats
  • Only makes sense for large test sets (1000s
    sentences)
  • BLEU does not work for single sentences
  • Problems with BLEU have to be demonstrated by
    lack of correlation with human jugdements Nobod
    y cares about anecdotal criticism
  • Can BLEU be improved? There is a lot of
    work in MT Evaluation...
Write a Comment
User Comments (0)
About PowerShow.com