The Link between Controlled Language and PostEditing: - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

The Link between Controlled Language and PostEditing:

Description:

Use of slang... Use of NTI list (Bernth/Gdaniec 2001) Use of term 'minimal NTI' Research Design ... Dictionary searches were uncommon during this study. ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 37
Provided by: dcu
Category:

less

Transcript and Presenter's Notes

Title: The Link between Controlled Language and PostEditing:


1
The Link between Controlled Language and
Post-Editing
  • An Empirical Investigation of Technical, Temporal
    and Cognitive Effort
  • Sharon OBrien, CTTS/SALIS

2
Overview
  • Research Parameters
  • Temporal Effort
  • Technical Effort
  • Cognitive Effort
  • Conclusions

3
Definition
  • an explicitly defined restriction of a natural
    language that specifies constraints on lexicon,
    grammar, and style.
  • (Huijsen, 1998 2)

4
Motivation In a Nutshell
  • Can the introduction of CL rules really improve
    MT output such that post-editing effort is
    reduced?

5
Machine Translatability
  • One of the main goals of CL
  • The notion of translatability is based on
    so-called "translatability indicators" where the
    occurrence of such an indicator in the text is
    considered to have a negative effect on the
    quality of machine translation. The fewer
    translatability indicators, the better suited the
    text is to translation using MT.
  • (Underwood and Jongejan 2001 363)

6
Machine Translatability
  • Negative Translatability Indicators
  • NTIs for short
  • Examples (for English as SL)
  • Long noun phrases
  • Passive voice
  • Ungrammatical constructs
  • Use of slang
  • Use of NTI list (Bernth/Gdaniec 2001)
  • Use of term minimal NTI

7
Research Design
  • SL English TL German
  • Text Type User Manual (1 777 words)
  • Users 12 Professional Translators
  • Tools IBM Websphere, Translog, IBMs
    EasyEnglishAnalyzer, Sun Microsystems Sunproof
  • Place of Data Capture IBM Stuttgart

8
Methodology
  • Edit SL text to create two sentence types
  • S(nti) sentences with known negative
    translatability indicators
  • S(min-nti) sentences where all listed NTIs had
    been removed
  • 9 subjects post-editing (P1-P9)
  • 3 subjects translating (T1-T3)
  • First pass exercise, no QA

9
Temporal Effort
  • Post-Editing vs. Translation
  • median words per minute

10
Temporal Effort (2)
  • Post-Editing vs. Translation
  • median processing speed
  • Processing speed is the total number of source
    words in each segment divided by the total
    processing time for that segment
  • i.e. words processed per second

11
Median Processing Speed
  • S(ntis) vs. S(min-ntis)

12
Temporal Effort Conclusions
  • The post-editing task was completed faster than
    the translation task.
  • First-pass exercise/No QA
  • The median processing speeds for S(min-nti)
    segments were significantly higher than S(nti)
    segments
  • So, from a temporal point of view, it seems that
    the introduction of CL benefits turnaround times

13
Technical Effort
  • Measured using Translog
  • Keyboarding
  • Deletions, insertions, cuts, pastes
  • Dictionary Look-Up Activity

14
Translog
15
Sample Linear Repetition File
16
Keyboarding Median Measurements
17
Keyboarding Median Measurements
  • Small difference between the two segment types,
    but statistically significant for
    insertions/deletions
  • Cutting and pasting very limited even though
    post-editors recycled whole chunks of text

18
Use of the Translog Dictionary
  • Training and practice prior to task
  • All users reported being comfortable with the
    feature

19
Data on Dictionary Usage
20
Possible Explanations?
  • Subjects not as familiar with feature as they
    reported
  • Subjects felt it was unnecessary to use
    dictionary
  • Subjects used to having terms suggested on-screen
    with TM/Terminology tool
  • Subjects lost faith in the feature when they
    encountered problems

21
Conclusions on Technical Effort
  • S(min-nti) segments require significantly fewer
    deletions and insertions than S(nti) segments.
  • Cutting and pasting was a very rare activity for
    both segment types.
  • Dictionary searches were uncommon during this
    study. When they were carried out, the search
    facility was frequently used incorrectly.

22
Technical/Temporal Combined
  • Results on technical post-editing effort add to
    the evidence presented above on temporal
    post-editing effort and further supports the
    claim that the elimination of NTIs from a segment
    can reduce post-editing effort.

23
Cognitive Effort
  • Potential Methodologies
  • TAP (rejected)
  • Pause Analysis
  • Choice Network Analysis
  • Eye tracking (unavailable at the time)

24
Pause Behaviour
  • No discernible correlations between pause
    behaviour and post-editing activity
  • Pause analysis rejected

25
Cognitive Effort
  • Choice Network Analysis

26
Choice Network Analysis
  • Choice Network Analysis compares the renditions
    of a single string of translation by multiple
    translators in order to propose a network of
    choices that theoretically represents the
    cognitive model available to any translator for
    translating that string. The technique is
    favoured over the think-aloud method, which is
    acknowledged as not being able to access
    automaticized processes.
  • (Campbell, 2000 215)

27
Example Sentence with NTIs
  • ST
  • Save the document(s).
  • Raw MT output
  • Sichern Sie das Dokument(s).
  • NTIs for this sentence
  • Short segment
  • Use of (s) for plural

28
(No Transcript)
29
Example Sentence with minimal NTIs
  • ST
  • The editor contains a menu and a toolbar.
  • Raw MT output
  • Der Editor enthält ein Menü und eine
    Symbolleiste.

30
(No Transcript)
31
NTIs and Cognitive Effort
  • Using CNA as a guide, NTIs categorised into
  • High impact on post-editing effort
  • 50 or more of the occurrences of the NTI
    resulted in post-editing by two or more
    post-editors
  • Moderate impact on post-editing effort
  • Between 31 and 49 of occurrences
  • Low impact on post-editing effort
  • 30 or fewer occurrences

32
Correlating Measurements
  • By combining data on temporal, technical and
    cognitive effort High Impact NTIs
  • Use of the gerund
  • Proper nouns
  • Problematic punctuation
  • Ungrammatical constructs
  • Use of (s) for plural
  • Non-finite verbs
  • Incomplete syntactic unit
  • Long NP
  • Short segment

33
Correlating Measurements
  • Moderate impact NTIs
  • Multiple coordinators
  • Passive voice
  • Personal pronouns
  • Use of a slash as a separator
  • Ambiguous scope in coordination
  • Parentheses

34
Correlating Measurements
  • Low impact NTIs
  • Abbreviations
  • Demonstrative pronouns
  • Missing in order to
  • Contractions

35
Conclusion
  • Within the limited scope of this research, we now
    have empirical evidence to support the assertion
    that controlling the input to MT leads to lower
    post-editing effort.
  • The elimination of some NTIs can have a higher
    impact than other NTIs
  • Is it worth having a relatively high number of CL
    rules?
  • Even if we remove known NTIs, MT engines are
    still likely to produce some errors and
    post-editors are still likely to post-edit.

36
Questions?
Write a Comment
User Comments (0)
About PowerShow.com