Human Judgements in Parallel Treebank Alignment - PowerPoint PPT Presentation

About This Presentation
Title:

Human Judgements in Parallel Treebank Alignment

Description:

Title: PowerPoint Presentation Last modified by: ys Created Date: 1/1/1601 12:00:00 AM Document presentation format: Bildspel p sk rmen Other titles – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 36
Provided by: acuk
Category:

less

Transcript and Presenter's Notes

Title: Human Judgements in Parallel Treebank Alignment


1
Human Judgements in Parallel Treebank Alignment
  • Martin Volk, Torsten Marek, Yvonne Samuelsson
  • University of Zurich and Stockholm University
  • volk_at_cl.uzh.ch

2
English Syntax Tree
3
(No Transcript)
4
(No Transcript)
5
DE EN Alignment
6
SMULTRON
  • Stockholm MULtilingual TReebank
  • 1000 sentences in 3 languages (DE-EN-SV)
  • 500 from Jostein Gaarders Sophies World ( 7
    500 tokens, 14 tokens/sentence) and
  • 500 from Economy texts ( 11 000 tokens, 22
    tokens/sentence)
  • ABB Quarterly report
  • Rainforest Alliance Banana Certification Program
  • SEB Annual report
  • Released January 2008 www.ling.su.se/dali/researc
    h/smultron/index.htm

7
German Annotation
8
German sentence flat annotation
9
German sentence deepened
10
English Annotation
11
English Syntax Tree
12
English annotation
  • Follows the Penn Treebank guidelines
  • Slower annotation because of
  • insertion of traces
  • secondary edges
  • deeper trees

13
(No Transcript)
14
Tree Alignment
15
  • Sentence alignment
  • Word alignment
  • input for Statistical MT
  • Phrase alignment
  • linguistically motivated phrases
  • input for Example-based MT

16
Alignment Example
17
Tools for Parallel Treebanks
  • creating and editing trees
  • from mono-lingual treebanks
  • PoS-taggers, chunkers, editor, tree-enricher
  • aligning phrases
  • use of word alignment tools
  • tree alignment editor ? Stockholm TreeAligner
  • searching across languages
  • TIGER-Search for parallel treebanks ? Stockholm
    TreeAligner

18
Guidelines for Alignment
  1. Align words and phrases that represent the same
    meaning and could serve as translation units in
    an MT system.
  2. Align as many words and phrases as possible.
  3. Distinguish between exact and approximate
    alignments.
  4. 1n word / phrase alignments are allowed, but not
    mn word / phrase alignments.
  5. mn sentence alignments are allowed.

19
Examples
  • Do not align
  • die Verwunderung über das Leben
  • their astonishment at the world
  • Do align
  • was für eine seltsame Welt
  • what an extraordinary world

20
Specific rules
  • a pronoun in one language shall never be aligned
    with a full noun in the other
  • names are aligned regardless of spelling, unless
    the name is changed (fiction)
  • ignore number/case but not voice

21
Exact vs approximate alignment
  • best vs. second-best translation
  • an acronym in one language shall be aligned as
    approximate (fuzzy) with a spelled-out term in
    the other
  • PT Power Technologies
  • difficult distinctions
  • einer der ersten Tage im Mai early May

22
Related Research
  • Blinker project (Melamed)
  • Prague Czech-English Treebank
  • Example-based MT in Dublin
  • Linköping English-Swedish Treebank

23
Experiment
  • 12 students to align 20 tree pairs DE-EN
  • 10 tree pairs from Sophies world
  • 10 tree pairs from Economy text
  • advanced CL students
  • received
  • short introduction
  • the written guidelines

24
Gold Standard Alignment (DE-EN)
word - word word - word phrase - phrase phrase - phrase
exact approx. exact approx.
10 sent. Sophie 75 3 46 12
10 sent. Sophie 78 78 58 58
10 sent. Econ 159 19 62 9
10 sent. Econ 178 178 71 71
25
Experiment Results
  • The students created
  • a huge variety in number of alignments
  • Sophie part from 47 to 125 (ø 94.3)
  • Econ part from 62 to 259 (ø 186.9)
  • ? the 3 students with the lowest numbers were
    non-native speakers of German
  • ? 1 student had misunderstood the task

26
Experiment Results
  • The remaining 8 students had a high overlap with
    the gold standard (Recall)
  • Sophie part from 48 to 81 (ø 68.7)
  • Econ part from 66 to 89 (ø 75.5)
  • Precision
  • Sophie part from 81 to 97 (ø 89.1)
  • Econ part from 78 to 94 (ø 88.2)

27
Discrepancies
  • students sometimes aligned a word (or some words)
    with a node.
  • e.g. the word natürlich to the phrase of course
  • students sometimes aligned a German verb group
    with a single verb form in English
  • e.g. ist zurückzuführen vs. reflecting

28
Discrepancies
  • based on different grammatical forms
  • a definite single NP in German with an indefinite
    plural NP in English
  • der Umsatz vs. revenues
  • a German genitive NP with a PP in English
  • der beiden Divisionen vs. of the two divisions

29
Missed by all students
  • alignment of German word to empty token in
    English
  • wenn sie die Hand ausstreckte vs.
  • herself shaking hands

30
(No Transcript)
31
Conclusions
  1. Our alignment guidelines are sufficient for a
    core of clear alignment decisions.
  2. Needed
  3. Better alignment rules with concrete examples.
  4. Better support tools (consistency checking).
  5. The distinction between exact alignment and
    approximate alignment is very tricky.

32
Thank You for Your Attention!
  • Questions???

33
Applications of Parallel Treebanks
  • For the Translator
  • corpus for translation studies
  • search tools needed
  • For the Computational Linguist
  • input for Example-based Machine Translation
  • evaluation corpus for word, phrase or clause
    alignment
  • training corpus for transfer rules

34
Alignment Example
35
Parallel Treebanking
SV sentence
DE sentence
ANNOTATE - PoS tagger (STTS) - Chunker (TIGER)
PoS tagger (SUC) STTS conversion ANNOTATE -
Chunker (SWE-TIGER)
flat DE tree
flat SV tree
Deepening
Deepening Back conv.
DE tree
SV tree
phrase alignment
Write a Comment
User Comments (0)
About PowerShow.com