MT with Limited Resources: Approaches and Results - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

MT with Limited Resources: Approaches and Results

Description:

... training corpus, ... and statistical dictionary extracted from the training text. ... upon manual evaluation of training data and personal grammatical ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 24
Provided by: cmu
Category:

less

Transcript and Presenter's Notes

Title: MT with Limited Resources: Approaches and Results


1
MT with Limited ResourcesApproaches and Results
  • Ralf Brown, Stephan Vogel, Alon Lavie, Lori
    Levin, Jaime Carbonell
  • Students Christian Monson, Erik Peterson,
    Kathrin Probst, Ashish Venugopal, Ying Zhang
  • Carnegie Mellon University

2
RADD CMU's "incubator" for MT technologies
  • Multiple Techniques
  • Statistical
  • Example-Based
  • Transfer-Rule
  • Common Pre-Processing
  • Segmentation
  • Conversion of numbers to Arabic numerals
  • Translation of month names to English month names
  • Multi-Engine combinations

3
Statistical MT
  • The major improvement for June evaluation was
    phrase-to-phrase alignments.
  • Performance (NIST score)ME-Compatible-SMT
    5.7354Full-SMT 6.1361

4
Example-Based MT
  • Given an indexed training corpus,
  • find phrases in the corpus which occur in the
    input to be translated
  • retrieve the sentence pairs containing matches,
    and
  • perform a word-level alignment to determine
    translations.
  • Our "standard" EBMT system is actually a
    multi-enginecombination of phrasal EBMT, LDC
    lexicon, and statistical dictionary extracted
    from the training text.

5
Example-Based MT (2)
  • Inexact matching A phrase can match even if one
    of the words inside the phrase differs, provided
    that the dictionary can provide a more-or-less
    unambiguous translation for the unmatched word.
  • "More-or-less Unambiguous" means that the
    translationwith the second-highest frequency has
    frequency less thanTHRESHOLDhighest-frequency
  • we experimentally determined the best threshold
    to be 0.55.

6
Transfer Rule MT
  • Manually-developed transfer rules for translation
    with our newly developed Transfer Engine.
  • (71 hours development time)
  • and a transfer lexicon automatically derived from
    theLDC 10k-word lexicon.
  • A language model selected from among ambiguous
    translations.
  • Performance (NIST score)XFERLM 4.8404

7
Multi-Engine MT
  • Hypothesis by combining multiple
    translationmethods, we can mitigate weaknesses
    and enhancestrengths of individual methods.
  • Each engine generates whatever partial
    translationsit can and assigns an approximate
    quality score.
  • The partial translations are then combined into
    alattice and a trigram model of the output
    language(plus other scoring heuristics) is used
    to selectthe best path through the lattice.

8
Multi-Engine MT Results
  • Most combinations outperform the individual
    engines.
  • We submitted two combinations and the engines
    they combinedfor official scores, with the
    result shown below.PhrEBMT 3.9668 PhrEBMT 3.96
    68SMT 5.7354 XFER
    4.8404Combo 5.9524 Combo 5.2170
  • Additionally, we see the effect of
    combininglexica with phrasal EBMTPhrEBMT
    3.9668PhrEBMTlex 5.2883

9
CMU Small Data Results
  • Official results were submitted with segmentor
    trained on full LDC word list (same as large
    data track)
  • We retrained our segmentor with only 10K small
    dict and the words from the 100K Chinese treebank
    and re-evaluated our latest systems.
  • Results with new (small) segmentation are
    reported in parentheses

10
CMU Small Data Results SMT
  • Results with full segmentor and with re-trained
    small segmentor for different versions of our
    SMT system

11
Learning Transfer-Rules for Languages with
Limited Resources
  • Rationale
  • Large bilingual corpora not available
  • Bilingual native informant(s) can translate and
    align a small pre-designed elicitation corpus,
    using elicitation tool
  • Elicitation corpus designed to be typologically
    comprehensive and compositional
  • Transfer-rule engine and new learning approach
    support acquisition of generalized transfer-rules
    from the data

12
AVENUE Transfer
13
Sample Transfer Rule
  • Rules contain necessary information for analysis,
    transfer and generation
  • Unification equations used to build source,
    target feature structures
  • Transfer Chinese questions formed by appending
    particle MA to English
  • S,2 Rule ID
  • SS NP VP MA -gt AUX NP VP Source
    Target production rules
  • (
  • (x1y2) Source NP
    aligns with target NP
  • (x2y3) Source VP
    aligns with target VP
  • ((x0 subj) x1) Build the source
    feature structure
  • ((x0 subj case) nom)
  • ((x0 act) quest)
  • (x0 x2)
  • ((y1 form) do) Set inserted constituent
    AUXs base form to do
  • ((y3 vform) c inf) Constrain verb to
    infinitive form
  • ((y1 agr) (y2 agr)) Enforce agreement
    between do and subject
  • )

14
Transfer Overview
  • The AVENUE translation engine developed
    internally and follows a three-step transfer
    approach
  • Analysis
  • Transfer
  • Generation
  • The engine can be run with manually developed
    transfer-rules as a stand-alone system or operate
    as part our larger rule-learning system.

15
RADD Transfer Development
  • Total Chinese-specific rule and lexicon
    development time 71 hours
  • Small and Large Tracks used same transfer rules
    but different sized lexicons (10K vs. 50K)
  • Rule development by a bilingual speaker with
    linguistic background, based upon manual
    evaluation of training data and personal
    grammatical knowledge.
  • Development concentrated on translating noun
    phrases and structures where Chinese and English
    word order differed

16
Analysis
  • Analysis uses a uses a unification-based chart
    parser to find the input sentences grammatical
    structure.
  • Different possible analyses and transfer paths
    are all efficiently packed together in a packed
    forest for later usage.

17
Transfer
  • Transfer rule manipulate the parse tree(s)
    created during analysis.
  • Constituents (such as noun, verb phrases) can be
    reordered, inserted, or deleted.
  • Words are translated using a transfer lexicon
  • For sentences without a complete parse, transfer
    occurs on the longest sub-parses found during
    analysis.

18
Generation
  • During generation, the engine checks that the
    target language tree from transfer satisfies
    target language constraints (e.g. subject-verb
    agreement in English)
  • Finally, the target sentence is read from the
    leaves of the target tree and returned.

19
Rule Learning - Overview
  • Goal Acquisition of Syntactic Transfer Rules
  • 1) Flat Seed Generation produce rules from
    word-aligned sentence pairs, abstracted only to
    POS level no syntactic structure
  • 2) Add compositional structure to Seed Rule by
    exploiting previously learned rules
  • 3) Seeded Version Space Learning group seed rules
    by constituent sequences and alignments, seed
    rules form s-boundary of VS generalize with
    validation

20
Flat Seed Generation
  • Create a seed rule that is specific to the
    sentence pair, but abstracted to the pos level.
    Use SL information (e.g. parses), and any TL
    information. E.g.
  • The highly qualified applicant visited the
    company.
  • Der äußerst qualifizierte Bewerber besuchte die
    Firma.
  • ((1,1),(2,2),(3,3),(4,4),(5,5),(6,6))

SS det adv adj n v det n? det adv adj n v
det n (alignments (x1y1) (x2y2) (x3y3)
(x4y4) (x5y5) (x6y6) (x7y7) constraints
((x1 def) ) ((x4 agr) 3-sing) ((x5 tense)
past) . ((y1 def) ) ((y3 case) nom)
((y4 agr) 3sg) )
21
Compositionality
  • If there is a previously learned rule that can
    account for part of the sentence, adjust seed
    rule to reflect this compositional element.
  • Adjust constituent sequences, alignments, and
    constraints add context constraints (from
    possible translations), remove unnecessary ones

SS det adv adj n v det n? det adv adj n v
det n (alignments (x1y1) (x2y2) (x3y3)
(x4y4) (x5y5) (x6y6) (x7y7) constraints
((x1 def) ) ((x4 agr) 3-sing) ((x5 tense)
past) . ((y1 def) ) ((y4 agr) 3sg)
)
NPNP det adv adj n det adv adj
n ((x1y1) ((y4 agr) (x4 agr) .)
SS NP v det n? NP v det n (alignments (x1
y1) (x2y2) (x3y3) (x4y4) (x5y5)
(x6y6) (x7y7) constraints ((x5 tense)
past) . ((y1 def) ) ((y1 case) nom)
((y1 agr) 3sg) )
22
Seeded Version Space Learning
  • NP v det n NP VP
  • Group seed rules into version spaces as above.
  • Make use of partial order of rules in version
    space. Partial order is defined
  • via the f-structures satisfying the constraints.
  • Generalize in the space by repeated merging of
    rules
  • Deletion of constraint
  • Moving value constraints to agreement
    constraints, e.g.
  • ((x1 num) pl), ((x3 num) pl) ?
  • ((x1 num) (x3 num)
  • 4. Check translation power of generalized rules
    against sentence pairs




23
Future Work
  • Baseline evaluation
  • Adjust generalization step size
  • Revisit generalization operators
  • Introduce specialization operators to retract
    from overgeneralizations (including seed rules)
  • Learn from an unstructured bilingual corpus
  • Evaluate merges to pick the optimal one at any
    step based on cross-validation, number of
    sentences it can translate
Write a Comment
User Comments (0)
About PowerShow.com