Knowledgebased Machine Translation KBMT - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Knowledgebased Machine Translation KBMT

Description:

Knowledge-based. Machine Translation (KBMT) 11-682/15-482. Introduction to IR, NLP, MT and Speech ... The Interlingua KBMT approach ... KBMT: KANT, KANTOO, CATALYST ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 45
Provided by: AlonL
Category:

less

Transcript and Presenter's Notes

Title: Knowledgebased Machine Translation KBMT


1
Knowledge-based Machine Translation (KBMT)
  • 11-682/15-482
  • Introduction to IR, NLP, MT and Speech
  • November 16, 2004

2
Approaches to MT Vaquois MT Triangle
Interlingua
Give-informationpersonal-data (namealon_lavie)
Generation
Analysis
Transfer
s vp accusative_pronoun chiamare proper_name
s np possessive_pronoun name vp be
proper_name
Direct
Mi chiamo Alon Lavie
My name is Alon Lavie
3
KBMT Analysis and Generation
  • Analysis
  • Morphological analysis (word-level) and POS
    tagging
  • Syntactic analysis and disambiguation (produce
    syntactic parse-tree)
  • Semantic analysis and disambiguation (produce
    logical form representation)
  • Map to language-independent Interlingua
  • Generation
  • Generate semantic representation in TL
  • Sentence Planning generate syntactic structure
    and lexical selections for concepts
  • Surface-form realization generate correct forms
    of words

4
Transfer Approaches
  • Syntactic Transfer
  • Analyze SL input sentence to its syntactic
    structure (parse tree)
  • Transfer SL parse-tree to TL parse-tree (various
    formalisms for specifying mappings)
  • Generate TL sentence from the TL parse-tree
  • Semantic Transfer
  • Analyze SL input to a language-specific semantic
    representation (i.e. logical form)
  • Transfer SL semantic representation to TL
    semantic representation
  • Generate syntactic structure and then surface
    sentence in the TL

5
Transfer Approaches
  • Advantages and Disadvantages
  • Syntactic Transfer
  • No need for semantic analysis and generation
  • Syntactic structures are general, not domain
    specific ? Less domain dependent, can
    handle open domains
  • Requires word translation lexicon
  • Semantic Transfer
  • Requires deeper analysis and generation, symbolic
    representation of concepts and predicates ?
    difficult to construct for open or unlimited
    domains
  • Can better handle non-compositional meaning
    structures ? can be more accurate
  • No word translation lexicon generate in TL from
    symbolic concepts

6
Interlingua KBMT
  • The obvious deep Artificial Intelligence
    approach
  • Analyze the source language into a detailed
    symbolic representation of its meaning
  • Generate this meaning in the target language
  • Interlingua one single meaning representation
    for all languages
  • Nice in theory, but extremely difficult in
    practice

7
What is an interlingua?
  • Representation of meaning or speaker intention.
  • Sentences that are equivalent for the translation
    task have the same interlingua representation.
  • The room costs 100 Euros per night.
  • The room is 100 Euros per night.
  • The price of the room is 100 Euros per night.

8
The Interlingua KBMT approach
  • With interlingua, need only N parsers/ generators
    instead of N2 transfer systems

L2
L2
L3
L1
L1
L3
interlingua
L6
L4
L6
L4
L5
L5
9
Advantages of Interlingua
  • Add a new language easily
  • get all-ways translation to all previous
    languages by adding one grammar for analysis and
    one grammar for generation
  • Mono-lingual development teams.
  • Paraphrase
  • Generate a new source language sentence from the
    interlingua so that the user can confirm the
    meaning

10
Disadvantages of Interlingua
  • Meaning is arbitrarily deep.
  • What level of detail do you stop at?
  • If it is too simple, meaning will be lost in
    translation.
  • If it is too complex, analysis and generation
    will be too difficult.
  • Should be applicable to all languages.
  • Human development time.

11
KBMT KANT, KANTOO, CATALYST
  • Deep knowledge-based framework, with symbolic
    interlingua as intermediate representation
  • Syntactic and semantic analysis into a
    unambiguous detailed symbolic representation of
    meaning using unification grammars and
    transformation mappers
  • Generation into the target language using
    unification grammars and transformation mappers
  • First large-scale multi-lingual interlingua-based
    MT system deployed commercially
  • CATALYST at Caterpillar high quality translation
    of documentation manuals for heavy equipment
  • English (source) to French, Spanish, German
    (target)
  • Limited domains and controlled English input
  • Minor amounts of post-editing

12
Interlingua-based Speech-to-Speech MT
  • Evolution from JANUS/C-STAR systems to NESPOLE!,
    LingWear, BABYLON
  • Early 1990s first prototype system that fully
    performed sp-to-sp (very limited domain)
  • Interlingua-based, but with shallow task-oriented
    representations
  • we have single and double rooms available
  • give-informationavailability
  • (room-typesingle, double)
  • Semantic Grammars for analysis and generation
  • Multiple languages English, German, French,
    Italian, Japanese, Korean, and others
  • Most active work on portable speech translation
    on small devices Arabic/English and Thai/English

13
Design Principles of the NESPOLE! Interchange
Format
  • Instructions
  • Delete sample document icon and replace with
    working document icons as follows
  • Create document in Word.
  • Return to PowerPoint.
  • From Insert Menu, select Object
  • Click Create from File
  • Locate File name in File box
  • Make sure Display as Icon is checked.
  • Click OK
  • Select icon
  • From Slide Show Menu, Select Action Settings.
  • Click Object Action and select Edit
  • Click OK
  • Suitable for task oriented dialogue
  • Based on speakers intent, not literal meaning
  • Can you pass the salt is represented only as a
    request for the hearer to perform an action, not
    as a question about the hearers ability.
  • Abstract away from the peculiarities of any
    particular language
  • resolve translation mismatches.

14
Speech ActsSpeaker intention vs literal meaning
  • Can you pass the salt?
  • Literal meaning The speaker asks for information
    about the hearers ability.
  • Speaker intention The speaker requests the
    hearer to perform an action.

15
Domain Actions Extended, Domain-Specific Speech
Acts
  • give-informationexistencebody-state
  • It hurts.
  • give-informationonsetbody-object
  • The rash started three days ago.
  • request-informationavailabilityroom
  • Are there any rooms available?
  • request-informationpersonal-data
  • What is your name?

16
Formulaic Utterances
  • Good night.
  • tisbaH cala xEr
  • waking up on good
  • Romanization of Arabic from CallHome Egypt

17
Same intention, different syntax
  • rigly bitiwgacny
  • my leg hurts
  • candy wagac fE rigly
  • I have pain in my leg
  • rigly bitiClimny
  • my leg hurts
  • fE wagac fE rigly
  • there is pain in my leg
  • rigly bitinqaH calya
  • my leg bothers on me
  • Romanization of Arabic from CallHome Egypt.

18
Language Neutrality
  • Comes from representing speaker intention rather
    than literal meaning for formulaic and
    task-oriented sentences.
  • How about suggestion
  • Why dont you suggestion
  • Could you tell me request info.
  • I was wondering request info.

19
AVENUE Transfer-based MT
  • A new approach for automatically acquiring
    syntactic MT transfer rules from small amounts of
    elicited translated and word-aligned data
  • Specifically designed to bootstrap MT for
    languages for which only limited amounts of
    electronic resources are available (particularly
    indigenous minority languages)
  • Use machine learning techniques to generalize
    transfer rules from specific translated examples
  • Combine with decoding techniques from SMT for
    producing the best translation of new input from
    a lattice of translation segments
  • Languages Hebrew, Hindi, Mapudungun, Quechua
  • Most active work on designing a typologically
    comprehensive elicitation corpus, advanced
    techniques for automatic rule learning, improved
    decoding, and rule refinement via user interaction

20
Transfer Rule Formalism
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
  • Type information
  • Part-of-speech/constituent information
  • Alignments
  • x-side constraints
  • y-side constraints
  • xy-constraints,
  • e.g. ((Y1 AGR) (X1 AGR))

21
Transfer Rule Formalism (II)
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )
  • Value constraints
  • Agreement constraints

22
The Transfer Engine
23
Rule Learning - Overview
  • Goal Acquire Syntactic Transfer Rules
  • Use available knowledge from the source side
    (grammatical structure)
  • Three steps
  • Flat Seed Generation first guesses at transfer
    rules flat syntactic structure
  • Compositionality use previously learned rules to
    add hierarchical structure
  • Seeded Version Space Learning refine rules by
    learning appropriate feature constraints

24
Transfer with Strong Decoding
25
Learning Transfer-Rules for Languages with
Limited Resources
  • Rationale
  • Large bilingual corpora not available
  • Bilingual native informant(s) can translate and
    align a small pre-designed elicitation corpus,
    using elicitation tool
  • Elicitation corpus designed to be typologically
    comprehensive and compositional
  • Transfer-rule engine and new learning approach
    support acquisition of generalized transfer-rules
    from the data

26
Why Machine Translation for Minority and
Indigenous Languages?
  • Commercial MT economically feasible for only a
    handful of major languages with large resources
    (corpora, human developers)
  • Is there hope for MT for languages with limited
    resources?
  • Benefits include
  • Better government access to indigenous
    communities (Epidemics, crop failures, etc.)
  • Better indigenous communities participation in
    information-rich activities (health care,
    education, government) without giving up their
    languages.
  • Language preservation
  • Civilian and military applications (disaster
    relief)

27
English-Hindi Example
28
English-Chinese Example
29
Spanish-Mapudungun Example
30
English-Arabic Example
31
The Elicitation Corpus
  • Translated, aligned by bilingual informant
  • Corpus consists of linguistically diverse
    constructions
  • Based on elicitation and documentation work of
    field linguists (e.g. Comrie 1977, Bouquiaux
    1992)
  • Organized compositionally elicit simple
    structures first, then use them as building
    blocks
  • Goal minimize size, maximize linguistic coverage

32
Flat Seed Rule Generation
33
Flat Seed Generation
  • Create a transfer rule that is specific to the
    sentence pair, but abstracted to the POS level.
    No syntactic structure.

34
Compositionality
35
Compositionality - Overview
  • Traverse the c-structure of the English sentence,
    add compositional structure for translatable
    chunks
  • Adjust constituent sequences, alignments
  • Remove unnecessary constraints, i.e. those that
    are contained in the lower-level rule

36
Seeded Version Space Learning
37
Seeded Version Space Learning Overview
  • Goal add appropriate feature constraints to the
    acquired rules
  • Methodology
  • Preserve general structural transfer
  • Learn specific feature constraints from example
    set
  • Seed rules are grouped into clusters of similar
    transfer structure (type, constituent sequences,
    alignments)
  • Each cluster forms a version space a partially
    ordered hypothesis space with a specific and a
    general boundary
  • The seed rules in a group form the specific
    boundary of a version space
  • The general boundary is the (implicit) transfer
    rule with the same type, constituent sequences,
    and alignments, but no feature constraints

38
Seeded Version Space Learning Generalization
  • The partial order of the version space
  • Definition A transfer rule tr1 is strictly more
    general than another transfer rule tr2 if all
    f-structures that are satisfied by tr2 are also
    satisfied by tr1.
  • Generalize rules by merging them
  • Deletion of constraint
  • Raising two value constraints to an agreement
    constraint, e.g.
  • ((x1 num) pl), ((x3 num) pl) ?
  • ((x1 num) (x3 num))

39
Seeded Version Space Learning
  • NP v det n NP VP
  • Group seed rules into version spaces as above.
  • Make use of partial order of rules in version
    space. Partial order is defined
  • via the f-structures satisfying the constraints.
  • Generalize in the space by repeated merging of
    rules
  • Deletion of constraint
  • Moving value constraints to agreement
    constraints, e.g.
  • ((x1 num) pl), ((x3 num) pl) ?
  • ((x1 num) (x3 num)
  • 4. Check translation power of generalized rules
    against sentence pairs




40
Seeded Version Space LearningThe Search
  • The Seeded Version Space algorithm itself is the
    repeated generalization of rules by merging
  • A merge is successful if the set of sentences
    that can correctly be translated with the merged
    rule is a superset of the union of sets that can
    be translated with the unmerged rules, i.e. check
    power of rule
  • Merge until no more successful merges

41
Seeded VSL Some Open Issues
  • Three types of constraints
  • X-side constrain applicability of rule
  • Y-side assist in generation
  • X-Y transfer features from SL to TL
  • Which of the three types improves translation
    performance?
  • Use rules without features to populate lattice,
    decoder will select the best translation
  • Learn only X-Y constraints, based on list of
    universal projecting features
  • Other notions of version-spaces of feature
    constraints
  • Current feature learning is specific to rules
    that have identical transfer components
  • Important issue during transfer is to
    disambiguate among rules that have same SL side
    but different TL side can we learn effective
    constraints for this?

42
Examples of Learned Rules (Hindi-to-English)
43
Manual Transfer Rules Hindi Example
PASSIVE OF SIMPLE PAST (NO AUX) WITH LIGHT
VERB passive of 43 (7b) VP,28 VPVP V V
V -gt Aux V ( (X1Y2) ((x1 form) root)
((x2 type) c light) ((x2 form) part) ((x2
aspect) perf) ((x3 lexwx) 'jAnA') ((x3
form) part) ((x3 aspect) perf) (x0 x1)
((y1 lex) be) ((y1 tense) past) ((y1 agr
num) (x3 agr num)) ((y1 agr pers) (x3 agr
pers)) ((y2 form) part) )
44
Manual Transfer Rules Example
NP PP NP1 NP P Adj N
N1 ke eka aXyAya N
jIvana
NP NP1 PP Adj N
P NP one chapter of N1
N life
NP1 ke NP2 -gt NP2 of NP1 Ex jIvana ke
eka aXyAya life of (one) chapter
gt a chapter of life NP,12 NPNP PP
NP1 -gt NP1 PP ( (X1Y2) (X2Y1) ((x2
lexwx) 'kA') ) NP,13 NPNP NP1 -gt
NP1 ( (X1Y1) ) PP,12 PPPP NP Postp
-gt Prep NP ( (X1Y2) (X2Y1) )
Write a Comment
User Comments (0)
About PowerShow.com