CSA4050: Advanced Techniques in NLP - PowerPoint PPT Presentation

About This Presentation
Title:

CSA4050: Advanced Techniques in NLP

Description:

Machine Translation II. Direct MT. Transfer MT. Interlingual MT. Jan 2005. CSA4050 Machine Translation II. 2. History Pre ALPAC. 1952 First MT Conference (MIT) ... – PowerPoint PPT presentation

Number of Views:82
Avg rating:3.0/5.0
Slides: 23
Provided by: MikeR2
Category:

less

Transcript and Presenter's Notes

Title: CSA4050: Advanced Techniques in NLP


1
CSA4050 Advanced Techniques in NLP
  • Machine Translation II
  • Direct MT
  • Transfer MT
  • Interlingual MT

2
History Pre ALPAC
  • 1952 First MT Conference (MIT)
  • 1954 Georgetown System (word for word based)
    successfully translated 49 Russian sentences
  • 1954 1965 Much investment into brute force
    empirical approach crude word-for-word
    techniques with limited reshuffling of output
  • ALPAC (Automatic Language Processing Advisory
    Committee) Report concludes that research funds
    should be directed into more fundamental
    linguistic research

3
History Post ALPAC
  • 1965-1970
  • Operational Systems approach SYSTRAN (eventually
    became the basis for babelfish)
  • University centres established in Grenoble
    (CETA), Montreal and Saarbruecken
  • Systems developed on the basis of linguistic and
    non-linguistic representations 1970-1990
  • Ariane (Dependency Grammar)
  • TAUM METEO (Metamorphoses Grammars)
  • EUROTRA (multilingual intermediate
    representations)
  • ROSETTA (Landsbergen) interlingua based
  • BSO (Witkam) Esperanto
  • 1990- Data Driven Translation Systems

4
MT Methods
  • MT
  • Direct MT Rule-Based MT Data-Driven MT
  • Transfer Interlingua EBMT SMT

5
Basic ArchitectureDirect Translation
  • Basic idea
  • language pair specific
  • no intermediate representation- pipeline
    architecture

6
Staged Direct MT (En/Jp)
7
Direct TranslationAdvantages
  • Exploits fact that certain potential ambiguities
    can be left unresolvedwall -wand/mauer
    parete/muro
  • Designers can concentrate more on special cases
    where languages differ.
  • Minimal resources necessary a cheap bilingual
    dictionary rudimentary knowledge of target
    language suffices.
  • Translation memories are a (successful and much
    used) development of this approach.

8
Direct TranslationDisadvantages
  • Computationally naive
  • Basic model word-for-word translation local
    reordering (e.g. to handle adjnoun order)
  • Linguistically naive
  • no analysis of internal structure of input, esp.
    wrt the grammatical relationships between the
    main parts of sentences.
  • no generalisation everything on a case-by-case
    basis.
  • Generally, poor translation
  • except in simple cases where there is lots of
    isomorphism between sentences.

9
Transfer Model of MT
  • To overcome language differences, first build a
    more abstract representation of the input.
  • The translation process as such (called transfer)
    operates upon at the level of the representation.
  • This architecture assumes
  • analysis via some kind of parsing process.
  • synthesis via some kind of generation.

10
Basic ArchitectureTransfer Model
source representation
target representation
transfer
analysis
generation
target text
source text
11
Transfer Rules
  • In General there are two kinds of transfer rule
  • Structural Transfer Rules these deal with
    differences in the syntactic structures.
  • Lexical Transfer Rules these deal with cross
    lingual mappings at the level of words and fixed
    phrases.

12
Structural Transfer Rule
NPs(Adjs,Nouns) ? NPt(Nount,Adjt)
13
  • delete initial there
  • make gardening modify NP

existential-there-sentence there was an old man
gardening
  • reverse order of NP/modifier

intermediate-representation-2 gardening an
old man was
  • lexical transfer

japanese-s niwa no teire o suru ojiisan ita
14
More Structural Transfer Rules
15
Lexical Transfer
  • Easy cases are based on bilingual dictionary
    lookup.
  • Resolution of ambiguitiesmay require further
    knowledge
  • know ? savoirknow ? connaĆ®tre
  • Not necessarily word for wordschimmel ? white
    horse

16
Transfer Model
  • Degree of generalisation depends upon depth of
    representation
  • Deeper the representation, harder it is to do
    analysis or generation.
  • Shallower the representation, the larger the
    transfer component.
  • Where does ambiguity get resolved?
  • Number of bilingual components can get large.

17
Interlingual TranslationThe Vauquois Triangle
interlingua
increasing depth
analysis
generation
target text
source text
18
Interlingual Translation
  • Transfer model requires different transfer rules
    for each language pair.
  • Much work for multilingual system.
  • Interlingual approach eliminates transfer
    altogether by creating a language independent
    canonical form known as an interlingua.
  • Various logic-based schemes have been used to
    represent such forms.
  • Other approaches include attribute/value matrices
    called feature structures.

19
Possible Feature Structure for There was an old
man gardening
event gardening type man agent number sg
definiteness indef aspect progressive tense
past
20
Ontological Issues
  • The designer of an interlingua has a very
    difficult task.
  • What is the appropriate inventory of attributes
    and values?
  • Clearly, the choice has radical effects on the
    ability of the system to translate faithfully.
  • For instance, to handle the muro/parete
    distinction, the internal/external characteristic
    of the wall would have to be encoded.

21
Feature Structure for muro
word muro syntax POS class noun type
count field buildings semantics type struct
ural position external
22
Interlingual Approach Pros and Cons
  • Pros
  • Portable (avoids N2 problem)
  • Because representation is normalised structural
    transformations are simpler to state.
  • Explanatory Adequacy
  • Cons
  • Difficult to deal with terms on primitive level
  • universals?
  • Must decompose and reassemble concepts
  • Useful information lost (paraphrase)
  • In practice, works best in small domains.
Write a Comment
User Comments (0)
About PowerShow.com