Dependency Trees and Machine Translation - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Dependency Trees and Machine Translation

Description:

The direction of arrows can be head-child or child-head (has to be mentioned) ... Like STAG, STIG for phrase structures. Basic units are elementary trees ... – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 54
Provided by: carnegieme
Category:

less

Transcript and Presenter's Notes

Title: Dependency Trees and Machine Translation


1
Dependency Trees and Machine Translation
  • Vamshi Ambati
  • Vamshi_at_cs.cmu.edu
  • Spring 2008 Adv MT Seminar
  • 02 April 2008

2
Today
  • Introduction
  • Dependency formalism
  • Syntax in Machine Translation
  • Dependency Tree based Machine Translation
  • By projection
  • By synchronous modeling
  • Conclusion and Future

3
Today
  • Introduction
  • Dependency formalism
  • Syntax in Machine Translation
  • Dependency Tree based Machine Translation
  • By projection
  • By synchronous modeling
  • Conclusion and Future

4
Dependency Trees
Phrase Structure Trees
John gave Mary an apple
5
Dependency Trees
Phrase Structure Trees Labels
S
VP
NP
JohnN gaveV MaryN anDT appleN
6
Dependency Trees
Head Percolation - Usually done
deterministically - Assuming one head per phrase
gave
gave
apple
John gave Mary an apple
7
Dependency Trees
gave
apple
John Mary an
8
Dependency Trees
John
Mary
gave
an
apple
9
Dependency Trees Basics
(optional)
SUBJ
John gave
  • Child
  • Dependent
  • Modifier
  • Modifier
  • Parent
  • Governor
  • Head
  • Modified
  • The direction of arrows can be head-child or
    child-head
  • (has to be mentioned)

10
Dependency Trees Basics
  • Properties
  • Every word has a single head/parent
  • Except for the root
  • Completely connected tree
  • Acyclic
  • If wi?wj then never wj?wi
  • Variants
  • Projective Non-crossing between dependencies
  • If wi -gtwj , then for all k between i and j,
    either wk -gtwi or wk -gtwj holds
  • Non-Projective Allow crossings between
    depdenencies

11
Projective dependency tree
ounces
Projectiveness all the words between here
finally depend on either on
was or .
Example credit Yuji Matsumoto, NAIST, Japan
12
Non-projective dependency tree
Direction of edges from a parent to the children
Note Phrases thus extracted which are united by
dependencies could be discontinuous
Example from R. McDonald and F. Pereira EACL,
2006.
13
Dependency Grammar (DG) in the Grammar Formalism
Timeline
  • Panini (2600 years ago, India) recognised,
    distinguished and classified semantic, syntactic
    and morphological dependencies (Bharati, Natural
    Language Processing)
  • The Arabic grammarians (1200 years ago, Iraq)
    recognised government and syntactic dependency
    structure, (The Foundations of Grammar - Owens)
  • The Latin grammarians (800 years ago) recognised
    'determination' and dependency structures. -
    Percival, "Reflections on the History of
    Dependency Notions
  • Lucien Tesniere (1930s, France) developed a
    relatively formal and sophisticated theory of DG
    grammar for use in schools
  • PSG, CCG etc were around the same time in early
    20th century

Source ELLSSI 2000 Tutorial on Dependency
Grammars
14
Dependency Trees some phenomenon
  • DG has been widely accepted as a variant of PSG,
    but it is not strongly equivalent
  • Constituents are implicit in a DepTree and can be
    derived
  • Relations are explicit and can be labelled
    although optional
  • No explicit non-terminal nodes, which means no
    unary productions too
  • Can handle discontinuous phrases too
  • Known problems with Coordination and Gerunds

15
Phrase structure vs Dependency
  • Phrase structure suitable to languages with
  • rather fixed word order patterns
  • clear constituency structures
  • English etc
  • Dependency structure suitable to languages with
  • greater freedom of word order
  • order is controlled more by pragmatic than by
    syntactic factors
  • Slavonic (Czech, Polish) and some Romance
    (Italian , spanish etc)

16
Today
  • Introduction
  • Dependency formalism
  • Syntax in Machine Translation
  • Dependency Tree based Machine Translation
  • By projection
  • By synchronous modeling
  • Conclusion and Future

17
Phrasal SMT discussion
  • Advantages
  • Do not have to compose translations unnecessarily
  • Local re-ordering captured in phrases
  • Already specific to the domain and capture
    context locally
  • Disadvantages
  • Specificity and no generalization
  • Discontiguous phrases not considered
  • Global reordering
  • Estimation problems (long vs short phrases)
  • Can not model phenomenon across phrases
  • Limitations
  • Phrase sizes (how much before I run into out of
    memory?)
  • Corpus Availability makes it feasible only to
    certain language pairs

18
Syntax in MT Many Representations
  • WordLevel MT No syntax
  • SMT Phrases / contiguous sequences
  • SMT Hierarchical Pseudo Syntax
  • Syntax based SMT Constituent
  • Syntax based SMT CCG
  • Syntax based SMT LFG
  • Syntax based SMT Dependency

19
Syntax in MT Many ways of incorporation
  • Pre-procesing
  • Reordering input
  • Reordered training corpus
  • Translation models
  • Syntactically informed alignment models
  • Better distortion models
  • Language Models
  • Syntactic language models
  • Syntax motivated models
  • Post-processing
  • Nbest list reranking with syntactic information
  • Translation correction Case marker/TAM
    correction
  • True casing etc?
  • Multi combinations with Syntactic backbones?

20
Syntax based SMT discussion
  • Inversion Transduction Grammar (Wu 96)
  • Very constrained form of syntax One
    non-terminal
  • Some expressive limitations
  • Not linguistically motivated
  • Effectively learns preferences for flip/no-flip
  • Generative Tree to String (Yamada Knight 2001)
  • Expressiveness (last week presentation)
  • No discontiguous phrases
  • Multitext grammars (Melamed 2003)
  • Formalized, but MT work yet to be realized
  • Hierarchical MT (Chang 2005)
  • Linguistic generalizations
  • Handles discontiguous phrases recursively
  • Estimation problems and Phrase table are
    increased even more
  • Across phrase boundary modeling

21
Syntax in MT and Dependency Trees
  • Source side tree is provided
  • Target side is obtained by projection
  • Problem of Isomorphism between trees
  • head-switching
  • empty-dep extra-dep

Se
Syntax
Source
Target
Tree and String
Se
Sf
Source side tree is provided Target side is
provided Ideally non-isomorphic trees should be
modeled too
Syntax
Syntax
Source
Target
Tree and Tree
22
Today
  • Introduction
  • Dependency formalism
  • Syntax in Machine Translation
  • Dependency Tree based Machine Translation
  • By projection
  • By synchronous modeling
  • Conclusion and Future

23
Dependency Tree based Machine Translation
  • By projection
  • Fox 2002
  • Dekang Lin 2004
  • Quirk et al 2004, Quirk et al 2006, Menezes et al
    2007
  • By synchronous modeling
  • Alshawi et al 2001
  • Jason Eisner 2003
  • Fox 2005
  • Yuang Lin and Daniel Marcu 2004

24
Phrasal Cohesion and Statistical Machine
TranslationHeidi Fox , EMNLP 2002
  • English-French Corpus was used
  • En-Fr are similar
  • For phrase structure trees -
  • Head Crossings involve head constituent of the
    phrase with its modifier spans
  • Modifier Crossings involve only spans of modifier
    constituents
  • For dependency trees
  • Head Crossings means crossings of spans of child
    with its parent
  • Modifier crossings same as above
  • Dependency structures show cohesive nature across
    translation

25
A Path-based Transfer modelDekang Lin 2004
  • Input
  • Word-aligned
  • Source parsed
  • Syntax translation model
  • Set of paths in source tree
  • Extract connected target path
  • Generalization of paths to POS
  • Modeling
  • Relative likelihood
  • Smoothing factor for noise

26
A Path-based Transfer modelDekang Lin 2004
  • Decoding
  • Parse input and extract all paths, extract target
    paths
  • Find a set of transfer rules
  • Cover the entire source tree
  • Can be consistently merged
  • Lexicalized rule preferred
  • Future work?
  • Word ordering is addressed
  • Transfer rules from same sen follow order in
    sentence
  • Only one example of path follow order in rule
  • Many examples pick relative distance from head
  • Highest probability
  • Dynamic Programming
  • Min-set cover problem applied to trees

27
A Path-based Transfer modelDekang Lin 2004
  • Evaluation
  • English-French 1.2M
  • Source parsed by Minipar
  • 1755 test set
  • 5 to 15 words long sentences
  • Compared to Koehns results from 2003 paper
  • No Language Model or extra generation module
  • Order defined by paths is linear
  • Some heuristics to maintain linearity
  • Generalization of paths (transfer rules)
    quadratic vs. exponential
  • Direct Correspondence Approach (DCA) is violated
    when translation divergences exist
  • Very Naïve notion of reordering and merge
    conflict resolution

28
Dependency Treelet TranslationQuirk et al ACL
2004,05,06
  • Project dependencies from source to target via
    word alignment
  • One-one project dependency to aligned words
  • Many-one nothing to do, as the projected is the
    head
  • One-many project to right most, and rest are
    attached to it
  • Reattachment of modifiers to lowest possible node
    that preserves target word order
  • Treelet extraction
  • All subtrees on source until a particular limit,
    and the corresponding target fragment which is
    connected
  • MLE for scoring

29
Dependency Treelet TranslationQuirk et al ACL
2004,05,06
tired men and dogs
hommes et chiens fatigues
et
and
et
hommes
men
chiens
hommes
dogs
chiens
fatigues
fatigues
tired
Treelet with missing roots
30
Dependency Treelet TranslationQuirk et al
2004,05,06
  • Translation Model
  • Trained from the aligned projected corpus
  • Log-linear with feature functions
  • Channel Model
  • Treelet Prob
  • Lexical Prob
  • Order Model
  • Head relative
  • Swap model
  • Target Model
  • Target language model
  • Bigram Agreement model (opt)

31
Dependency Treelet TranslationQuirk et al ACL
2004,05,06
  • Decoding (Step by step)
  • Input is a dependency analyzed source
  • Challenge is that left-right may not work when
    starting with a Tree
  • Obtain best target tree combining the models
  • Exhaustive search using DP
  • Translate bottom up, from a given subtree (ITG)
  • For each head node extract all matching treelets
    x_i
  • For each uncovered subtrees extract all matching
    treelets y_i
  • Try all insertions of y_i into slots in x_i
  • Ordering model ranks all the re-ordering
    possibilities for the modifiers

32
Dependency Treelet TranslationQuirk et al ACL
2004,05,06
  • Decoding Optimizations
  • Duplicate translations checkreuse
  • Nbest list (only maintain top best candidates)
  • Early pruning before reordering (channel model)
  • Greedy reordering (pick best one and move on)
  • Variable n-best size (dynamically reduce n with
    increasing uncovered subtrees)
  • Determinstic pruning of treelets based on MLE
    (allowing decoder to try more reorderings)
  • A decoding
  • Estimate the cost of an uncovered node reordering
    instead of computing it exactly
  • Heuristics for optimistic estimates for each of
    the models

33
Dependency Treelet TranslationQuirk et al ACL
2004,05,06
  • Evaluation
  • Eng-French
  • 1.5M parallel Microsoft technical documentation
  • NLPWIN parsed on Eng side
  • GIZA trained
  • Target LM French side of parallel data
  • Tuned on 250 sens for MaxBLEU
  • Tested on 10K unseen
  • 1 Reference

34
Improvements to Treelet Translation
  • Dependency Order Templates (ACL 2007)
  • Improve Generality in Translation
  • Learn un-lexicalised order templates
  • Only use at runtime for restricting search space
    in reordering
  • Minimal Translation Units (HLT NAACL 2005)
  • Bilingual n-gram channel model (Banchs et.al
    2005)
  • M ltm1,m2gt
  • m1 ltsi, tjgt
  • Instead of conditioning on the surface adjacent
    MTU, they condition on Headwordchain

35
Dependency Tree based Machine Translation
  • By projection
  • Fox 2002
  • Dekang Lin 2004
  • Quirk et al 2004, Quirk et al 2006, Menezes et al
    2007
  • By synchronous modeling
  • Alshawi et al 2001
  • Jason Eisner 2003
  • Yuang Lin and Daniel Marcu 2004
  • Fox 2005

36
Learning Dependency Translation Models as
Collections of Finite-State Head
TransducersAlshwai et al 2001
  • Head transducers variant
  • Middle-out string transduction vs. left-right
  • Can be used in a hierarchical fashion, if you
    consider input/output for non-head transitions as
    strings rather than words
  • Dependency transduction model

May not always be a dependency model in
conventional sense
Empty in/out
37
Learning Dependency Translation Models as
Collections of Finite-State Head
TransducersAlshwai et al 2001
  • Training Given unaligned bitext
  • Compute coocurrence statistics at wordlevel
  • Find a hierarchical synchronous alignment driven
    by cost function
  • Construct a set of head transducers that explain
    the alignment
  • Calculate the transition weights by MLE
  • Decoding
  • Similar to CKY or Chart Parsing, but middle-out
  • Given input, find the best applications of
    transducers
  • A derivation spanning entire input means it
    probably has found best dependencies for source
    target
  • Else string together most probable partial
    hypothesis to form a tree
  • Pick the target tree with lowest score and read
    off the string

38
Learning Dependency Translation Models as
Collections of Finite-State Head
TransducersAlshwai et al 2001
  • Evaluation
  • Eng Spanish (ATIS data 13,966 train, 1185
    test)
  • Eng Jap (Speech transcribed data 12,226
    train, 3253 test)
  • Discussion
  • Language agnostic, direction agnostic
  • Induced dependency tree may not be syntactically
    motivated, but suited to translation
  • Application of transducers is done locally, and
    so less context information
  • A single transducer tries to do everything,
    training may have sparsity problems

39
Learning non-isomorphic tree mappings for
MTJason Eisner 2003
  • Non-Isomorphism not just due to language
    divergences but free translation
  • A version of Tree Substitution Grammar
  • To learn from unaligned non-isomorphic trees
  • A statistical model based generalized instead of
    linguistic minimalism
  • Expressive with empty string insertions
  • Formulate for both PSG and DG
  • Translation model
  • Joint model P (Ts,Tt,A)
  • Alignment
  • Decoding
  • Training
  • Factorization helps
  • Reconstruct all derivations for a tree by
    efficient tree parsing algorithm for TSG
  • EM as an efficient inside-outside training on all
    derivations
  • Decoding
  • Chart Parsing to create a forest of derivations
    for input tree
  • Maximize over probability of derivations
  • 1-best derivation parse is syntactic-alignment

1. Kids kiss Sam quite often 2. Lots of kids
give kisses to Sam
40
Machine Translation Using Probabilistic
Synchronous Dependency Insertion Grammars Ding
and Marcu 2005
  • SDIG
  • Like STAG, STIG for phrase structures
  • Basic units are elementary trees
  • Handles non-isomorphism at sub-tree level
  • Cross-lingual inconsistencies are handled if they
    appear within basic units
  • Crossing-dependency
  • Broken-dependency

41
Machine Translation Using Probabilistic
Synchronous Dependency Insertion Grammars Ding
and Marcu 2005
  • Induction of SDIG for MT as Synchronous
    hierarchical tree partitioning
  • Train IBM Mode 1 scores for bitext
  • For each category of Node, starting with NP -
  • Perform synchronous tree partitioning operations
  • Compute Prob of word pair (ei,fi) where operation
    can be performed
  • Heuristic functions (Graphical model) guide the
    partitioning

42
Machine Translation Using Probabilistic
Synchronous Dependency Insertion Grammars Ding
and Marcu 2005
  • Translation
  • Decoding for MT
  • Translation is obtained by
  • maximizing over all possible derivations of the
    source tree
  • translation of the elementary trees
  • Analogous to HMM (Emission and Transition probs
    with elementary trees)
  • Decoding is similar to a Viterbi-style algorithm
    on the tree
  • Hooks
  • Augmenting corpus by singleton ETs from Model1
  • Smoothing probabilities

43
Machine Translation Using Probabilistic
Synchronous Dependency Insertion Grammars Ding
and Marcu 2005
  • Evaluation
  • Chinese-English system
  • Dan Bikels parses for both Cn,En trained from
    Parallel treebanks
  • Test with 4 refs
  • Compared with
  • GIZA trained
  • ISI Rewrite Decoder
  • NIST increased 97
  • BLUE increased 27
  • Reordering ignored for now

44
Dependency Based Statistical MT Fox 2005
  • Czech-English parallel corpus (Penn TB and Prague
    TB)
  • Morphological process and tecto-grammatical
    conversion for Czech trees
  • No processing for English trees
  • Alignment of subtrees via IBM Model4 scores
  • followed by structural modification of trees to
    suit alignment (KEEP,SPLIT,BUD)
  • Translation Model

45
Dependency Based Statistical MT Fox 2005
  • Decoding
  • Bestfirst decoder
  • Process given Czech input to dependency tree and
    translate each node independently
  • For each node
  • Choose head position
  • Generate english POS seq
  • Generate the feature list
  • Perform structural mutations
  • Syntax Language Model
  • Takes as input a forest of phrase structures
  • Invert decoder forest output (dep tree nodes)
    into phrase structures
  • Reordering is entirely left to LM
  • Evaluation
  • Work in progress
  • Proposed to use BLEU

46
Today
  • Introduction
  • Dependency formalism
  • Syntax in Machine Translation
  • Dependency Tree based Machine Translation
  • By projection
  • By synchronous modeling
  • Conclusion and Future

47
Conclusion
  • The good -
  • Easy to work with
  • Cohesive during projection
  • Builds well on top of existing PBSMT (Effective
    combination of lexicalization and syntax)
  • Supports modeling a target even with crossing
    phrase boundaries
  • Gracefully degrade over new domains
  • The bad
  • Reordering is not crucial, but expensive
  • Lots of hooks for decoding
  • Generalization explodes space
  • The not so good
  • Current approaches require a dependency tree on
    source side and a strong model for the target side

48
What Next
  • 1 year
  • Better scoring and estimation in syntactic
    translation models
  • Improvement in Dependency trees parse quality
    directly translates? (Chris Quirk et al 2006) ?
    What about MST Parser etc?
  • Better Word-Alignment and effect on model
  • Incorporating labeled dependencies. Will it help?
  • Factored Dependency Tree based models
  • Approximate subtree matching and Parsing
    algorithms
  • 3-5 years
  • Decoding Algorithms and the Target-Ordering
    problem
  • Discriminative approaches to MT are catching up.
    How can syntax be incorporated into such a
    framework
  • Better syntactic language models based on the
    dependency formalisms
  • Semantics in Translation (Are DepTrees the first
    step?)
  • Fusion of Dependency and Constituent approaches
    (LFG style)
  • Joint Modeling approaches (Eisner 03, Smith 06 QS
    Grammar)
  • Taking MT to other applications like
    Cross-lingual Retrieval and QA which already use
    DepFormalisms

49
Thanks to
  • Lori Levin For discussion on Dependency tree
    formalism
  • Amr Ahmed For discussion and separation of work
  • Respective authors of the papers for some of the
    graphic images I liberally used in the slides

50
Questions
  • Thanks

51
DG Variants
  • Case Grammar (Anderson)
  • Daughter-Dependency Theory (Hudson)
  • Dependency Unification Grammar (Hellwig)
  • Functional-Generative Description (Sgall)
  • Lexicase (Starosta)
  • Meaning-Text Model (Mel'cuk)
  • Metataxis (Schubert)
  • Unification Dependency Grammar (Maxwell)
  • Constraint Dependency Grammar (Maruyama)

52
Motivation Questions
  • 1. How is dependency analysis used in Syntax MT?
    How do the algorithms vary if only the source
    side of analysis is present?
  • 2. How do the decoding and transfer phases adapt
    when using dependency analysis? What algorithms
    exist and what is the complexity analysis?
  • 3. How does dependency based syntax incorporation
    in MT, compare with other grammar formalisms like
    the phrase structure grammar?
  • 4. Is there a class of languages which yield
    better to dependency analysis vs. other analysis?
  • 5. Dependency analysis being close to semantics,
    does it help MT produce better results?

53
Other Papers
  • QuasiSynchronous Grammars for Soft Syntactic
    Projection David Smith and Jason Eisner 2007
  • Automatic Learning of Parallel Dependency Treelet
    Pairs Yuan Ding and Martha Palmer 2004
  • Dependency vs. Constituents for Tree-Based
    Alignment Dan Gildea 2003
  • My Compilation
  • http//kathmandu.lti.cs.cmu.edu8080/wiki/index.ph
    p/AMTSchedule
Write a Comment
User Comments (0)
About PowerShow.com