Title: Linguistics 187287 Week 6
1Linguistics 187/287 Week 6
Generation Term-rewrite System Machine Translation
- Martin Forst, Ron Kaplan, and Tracy King
2Generation
- Parsing string to analysis
- Generation analysis to string
- What type of input?
- How to generate
3Why generate?
- Machine translation
- Lang1 string -gt Lang1 fstr -gt Lang2 fstr -gt Lang2
string - Sentence condensation
- Long string -gt fstr -gt smaller fstr -gt new string
- Question answering
- Production of NL reports
- State of machine or process
- Explanation of logical deduction
- Grammar debugging
4F-structures as input
- Use f-structures as input to the generator
- May parse sentences that shouldnt be generated
- May want to constrain number of generated options
- Input f-structure may be underspecified
5XLE generator
- Use the same grammar for parsing and generation
- Advantages
- maintainability
- write rules and lexicons once
- But
- special generation tokenizer
- different OT ranking
6Generation tokenizer/morphology
- White space
- Parsing multiple white space becomes a single TB
- John appears. -gt John TB appears TB . TB
- Generation single TB becomes a single space (or
nothing) - John TB appears TB . TB -gt John appears.
-
John appears . - Suppress variant forms
- Parse both favor and favour
- Generate only one
7Morphconfig for parsing generation
- STANDARD ENGLISH MOPRHOLOGY (1.0)
- TOKENIZE
- P!eng.tok.parse.fst G!eng.tok.gen.fst
- ANALYZE
- eng.infl-morph.fst G!amerbritfilter.fst
- G!amergen.fst
- ----
8Reversing the parsing grammar
- The parsing grammar can be used directly as a
generator - Adapt the grammar with a special OT ranking
GENOPTIMALITYORDER - Why do this?
- parse ungrammatical input
- have too many options
9Ungrammatical input
- Linguistically ungrammatical
- They walks.
- They ate banana.
- Stylistically ungrammatical
- No ending punctuation They appear
- Superfluous commas John, and Mary appear.
- Shallow markup NP John and Mary appear.
10Too many options
- All the generated options can be linguistically
valid, but too many for applications - Occurs when more than one string has the same,
legitimate f-structure - PP placement
- In the morning I left. I left in the morning.
11Using the Gen OT ranking
- Generally much simpler than in the parsing
direction - Usually only use standard marks and NOGOOD
- no marks, no STOPPOINT
- Can have a few marks that are shared by several
constructions - one or two for dispreferred
- one or two for preferred
12Example Prefer initial PP
- S --gt (PP _at_ADJUNCT _at_(OT-MARK GenGood))
- NP _at_SUBJ
- VP.
- VP --gt V
- (NP _at_OBJ)
- (PP _at_ADJUNCT).
- GENOPTIMALITYORDER NOGOOD GenGood.
- parse they appear in the morning.
- generate without OT In the morning they appear.
- They appear
in the morning. - with OT In the morning they
appear.
13Debugging the generator
- When generating from an f-structure produced by
the same grammar, XLE should always generate - Unless
- OT marks block the only possible string
- something is wrong with the tokenizer/morphology
- regenerate-morphemes if this gets a
string - the tokenizer/morphology is not the
problem - Hard to debug XLE has robustness features to help
14Underspecified Input
- F-structures provided by applications are not
perfect - may be missing features
- may have extra features
- may simply not match the grammar coverage
- Missing and extra features are often systematic
- specify in XLE which features can be added and
deleted - Not matching the grammar is a more serious problem
15Adding features
- English to French translation
- English nouns have no gender
- French nouns need gender
- Soln have XLE add gender
- the French morphology will control
the value - Specify additions in xlerc
- set-gen-adds add "GEND"
- can add multiple features
- set-gen-adds add "GEND CASE PCASE"
- XLE will optionally insert the feature
Note Unconstrained additions make generation
undecidable
16Example
The cat sleeps. -gt Le chat dort.
- PRED 'dormirltSUBJgt'
- SUBJ PRED 'chat'
- NUM sg
- SPEC def
- TENSE present
PRED 'dormirltSUBJgt' SUBJ PRED 'chat'
NUM sg GEND masc
SPEC def TENSE present
17Deleting features
- French to English translation
- delete the GEND feature
- Specify deletions in xlerc
- set-gen-adds remove "GEND"
- can remove multiple features
- set-gen-adds remove "GEND CASE PCASE"
- XLE obligatorily removes the features
- no GEND feature will remain in the f-structure
- if a feature takes an f-structure value, that
f-structure is also removed
18Changing values
- If values of a feature do not match between the
input f-structure and the grammar - delete the feature and then add it
- Example case assignment in translation
- set-gen-adds remove "CASE"
- set-gen-adds add "CASE"
- allows dative case in input to become accusative
- e.g., exceptional case marking verb in input
language but regular case in output language
19Generation for Debugging
- Checking for grammar and lexicon errors
- create-generator english.lfg
- reports ill-formed rules, templates, feature
declarations, lexical entries - Checking for ill-formed sentences that can be
parsed - parse a sentence
- see if all the results are legitimate strings
- regenerate they appear.
20Rewriting/Transfer System
21Why a Rewrite System
- Grammars produce c-/f-structure output
- Applications may need to manipulate this
- Remove features
- Rearrange features
- Continue linguistic analysis (semantics,
knowledge representation next week) - XLE has a general purpose rewrite system (aka
"transfer" or "xfr" system)
22Sample Uses of Rewrite System
- Sentence condensation
- Machine translation
- Mapping to logic for knowledge representation and
reasoning - Tutoring systems
23What does the system do?
- Input set of "facts"
- Apply a set of ordered rules to the facts
- this gradually changes the set of input facts
- Output new set of facts
- Rewrite system uses the same ambiguity management
as XLE - can efficiently rewrite packed structures,
maintaining the packing
24Example F-structure Facts
- PERS(var(1),3)
- PRED(var(1),girl)
- CASE(var(1),nom)
- NTYPE(var(1),common)
- NUM(var(1),pl)
- SUBJ(var(0),var(1))
- PRED(var(0),laugh)
- TNS-ASP(var(0),var(2))
- TENSE(var(2),pres)
- arg(var(0),1,var(1))
- lex_id(var(0),1)
- lex_id(var(1),0)
- F-structures get var()
- Special arg facts
- lex_id for each PRED
- Facts have two arguments (except arg)
- Rewrite system allows for any number
- of arguments
25Rule format
- Obligatory rule LHS gt RHS.
- Optional rule LHS ?gt RHS.
- Unresourced fact - clause.
- LHS
- clause match and delete
- clause match and keep
- -LHS negation (don't have fact)
- LHS, LHS conjunction
- ( LHS LHS ) disjunction
- ProcedureCall procedural attachment
- RHS
- clause replacement facts
- 0 empty set of replacement facts
- stop abandon the analysis
26Example rules
PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom)
NTYPE(var(1),common) NUM(var(1),pl) SUBJ(var(0),v
ar(1)) PRED(var(0),laugh) TNS-ASP(var(0),var(2))
TENSE(var(2),pres) arg(var(0),1,var(1)) lex_id(va
r(0),1) lex_id(var(1),0)
"PRS (1.0)" grammar toy_rules. "obligatorily
add a determiner if there is a noun with no
spec" NTYPE(F,), -SPEC(F,) gt SPEC(F,def
). "optionally make plural nouns singular this
will split the choice space" NUM(F, pl) ?gt
NUM(F, sg).
27Example Obligatory Rule
PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom)
NTYPE(var(1),common) NUM(var(1),pl) SUBJ(var(0),v
ar(1)) PRED(var(0),laugh) TNS-ASP(var(0),var(2))
TENSE(var(2),pres) arg(var(0),1,var(1)) lex_id(va
r(0),1) lex_id(var(1),0)
"obligatorily add a determiner if there is a
noun with no spec" NTYPE(F,),
-SPEC(F,) gt SPEC(F,def).
Output facts all the input facts plus
SPEC(var(1),def)
28Example Optional Rule
"optionally make plural nouns singular this will
split the choice space" NUM(F, pl) ?gt
NUM(F, sg).
PERS(var(1),3) PRED(var(1),girl) CASE(var(1),nom)
NTYPE(var(1),common) NUM(var(1),pl) SPEC(var(1),de
f) SUBJ(var(0),var(1)) PRED(var(0),laugh) TNS-AS
P(var(0),var(2)) TENSE(var(2),pres) arg(var(0),1,
var(1)) lex_id(var(0),1) lex_id(var(1),0)
Output facts all the input facts plus
choice split A1 NUM(var(1),pl)
A2 NUM(var(1),sg)
29Output of example rules
- Output is a packed f-structure
- Generation gives two sets of strings
- The girls laugh.laugh!laugh
- The girl laughs.laughs!laughs
30Manipulating sets
- Sets are represented with an in_set feature
- He laughs in the park with the telescope
- ADJUNCT(var(0),var(2))
- in_set(var(4),var(2))
- in_set(var(5),var(2))
- PRED(var(4),in)
- PRED(var(5),with)
- Might want to optionally remove adjuncts
- but not negation
31Example Adjunct Deletion Rules
- "optionally remove member of adjunct set"
- ADJUNCT(, AdjSet), in_set(Adj, AdjSet),
- -PRED(Adj, not)
- ?gt 0.
- "obligatorily remove adjunct with nothing in it"
- ADJUNCT(, Adj), -in_set(,Adj)
- gt 0.
He laughs with the telescope in the park. He
laughs in the park with the telescope He laughs
with the telescope. He laughs in the park. He
laughs.
32Manipulating PREDs
- Changing the value of a PRED is easy
- PRED(F,girl) gt PRED(F,boy).
- Changing the argument structure is trickier
- Make any changes to the grammatical functions
- Make the arg facts correlate with these
33Example Passive Rule
- "make actives passive
- make the subject NULL make the object the
subject - put in features"
- SUBJ( Verb, Subj), arg( Verb, Num, Subj),
- OBJ( Verb, Obj), CASE( Obj, acc)
- gt
- SUBJ( Verb, Obj), arg( Verb, Num, NULL),
CASE( Obj, nom), - PASSIVE( Verb, ), VFORM( Verb, pass).
the girls saw the monkeys gt The monkeys were
seen. in the park the girls saw the monkeys
gt In the park the monkeys were seen.
34Templates and Macros
- Rules can be encoded as templates
- n2n(Eng,Frn)
- PRED(F,Eng), NTYPE(F,)
- gt PRED(F,Frn).
- _at_n2n(man, homme).
- _at_n2n(woman, femme).
- Macros encode groups of clauses/facts
- sg_noun(F)
- NTYPE(F,), NUM(F,sg).
- _at_sg_noun(F), -SPEC(F)
- gt SPEC(F,def).
35Unresourced Facts
- Facts can be stipulated in the rules and refered
to - Often used as a lexicon of information not
encoded in the f-structure - For example, list of days and months for
manipulation of dates - - day(Monday). - day(Tuesday). etc.
- - month(January). - month(February). etc.
- PRED(F,Pred), ( day(Pred) month(Pred) )
gt
36Rule Ordering
- Rewrite rules are ordered (unlike LFG syntax
rules but like finite-state rules) - Output of rule1 is input to rule2
- Output of rule2 is input to rule3
- This allows for feeding and bleeding
- Feeding insert facts used by later rules
- Bleeding remove facts needed by later rules
- Can make debugging challenging
37Example of Rule Feeding
- Early Rule Insert SPEC on nouns
- NTYPE(F,), -SPEC(F,) gt
- SPEC(F, def).
- Later Rule Allow plural nouns to become singular
only if have a specifier (to avoid bad count
nouns) - NUM(F,pl), SPEC(F,) gt NUM(F,sg).
38Example of Rule Bleeding
- Early Rule Turn actives into passives
(simplified) - SUBJ(F,S), OBJ(F,O) gt
- SUBJ(F,O), PASSIVE(F,).
- Later Rule Impersonalize actives
- SUBJ(F,), -PASSIVE(F,) gt
- SUBJ(F,S), PRED(S,they), PERS(S,3),
NUM(S,pl). - will apply to intransitives and verbs with
(X)COMPs but not transitives
39Debugging
- XLE command line tdbg
- steps through rules stating how they apply
Rule
1 (NTYPE(F,A)), -(SPEC(F,B))
gtSPEC(F,def) File /tilde/thking/courses/ling18
7/hws/thk.pl, lines 4-10 Rule 1 matches
(2) NTYPE(var(1),common) 1
--gt SPEC(var(1),def)
Rule 2 NUM(F,pl)
?gtNUM(F,sg) File /tilde/thking/courses/ling187/
hws/thk.pl, lines 11-17 Rule 2 matches 3
NUM(var(1),pl) 1 --gt
NUM(var(1),sg)
Rule 5 SUBJ(Verb,Subj),
arg(Verb,Num,Subj), OBJ(Verb,Obj),
CASE(Obj,acc) gtSUBJ(Verb,Obj),
arg(Verb,Num,NULL), CASE(Obj,nom),
PASSIVE(Verb,), VFORM(Verb,pass) File
/tilde/thking/courses/ling187/hws/thk.pl, lines
28-37 Rule does not apply
girls laughed
40Running the Rewrite System
- create-transfer adds menu items
- load-transfer-rules FILE loads rules from file
- f-str window under commands has
- transfer prints output of rules in XLE window
- translate runs output through generator
- Need to do (where path is XLEPATH/lib)
- setenv LD_LIBRARY_PATH /afs/ir.stanford.edu/data/l
inguistics/XLE/SunOS/lib
41Rewrite Summary
- The XLE rewrite system lets you manipulate the
output of parsing - Creates versions of output suitable for
applications - Can involve significant reprocessing
- Rules are ordered
- Ambiguity management is as with parsing
42Grammatical Machine Translation
- Stefan Riezler John Maxwell
43Translation System
Lots of statistics
Translationrules
XLEParsing
XLEGeneration
F-structures
F-structures.
GermanLFG
English LFG
44Transfer-Rule Induction from aligned bilingual
corpora
- Use standard techniques to find many-to-many
candidate word-alignments in source-target
sentence-pairs - Parse source and target sentences using LFG
grammars for German and English - Select most similar f-structures in source and
target - Define many-to-many correspondences between
substructures of f-structures based on
many-to-many word alignment - Extract primitive transfer rules directly from
aligned f-structure units - Create powerset of possible combinations of basic
rules and filter according to contiguity and type
matching constraints
45Induction
- Example sentences Dafür bin ich zutiefst
dankbar. - I have a deep appreciation for that.
- Many-to-many word alignment
- Dafür6 7 bin2 ich1 zutiefst3 4 5
dankbar5 - F-structure alignment
46Extracting Primitive Transfer Rules
- Rule (1) maps lexical predicates
- Rule (2) maps lexical predicates and interprets
subj-to-subj link as indication to map subj of
source with this predicate into subject of target
and xcomp of source into object of target - X1, X2, X3, are variables for f-structures
-
- (2) PRED(X1, sein),
- SUBJ(X1,X2),
- XCOMP(X1,X3)
- gt
- PRED(X1, have),
- SUBJ(X1,X2)
- OBJ(X1,X3)
(1) PRED(X1, ich) gt PRED(X1, I)
47Extracting Complex Transfer Rules
- Complex rules are created by taking all
combinations of primitive rules, and filtering
- (4) zutiefst dankbar sein
- gt
- have a deep appreciation
- (5) zutiefst dankbar dafür sein
- gt
- have a deep appreciation for that
- (6) ich bin zutiefst dankbar dafür
- gt
- I have a deep appreciation for that
48Transfer Contiguity constraint
- Transfer contiguity constraint
- Source and target f-structures each have to be
connected - F-structures in the transfer source can only be
aligned with f-structures in the transfer target,
and vice versa - Analogous to constraint on contiguous and
alignment-consistent phrases in phrase-based SMT - Prevents extraction of rule that would translate
dankbar directly into appreciation since
appreciation is aligned also to zutiefst - Transfer contiguity allows learning idioms like
es gibt - there is from configurations that are
local in f-structure but non-local in string,
e.g., es scheint zu geben - there seems
to be
49Linguistic Filters on Transfer Rules
- Morphological stemming of PRED values
- (Optional) filtering of f-structure snippets
based on consistency of linguistic categories - Extraction of snippet that translates zutiefst
dankbar into a deep appreciation maps
incompatible categories adjectival and nominal
valid in string-based world - Translation of sein to have might be discarded
because of adjectival vs. nominal types of their
arguments - Larger rule mapping zutiefst dankbar sein to have
a deep appreciation is ok since verbal types match
50Transfer
- Parallel application of transfer rules in
non-deterministic fashion - Unlike XLE ordered-rule rewrite system
- Each fact must be transferred by exactly one rule
- Default rule transfers any fact as itself
- Transfer works on chart using parsers
unification mechanism for consistency checking - Selection of most probable transfer output is
done by beam-decoding on transfer chart
51Generation
- Bi-directionality allows us to use same grammar
for parsing training data and for generation in
translation application - Generator has to be fault-tolerant in cases where
transfer-system operates on FRAGMENT parse or
produces non-valid f-structures from valid input
f-structures - Robust generation from unknown (e.g.,
untranslated) predicates and from unknown
f-structures
52Robust Generation
- Generation from unknown predicates
- Unknown German word Hunde is analyzed by German
grammar to extract stem (e.g., PRED Hund, NUM
pl) and then inflected using English default
morphology (Hunds) - Generation from unknown constructions
- Default grammar that allows any attribute to be
generated in any order is mixed as suboptimal
option in standard English grammar, e.g. if SUBJ
cannot be generated as sentence-initial NP, it
will be generated in any position as any category - extension/combination of set-gen-adds and OT
ranking
53Statistical Models
- Log-probability of source-to-target transfer
rules, where probability r(ef) or rule that
transfers source snippet f into target snippet e
is estimated by relative frequency - Log-probability of target-to-source transfer
rules, estimated by relative frequency
54Statistical Models, cont.
- Log-probability of lexical translations l(ef)
from source to target snippets, estimated from
Viterbi alignments a between source word
positions i1, n and target word positions
j1,,m for stems fi and ej in snippets f and e
with relative word translation frequencies
t(ejfi) - Log-probability of lexical translations from
target to source snippets
55Statistical Model, cont.
- Number of transfer rules
- Number of transfer rules with frequency 1
- Number of default transfer rules
- Log-probability of strings of predicates from
root to frontier of target f-structure, estimated
from predicate trigrams in English f-structures - Number of predicates in target f-structure
- Number of constituent movements during
generations based on original order of head
predicates of the constituents
56Statistical Models, cont.
- Number of generation repairs
- Log-probability of target string as computed by
trigram language model - Number of words in target string
57Experimental Evaluation
- Experimental setup
- German-to-English on Europarl parallel corpus
(Koehn 02) - Training and evaluation on sentences of length
5-15, for quick experimental turnaround - Resulting in training set of 163,141 sentences,
development set of 1,967 sentences, test of 1,755
sentences (used in Koehn et al. HLT03) - Improved bidirectional word alignment based on
GIZA (Och et al. EMNLP99) - LFG grammars for German and English (Butt et al.
COLING02 Riezler et al. ACL02) - SRI trigram language model (Stocke02)
- Comparison with PHARAOH (Koehn et al. HLT03) and
IBM Model 4 as produced by GIZA (Och et al.
EMNLP99)
58Experimental Evaluation, cont.
- Around 700,000 transfer rules extracted from
f-structures chosen by dependency similarity
measure - System operates on n-best lists of parses (n1),
transferred f-structures (n10), and generated
strings (n1,000) - Selection of most probable translations in two
steps - Most probable f-structure by beam search (n20)
on transfer chart using features 1-10 - Most probable string selected from strings
generated from selected n-best f-structures using
features 11-13 - Feature weights for modules trained by MER on 750
in-coverage sentences of development set
59Automatic Evaluation
- NIST scores (ignoring punctuation) Approximate
Randomization for significance testing (see
above) - 44 in-coverage of grammars 51 FRAGMENT parses
and/or generation repair 5 timeouts - In-coverage Difference between LFG and P not
significant - Suboptimal robustness techniques decrease overall
quality
60Manual Evaluation
- Closer look at in-coverage examples
- Random selection of 500 in-coverage examples
- Two independent judges indicated preference for
LFG or PHARAOH, or equality, in blind test - Separate evaluation under criteria of
grammaticality/fluency and translational/semantic
adequacy - Significance assessed by Approximate
Randomization via stratified shuffling of
preference ratings between systems
61Manual Evaluation
- Result differences on agreed-on ratings are
statistically significant at p lt 0.0001 - Net improvement in translational adequacy on
agreed-on examples is 11.4 on 500 sentences
(57/500), amounting to 5 overall improvement in
hybrid system (44 of 11.4) - Net improvement in grammaticality on agreed-on
examples is 15.4 on 500 sentences, amounting to
6.7 overall improvement in hybrid system
62Examples LFG gt PHARAOH
- src in diesem fall werde ich meine verantwortung
wahrnehmen - sef then i will exercise my responsibility
- LFG in this case i accept my responsibility
- P in this case i shall my responsibilities
- src die politische stabilität hängt ab von der
besserung der lebensbedingungen - ref political stability depends upon the
improvement of living conditions - LFG the political stability hinges on the
recovery the conditions - P the political stability is rejects the
recovery of the living conditions
63Examples PHARAOH gt LFG
- src das ist schon eine seltsame vorstellung von
gleichheit - ref a strange notion of equality
- LFG equality that is even a strange idea
- P this is already a strange idea of equality
- src frau präsidentin ich beglückwünsche herrn
nicholson zu seinem ausgezeichneten bericht - ref madam president I congratulate mr nicholson
on his excellent report - LFG madam president I congratulate mister
nicholson on his report excellented - P madam president I congratulate mr nicholson
for his excellent report
64Discussion
- High percentage of out-of-coverage examples
- Accumulation of 2 x 20 error-rates in parsing
training data - Errors in rule extraction
- Together result in ill-formed transfer rules
causing high number of generation
failures/repairs - Propagation of errors through the system also for
in-coverage examples - Error analysis 69 transfer errors, 10 due to
parse errors - Discrepancy between NIST and manual evaluation
- Suboptimal integration of generator, making
training and translation with large n-best lists
infeasible - Language and distortion models applied after
generation
65Conclusion
- Integration of grammar-based generator into
dependency-based SMT system achieves
state-of-the-art NIST and improved grammaticality
and adequacy on in-coverage examples - Possibility of hybrid system since it is
determinable when sentences are in coverage of
system
66Grammatical Machine Translation II
- Ji Fang, Martin Forst, John Maxwell, and Michael
Tepper
67Overview over different approaches to MT
68Limitations of string-based approaches
- Transfer rules/correspondences of little
generality - Problems with long-distance dependencies
- Perform less well for morphologically rich
(target) languages - N-gram LM-based disambiguation seems to have
leveled out
69Limitations of string-based approaches - little
generality
- From Europarl Das tut mir leid. Im sorry
about that. - Google (SMT) Im sorry. Perfect!
- But As soon as input changes a bit, we get
garbage. - Das tut ihr leid. She is sorry about that.
? It does their suffering. - Der Tod deines Vaters tut mir leid. I am sorry
about the death of your father. ? The death of
your father I am sorry. - Der Tod deines Vaters tut ihnen leid. They are
sorry about the death of your father. ? The
death of your father is doing them sorry.
70Limitations of string-based approaches - problems
with LDDs
- From Europarl Dies stellt eine der großen
Herausforderungen für die französische
Präsidentschaft dar . This is one of the
major issues of the French Presidency . - Google (SMT) This is one of the major challenges
for the French presidency represents. - Particle verb is identified and translated
correctly - But two verbs ? ungrammatical seem to be too
far apart to be filtered by LM
71Limitations of string-based approaches - rich
morphology
- Language pairs involving morphologically rich
languages, e.g., Finnish, are hard
From Koehn (2005, MT Summit)
72Limitations of string-based approaches - rich
morphology
- Morphologically rich, free word order languages,
e.g. German, are particularly hard as target
languages.
Again from Koehn (2005, MT Summit)
73Limitations of string-based approaches - n-gram
LMs
- Even for morphologically poor languages,
improving n-gram LMs becomes increasingly
expensive. - Adding data helps improve translation quality
(BLEU scores), but not enough. - Assuming best improvement rate observed in Brants
et al. (2007), 400 million times available data
needed to attain human translation quality by LM
improvement.
74Limitations of string-based approaches - n-gram
LMs
- Best improvement rate 0.7 BP/x2
- Would need 40 more doublings to obtain human
translation quality. (42 0.740 70) - Necessary training data in tokens 1e22
(1e10240 1e22) - 4e8 times current English Web (estimate)
(2.5e134e8 1e22)
- From Brants et al. (2007)
75Limitations of bitext-based approaches
- Generally available bitexts are limited in size
and specialized in genre - Parliament proceedings
- UN texts
- Judiciary texts (from multilingual countries)
- ? Makes it hard to repurpose bitext-based
systems to new genres - Induced transfer rules/correspondences often of
mediocre quality - Loose translations
- Bad alignments
76Limitations of bitext-based approaches -
availability and quality
- Readily available bitexts are limited in size and
specialized in genre - Approaches to auto-extracting bitexts from the
web exist. - Additional data help to some degree, but then
effect levels out. - Still a genre bias in bitexts, despite automatic
acquisition? - Still more general problems with alignment
quality etc.?
77Limitations of bitext-based approaches -
availability and quality
- Much more data needed to attain human translation
quality - Logarithmic gains (at best) by adding bitext data
- From Munteanu Marcu (2005)
- Base Line 100K - 95M English Words
- Mid Line (auto) 90K - 2.1M
- Top Line (oracle) 90K - 2.1M
78Context-Based MT / Meaningful Machines
- Combines example-based MT (EBMT) and SMT
- Very large (target) language model, large amount
of monolingual text required - No transfer statistics, thus no parallel text
required - Translation lexicon is developed
semi-automatically (i.e. hand-validated) - Lexicon has slotted phrase pairs (like EBMT),
i.e. NP1 biss ins Gras. NP1 bit the dust.
79Context-Based MT / Meaningful Machines - pros
- High-quality translation lexicon seems to allow
for - Easier repurposing of system(s) to new genres
- Better translation quality
From Carbonell (2006)
80Context-Based MT / Meaningful Machines - cons
- Works really well for English-Spanish. How about
other language pairs? - Same problems with n-gram LMs as traditional
SMT probably affects pairs involving
morphologically rich (target) language
particularly badly. - How much manual labor involved in development of
translation lexicon? - Computationally expensive
81Grammatical Machine Translation
- Syntactic transfer-based approach
- Parsing and generation identical/similar between
GMT I and GMT II
pyramid
F-structure transfer rules
transfer, score target FSs
parse source, score f-structures
generate, pick best realization
String-level statistical methods
82Grammatical Machine Translation GMT I vs. GMT II
- GMT I
- Transfer rules induced from parsed bitexts
- Target f-structures ranked using individual
transfer rule statistics
- GMT II
- Transfer rules induced from manually/semi-automati
cally construc-ted phrase lexicon - Target f-structures ranked using monolingually
trained bilexical dependency statistics and
general transfer rule statistics
83GMT II
- Where do the transfer rules come from?
- Where do statistics/machine learning come in?
induced from manually/semi-automatically compiled
phrase pairs with slots potentially, but not
necessarily from bitexts
pyramid
log-linear model trained on synt. annotated
monolingual corpus
log-linear model trained on bitext data includes
score from parse ranking model and very general
transfer features
F-structure transfer rules
log-linear model trained on bitext data includes
scores from other two models and features/score
of monolingually trained model for realization
ranking
transfer, score target FSs
parse source, score f-structures
generate, pick best realization
String-level statistical methods
84GMT II - The phrase dictionary
- Contains phrase pairs with slot categories
(Ddeff, Ddef, NP1nom, NP1, etc.) that allow for
well-formed phrases without being included in
induced rules - Currently hand-written
- Will hopefully be compiled (semi-)automati-cally
from bilingual dictionaries - Bitexts might also be used how exactly remains
to be defined.
85GMT II - Rule induction from the phrase dictionary
- Sub-FSs of slot variables are not included
- FS attributes can be defined as irrelevant for
translation, e.g. CASE (in both en and de), GEND
(in de). Attributes so defined are never included
in induced rules. - set-gen-adds remove CASE GEND
- FS attributes can be defined as
remove_equal_features. Attributes defined as
such are not included in induced rules when they
are equal. - set remove_equal_features NUM OBJ OBL-AG
PASSIVE SUBJ TENSE - ? more general rules
86GMT II - Rule induction from the phrase
dictionary (noun)
- Ddeff Verfassung Ddef constitution
- PRED(X1, Verfassung),
- NTYPE(X1, Z2),
- NSEM(Z2, Z3),
- COMMON(Z3, count),
- NSYN(Z2, common)
- gt
- PRED(X1, constitution),
- NTYPE(X1, Z4),
- NSYN(Z4, common).
87GMT II - Rule induction from the phrase
dictionary (adjective)
- europäische European
- PRED(X1, europäisch)
- gt
- PRED(X1, European).
- To accommodate certain non-parallelism with
respect to SUBJs of adjectives etc., special
mechanism removes SUBJs of non-verbs and makes
them addable in generation.
88GMT II - Rule induction from the phrase
dictionary (verb)
- NP1nom koordiniert NP2acc. NP1 coordinates
NP2. - PRED(X1, koordinieren),
- arg(X1, 1, A2),
- arg(X1, 2, A3),
- VTYPE(X1, main)
- gt
- PRED(X1, coordinate),
- arg(X1, 1, A2),
- arg(X1, 2, A3),
- VTYPE(X1, main).
89GMT II - Rule induction (argument switching)
- NP1nom tut NP2dat leid. NP2 is sorry about
NP1. - PRED(X1, leidtun),
- SUBJ(X1, A2),
- OBJ-TH(X1, A3),
- VTYPE(X1, main)
- gt
- PRED(X1,be),
- SUBJ(X1,A3),
- XCOMP-PRED(X1,Z1),
- PRED(Z1, sorry),
- OBL(Z1,Z2),
- PRED(Z2,about),
- OBJ(Z2,A2),
- VTYPE(X1,copular).
90GMT II - Rule induction (head switching)
- Ich versuche nur, mich jeder Demagogie zu
enthalten. It is just that I am trying not to
indulge in demagoguery. - NP1nom Vfin nur. It is ist just that NP1 Vs.
- ADJUNCT(X1,Z2), in_set(X3,Z2),
PRED(X3,nur), ADV-TYPE(X3,unspec) - gt
- PRED(Z4,be), SUBJ(Z4,X3), NTYPE(X3,Z5),
NSYN(Z5,pronoun), GEND-SEM(Z5,nonhuman),
HUMAN(Z5,-), NUM(Z5,sg), PERS(Z5,3),
PRON-FORM(Z5,it), PRON-TYPE(Z5,expl_),
arg(Z4,1,Z6), PRED(Z6, just), SUBJ(Z6,Z7),
arg(Z6,1,A1), COMP-FORM(A1,that),
COMP(Z6,A1), nonarg(Z6,1,Z7),
ATYPE(Z6,predicative), DEGREE(Z6, positive),
nonarg(Z4,1,X3), TNS-ASP(Z4,Z8),
MOOD(Z8,indicative), TENSE(Z8, pres),
XCOMP-PRED(Z4,Z6), CLAUSE-TYPE(Z4,decl),
PASSIVE(Z4,-), VTYPE(A2,copular).
91GMT II - Rule induction (more on head switching)
- In addition to rewriting terms, system
re-attaches rewritten FS if necessary. Here, this
might be the case of X1. - ADJUNCT(X1,Z2), in_set(X3,Z2),
PRED(X3,nur), ADV-TYPE(X3,unspec) - gt
- PRED(Z4,be), SUBJ(Z4,X3), NTYPE(X3,Z5),
NSYN(Z5,pronoun), GEND-SEM(Z5,nonhuman),
HUMAN(Z5,-), NUM(Z5,sg), PERS(Z5,3),
PRON-FORM(Z5,it), PRON-TYPE(Z5,expl_),
arg(Z4,1,Z6), PRED(Z6, just), SUBJ(Z6,Z7),
arg(Z6,1,A1), COMP-FORM(A1,that),
COMP(Z6,A1), nonarg(Z6,1,Z7),
ATYPE(Z6,predicative), DEGREE(Z6, positive),
nonarg(Z4,1,X3), TNS-ASP(Z4,Z8),
MOOD(Z8,indicative), TENSE(Z8, pres),
XCOMP-PRED(Z4,Z6), CLAUSE-TYPE(Z4,decl),
PASSIVE(Z4,-), VTYPE(A2,copular).
92GMT II - Pros and cons of rule induction from a
phrase dictionary
- Development of phrase pairs can be carried out by
someone with little knowledge of grammar and
transfer system manual development of transfer
rules would require experts (for boring,
repetitive labor). - Phrase pairs can remain stable while grammars
keep evolving. Since transfer rules are induced
fully automatically, they can easily be kept in
sync with grammars. - Induced rules are of much higher quality than
rules induced from parsed bitexts (GMT I). - Although there is hope that phrase pairs can be
constructed semi-automatically from bilingual
dictionaries, it is not yet clear to what extent
this can be automated. - If rule induction from parsed bitexts can be
improved, the two approaches might well be
complementary.
93Lessons Learned for Parallel Grammar Development
- Absence of a feature like PERF/- is not
equivalent to PERF-. - FS-internal features should not say anything
about the function of the FS - Example PRON-TYPEposs instead of
PRON- TYPEpers - Compounds should be analyzed similarly, whether
spelt together (de) or apart (en) - Possible with SMOR
- Very hard or even impossible with DMOR
94Absence of PERF ? PERF-
95No function info in FS-internal features
- I think NP1 Vs. In my opinion NP1 Vs.
96Parallel analysis of compounds
97More Lessons Learned for Parallel Grammar
Development
- ParGram needs to agree on a parallel PRED value
for (personal) pronouns - We need an interlingua for numbers, clock
times, dates etc. - Guessers should analyze (composite) names
similarly
98Parallel PRED values for (personal) pronouns
- Otherwise the number of rules we have to learn
for them explodes. - de-en pro/er ? he, pro/er ? it, pro/sie ? she,
pro/sie ? it, pro/es ? it, pro/es ? he, pro/es ?
she - Also PRED-NUM-PERS combination may make no
sense!!! Result A lot of generator effort for
nothing - en-de he ? pro/er, she ? pro/sie, it ? pro/es,
it ? pro/er, it ? pro/sie,
99Interlingua for numbers, clock times, dates, etc.
- We cannot possibly learn transfer rules for all
dates.
100Guessed (composite) names
We cannot possibly learn transfer rules for all
proper names in this world.
101And Yet More Lessons Learned for Grammar
Development
- Reflexive pronouns - PERS and NUM agreement
should be insured via inside-out function
application, e.g. ((SUBJ ) PERS) (PERS). - Semantically relevant features should not be
hidden in CHECK
102Reflexive pronouns
- Introduce their own values for PERS and NUM
- Overgeneration Ich wasche sich.
- NUM ambiguity for (frequent) sich
- Less generalization possible in transfer rules
for inherently reflexive verbs - 6 rules
necessary instead of 1.
103Reflexive pronouns
104Semantically relevant features in CHECK
- sie they
- Sie you (formal)
- Since CHECK features are not used for
translations, the distinction between sie and
Sie is lost.
105Planned experiments - Motivation
- We do not have the resources to develop a
general purpose phrase dictionary in the short
or medium term. - Nevertheless, we want to get an idea about how
well our new approach may scale.
106Planned Experiments 1
- Manually develop phrase dictionary for a few
hundred Europarl sentences - Train target FS ranking model and realization
ranking model on those sentences - Evaluate output in terms of BLEU, NIST and
manually - Can we make this new idea work under ideal
conditions? It seems we can.
107Planned Experiments 2
- Manually develop phrase dictionary for a few
hundred Europarl sentences - Use bilingual dictionary to add possible phrase
pairs that may distract the system - Train target FS ranking model and realization
ranking model on those sentences - Evaluate output in terms of BLEU, NIST and
manually - How well can our system deal with the
distractors?
108Planned Experiments 3
- Manually develop phrase dictionary for a few
hundred Europarl sentences - Use bilingual dictionary to add possible phrase
pairs that may distract the system - Degrade the phrase dictionary at various levels
of severity - Take out a certain percentage of phrase pairs
- Shorter phrases may be penalized less than longer
ones - Train target FS ranking model and realization
ranking model on those sentences - Evaluate output in terms of BLEU, NIST and
manually - How good or bad is the output of the system when
the bilingual phrase dictionary lacks coverage?
109Main Remaining Challenges
- Get comprehensive and high-quality dictionary of
phrase pairs - Get more and better (i.e. more normalized and
parallel) analyses from grammars - Improve ranking models, in particular on source
side - Improve generation behavior of grammars - So far,
grammar development has mostly been
parsing-oriented. - Efficiency, in particular on the generation side,
i.a. packed transfer and generation
110(No Transcript)