AMTEXT: Extractionbased MT for Arabic - PowerPoint PPT Presentation

1 / 21

About This Presentation

Title:

AMTEXT: Extractionbased MT for Arabic

Description:

(1 5 'Arafat will meet with Peres on Monday' 3.2 'ErfAt yltqy byryz msAA ... (S,9 (PERSON,1 (PNAME,0 'Arafat') ) (MEET-V,5 'will meet with') (PERSON,1 (PNAME, ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 22

Provided by: AlonL

Category:

more less

Transcript and Presenter's Notes

Title: AMTEXT: Extractionbased MT for Arabic

1
AMTEXTExtraction-based MT for Arabic

Faculty
Alon Lavie, Jaime Carbonell
Students and Staff
Laura Kieras, Peter Jansen
Informant
Loubna El Abadi

2
Goals and Approach

Analysts often are looking for limited concrete
information within the text ? full MT may not be
necessary
Alternative rather than full MT followed by
extraction, first extract and then translate only
extracted information
But how do we extract just the relevant parts
in the source language?
AMTEXT approach
learn extraction patterns and their translations
from small amounts of human translated and
aligned data
Combine with broad coverage Named-Entity
translation lexicons
System output translation of extracted
information a structured representation

3
AMTEXT Extraction-based MT
Word-aligned elicited data
Source Text
Learning Module
Run Time Extract Transfer System
Transfer Rules
Filled Template
Partial Parser Transfer Engine
SS NE-P pagash et NE-P TE -gt NE-P met with
NE-P TE((X1Y1) (X4Y4) (X5Y5))
Extracted Target Text
Post-processor Extractor
NE Translation Lexicon
Word Translation Lexicon
4
Elicitation Example
5
Learning Extraction Translation Patterns

Elicited example
Sharon nifgash hayom im bush
Sharon met with Bush today
After Generalization
ltPERSONgt ltMEET-Vgt ltTEgt im ltPERSONgt
ltPERSONgt ltMEET-Vgt with ltPERSONgt ltTEgt
Resulting Learned Pattern Rule
SS PERSON MEET-V TE im PERSON -gt PERSON
MEET-V with PERSON TE
( (X1Y1)
(X2Y2)
(X3Y5)
(X5Y4))

6
Transfer Rule Formalism
SL the old man, TL ha-ish ha-zaqen NPNP
DET ADJ N -gt DET N DET ADJ ( (X1Y1) (X1Y3)
(X2Y4) (X3Y2) ((X1 AGR) 3-SING) ((X1 DEF
DEF) ((X3 AGR) 3-SING) ((X3 COUNT)
) ((Y1 DEF) DEF) ((Y3 DEF) DEF) ((Y2 AGR)
3-SING) ((Y2 GENDER) (Y4 GENDER)) )

Type information
Part-of-speech/constituent information
Alignments
x-side constraints
y-side constraints
xy-constraints,
e.g. ((Y1 AGR) (X1 AGR))

7
The Transfer Engine
8
Partial Parsing

Input Full text in the foreign language
Output Translation of extracted/matched text
Goal Extract by effectively matching transfer
rules with the full text
Identify/parse NEs and words in restricted
vocabulary
Identify transfer-rule (source-side) patterns
Transfer Engine produces a complete lattice of
transfer translations

Sharon, meluve b-sar ha-xuc shalom, yipagesh im
bush hayom
NE-P
NE-P
NE-P
TE
Sharon will meet with Bush today
9
Post Processing

Translation Selection Module
select most complete and coherent translation
from lattice based on scoring heuristics
Structure Extraction
Extract translated entities from the pattern and
display in a structured table format
Output Display
Perl scripts construct HTML page for displaying
complete translation results

10
Translation Selection Module Features

Goal Scoring function that can identify the most
likely best match
Lattice arc features from the transfer engine
matched range of source
matched parts of target
transfer score
partial parse

11
Lattice Example

Arafat to meet Peres in Brussels on Monday
ErfAt yltqy byryz msAA AlAvnyn fy brwksl
(1 1 "Arafat" 3 "ErfAt" "(PNAME,0 "Arafat")")
(2 2 "will meet with" 3 "yltqy" "(MEET-V,5 "will
meet with")")
(3 3 "Peres" 3 "byryz" "(PNAME,1 "Peres")")
(1 3 "Arafat will meet with Peres" 3 "ErfAt yltqy
byryz" "((S,11 (PERSON,1 (PNAM
E,0 "Arafat") ) (MEET-V,5 "will meet with")
(PERSON,1 (PNAME,1 "Peres") ) ) )")
(4 4 "msAA" 3 "msAA" "(UNK,0 "msAA")")
(5 5 "Monday" 3 "AlAvnyn" "(DAY,0 "Monday")")
(4 5 "on Monday" 2.9 "msAA AlAvnyn" "((TE,4
(LITERAL "on")(DAY,0 "Monday") ) )")
(1 5 "Arafat will meet with Peres on Monday" 3.2
"ErfAt yltqy byryz msAA AlAvnyn
" "((S,9 (PERSON,1 (PNAME,0 "Arafat") ) (MEET-V,5
"will meet with") (PERSON,1 (P
NAME,1 "Peres") ) (TE,4 (LITERAL "on")(DAY,0
"Monday") ) ) )")
(1 5 "Arafat will meet with Peres Monday" 3.1
"ErfAt yltqy byryz msAA AlAvnyn" "
((S,9 (PERSON,1 (PNAME,0 "Arafat") ) (MEET-V,5
"will meet with") (PERSON,1 (PNAM
E,1 "Peres") ) (TE,5 (DAY,0 "Monday") ) ) )")
(6 6 "fy" 3 "fy" "(UNK,2 "fy")")
(7 7 "Brussels" 3 "brwksl" "(PLACE,0
"Brussels")")

12
Example Extracting Features

1 5 ? Length (tokens)
of source segment (ar) (1)
"Arafat will meet with Peres Monday" ? length
of trans segment (2)
3.1
? transfer engine score (3)
"ErfAt yltqy byryz msAA AlAvnyn" ? length of
source segment (4)
1 2 3 4 5
"((S,9 (PERSON,1 (PNAME,0 "Arafat") ) (MEET-V,5
"will meet with") (PERSON,1 (PNAME,1 "Peres") )
(TE,5 (DAY,0 "Monday") ) ) )"
? Transfer structure - full frame (S) or
not? (5)
Secondary feature (6) relative lengths of (2)
over (4) the smaller, the more concise the
source language match (less extraneous material,
i.e. less chance of mistranslation).

13
Selecting Best Translation
For each parse Pj in the lattice, calculate a
score Sj based on features fi with weight
coefficients wi, as follows
Weights wi trained by hill climbing
(training set / manual reference parse)
14
Proof-of-Concept System

Arabic-to-English
Newswire text (available from TIDES)
Very limited set of actions (X meet Y)
Limited collection of translation patterns
ltPerson-NEgt ltmeet-verbgt ltPerson-NEgt ltLOCgt ltTEgt
Limited vocabulary and NE lexicon

15
System Development

Training corpus of 535 short sentences translated
and aligned by bilingual informant
258 simple meeting sentences
120 Temporal Expressions
105 Location Expressions
52 Title Expressions
Translation Lexicon of Names Entities (person
names, organizations and locations) converted
from Fei Huangs NE translation/transliteration
work
Pattern Generalizations semi-automatically
learned from the training data
Patterns manually enhanced with skipping
markers
Initial System integrated
Development with informant on 74 sentence dev data

16
Resulting System

Transfer Grammar contains
21 transfer pattern rules
12 Meet Verb rules
4/17/11/17 Person/TE/LOC/PTitle high-level
rules
Transfer Lexicon contains 3070 entries (mostly
names and locations)
Estimated development effort/time
20 hours with informant
50 hours of lexical and rule development

17
Evaluation

Development set of 74 sentences
Test set of 76 unseen sentences with meeting
information
Identified subset of each set on which meeting
patterns could potentially apply (Good)
53 development sentences
44 test sentences

18
Evaluation

Translation-based
Unigram token-based retrieval metrics precision
/ recall / F1
Entity-based
Recall for each role in the meeting frame (V, P1,
P2, LOC and TE)
Partial recall credit for partial matches
Partial credit (50) for P1/P2 role interchange

19
Evaluation Results
20
Demonstration

http//www-2.cs.cmu.edu/afs/cs/user/alavie/Avenue/
tmp/demo20sep/met.dev.htm

21
Conclusions

Attractive methodology for joint extraction
translation of Essential Elements of Information
from full foreign language texts
Rapid Development - circumvents need for
developing high-quality full MT or high-quality
IE technology for the foreign source language
Effective use of bilingual informants
Main Open Question Scalability
Can this methodology be effective with much
broader and more complex types of extracted EEIs?
Is automatic learning of generalized patterns
feasible and effective in such more complex
scenarios?
Can the selection heuristics effectively cope
with the vast amounts of ambiguity expected in a
large scale system?