AMTEXT: Extractionbased MT for Arabic - PowerPoint PPT Presentation

1 / 8
About This Presentation
Title:

AMTEXT: Extractionbased MT for Arabic

Description:

Laura Kieras, Peter Jansen. Informant: Loubna El Abadi. Sep 21, ... Sharon, meluve b-sar ha-xuc shalom, yipagesh im bush hayom. Sharon will meet with Bush today ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 9
Provided by: AlonL
Category:

less

Transcript and Presenter's Notes

Title: AMTEXT: Extractionbased MT for Arabic


1
AMTEXTExtraction-based MT for Arabic
  • Faculty
  • Alon Lavie, Jaime Carbonell
  • Students and Staff
  • Laura Kieras, Peter Jansen
  • Informant
  • Loubna El Abadi

2
Goals and Approach
  • Analysts often are looking for limited concrete
    information within the text ? full MT may not be
    necessary
  • Alternative rather than full MT followed by
    extraction, first extract and then translate only
    extracted information
  • AMTEXT approach
  • learn extraction patterns and their translations
    from small amounts of human translated and
    aligned data
  • Combine with broad coverage Named-Entity
    translation lexicons
  • System output translation of extracted
    information a structured representation

3
AMTEXT Extraction-based MT
Word-aligned elicited data
Source Text
Learning Module
Run Time Extract Transfer System
Transfer Rules
Filled Template
Partial Parser Transfer Engine
SS NE-P pagash et NE-P TE -gt NE-P met with
NE-P TE((X1Y1) (X4Y4) (X5Y5))
Extracted Target Text
Post-processor Extractor
NE Translation Lexicon
Word Translation Lexicon
4
Elicitation Example
5
Partial Parsing
  • Input Full text in the foreign language
  • Output Translation of extracted/matched text
  • Goal Extract by effectively matching transfer
    rules with the full text
  • Identify/parse NEs and words in restricted
    vocabulary
  • Identify transfer-rule (source-side) patterns
  • Handle expected high-levels of ambiguity

Sharon, meluve b-sar ha-xuc shalom, yipagesh im
bush hayom
NE-P
NE-P
NE-P
TE
Sharon will meet with Bush today
6
Proof-of-Concept System
  • funded by small year-0 ITIC/REFLEX
  • Arabic-to-English
  • Newswire text (available from TIDES)
  • Limited set of actions (X meet Y)
  • Limited translation patterns
  • ltPerson-NEgt ltmeet-verbgt ltPerson-NEgt ltLOCgt ltTEgt
  • Limited vocabulary and NE lexicon

7
Demonstration
  • http//www-2.cs.cmu.edu/afs/cs/user/alavie/Avenue/
    tmp/demo20sep/met.dev.htm

8
Integration Technical Issues
  • Components
  • Converter of Arabic to Darwish representation
    and pre-processor (scripts)
  • Transfer Engine (C/C)
  • Post-processor extractor (perl scripts)
  • Input Arabic text in UTF8
  • Output formatted html page
Write a Comment
User Comments (0)
About PowerShow.com