Named Entities in Domain Unlimited Speech Translation - PowerPoint PPT Presentation

About This Presentation
Title:

Named Entities in Domain Unlimited Speech Translation

Description:

Identify Relevant Names from Translation Output. IR of Relevant Texts in Target Language ... (Chinese) 7.82. 6.57. baseline. 7.87. 6.61 Offline NE. 7.98. 6.87 ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 8
Provided by: AlonL
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Named Entities in Domain Unlimited Speech Translation


1
Named Entitiesin Domain Unlimited Speech
Translation
  • Alex Waibel, Stephan Vogel, Tanja Schultz
  • Carnegie Mellon UniversityInteractive Systems
    Labs

2
Objective
  • Extraction and Translation of Arabic Named
    Entities from Speech
  • Problem
  • How do we do Domain-Unlimited Speech Translation?
  • What to do with Named Entities in Speech?
  • Named Entities are Typically OoVs
  • ? Recognizer will Replace it with a WRONG
    Word
  • ? Named Entity is Unlikely to be Handled
    Right
  • Translation of Named Entities ? Named Entities
    Frequently not in Lexicon

3
Approach Speech Translation
  • Piggy-Back on STR-DUST (NSF-ITR Project)
  • Speech Translation on Domain Unlimited Speech
    Tasks
  • Approach
  • Recognition Statistical Speech Recognition
  • Consolidation Statistical Reduction and
    Extraction
  • Translation Statistical MT
  • Opportunity
  • Cascade of Statistical Source-Channel Models
  • Integration and Optimization
  • Combine and Compute Joint Models
  • Working with Errors Lattices to Communicate
    between Modules

4
Approach Named Entities
  • Two Pass Decoding Strategy
  • OoVs in Speech
  • Recover Named Entity in Dictionary
  • Identify Relevant Names from Very Large Name
    Lists
  • Search for Relevant New Names on Internet
  • Insert Named Entities in Dictionary, Iterate
  • New Word Model
  • Model Unseen Words by New-Word-Model
  • Assign Named Entity Tag to New-Word
  • Bi-Lingual Named Entity Tagging
  • Recover Named Entity
  • Identify Relevant Names from Translation Output
  • IR of Relevant Texts in Target Language
  • Use Transliteration Model to Update Lexicon

5
Input/Output
  • Input
  • Speech in source language (Arabic)
  • Text in source language (Arabic)
  • Output
  • English translation of transcript
  • English translation of extracted entities

Reco
??????? ?????? ????? ?? ???? ???????? ??????
??????? ?????? ??????? ?? ??????? ??????? ?????
?? ???? 23 ???? ?????? 300 ?????. ???? ??????
?????? ???? ?? ??????? ???????? ??????? ????????
?? ???? ????? ??????.
NESearch and Translation
Name Abu HafzOrgnz al-Qaida Location
Baghdad
6
Evaluation
  • Correct Named Entity Detection
  • Word Correct from Arabic Speech
  • NE-Tag Correct from Arabic Transcript
  • Correct Translation
  • Of Output Text (NIST, Bleu)
  • Of Output Named Entity

7
First Results NE Translation(Chinese)
Test data 887 sentences Small track NIST score Large track NIST score
baseline 6.57 7.82
Offline NE 6.61 7.87
Online NE 6.87 7.98
  1. Online NE translation gives improvements for both
    tracks
  2. Online NE translation works better on uncommon NE
    translation, and gives more improvement
Write a Comment
User Comments (0)
About PowerShow.com