Forest Driven Rulebased Spoken Language Translation - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Forest Driven Rulebased Spoken Language Translation

Description:

Towards realization of Spoken language translation (SLT) Major machine translation methods ... of the conventional rule base machine translator. Future works ... – PowerPoint PPT presentation

Number of Views:40
Avg rating:3.0/5.0
Slides: 2
Provided by: bcmiSj
Category:

less

Transcript and Presenter's Notes

Title: Forest Driven Rulebased Spoken Language Translation


1
Forest Driven Rule-based Spoken Language
TranslationKAMATANI Satoshi, FURIHATA Kentaro
and CHINO TetsuroCorporate Research
Development CenterToshiba Corporation1, Komukai
Toshiba-cho, Saiwai-ku, Kawasaki, 212-8583,
Japane-mail satoshi.kamatani_at_toshiba.co.jp
CJNLP 2006The 6th China-Japan Natural Language
Processing Joint Research Promotion Conference
  • 1. Introduction
  • Towards realization of Spoken language
    translation (SLT)
  • Major machine translation methods
  • Rule based machine translation (RBMT)
  • Example based machine translation (EBMT)
  • Stochastic machine translation (SMT)
  • SLT based on RBMT
  • Market-proven translation quality
  • Comparatively easy to respond to a request for
    modifications
  • Beneficial use of Translation rules accumulated
    many years

2.2.4 Forest dependency analysis
  • Co-occurrence Probability Model
  • learned by EM-algorithm based method proposed by
    Torisawa et.al 2001
  • h Head word ( modified word )
  • m Modifier
  • rel Relation between the head word and the
    modifier
  • C Set of hidden co-occurrence classes
  • Co-occurrence Preference
  • INT(x) a function to round down to integer
  • INT ( average of whole co-occurrence
    probabilities ) 12
  • EPS

Corpus
  • (???,?,????) 2
  • (?????,?, ????) 1
  • (?????,?,??) 3

Learning
Co-occurrence model
Co-occurrence Preference
Round down
Co-occurrence preference mj modifies mp
1.0
confliction with clause dependency
condition -0.8
But it is necessary to analyze robustly and
bridge between spoken and written language.
  • 2. Forest Driven Spoken Language Translation
  • 2.1 Overview
  • It realizes a fair evaluation for all the
    possible candidates of interpretation without
    unreasonable suppositions
  • All of syntactic/semantic preferences are
    evaluated by one integrated mechanism on the
    forest structures.

minami_B
cm_ga
Source Language Input
The enumeration of all possible interpretation
candidates.
(a)
Robust GLR Parsing
mi
mj
mp
mo
mq
mk
syntax forest
minami_B
minami_C
cm_ga
The interpretation of the spoken language
specific structure and a transfer to equivalent
but simple structure.
Pattern 0.poscase maker 0.surface?
-1.posnoun Clause type cm_ga Activate in
modifier Dependency co_occured with other
cm_ga -0.9 co_occured with ad_ha
-0.5
(b)
Syntactic Pruning
(c)
Partial forest transfer
(d)
Forest driven dependency analysis
The estimation of the optimum interpretation.
optimum dependency structure
Pattern 0.posconjunctive maker
0.surface?? -1.posverb or
-1.posauxiliary verb Clause type minami_B
(classified into class B by Minami) Active in
modified Dependency co_occured with other
cm_ga -0.8 co_occured with minami_A
-1.0
Practical use of rich translation knowledge.
(e)
Transfer Generation
Target Language Output
6
4
4
3
2
1
  • 2.2 Detail of each step
  • 2.2.1 Robust GLR Parsing
  • Robust CFG
  • Designed for fragmental inputs
  • Capture clause structure
  • Introduce ill-formednes marker weak
  • The GLR parser leads all possible interpretations
    as a syntax forest.

????
????
????
??????
?
I
the
whose
is
book
restaurant
soup
good

a
books
restaurant
Soup
good
  • 3. Evaluation
  • 3.1 Conditions of Experiment
  • We developed a prototype system
  • Evaluation target
  • Japanese to English speech translation ( Travel
    domain, Open data )
  • 702 utterances ( 13.4 letters, 7.2 morphemes in
    average )
  • 3.2 Parsing Accuracy
  • All of the test utterances are accepted by our
    grammar.
  • Reduce the number of parsing Reduce
    ambiguities
  • Improve the ratio of correct dependency (CDR) in
    the forest
  • T-test assessed a significant improvement with
    difference 0.05.

Example
  • 2.2.2 Syntactic Pruning
  • Evaluate syntactic well-formedness and prune ill
    structures
  • Estimate ill-formedness for all interpretations
    in the forest.
  • Assign a penalty to weak marked vertices.
  • Aggregate the possible minimum penalty by
    bottom up manner.
  • Extract syntactically preferred part.
  • Select vertices with the lowest penalty
    stepwise top down manner.
  • Extract a well-formed structure consists of
    only preferred vertices.

F a syntax forest t a syntax tree in F NF
number of t?F Dc a number of correct
dependency Dm number of dependencies in t
matched to the correct one.
  • 3.3 Translation Quality
  • Visual evaluation by 3 persons, majority
    decision
  • Evaluate whether the translation conveys
    intention of the original utterance or not.
  • Evaluate two systems
  • Forest Driven SLT system
  • A commercial MT Engine for written language.
  • 2.2.3 Partial Forest Transfer
  • Transfer transfers a tree to a tree.
  • It is widely and successfully utilized in
    conventional MT.
  • It converts the internal expressions based on
    the transfer rules.
  • But, It can treat only one tree structure.
  • Partial Forest Transfer transfers a forest to
    a forest directly.
  • It bridges the gap between spoken and written
    languages.
  • It reduces ambiguity in parses.

They share the same set of transfer rules!
  • 4. Conclusion
  • Forest driven spoken language translation system
  • Robust and efficient processing for Japanese
    utterances.
  • Beneficial use of the conventional rule base
    machine translator.
  • Future works
  • Utilization of machine learning for extracting
    clause rules.
  • Evaluation of other language pair ( eg. Japanese
    to Chinese etc)
Write a Comment
User Comments (0)
About PowerShow.com