Title: Forest Driven Rulebased Spoken Language Translation
1Forest Driven Rule-based Spoken Language
TranslationKAMATANI Satoshi, FURIHATA Kentaro
and CHINO TetsuroCorporate Research
Development CenterToshiba Corporation1, Komukai
Toshiba-cho, Saiwai-ku, Kawasaki, 212-8583,
Japane-mail satoshi.kamatani_at_toshiba.co.jp
CJNLP 2006The 6th China-Japan Natural Language
Processing Joint Research Promotion Conference
- 1. Introduction
- Towards realization of Spoken language
translation (SLT) - Major machine translation methods
- Rule based machine translation (RBMT)
- Example based machine translation (EBMT)
- Stochastic machine translation (SMT)
- SLT based on RBMT
- Market-proven translation quality
- Comparatively easy to respond to a request for
modifications - Beneficial use of Translation rules accumulated
many years
2.2.4 Forest dependency analysis
- Co-occurrence Probability Model
- learned by EM-algorithm based method proposed by
Torisawa et.al 2001 - h Head word ( modified word )
- m Modifier
- rel Relation between the head word and the
modifier - C Set of hidden co-occurrence classes
- Co-occurrence Preference
-
- INT(x) a function to round down to integer
- INT ( average of whole co-occurrence
probabilities ) 12 - EPS
Corpus
- (???,?,????) 2
- (?????,?, ????) 1
- (?????,?,??) 3
Learning
Co-occurrence model
Co-occurrence Preference
Round down
Co-occurrence preference mj modifies mp
1.0
confliction with clause dependency
condition -0.8
But it is necessary to analyze robustly and
bridge between spoken and written language.
- 2. Forest Driven Spoken Language Translation
- 2.1 Overview
- It realizes a fair evaluation for all the
possible candidates of interpretation without
unreasonable suppositions - All of syntactic/semantic preferences are
evaluated by one integrated mechanism on the
forest structures.
minami_B
cm_ga
Source Language Input
The enumeration of all possible interpretation
candidates.
(a)
Robust GLR Parsing
mi
mj
mp
mo
mq
mk
syntax forest
minami_B
minami_C
cm_ga
The interpretation of the spoken language
specific structure and a transfer to equivalent
but simple structure.
Pattern 0.poscase maker 0.surface?
-1.posnoun Clause type cm_ga Activate in
modifier Dependency co_occured with other
cm_ga -0.9 co_occured with ad_ha
-0.5
(b)
Syntactic Pruning
(c)
Partial forest transfer
(d)
Forest driven dependency analysis
The estimation of the optimum interpretation.
optimum dependency structure
Pattern 0.posconjunctive maker
0.surface?? -1.posverb or
-1.posauxiliary verb Clause type minami_B
(classified into class B by Minami) Active in
modified Dependency co_occured with other
cm_ga -0.8 co_occured with minami_A
-1.0
Practical use of rich translation knowledge.
(e)
Transfer Generation
Target Language Output
6
4
4
3
2
1
- 2.2 Detail of each step
- 2.2.1 Robust GLR Parsing
- Robust CFG
- Designed for fragmental inputs
- Capture clause structure
- Introduce ill-formednes marker weak
- The GLR parser leads all possible interpretations
as a syntax forest.
????
????
????
??????
?
I
the
whose
is
book
restaurant
soup
good
a
books
restaurant
Soup
good
- 3. Evaluation
- 3.1 Conditions of Experiment
- We developed a prototype system
- Evaluation target
- Japanese to English speech translation ( Travel
domain, Open data ) - 702 utterances ( 13.4 letters, 7.2 morphemes in
average ) - 3.2 Parsing Accuracy
- All of the test utterances are accepted by our
grammar. - Reduce the number of parsing Reduce
ambiguities - Improve the ratio of correct dependency (CDR) in
the forest - T-test assessed a significant improvement with
difference 0.05.
Example
- 2.2.2 Syntactic Pruning
- Evaluate syntactic well-formedness and prune ill
structures - Estimate ill-formedness for all interpretations
in the forest. - Assign a penalty to weak marked vertices.
- Aggregate the possible minimum penalty by
bottom up manner. - Extract syntactically preferred part.
- Select vertices with the lowest penalty
stepwise top down manner. - Extract a well-formed structure consists of
only preferred vertices.
F a syntax forest t a syntax tree in F NF
number of t?F Dc a number of correct
dependency Dm number of dependencies in t
matched to the correct one.
- 3.3 Translation Quality
- Visual evaluation by 3 persons, majority
decision - Evaluate whether the translation conveys
intention of the original utterance or not. - Evaluate two systems
- Forest Driven SLT system
- A commercial MT Engine for written language.
- 2.2.3 Partial Forest Transfer
- Transfer transfers a tree to a tree.
- It is widely and successfully utilized in
conventional MT. - It converts the internal expressions based on
the transfer rules. - But, It can treat only one tree structure.
- Partial Forest Transfer transfers a forest to
a forest directly. - It bridges the gap between spoken and written
languages. - It reduces ambiguity in parses.
They share the same set of transfer rules!
- 4. Conclusion
- Forest driven spoken language translation system
- Robust and efficient processing for Japanese
utterances. - Beneficial use of the conventional rule base
machine translator. - Future works
- Utilization of machine learning for extracting
clause rules. - Evaluation of other language pair ( eg. Japanese
to Chinese etc)