Title: Improving Translation Selection using Conceptual Vectors
1Improving Translation Selection using Conceptual
Vectors
- LIM Lian Tze
- Computer Aided Translation Unit
- School of Computer Sciences
- Universiti Sains Malaysia
2Presentation Overview
- Problem Background Motivation
- Research Objectives
- Methodology
- Advantages Contributions
3Presentation Overview
- Problem Background Motivation
- Research Objectives
- Methodology
- Advantages Contributions
4Natural Language is Ambiguous
bank
?
?
5Word Sense Disambiguation
. bank1 a financial institution that accepts
deposits and channels the money into lending
activities bank2 sloping land (especially the
slope beside a body of water) .
- Given
- a list of meanings/senses of words (dictionaries)
- input text containing occurrences of ambiguous
words - Assign the correct sense to particular instance
of ambiguous word in context - A.k.a. sense-tagging
bank1
withdraw money from the bank...
6Disambiguation in Machine Translation (1)
(Malay translations) bank tebing
. bank1 a financial institution that accepts
deposits and channels the money into lending
activities bank2 sloping land (especially the
slope beside a bodyof water) .
English input
withdraw money from the bank...
sense-tag(WSD)
withdraw money from the bank1...
select translation word
That worked well
Malay output
mengeluarkan wang dari bank...
7Disambiguation in Machine Translation (2)
(Malay translations) edaran (money)
penyebaran (berita)
. circulation6 the spread or transmission of
something(as news or money) to a wider group or
area .
English input
50 ringgit notes in circulation...
sense-tag(WSD)
50 ringgit notes in circulation6...
translate
That DIDNT work well
Malay output
duit kertas 50 ringgit dalam edaran??
penyebaran?...
8Optimising WSD for MT
select
select
(Lee and Kim 2002)
Input word
Sense number
Translation word
select
9Presentation Overview
- Problem Background Motivation
- Research Objectives
- Methodology
- Advantages Contributions
10Main Objective
- Existing MT system
- Selects fragments (translation units) from
previously translated examples - Re-combines selected translation units to produce
translation output for new input text - Improve the translation quality of this MT system
by adapting a WSD algorithm specifically for MT
purposes
.
11Need semantic knowledge about
- Word senses
- Use dictionary definitions
- Pairs of translation words
- From bilingual knowledge bank (BKB) made up of
pairs of sentences that are translations of each
other - Corresponding words in each translation sentence
pair are explicitly marked - Need a model to capture semantic knowledge of
lexical items - Conceptual Vectors (Lafourcade 2001)
- Using a selection of concepts or themes
- Construct mathematical vectors from concepts
- Thematic similarity between lexical items angle
between CVs
12Need to
- Compile CVs for word meanings on 2 levels
- Word sense (from dictionary)
- Word/phrase translation unit (from BKB) using
data compiled from previous step - Use compiled information during translation
runtime to select correct translation units
13Presentation Overview
- Problem Background Motivation
- Research Objectives
- Methodology
- Advantages and Contributions
14Brief Outline
Input Text
Dictionary / Lexicon Word senses
tag
clues
Concept Category Labels
matching, comparison, selection
BKB
Translation Unit Profile(word ? translation
level knowledge)
Examples
Translationunits
selected translation units
Translated Text
Data Preparation Phase
EBMT Run-time Phase
15Concept Hierarchy Example GoiTaikei
person
organisation
agent
facility
place
region
concrete
nature
object
animate
inanimate
mental state
noun
abstract thing
action
human activity
phenomenon
abstract
event
natural phenomenon
existence
categorisation system
relation
characteristic
relation
state
form
numerical
location
time
16Definition CVs for Word Senses
circulation6 the spread or transmission of
something (such as news or money) to a wider
group or area
TRANSMISSION_ OF_INFORMATION
MONEY
SPREAD_MOVEMENT
INFORMATION
Activationlevel
concepts
Activationlevel
concepts
17Sense-taggingTranslation Examples (English)
bilangann syilingn seringgitn
dalamprep edarann.
M
numbern ofprep onenum_card ringgitn
coinsn inprep circulationn.
E
numbern2 ofprep onenum_card1
ringgitn1 coinsn1 inprep circulationn6.
18CVs of Translation Pairs
s
Vprofile (s)
Vcontext (s)
Vlex_def (s)
circulationperedaran (2299, 2306, 2309)
?
2299 The circulation5 of air through the pipesPeredaran udara melalui paip-paip
Vcontext ( s, 2299)
Vlex_def ( s, 2299)
?
?
BKB Examples
2306 one ringgit coins in circulation6. syiling seringgit dalam peredaran.
Vcontext ( s, 2306)
?
Vlex_def ( s, 2306) Vlex_def ( s, 2309)
2309 dollar note withdrawn from circulation6.Wang kertas ditarik daripada peredaran.
Vcontext ( s, 2309)
19During Translation
Input Text
Dictionary / Lexicon Word senses
tag
clues
Concept Category Labels
matching, comparison, selection
BKB
Translation Unit Profile(word ? translation
level knowledge)
Examples
Translationunits
selected translation units
Translated Text
Data Preparation Phase
EBMT Run-time Phase
20Some Results
- Translating circulation to Malay
- edaran or penyebaran
- TS proposed translation selection using CVs
- BS baseline strategy, chooses
- the translation that co-occur with the same input
words (and same structure) as in the BKB - or the most frequently occuring translation
Input Translation chosen by TS Translation chosen by BS
We will stop the circulation of that magazine. ? edaran ? penyebaran
We will stop the circulation of that rumour. ? penyebaran ? penyebaran
We will stop the circulation of that newspaper. ? edaran ? penyebaran
21Presentation Overview
- Problem Background Motivation
- Research Objectives
- Methodology
- Advantages Contributions
22Advantages and Weaknesses
- Pros
- optimized for EBMT
- focus on translation selection, bypass
intermediate WSD at run time - Handles many-to-many mapping of source word ?
sense ? translation words - allows for bi-directional translation with
sense-tagging for 1 language - mathematical operations on vectors are easy to
implement - avoids combinatorial effect when multiple
ambiguous words in input - Cons
- not all ambiguities can be solved using
co-occurring concepts - does not handle translation selection of function
words - manual work required in data preparation
23Research Contributions
- Adaptation of a WSD approach for the specific aim
of translation selection - Proposal of specific guidelines for assigning
related concepts for word meanings from
dictionaries - Production of knowledge about word meanings on
two levels - Word senses as in dictionaries
- Translations as in parallel text
24Summary
- WSD can be customized for different NLP
applications accordingly - Different requirements
- Increase efficiency
- WSD and related tasks based on concepts common to
co-occurring word senses can be facilitated using
conceptual vector model - Requires a concept category hierarchy and word
sense list - Concepts related to a word sense modelled as
mathematical vector - Conceptual similarity angular distance between
vectors - Future work
- Automating data preparation tasks
- Investigating suitable weights or normalizing
factors during CV manipulation - Integration with other WSD or translation
selection strategies
25Future Work
- Automate tagging tasks that are currently done
manually - Investigate different weight values for CVs for
different syntactic relations or word classes - Integrate with other WSD/translation selection
tasks
26Thank You