Improving Translation Selection using Conceptual Vectors - PowerPoint PPT Presentation

About This Presentation
Title:

Improving Translation Selection using Conceptual Vectors

Description:

Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 23
Provided by: LimLi4
Category:

less

Transcript and Presenter's Notes

Title: Improving Translation Selection using Conceptual Vectors


1
Improving Translation Selection using Conceptual
Vectors
  • LIM Lian Tze
  • Computer Aided Translation Unit
  • School of Computer Sciences
  • Universiti Sains Malaysia

2
Presentation Overview
  • Problem Background Motivation
  • Research Objectives
  • Methodology
  • Advantages Contributions

3
Presentation Overview
  • Problem Background Motivation
  • Research Objectives
  • Methodology
  • Advantages Contributions

4
Natural Language is Ambiguous
bank
?
?
5
Word Sense Disambiguation
. bank1 a financial institution that accepts
deposits and channels the money into lending
activities bank2 sloping land (especially the
slope beside a body of water) .
  • Given
  • a list of meanings/senses of words (dictionaries)
  • input text containing occurrences of ambiguous
    words
  • Assign the correct sense to particular instance
    of ambiguous word in context
  • A.k.a. sense-tagging

bank1
withdraw money from the bank...
6
Disambiguation in Machine Translation (1)
(Malay translations) bank tebing
. bank1 a financial institution that accepts
deposits and channels the money into lending
activities bank2 sloping land (especially the
slope beside a bodyof water) .
English input
withdraw money from the bank...
sense-tag(WSD)
withdraw money from the bank1...
select translation word
That worked well
Malay output
mengeluarkan wang dari bank...
7
Disambiguation in Machine Translation (2)
(Malay translations) edaran (money)
penyebaran (berita)
. circulation6 the spread or transmission of
something(as news or money) to a wider group or
area .
English input
50 ringgit notes in circulation...
sense-tag(WSD)
50 ringgit notes in circulation6...
translate
That DIDNT work well
Malay output
duit kertas 50 ringgit dalam edaran??
penyebaran?...
8
Optimising WSD for MT
select
select
(Lee and Kim 2002)
Input word
Sense number
Translation word
select
9
Presentation Overview
  • Problem Background Motivation
  • Research Objectives
  • Methodology
  • Advantages Contributions

10
Main Objective
  • Existing MT system
  • Selects fragments (translation units) from
    previously translated examples
  • Re-combines selected translation units to produce
    translation output for new input text
  • Improve the translation quality of this MT system
    by adapting a WSD algorithm specifically for MT
    purposes

.
11
Need semantic knowledge about
  • Word senses
  • Use dictionary definitions
  • Pairs of translation words
  • From bilingual knowledge bank (BKB) made up of
    pairs of sentences that are translations of each
    other
  • Corresponding words in each translation sentence
    pair are explicitly marked
  • Need a model to capture semantic knowledge of
    lexical items
  • Conceptual Vectors (Lafourcade 2001)
  • Using a selection of concepts or themes
  • Construct mathematical vectors from concepts
  • Thematic similarity between lexical items angle
    between CVs

12
Need to
  • Compile CVs for word meanings on 2 levels
  • Word sense (from dictionary)
  • Word/phrase translation unit (from BKB) using
    data compiled from previous step
  • Use compiled information during translation
    runtime to select correct translation units

13
Presentation Overview
  • Problem Background Motivation
  • Research Objectives
  • Methodology
  • Advantages and Contributions

14
Brief Outline
Input Text
Dictionary / Lexicon Word senses
tag
clues
Concept Category Labels
matching, comparison, selection
BKB
Translation Unit Profile(word ? translation
level knowledge)
Examples
Translationunits
selected translation units
Translated Text
Data Preparation Phase
EBMT Run-time Phase
15
Concept Hierarchy Example GoiTaikei
person
organisation
agent
facility
place
region
concrete
nature
object
animate
inanimate
mental state
noun
abstract thing
action
human activity
phenomenon
abstract
event
natural phenomenon
existence
categorisation system
relation
characteristic
relation
state
form
numerical
location
time
16
Definition CVs for Word Senses
circulation6 the spread or transmission of
something (such as news or money) to a wider
group or area
TRANSMISSION_ OF_INFORMATION
MONEY
SPREAD_MOVEMENT
INFORMATION
Activationlevel
concepts
Activationlevel
concepts
17
Sense-taggingTranslation Examples (English)
bilangann syilingn seringgitn
dalamprep edarann.
M
numbern ofprep onenum_card ringgitn
coinsn inprep circulationn.
E
numbern2 ofprep onenum_card1
ringgitn1 coinsn1 inprep circulationn6.
18
CVs of Translation Pairs
s
Vprofile (s)
Vcontext (s)
Vlex_def (s)
circulationperedaran (2299, 2306, 2309)
?



2299 The circulation5 of air through the pipesPeredaran udara melalui paip-paip
Vcontext ( s, 2299)
Vlex_def ( s, 2299)
?
?
BKB Examples
2306 one ringgit coins in circulation6. syiling seringgit dalam peredaran.
Vcontext ( s, 2306)
?
Vlex_def ( s, 2306) Vlex_def ( s, 2309)
2309 dollar note withdrawn from circulation6.Wang kertas ditarik daripada peredaran.
Vcontext ( s, 2309)
19
During Translation
Input Text
Dictionary / Lexicon Word senses
tag
clues
Concept Category Labels
matching, comparison, selection
BKB
Translation Unit Profile(word ? translation
level knowledge)
Examples
Translationunits
selected translation units
Translated Text
Data Preparation Phase
EBMT Run-time Phase
20
Some Results
  • Translating circulation to Malay
  • edaran or penyebaran
  • TS proposed translation selection using CVs
  • BS baseline strategy, chooses
  • the translation that co-occur with the same input
    words (and same structure) as in the BKB
  • or the most frequently occuring translation

Input Translation chosen by TS Translation chosen by BS
We will stop the circulation of that magazine. ? edaran ? penyebaran
We will stop the circulation of that rumour. ? penyebaran ? penyebaran
We will stop the circulation of that newspaper. ? edaran ? penyebaran
21
Presentation Overview
  • Problem Background Motivation
  • Research Objectives
  • Methodology
  • Advantages Contributions

22
Advantages and Weaknesses
  • Pros
  • optimized for EBMT
  • focus on translation selection, bypass
    intermediate WSD at run time
  • Handles many-to-many mapping of source word ?
    sense ? translation words
  • allows for bi-directional translation with
    sense-tagging for 1 language
  • mathematical operations on vectors are easy to
    implement
  • avoids combinatorial effect when multiple
    ambiguous words in input
  • Cons
  • not all ambiguities can be solved using
    co-occurring concepts
  • does not handle translation selection of function
    words
  • manual work required in data preparation

23
Research Contributions
  • Adaptation of a WSD approach for the specific aim
    of translation selection
  • Proposal of specific guidelines for assigning
    related concepts for word meanings from
    dictionaries
  • Production of knowledge about word meanings on
    two levels
  • Word senses as in dictionaries
  • Translations as in parallel text

24
Summary
  • WSD can be customized for different NLP
    applications accordingly
  • Different requirements
  • Increase efficiency
  • WSD and related tasks based on concepts common to
    co-occurring word senses can be facilitated using
    conceptual vector model
  • Requires a concept category hierarchy and word
    sense list
  • Concepts related to a word sense modelled as
    mathematical vector
  • Conceptual similarity angular distance between
    vectors
  • Future work
  • Automating data preparation tasks
  • Investigating suitable weights or normalizing
    factors during CV manipulation
  • Integration with other WSD or translation
    selection strategies

25
Future Work
  • Automate tagging tasks that are currently done
    manually
  • Investigate different weight values for CVs for
    different syntactic relations or word classes
  • Integrate with other WSD/translation selection
    tasks

26
Thank You
Write a Comment
User Comments (0)
About PowerShow.com