Title: Machine Translation of Persian Complex Predicates
1Machine Translation of Persian Complex Predicates
Jan W. Amtrup Kofax Image Products Karine
Megerdoomian MITRE
2Complex Predicates
- Predicates composed of more than one grammatical
element but behaving as a simple predicate - Persian Verbal Predicates
- Consist of preverbal element(s) and a light verb
behaving as a single semantic unit -
???? ????? worry eat ? to worry ????? ?????
shame pull ? to be ashamed ???? ??? comb
hit ? to comb ????? ???? from hand give ? to
lose
3Direct Translation
Source Language
Target Language
Transfer
Interlingua
3
4Why word-for-word translation wont work
- Compositional meaning ??? ???? ???? ??? ???
????? hand of Mani start did pain catching
Manis hand started hurting - Ambiguity
- ????? ??????? ??? ???? ??????? ?????? ?????
??? - To saying Associated Press number unemployed
increase found is - According to the Associated Press, the number
of the unemployed has increased.
5Solutions for Machine Translation
- List each light verb in the lexicon as an atomic
unit
6Problems with the atomic approach
- Intervening elements
- ??????? ?????? ??????? ??? ??????? ?????? ??? ??
???? ???? - countries-ez islamic requester role increasing
United Nations in Iraq became - The Islamic countries requested an increasing
United Nations role in Iraq. - Internal modification
- ???? ??? ?????? ????? ????.
- price oil increase intense-Indef found
- The price of oil increased intensely.
7Solutions for Machine Translation
- List each light verb in the lexicon as an atomic
unit - Treat light verbs as a special case of
subcategorization
8Issues with subcategorization approach
- Productivity
- Even though the individual parts of a
construction are present in the lexicon, the
semi-productive creation of novel verbs is missed - ?????? ????? ?? ????? ????? ???? ???????
???? ???? ???? link give, parasite hit, filter
become, hack do, download do - (Lexicon size)
- Each entry has to be represented
8
9Solutions for Machine Translation
- List each light verb in the lexicon as an atomic
unit - Treat light verbs as a special case of
subcategorization - These are static approaches to lexicon
architecture
10Solutions for Machine Translation
- List each light verb in the lexicon as an atomic
unit - Treat light verbs as a special case of
subcategorization - These are static approaches to lexicon
architecture - How about
- Use a constructionist approach to lexicon?
11Constructionist Lexicon
- Dynamic view of word formation
- Surface words are not atomic units but have
internal structure - Meaning is composed by combining components of
words - This view is predominant in theoretical
linguistics, but also attracts attention in
computational paradigms (e.g. Fong et al. 2001,
Fujita et al. 2004)
12Some constructions
12
13Inchoative
Clothes
Clothes
dry
dried
became
Clothes
Becomeltgt
Becomeltdrygt
The clothes dried
?????? ??? ????
14Causative
The heat of the sun
Heat-ez
Heat of sun
sun
clothes
cause
dried
OM
the
clothes
dry
clothes
made
becomeltgt
becomeltdrygt
????? ????? ?????? ?? ??? ???
The heat of the sun dried the clothes
15Activity
Adj to New York
The plane
Plane
Plane
flew
to
New York
Act
To New York
flight
did
ltEventiveNoungt
flightltEventiveNoungt
The plane flew to New York
?? ??????? ????? ??? ???????
16Repetitive Activity
Hossein
Hossein
Hossein
combed
dog-his/her
OM
Actltrepetitivegt
his/her dog
comb
His/her dog
hit
With ltgt
With ltcombgt
Hossein combed his/her dog
???? ??? ?? ???? ??
17The compositional approach
- Lexicon lists only the necessary parts, not
compositions - Theoretically motivated
- Accounts for novel usage
- Allows modeling of closely related verbsthe
window broke vs. John broke the window - Ability to handle separable light verb
constructions
18Conclusion
- Discussed some lexical issues in Persian MT
- Presented a structure of the lexicon based on
linguistic theory - Presented computational formalization and
implementation of the lexical structure of
certain Persian light verbs - Advantages
- Correct translation without listing each complex
verb in the lexicon - Smaller vocabulary size
- Facilitates multilingual translation
- Interlingua but based on linguistic theory
- Easy handling of separable light verbs