Title: Using a parallel corpus in translation practice and research
1Using a parallel corpus in translation practice
and research
- Ana Frankenberg-Garcia
- ana.frankenberg_at_sapo.pt
2Machine Translation
Using machines to analyse Human Translation
3The study of human translation
- Traditionally not a hard science
- Difficult to be systematic
But with the technology of corpus linguistics,
things can change
4What is a corpus?
large
text-retrieval software
specific criteria
machine-readable
5Advantages of using corpora to study human
translation
- An enormous amount of translated texts
- Systematic analyses
- Quantifiable results
6A bi-directional parallel corpus of Portuguese
and English
COMPARA
Project leaders Ana Frankenberg-Garcia Diana
Santos Research assistants Rosário Silva
Susana Inácio Initial support (1999-2000) FCT
(Portugal) ISLA (Lisboa) Oxford University
(Language Centre) Present funding
(2001-2006) Linguateca FCT/ POSI
(POSI/PLP/43931/2001)
7COMPARA structure
COMPARA
PT translations
EN translations
EN source texts
PT source texts
8COMPARA
Original
Translated English English
English Portuguese
Original Translated Portuguese Portuguese
Source Translations Texts
9 COMPARA 8.0 varieties
UK
Unbalanced distribution!
Portugal
US
Mozambique
South Africa
Brazil
Angola
ENGLISH
PORTUGUESE
10 COMPARA 8.0 Publication dates
2002
1997
1988
1914
1880
1837
11 COMPARA 8.0 genre
Published fiction
other genres
EXTENSIBLE
12COMPARA 8.0 authors
Portuguese writers Camilo Castelo Branco Eça de
Queirós José Cardoso Pires José Saramago Jorge de
Sena Lídia Jorge Mário de Carvalho Sá Carneiro
13COMPARA 8.0 authors
Brazilian writers Aluísio Azevedo Autran
Dourado Chico Buarque Jô Soares José de
Alencar Machado de Assis Manuel Antônio de
Almeida Marcos Rey Patrícia Melo Paulo
Coelho Rubem Fonseca
14COMPARA 8.0 authors
Angolan writers José Eduardo Agualusa Mozambiquea
n writers Mia Couto
15COMPARA 8.0 authors
British writers David Lodge Ian McEwan Julian
Barnes Joseph Conrad Joanna Trollope Kazuo
Ishiguro Lewis Carrol Mary Shelley Oscar Wilde
16COMPARA 8.0 authors
American writers Henry James Edgar Allan
Poe Richard Zimler South African writers Nadine
Gordimer
17Can any text be included in the corpus?
- Only
- published source texts and translations
- Only
- English translated directly from Portuguese
- Portuguese translated directly from English
- Only
- human translations!
18COMPARA 8.0 texts
74 translations
71 source texts (extracts)
19 COMPARA 8.0 size
1,536,269 1,423,937 words words in
in English Portuguese
Largest edited parallel corpus containing
Portuguese
20COMPARA users and uses
- Language learners - bilingual dictionary with
examples - Language teachers - exercises and tests
- Translators - language equivalents
- Translation lecturers - exercises problems
- Translation theorists - test translation
hypotheses - Lexicographers - bilingual dictionaries
- Computational linguists - machine translation
Latest statistics 6000 queries per month
21 COMPARA availability
Free, online For research and education
22 COMPARA access
www.linguateca.pt/COMPARA/
COMPARA
23(No Transcript)
24nodded
25(No Transcript)
26(No Transcript)
27Studies using COMPARA
- Observing source texts and translations
- Constrasting Portuguese and English
- Comparing translated and untranslated language
- Examining the characteristics of translated texts
281. Observing source texts translations
- Improving bilingual dictionaries and
machine-translation programs - Frankenberg-Garcia (2002)
- nod
- Ribeiro Dias (2005)
- grande
- Specia et al. (2005)
- word-sense disambiguation
292. Contrasting English and Portuguese
Contrasting original fiction in English and
Portuguese Frankenberg-Garcia (2005)
PT Loan words
EN Loan words
PT Loan languages
EN Loan languages
303. Comparing translated and untranslated language
translations source texts
diferente(s) simplesmente end. up
2 x
30,7 15,4 15,6 5,1 13,5 2,8
3 x
4 x
lemma rezar
5,6 12,4
2 x
frequency/100 K words in COMPARA 7.0.4
314. Examining the characteristics of translated
texts
Are translations longer than source
texts? Frankenberg-Garcia (2004) Explicitation
Hypothesis
32Source texts
Translations
Pt 1500 words
Pt 1500 words
8 PT translators 8 EN translators
Pt 1500 words
Pt 1500 words
Pt 1500 words
Pt 1500 words
Pt 1500 words
Pt 1500 words
En 1500 words
En 1500 words
En 1500 words
En 1500 words
En 1500 words
En 1500 words
En 1500 words
?
En 1500 words
8 PT authors 8 EN authors
33Source texts
Translations
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
ST
Matched t-test 95 probability TT longer than ST
5
34To conclude....
Studies such as these were unthinkable before
corpora Many other studies are
possible! COMPARA is free and available
online Contact us ana.frankenberg_at_sapo.pt
diana.santos_at_sintef.no