Using a parallel corpus in translation practice and research

About This Presentation
Title:

Using a parallel corpus in translation practice and research

Description:

machine-readable. Advantages of using corpora to study human translation ... bilingual dictionaries and machine-translation programs. Frankenberg-Garcia (2002) ... –

Number of Views:343
Avg rating:3.0/5.0
Slides: 35
Provided by: anafranken
Category:

less

Transcript and Presenter's Notes

Title: Using a parallel corpus in translation practice and research


1
Using a parallel corpus in translation practice
and research
  • Ana Frankenberg-Garcia
  • ana.frankenberg_at_sapo.pt

2
Machine Translation
Using machines to analyse Human Translation
3
The study of human translation
  • Traditionally not a hard science
  • Difficult to be systematic

But with the technology of corpus linguistics,
things can change
4
What is a corpus?
large
text-retrieval software
specific criteria
machine-readable
5
Advantages of using corpora to study human
translation
  • An enormous amount of translated texts
  • Systematic analyses
  • Quantifiable results

6
A bi-directional parallel corpus of Portuguese
and English
COMPARA
Project leaders Ana Frankenberg-Garcia Diana
Santos Research assistants Rosário Silva
Susana Inácio Initial support (1999-2000) FCT
(Portugal) ISLA (Lisboa) Oxford University
(Language Centre) Present funding
(2001-2006) Linguateca FCT/ POSI
(POSI/PLP/43931/2001)
7
COMPARA structure
COMPARA
PT translations

EN translations
EN source texts
PT source texts
8
COMPARA
Original
Translated English English
English Portuguese
Original Translated Portuguese Portuguese
Source Translations Texts
9
COMPARA 8.0 varieties
UK
Unbalanced distribution!
Portugal
US
Mozambique
South Africa
Brazil
Angola
ENGLISH
PORTUGUESE
10
COMPARA 8.0 Publication dates
2002
1997
1988
1914
1880
1837
11
COMPARA 8.0 genre
Published fiction
other genres
EXTENSIBLE
12
COMPARA 8.0 authors
Portuguese writers Camilo Castelo Branco Eça de
Queirós José Cardoso Pires José Saramago Jorge de
Sena Lídia Jorge Mário de Carvalho Sá Carneiro
13
COMPARA 8.0 authors
Brazilian writers Aluísio Azevedo Autran
Dourado Chico Buarque Jô Soares José de
Alencar Machado de Assis Manuel Antônio de
Almeida Marcos Rey Patrícia Melo Paulo
Coelho Rubem Fonseca
14
COMPARA 8.0 authors
Angolan writers José Eduardo Agualusa Mozambiquea
n writers Mia Couto
15
COMPARA 8.0 authors
British writers David Lodge Ian McEwan Julian
Barnes Joseph Conrad Joanna Trollope Kazuo
Ishiguro Lewis Carrol Mary Shelley Oscar Wilde
16
COMPARA 8.0 authors
American writers Henry James Edgar Allan
Poe Richard Zimler South African writers Nadine
Gordimer
17
Can any text be included in the corpus?
  • Only
  • published source texts and translations
  • Only
  • English translated directly from Portuguese
  • Portuguese translated directly from English
  • Only
  • human translations!

18
COMPARA 8.0 texts
74 translations
71 source texts (extracts)
19
COMPARA 8.0 size
1,536,269 1,423,937 words words in
in English Portuguese
Largest edited parallel corpus containing
Portuguese
20
COMPARA users and uses
  • Language learners - bilingual dictionary with
    examples
  • Language teachers - exercises and tests
  • Translators - language equivalents
  • Translation lecturers - exercises problems
  • Translation theorists - test translation
    hypotheses
  • Lexicographers - bilingual dictionaries
  • Computational linguists - machine translation

Latest statistics 6000 queries per month
21
COMPARA availability
Free, online For research and education
22
COMPARA access
www.linguateca.pt/COMPARA/
COMPARA
23
(No Transcript)
24
nodded
25
(No Transcript)
26
(No Transcript)
27
Studies using COMPARA
  1. Observing source texts and translations
  2. Constrasting Portuguese and English
  3. Comparing translated and untranslated language
  4. Examining the characteristics of translated texts

28
1. Observing source texts translations
  • Improving bilingual dictionaries and
    machine-translation programs
  • Frankenberg-Garcia (2002)
  • nod
  • Ribeiro Dias (2005)
  • grande
  • Specia et al. (2005)
  • word-sense disambiguation

29
2. Contrasting English and Portuguese
Contrasting original fiction in English and
Portuguese Frankenberg-Garcia (2005)
PT Loan words
EN Loan words
PT Loan languages
EN Loan languages
30
3. Comparing translated and untranslated language
translations source texts
diferente(s) simplesmente end. up
2 x
30,7 15,4 15,6 5,1 13,5 2,8
3 x
4 x
lemma rezar
5,6 12,4
2 x
frequency/100 K words in COMPARA 7.0.4
31
4. Examining the characteristics of translated
texts
Are translations longer than source
texts? Frankenberg-Garcia (2004) Explicitation
Hypothesis
32
Source texts
Translations
Pt 1500 words
Pt 1500 words
8 PT translators 8 EN translators
Pt 1500 words
Pt 1500 words
Pt 1500 words
Pt 1500 words
Pt 1500 words
Pt 1500 words
En 1500 words
En 1500 words
En 1500 words
En 1500 words
En 1500 words
En 1500 words
En 1500 words
?
En 1500 words
8 PT authors 8 EN authors
33
Source texts
Translations
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
TT
ST
Matched t-test 95 probability TT longer than ST
5
34
To conclude....
Studies such as these were unthinkable before
corpora Many other studies are
possible! COMPARA is free and available
online Contact us ana.frankenberg_at_sapo.pt
diana.santos_at_sintef.no
Write a Comment
User Comments (0)
About PowerShow.com