Learner corpora, error analysis - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Learner corpora, error analysis

Description:

Error-tagged Learner Corpora and CALL: A Promising Synergy. CALICO Journal, 20 ... G. Leech (1998) Introduction' to Granger (ed.) Learner English on Computer. ... – PowerPoint PPT presentation

Number of Views:151
Avg rating:3.0/5.0
Slides: 68
Provided by: don78
Category:

less

Transcript and Presenter's Notes

Title: Learner corpora, error analysis


1
Learner corpora, error analysis
  • ?????????????????
  • ?????????????????
  • ???

2
Learner corpora
3
  • Granger, S. 2003. Error-tagged Learner Corpora
    and CALL A Promising Synergy. CALICO Journal, 20
    (3), 465-480.
  • Pravec, N. 2002. Survey of learner corpora. ICAME
    Journal No. 26, 81-114.
  • Tono, Y. 2003. Learner corpora design,
    development and applications. CL2003 workshop
    paper. 800-809.

4
What is a Learner Corpus?
  • a corpus, or computer textual database, of the
    language produced by foreign language learners
  • G. Leech (1998) Introduction to Granger (ed.)
    Learner English on Computer. London Addison
    Wesley Longman

5
learner corpora
  • Computer learner corpora are electronic
    collections of authentic FL/SL textual data
    assembled according to explicit design criteria
    for a particular SLA/FLT purpose. They are
    encoded in a standardised and homogeneous way and
    documented as to their origin

6
  • There is nothing new in the idea of collecting
    learner data. Both FLT and SLA researchers have
    been collecting learner output for descriptive
    and/or theory-building purposes since the
    disciplines emerged. In view of this, it is
    justified to ask what added value, if any, can be
    gained from using learner corpus data.
  • (Granger 2004 123f.)

7
  • a new resource for second language acquisition
    (SLA) and foreign language teaching (FLT)
    specialists.
  • especially useful when annotated with the help of
    a standardized system of error tags.

8
  • Learner language differs from native language
    both quantitatively and qualitatively.
  • It displays very different frequencies of words,
    phrases and structures, with some items overused
    and others significantly underused.
  • It is also characterized by a high rate of
    misuse, i.e. orthographic, lexical, and
    grammatical errors.

9
  • "The area of linguistic enquiry known as learner
    corpus research ... has created an important
    link between the two previously disparate fields
    of corpus linguistics and foreign/second language
    research. Using the main principles, tools and
    methods from corpus linguistics, it aims to
    provide improved descriptions of learner language
    which can be used for a wide range of purposes in
    foreign/second language acquisition research and
    also to improve foreign language teaching."
    (Granger 2002, 4)

10
Features of Learner Corpora
  • Storing the data in a machine-readable format
  • Computational analysis can be exploited
  • Annotations make corpora even more valuable
  • Finding patterns vs. idiosyncrasies
  • Can be shared with other researchers
  • Can be used for research teaching/learning.
  • Standard reference

11
  • Hence, it is vital to consult corpora of learner
    data, as well as corpora of data that serve as
    input to learners, before we can get a full
    picture of what learners know and how they come
    to know it.
  • Alan Juffs (2001) in SSLA, pp.312

12
What FLT fields will benefit from computer
learner corpus (CLC) research?
  • curriculum design use of CLC for selecting and
    sequencing what needs to be taught
  • materials design use of CLC to improve FLT
    tools traditional grammars and dictionaries
    web-based materials and CALL programs
  • classroom methodology use of CLC for data-driven
    learning (DDL) and learning-based exercises
  • Granger

13
Mark-up annotation
14
Annotated Learner Data
  • Mark-up Header info, sentence boundaries, etc.
  • Useful for choosing a proper set (portion) of
    files
  • Annotation POS tagging, error tagging, etc.

15
Error analysis
16
CA
EA
CIA
CEA
17
ERROR
CA, EA
CIA, CEA
18
Traditional EA suffers from a number of
limitations
  • Limitation 1 EA is based on heterogeneous
    learner data
  • Limitation 2 EA categories are fuzzy
  • Limitation 3 EA cannot cater for phenomena such
    as avoidance
  • Limitation 4 EA is restricted to what the
    learner cannot do
  • Limitation 5 EA gives a static picture of L2
    learning.
  • (Dagneaux et al. 1998 164)

19
Criticism of EA
  • Once a very popular enterprise, error analysis
    (EA) is now out of favor with most SLA/FLT
    circles. It has gone down in history as a fuzzy,
    unscientific, and unreliable way of approaching
    learner language.

20
Systematic, would occur in similar context
mistake
error
performance
competence
21
  • errors are an integral part of interlanguage and
    are just as worthy of analysis as any other IL
    aspect.
  • an important key to a better understanding of
    the process underlying L2-learning. Ringbom
    (1987, p. 69)
  • can still serve as a useful tool and is still
    undertaken. Ellis (1994, p. 20)
  • In particular, a detailed description of learner
    errors cannot but contribute to one essential FLT
    aimthat of helping learners to achieve a high
    level of accuracy in the language.

22
goals of learner corpus research
  • descriptively
  • find and classify errors, find patterns in
    learner language
  • theoretically
  • find out about learner's hypotheses
    (interlanguage)
  • improve teaching material

23
learner corpora what can you do with them
  • two main types of studies
  • error analysis (EA)
  • qualitative and quantitative
  • contrastive interlanguage analysis (CIA)
  • qualitative and quantitative
  • know your data comparability, ways of counting
    different distributions

24
what is an error?
  • "A linguistic form, ... which, in the same
    context would in all likelihood not be produced
    by the learner's native speaker counterparts."
    (Lennon 1991, 182)

25
  • structural errors (breaking of a rule),
  • non-structural errors,
  • deviations from some kind of norm ('breaches of
    code')
  • quantitative differences (overuse, underuse)

26
Steps in EA
  • 1. collection of samples of learner language
  • 2. identification of errors
  • 3. description of errors
  • 4. explanation of errors
  • Ellis 1994

27
error tags
  • classification
  • formal kind of error (insertion, deletion, ...)
  • exponent of error (word, phrase, ...)
  • hypothesis about reason (interference with L1,
    principle X not understood, ...)
  • linguistic level (morphology, syntax, ...)

28
development of error tagsets
  • the development of error tags is difficult
    because
  • there is often more than one possible target
    hypothesis
  • the level of granularity of the tagset depends on
    the research question
  • generic tagsets vs. specific tagsets

29
Error tagging principles (1)
  • FRIDA (Granger et al)

30
Principles in error tagging
  • 1. informative but manageable it should be
    detailed enough to provide useful information on
    learner errors, but not so detailed that it
    becomes unmanageable for the annotator

31
  • 2. reusable the categories should be general
    enough to be used for a variety of languages

32
  • 3. flexible it should allow for addition or
    deletion of tags at the annotation stage and for
    quick and versatile retrieval at the
    post-annotation stage

33
  • 4. consistent to ensure maximum consistency
    between the annotators, detailed descriptions of
    the error categories and error tagging principles
    should be included in an error tagging manual.

34
two major descriptive error taxonomies
  • (a) one based on linguistic categories (general
    ones such as morphology, lexis, and grammar and
    more specific ones such as auxiliaries, passives,
    and prepositions) and
  • (b) the other focusing on the way surface
    structures have been altered by learners (e.g.,
    omission, addition, misformation, and
    misordering).

35
three levels of annotation
  • error domain,
  • error category, and
  • word category.
  • These three levels are descriptive rather than
    interpretative.

36
  • The error domain is the most general level it
    specifies whether the error is formal (i.e.
    orthographic), grammatical, lexical, and so
    forth. Each error domain is subdivided into a
    variable number of error categories.

37
Error Domains and Categories (1)
38
2
39
3
40
  • The lexical domain ltLgt groups all lexical errors
    due to
  • 1. insufficient knowledge of the conceptual
    (i.e., denotative) meaning of words ltSIGgt
  • 2. violations of the co-occurrence patterns of
    words. This category covers a wide spectrum from
    restricted collocations to idioms ltFIGgt
  • 3. violations of the grammatical complementation
    (i.e., valency) patterns of words. This category
    covers the valency of verbs ltCPVgt, nouns ltCPNgt,
    adjectives ltCPAgt and adverbs ltCPDgt

41
(No Transcript)
42
(No Transcript)
43
CLEC
  • ????????????
  • ?????????

44
Error tagging principles (2)
  • CLEC (????????)

45
????????
  •  1.  ????,????????????????,???????,??????????????,
    ????11???(fm)?????(vp)?????(np)???(pr)??????(aj)?
    ??(ad)?????(pp)???(cj)???(wd)???(cc)???(sn)???????
    ???????cc???????,cc1??????????,cc2??????????
    ,cc3??????????,???

46
  •  2. ????????????????,?????,???????????/??????,????
    ?????????????????????????????(?vp?np??9??),???????
    ?(?cj?????)????????61????,????????????

47
  •  3.?????????(????????????????)???In the past,
    people are vp6, 4- kind to each other,
    ????????,??????? vp6?vp(??)?6?(??)??,4-????????,
    -???????,4??????4???????4??,????are???????

48
Error tags
Positions of error
Types of error
49
  • 4.??????????????????????????????????sn8????????,
    ?????????????????????????sn8?????????,????????????
    ,?sn81,sn82,???

50
  • 5.  ??????????????,???????????????,??????

51
(No Transcript)
52
(No Transcript)
53
????
54
????
55
??
56
?????
57
??
58
????
59
??
60
??
61
??
62
??
63
  • Your further tagging or your own tagging

64
CA to CIA
65
CA CIA
CA
OLltgtOL
SLltgtTL
DIAGNOSTIC
PREDICTIVE
TRANSFER
CIA
NLltgtIL
ILltgtIL
66
L2 Corpus (English)
L1 Corpus (Chinese)
L1 Corpus (English)
67
LOCNESS
CLEC/ SWECCL
Write a Comment
User Comments (0)
About PowerShow.com