Title: Investigating Chinese Learner English
1Investigating Chinese Learner English
- Centre for Linguistics and Applied Linguistics,
- Guangdong University of Foreign Studies
- Gui Shichun
2Background
- The corpus consists of one million words of
written compositions by 5 types of learners
senior middle-school, tertiary college English
(band 4), tertiary college English (band 6),
tertiary majors in English (1st and 2nd years),
tertiary majors in English (3rd and 4th years).
The corpus is annotated with grammatical tags
(automatically) and error tags (manually). - It is avaiblable at http//www.clal.org.cn/baseinf
o/ achievement/ Achievement1. htm,
3(No Transcript)
4Areas of Investigation
- Leech (1998) raises two specific questions in
connection with the study of learner language - What are the particular areas of overuse,
underuse and error which native speakers of
language A are prone to in learning target
language T, as contrasted with native speakers of
languages B, C, D . . . ? - What, in general, is the proportion of non-native
target language behaviour (overuse, underuse,
error) peculiar to native speakers of language A,
as opposed to such behaviour which is shared by
all learners of the language, whatever their
mother tongue?
5- Contrastive study must be very careful, because
the corpora under investigation are based on
different types of language performance. One of
the key issues is to identify the context-free
variables, e.g. functional words and some of the
most frequently used notional words. We believe
that an annotated learner corpus is useful in the
following ways
6- Identifying the words and structures that are
typically underused or overused in the learner
corpus - Identifying the kinds of error learners at
different levels are likely to commit - Predicting the language proficiency of the
learners - Providing diagnostic information to the both the
teachers and the learners.
7Comparison of grammatical tags
- In our POS tagging program, we used the same
133133 matrix of tag-transition frequencies, and
had CLEC grammatically tagged automatically. Then
we tried to compare the grammatical tags of CLEC,
Brown, and LOB.
8- The native corpora (BROWN and CLEC) are fairly
consistent in terms of their grammatical tagging. - Chinese learners used more pronouns, but fewer
determiners, prepositions, and numerals. Use of
more pronouns and fewer numerals reflects the
differences of subject matter between the learner
corpus and the native speaker corpus, because
what the learners have written are related to
their personal and school life and activities.
But use of fewer determiners and prepositions may
have something to do with the learner problems in
their writing. - Another step forward is to study in greater
details each type of the tagging scheme. Lets
look at use of determiners as an example.
9(No Transcript)
10(No Transcript)
11Some observations of Chinese learners use of
determiners
- Chinese learners used fewer determiners, but the
total frequencies of ST6 learners were closer to
those of the native speakers. - Chinese learners used fewer articles the, no
a, any were the most underused determiners. - Some tendency can be observed the more
proficient the learners, the closer is their use
of some determiners to that of the native
speakers. For example, quite, rather, half, all,
both, these, those, many, much, next, former, and
other. The last five are post-determiners, which
were used much more often by native speakers.
They can be considered as discourse markers. We
hypothesize that they can be used for text
identification as automatic scoring of learners
compositions.
12Under-use of the learner lexicon
- The most frequently used lexical items are
more or less context-free, and it is a suitable
place to start with our analysis. They include - Use of most functional words like determiners and
prepositions - Some of the modal or auxiliary verbs
- Some of the polysemous words like go, make, take,
great, risk, etc - Some pronouns, especially personal pronouns.
13Overuse of Modal Verbs (can, may,much,should)
14- As observed by Biber et al , can (ability or
permission), must (logical necessity) are used
much more common in conversation, the overuse of
can and must can be considered as an indication
of Chinese learners writing style. They make no
distinction between the stylistic differences of
spoken and the written forms. The materials of
CET learners were collected mostly from CET test
papers, yet they displayed the greatest number of
uses of can, must, and should. Chinese learners
tend to write down what they speak, though they
may not be well versed in speaking, as is
indicated by the underuse of could, have to, had
better and have got to.
15Comparing keyness of CLEC and FLOB
- By using the keyness programme of Wordsmith, we
are able to identify the underuse of the learner
corpus in terms of keyness,which is the classic
chi-square test of significance with Yates
correction for a 2 X 2 table. For better
estimation of keyness, Ted Dunning's Log
Likelihood test is used when contrasting long
texts or a whole genre against the reference
corpus. The higher the chi-square value, the
greater is the difference between the frequencies
of two corpora under observation.
16Fewer third person pronouns
- This is the result of Chinese transfer,
because in modern colloquial Chinese, the third
person pronouns do not make any gender
difference. There is no underuse of first and
second person pronouns. St3-4 learners show wider
discrepancies, because their compositions are
mainly thematic writing.
17Fewer passive voice constructions
- This is shown by the underuse of been and by,
and partially by was and were. The st3-4
group and the st5-6 group seem to follow the same
tendency
18Fewer relative clauses
- Chinese learners tend to use fewer relative
clauses, as is shown by the underuse of wh-words.
The discrepancies of st5-6 are smaller, showing
that they are closer to the native speakers.
19Contrastive analysis of risk and its synonyms
across a few corpora
Using more danger than risk
20- While the frequencies of risk, danger, threat,
and hazard are fairly consistent in the native
speaker corpora, the performance of Chinese
learners is quite different. - Danger is a more generic term. The following
errors are produced as a result of the generic
use of danger - Fake furniture brings danger to people.
(It is risky buying fake furniture.) - Water is facing the danger of shortage.
(We are facing the threat of water shortage.) - Their knowledge of risk is quite limited. They
know how to use take the risk(8), at the
risk(3)and to risk(6) whereas native speakers
say avoid/carry/eliminate/ignore/crease/involve/
give/reduce/run/ worth/lack of / the risk
conventional/maximum/no/some/suicide/own/
unnecessary/hazard/ with/ without/ risk - Chinese learners do not know how to use high risk
,which is used quite often in the native speaker
corpora?
21Analyzing learner errors The Cognitive Model
- We use error as a cover term for all ways of
being wrong as an FL learner. Errors are results
of uncertainty in language performance, and
there are various kinds of uncertainty that can
be traced back to cognition - False analogy books, news
knowledges, informations - Incomplete application of rules
development advantagement - Redundancy ??????????it was a
three-story-tall building - Overgeneralization entered the
classroomreturned the classroom
22- Verbal behaviour (errors as well as linguistic
structures) can be considered as an emergence
process, as a result of competition of cues. - To set up our cognitive framework of error
analysis we make use of only those errors whose
frequencies are well above 1 of the total. There
are altogether 21 error types. - Errors can be divided into several levels that
are equivalent to the processes of lexicalization
? syntaticalization ?relexicalization
(Skehan,1998).
23- Lexical perceptual level, also known as
substance errors (James, 1998), and defined by
MacWhinney as the level that involves the
acquisition of basic lexical structures in small
areas of cortex called local maps. They are
related to perceptual representations, especially
to memory, such as memory failure or memory
distortion. Typically these errors can be
identified at single-word level, as - spelling or number errors (great graet
information informations) , - or by looking at its close neighbors as
- absence of articles or prepositions. (the moon
is?brightest I dressed myself in?hurry I sat
back?my chair).
24- Lexico-grammatical (or lexical grammatical)
level. Misconception of target language system.
When looking at the errors of our learners, it is
very difficult to isolate grammar and lexis into
separate categories, because grammar does not
exist on its own. James defines it as text-level
errors whereas MacWhinney chooses to call it
the level that involves the interaction between
lexical structures in terms of lexical groups.
Typically these errors can be identified at the
inter-word level, by looking at the word and its
neighbors. - using the wrong parts of speech (POS errors It
is not difficulty that we can find) - wrong word (substitution errors ???????If
you match difficult problemPeople take
(pay)more attention to it) - wrong collocates (They must listen to the lesson
more carefully.) - verb agreement (People argues that euthanasia or
mercy killing is humane.) - Reference (My aunt came to my home with his
son.).
25- Syntactic level. Errors can be identified at a
broader context, at the sentential level. James
chooses to call it discourse-level errors, but
we propose to reserve the word discourse for
another upper level. L2 learners may often
produce grammatical sentences that sound foreign.
(Pawley and Syder, 1983). MacWhinney defines it
as the level that involves the processing of
syntactic information across longer neural
distances in functional neural circuits.
Syntactic errors vary from
26- capitalization (he learned English and Russian
and Wrote the Civil War in France. ) - punctuation (When playing football or basketball.
You might be using 400 calories an hour.), to - run-on sentences (If I am not famous, it doesnt
matter, I dont mind this.), - fragmentary sentences (As they do more exercises
and often think deeply.) and - structural deficiency (During I spent my holidays
in Beijing about ten years ago,).
27- Figure 9 The Cognitive Model
28- Confirmatory factor analysis was conducted by
using Lisrel 8.50, which shows clearly that there
are 3 factors, and they are grouped under 3
categories as what have been defined. Path
analysis shows that all the parameters (values of
?s) of the hypothetical paths are significant
except run-on sentences.
29Lamda0.28, insignif-icant
30Correspondence Analysis (learners types by error
types)
31Some General Remarks
- On the whole, identifying errors at 3 levels
seems to be working well in our cognitive model.
So far weve not covered errors at the discourse
level, because, - It is difficult to set down the standards for
native-like selection as defined by Pawley and
Syder (1983) - It is even more difficult for Chinese markers of
errors to observe the standards.
32- The grouping of errors is not as clear-cut as
what weve thought. Very often the same type of
error can be put into different categories or the
same type of errors can occur at 3 different
levels depending on the situations. We can only
say this is done according to the main tendency. - At every level, language transfer seems to play
an important role. This is because the adult
learners have set up their L1 (more complete)
linguistic system and are in the process of
setting up another linguistic system (rather
incomplete). As mature learners, when they want
to express their complex thinking, they often
fall back on using the linguistic system that is
more familiar to them.
33- Occurrences of errors depend very much on the
writing task and the learners certainty of
fulfilling the task. They may not be an
indication of the language proficiency of the
learners. CET learners tend to commit more
lexico-grammatical errors because their data were
collected mainly from CET compositions. - This points to the necessity of inclusion of more
learner data, so that we can have a more balanced
collection of error types for further
investigation.
34Thanks!
Your Comments are Welcome!