Title: Qiufang Wen
1Chinese learner corpora and second language
research
The 2006 International Symposium of
Computer-Assisted Language Learning June 2-4,
2006, Beijing
- Qiufang Wen
- The national research center for foreign language
education, BFSU
2Topics to be addressed
- English corpora of Chinese learners
- Corpus-based studies on English learners in
mainland China - Several corpus-based studies on English learners
interlanguage by myself or together with my
colleauges - Advantages and disadvantages of corpus-based
studies on the interlanguage
3Topic One
- English corpora of Chinese learners
4- Chinese learner English Corpus (CLEC)
- College Learners Spoken English Corpus (COLSEC)
- Spoken and Written Corpus of Chinese Learners
(SWECCL) - Version 1
- Version 2 (under construction)
- Bilingual Corpus of Chinese English Learners
(BICCEL) under construction
51. Chinese learner English Corpus (CLEC) by Gui
Yang in 2003
- Written corpus 1 million
- Timed and untimed compositions
- Levels of proficiency
- Middle school students
- Non-English major (Band 4)
- Non-English major (Band 6)
- English majors (Band 4 )
- English majors (Band 8)
- Error-tagged
6Two Types of English Learners in University
- English Majors Non-English majors
Year 4 Year 3 Year 2 Year 1
Year 4 Year 3 Year 2 Year 1
Band 8
Band 6
Band 4
Band 4
Band 2
72. College Learners Spoken English Corpus
(COLSEC) by Yang Wei in 2005
- Tokens 0.7million
- Source National spoken English test for
non-English majors - Test items
- Teacher-student conversation
- Student-student discussion
- teacher-student discussion
- Data format written transcripts
83. Spoken and Written Corpus of Chinese Learners
(SWECCL) by Wen, Wang Liang in 2005 (Version 1)
SWECCL
WECCL
SECCL
1.18 million
1.46 million
9Spoken (SECCL)
- Source of data
- National spoken English test 1996-2002
- Second-year English majors
- Data format
- Digital sounds as well as transcripts of the
speeches -
10National spoken English test for English majors
Band 4
- Test format
- Test in a lab
- The number of testees annually
- 2006 more than 16,000
- Expect to have 50,000 in the future
- Scoring procedures
- A random sample (30-35 tapes)
- Two raters scoring one tape independently
11- Number of subjects
- 6 groups from each year (1996-2002)
- 42 groups (30/35) about 1400 students
- About 230 hourss speech
- Testing items
12Testing items
13The structure of SECCL
Tagged
Article
Past Tense
Text
Special
Whole
Task A
SECCL
Raw
Task
Task B
Task C
Year
Sound files
(1996-2002)
14The written component
Written
Year 1
Year 2
Year 3
Year 4
15The written component
- Source of data
- Timed compositions in class (40 minutes, no less
than 300 words) - Take-home compositions (no word limit)
- Types of compositions
- Argumentative (a list of topics provided)
- Narrative
16SWECCL in 2007 (Version 2)
SWECCL
WECCL
SECCL
Two million
Two million
17SECCL(Version 2)
- 2003-2006 National Spoken English Test for
second-year English majors (band 4) - 2000-2006 National Spoken English Test for
4th-year English majors-Band 8 (Task 3) - A longitudinal data (2001-2004)
18Spoken (Band 8)
- Testing item (Task C)
- Make a comment on a given topic
- Data format
- Digital sounds as well as transcripts of the
speeches
19Spoken (Longitudinal)
- 72 students 56 students
- 40 hours speech
-
20Tasks
- Reading aloud
- Retelling a story
- Talking on a given topic (Narrative)
- Talking on a given topic (argumentative)
- Conversation (Role play)
- Discussion on a given topic
214. Bilingual Corpus of Chinese English Learners
(BICCEL)
BICCEL
Spoken
Written
E-C
C-E
E-C
C-E
0.5 million
0.5 million
0.5 million
0.5 million
22Spoken component of BICCEL
- National Oral English test Band 8
- The 4th year English majors
- Interpreting from English to Chinese (Task A)
- Interpreting from Chinese to English (Task B)
- 2001-2005 1100 testees
23Written component of BICCEL
- Source of data in-class assignment
- E-C and C-E translation
- Across the 3rd and 4th years
- 30 universities across the country
24Topic Two
- A brief review of corpus-based studies on Chinese
learner English
25Sources
- China National Knowledge Infrastructure
(CNKI)(On-line journals) - Digital dissertation database
26Corpus-based studies in mainland China
27Research areas
28Conferences workshop
- The International conference on Corpus
Linguistics 25-27 October, 2003 - The First National Symposium on corpus
linguistics and ELT Education - 11-13 October, 2004
- Workshop on the use of corpus in teaching and
research 17-19 March, 2006
29Topic Three
- Several corpus-based studies on English learners
interlanguage by myself or together with my
colleagues
30Study One
- Features of oral style in English compositions of
advanced Chinese EFL learners - (Wen, Q.F. Ding, Y.R. Wang, W.Y. 2003, Foreign
Language Teaching Research (4)268-274.
31Study Two
- A Study on Frequency Adverbs Used by Advance
English Learners in China - Wen, Q. F. Ding, Y. R. 2004. Modern foreign
languages(2) 141-147.
32Study Three
- An analysis of English Majors Abstracting
abilities through their English compositions - Wen, Q.F. Liu, R.Q. 2006. Foreign Languages
(2)
33Study Four
- A longitudinal study on the developmental
features of speaking vocabulary by English majors
in mainland China - Wen, Q. F. 2006. Foreign Language Teaching
and Research (3).
34Study Five
- A comparison of developmental features of
Speaking and Writing vocabulary by English majors
- Wen, Q. F. 2006. Foreign languages and Foreign
Language Teaching (4)
35Study Six
- Patterns of change in speaking vocabulary
development by English majors
36Study Two
- A Study on Frequency Adverbs Used by Advance
English Learners in China - Wen, Q. F. Ding, Y. R. 2004. Modern foreign
languages(2) 141-147.
37Frequency Adverbs
- Adverbs used for describing how often something
happens - never, sometimes, usually, always
38Top Twenty Frequency Adverbs
- Most frequently used by native
- speakers according to the analyses of the
British National Corpus (BNC) by Leech, Rayson
and Wilson (2001)
39Top Twenty Frequency Adverbs (TTFAs)
40Common features
- All high-frequency words
- Different frequencies in speech and writing
except sometimes and twice (Leech et al. 2001)
41A comparison of TTFAs in speech and writing
- The overall difference
- TTFAs more likely occur in writing than in
speech. - The specific differences
- Speech never, always, ever, normally
- Neutral sometimes, twice
- Writing 14 words
42Previous corpus-based studies
- e.g. Altenberg Granger, 2001 Cobb, 2002
Ringbom, 1998 Wen, Ting, Wang,2003 - Conflicting finding one overuse vs. underuse
43Examples
- Overuse high-frequency words in writing (Cobb,
2001) - Overuse modal verbs (Aijmer, 2002)
- Underuse adverbial connectors (Altenberg
Tapper, 1998) - No study on frequency adverbs
44Conflicting finding two
- Tend to use written style features in their
speech - Tend to use a mixed register in either speech or
in writing - Tend to use oral style features in their writing
- Did not compare the use of high-frequency words
in speech with writing
45General purposes of this study
- Whether Chinese EFL learners simply overuse the
TTFAs or they overuse some while underusing
others - whether they use the TTFAs similarly or
differently when compared their speech with
writing
46Research questions
- Do they overuse or underuse the TTFAs differently
between speech and writing? - Do they differ more from native speakers in
writing or in speaking with regard to the use of
the TTFAs? - Do they demonstrate a similar pattern of
writing-speaking difference as native speakers in
the use of the TTFAs?
47Data for analysis
48Data analysis
- Four comparisons
- Learners speech and native speakers speech
- SECCL vs. BNCS
- Learners writing and native speakers writing
CLEC vs. BNCW - Dif. in learners speech native speakers and
Dif. In learners writing native speakers - SECCL vs. BNCS and CLEC vs. BNCW
- Dif. In learners speech writing and dif. in
native speakers speech writing - SECCL vs. CLEC and BNCS vs. BNCW
49Results(1)
- TTFA use in learners spoken corpus (SECCL)
50Results(2)
- TTFAs use in learners written corpus(CLEC)
51Results(3)
- Comparison of learners speech with their
writing in TTFA use (Overuse)
52Results(3)
53Results(3)
- Comparison (identical or similar)
54Results(4)
- Speaking-writing differences in TTFA use in the
CEMIC and the BNC
55Results(4)
- Speaking-writing differences in TTFA use in the
CEMIC and the BNC
56Summary (1)
- English majors in China tend to overuse and
underuse certain TTFAs in their speech and
writing. The overuse tendency is stronger than
the underuse tendency in both speech and writing.
57Summary (2)
- The overuse tendency is more marked in their
speech than in their writing while the underuse
tendency is also slightly stronger in speech than
in writing. Some of the overused or underused
TTFAs in speech are the same as those in writing
but others are different.
58Summary (3)
- Chinese English majors demonstrate a pattern of
speaking-writing difference that is opposite to
that shown in the native speakers corpus they
tend to use more TTFAs in their speech than in
their writing while native speakers tend to use
more TTFAs in their writing than in their speech.
This shows that Chinese EFL learners use TTFAs
without awareness of their register differences.
59Possible reasons
- Limited vocabulary (Table 1b)
- Use them as time buyers
- Without equivalents readily available in Chinese
60Topic Four
- Advantages and disadvantages of corpus-based
studies on SLA
61Advantage One
- A large sample stored electronically and open to
the public - Validity and reliability (replicable)
- Possible for a diachronic study
62Advantage Two
- Using a computer software such as WordSmith
- Effectiveness and efficiency
63Advantage Three
- Understand the learner language from a different
perspective - Correct vs. incorrect
- More acceptable vs. less acceptable
- Frequency
- Overuse
- Underuse
- unuse
64Disadvantages
65Closing Remark
- The number of researchers increasing
- Constructing different types of corpora
- Carrying corpus-based studies
- Findings useful for textbook writers as well as
for practitioners
66