Title: ESP Materials
1Wednesday, June 25, 2008. 3.00 3.50 pm Building
W5C, Room 221, Macquarie University
ESP Materials Derived from a Web-based Corpus
2Presentation Outline
- Background and rationale
- Research questions
- Research methodology
- Data analysis and findings
- Discussion
3Background of the Study
The study is
- Under supervision of Assoc. Prof. David Hall
- With consultation of Prof. Pam Peters
4Rationale of the Study
54 Main Considerations for ESP Materials
Development
- 1. Limitation of relevant ESP textbooks
- Although specialized texts in ICT are
abundant, they are not suitable for unmodified
and unsupported use directly in ESP classes
because of their difficulty for EFL students. - Need for teacher-designed materials in ESP
teaching.
64 Main Considerations for ESP Materials
Development
- 2. Difference of students background knowledge
- ICT students
- posses some specialized knowledge and skills to
design hardware and software. - need English to communicate their knowledge in
academic and professional contexts. - Non-ICT students
- have little knowledge of ICT
- need ICT knowledge as computer users.
- need to learn both basic ICT concepts and
English - to communicate in business companies
or organizations. - Different learning needs same level of English
-
different level of specialized knowledge - Need for different specialized contents to
facilitate ESP learning
74 Main Considerations for ESP Materials
Development
- 3. Insufficiency of EFL students lexical
knowledge - It was found that undergraduate students in
EFL countries - e.g. in Thailand (Supatranont, 2005), Oman (Cobb
and Horst, 2001), and Indonesia (Nurweni and
Read, 1999) have limited lexical knowledge and
less proficient in English than what is expected
for students at a university level. - In Supatranonts study (2005), lexical
knowledge of RMUTL - students was found below the lexical
threshold to academic study. -
- With limited vocabulary size of academic words,
students cannot cope well with the specialized
texts because most frequent words in these texts
consist of academic and sub-technical words
(Mundraya, 2006). - Academic and technical words should be
integrated as main - vocabulary components of language input.
84 Main Considerations for ESP Materials
Development
- Lexical threshold to academic study is
composed of two wordlists - (Nation, 2001 Coxhead Nation, 2001 Cobb
Horst, 2001 and Nation Waring, 1997)
General service list (GSL) 2,000 high
frequency words (West, 1953) (and) Academic word
list (AWL) 570 academic words (Coxhead, 1998)
- Academic vocabulary in this study is based on the
GSL and AWL - (downloaded from http//www.uefap.com/vocab/
vocfram.htm)
94 Main Considerations for ESP Materials
Development
- 4. Typical problems of Thai students
- Differences of verb use in English and Thai,
- English time-oriented language
- signifying time concepts
with various tense forms - different forms of verb
inflections and auxiliaries - Thai tenseless language (Baker,
2002) - without verb inflections and
auxiliaries - using contexts or adverb of
time to convey - the time concept
- less common use of passive
form - Frequent errors in using verb tenses and passive
forms in English
10Objectives of the Study
- To identify high-frequency language items in ICT
specialized - texts by focusing on five lexical and
syntactic areas - academic words
- technical words
- collocations
- verb tenses
- verb usage in a passive form
- To obtain a set of language input to design a
course material - for teaching English for ICT to non-ICT EFL
students - by using a corpus-based method.
11Research Questions
12Research Methodology
13Research Methodology
Text Selection
- Texts selected exclusively from web-based
tutorials in ICT - Authors mostly lecturers in universities and
tutorial centers. - 5 topics concerning fundamental ICT knowledge
- Computer hardware
- Operating systems and graphical user
interfaces (OS and GUIs) - Basic application software
- Multimedia software
- Internet software
- 3 text types articles, manuals and
advertisements (of hardware)
14Research Methodology
Total files 230
15Research Methodology
1500-2000 w/article 700-1000 w/manual
200-500 w/ad
Total words 287,478
16Research Methodology
Size 287,478 words
Text files 230 files
Word types 6,064 word types
Medium Written
Language Texts written in English
Authorship Texts written by experts in academic institutions, tutorial centers or manufactures
Contents Fundamental knowledge of ICT
Text topics 5 topics 1. Computer hardware 2. Operating system and graphical user interfaces (OS and GUIs) 3. Basic application software 4. Multimedia software 5. Internet software
Text types Articles passages including definitions and descriptions Manuals instructions for operating hardware and software Advertisements details of features and quality computer hardware products
17Research Methodology
To compile the corpus, the selected texts are
- Converted into text files .txt extension
- Marked up with text documentation
- topic, author, text type, word number, and
source
- Annotated with POS tagging, using trial service
of - CLAWS, developed by UCREL at Lancaster
University, UK - (Available at http//ucrel.lancs.ac.uk/annotat
ion.htmlPOS)
18Research Methodology
Sample texts with markup and POS tagging
ltfileDescgt lttopicgt ltOSGUIs1gt
lt/topicgt ltexpertgtltResearch and IT,
Calvin College, MI, USAgt lt/expertgt
lttexttypegt ltarticlegtlt/texttypegt
ltwordsnumgt lt1574wordsgt lt/wordsnumgt
ltsourcegt lthttp//www.calvin.edu/rbobeldy/tutorial
s/os/basigt lt/sourcegt lt/fileDescgt What_DTQ
is_VBZ an_AT0 Operating_NN1 System_NN1 ?_? An_AT0
operating_NN1 system_NN1 is_VBZ a_AT0 group_NN1
of_PRF programs_NN2 that_CJT manage_VVB all_DT0
activities_NN2 on_PRP the_AT0 computer_NN1 ._.
When_CJS you_PNP turn_VVB on_PRP a_AT0
computer_NN1 ,_, the_AT0 operating_NN1 system_NN1
programs_NN2 run_VVB and_CJC check_VVB to_TO0
be_VBI sure_AJ0 all_DT0 the_AT0 parts_NN2 of_PRF
the_AT0 computer_NN1 are_VBB functioning_VVG
properly_AV0 ._. Once_AV0 loaded_VVN ,_, the_AT0
operating_NN1 system_NN1 manages_VVZ all_DT0
activities_NN2 on_PRP the_AT0 computer_NN1
and_CJC the_AT0 interactions_NN2 with_PRP
input_NN1 (_( keyboard_NN1 ,_, mouse_NN1 ,_,
etc._AV0 )_) and_CJC output_NN1 devices_NN2 (_(
printers_NN2 ,_, monitors_NN2 ,_, etc_AV0 ._. )_)
._. If_CJS you_PNP run_VVB a_AT0 program_NN1
like_PRP Microsoft_NP0 Word_NN1 ,_, the_AT0
operating_NN1 system_NN1 is_VBZ actually_AV0
managing_VVG how_AVQ you_PNP interact_VVB
with_PRP Word_NN1 _ how_AVQ you_PNP tell_VVB
it_PNP what_DTQ font_NN1 to_TO0 use_VVI ,_,
what_DTQ margins_NN2 you_PNP want_VVB ,_, and_CJC
how_AVQ Word_NN1 prints_NN2 to_PRP the_AT0
printer_NN1 ._. An_AT0 operating_NN1 system_NN1
manages_VVZ all_DT0 _ input_NN1 -_- getting_VVG
information_NN1 into_PRP the_AT0 computer_NN1
from_PRP an_AT0 external_AJ0 source_NN1
such_PRP21 as_PRP22 the_AT0 keyboard_NN1 ,_,
a_AT0 mouse_NN1 ,_, a_AT0 scanner_NN1 ,_, or_CJC
a_AT0 disk_NN1 ._. processing_NN1 -_- after_PRP
receiving_VVG input_NN1 ,_, the_AT0 computer_NN1
manipulates_VVZ or_CJC alters_VVZ the_AT0
data_NN0 ._.
19Research Methodology
- WordSmith Tools version 5.0
- Developed by Mike Scott (2007)
- University of Liverpool, UK
- www.lexically.net/wordsmith/index.html
20Research Methodology
- According to Bowker and Pearson (2002), Hunston
(2002), and Scott (2001) -
- To ensure the words keyness, the frequency
wordlist of - a corpus should be compared with a larger
reference corpus. - With Log Likelihood Formula
- Unusually frequent or infrequent words
can be identified for - their keyness and the significance
difference (p value) i.e. - Words with positive keyness gt occurs unusually
more often. - Words with negative keyness gt occurs unusually
less often.
21Research Methodology
- British National Corpus (BNC)
- A general corpus of 100 million words
- Samples of written and spoken language from a
wide range of sources - BNC website is http//www.natcorp.ox.ac.uk
- In the present study, BNC wordlist is from
WordSmith Tools
22Data Analysis and Findings
The method of analysis is adapted from the
suggestions of Bowker and Pearson (2002), and
Scott (2001).
- The method and findings are described according
to the research questions. - What are high-frequency academic words in ICT
specialized texts? - What are high-frequency technical words in ICT
specialized texts? - What are high-frequency collocations in ICT
specialized texts? - What are high-frequency verb tenses in ICT
specialized texts? - What are high-frequency usages of verbs in a
passive form - in ICT specialized texts?
23Data Analysis and Findings
Question 1 What are high-frequency academic
words in ICT specialized texts?
1.1 Download GSL and AWL wordlists from the
website of the University of
Hertfordshire, UK at http//www.uefap.com/vocab/vo
cfram.htm. Use these words as academic
word candidates.
1,937 GSL Headwords
570 AWL Headwords
24Data Analysis and Findings
1.2 Build a wordlist of the EICT Corpus,
resulting totally in 6064 word types.
1.3 Use academic word candidates to mark all GSL
and AWL in the corpus. Lemmatize them,
resulting in 941 headwords of academic word
candidates with 5 occurrences.
Sort in alphabetical order
25Data Analysis and Findings
26Data Analysis and Findings
Finding 1
From 941 words, 343 words with 5 occurrences,
positive keyness, and significance difference are
cropped up as high-frequency academic words.
Excluding function words
Sort in alphabetical order
Sort according to keyness
27Data Analysis and Findings
Finding 1
- All 343 high-frequency academic words can be
classified into 2 groups. - 246 academic words
- e.g. access, compute, illustrate indicate,
identify, manipulate, - term, category, feature, occurrence,
symbol etc. - 97 semi-technical words
- 2.1 Words with technical senses or particular
meaning - e.g. burn, drive, refresh, card, domain,
engine, memory, field - application, character, Word,
document, window etc. - 2.2 Words in mathematics, geometric shape
and diagram - e.g. add, multiply, divide, axis, table,
row, degree etc. - 2.3. Simple words frequently used as
command or method - e.g. edit, enable, paste, shift,
help, enter, drag, drop etc.
28Data Analysis and Findings
Question 2 What are high-frequency technical
words in ICT specialized texts?
Similarly to the method in Question 1 2.1 Build
word frequency list of the whole EICT Corpus. 2.2
Exclude all function words and high-frequency
words in finding 1. 2.3 Lemmatize the remaining
words, resulting in 938 headwords. 2.4 Keep only
words with 5 occurrences and technical
meanings. 2.5 Compare the resulting wordlist with
BNC wordlist, using Log Likelihood at the
p value 0.000001.
29Data Analysis and Findings
Finding 2
From 938 words, 267 words with 5
occurrences, positive keyness, and significance
difference are selected to be high-frequency
technical words.
Sort according to keyness
30Data Analysis and Findings
Finding 2
All 267 resulting words are classified into 5
groups 1. 106 words with particular meanings
(different from general meaning) e.g.
cache, cookies, bus, port, bitmap, chip, cursor,
pixel etc. 2. 61 abbreviations, acronyms, and
extensions e.g. ASCII, WYSIWYG, ALU, ROM,
RAM, OS, RGB, ESC, ALT txt, doc,
gif, wav, http, html, www etc. 3. 70 words
concerning programs, commands and keys e.g.
spreadsheet, database, notepad, wizard, Telnet,
Apple backspace, alternate, tab,
deselect, browse, redo etc. 4. 17 words in
mathematics, geometric shapes and diagram
e.g. equation, ellipse, polygon, cell, column,
intersection etc.
31Data Analysis and Findings
Question 3 What are high-frequency collocations
in ICT specialized texts?
32Data Analysis and Findings
3.2 On the cluster tab, select only the 2-5
clusters with technical meaning and frequent uses.
33Data Analysis and Findings
3.3 Compute the relation value, on the collocate
tab.
Sort according to the relation value
34Data Analysis and Findings
3.4 Select only the collocations with 5
occurrences, MI scores 5.000, and
distribution in 3 text files.
- Collocations with technical meanings
- e.g. QWERTY layout, recycle bin,
peripheral device, - operating system (OS), uniform
resource location - (URL), hypertext markup language
(html) -
- Collocations with frequent use
- e.g. refer to, (be) referred to as, (be)
concerned with, - consist of, conform to, such as, in
order, in order to
(Note on-going analysis)
35Data Analysis and Findings
Question 4 What are high-frequency verb tenses
in ICT specialized texts?
4.1 Using the tagged corpus, produce the
concordances of verbs e.g. For simple
present VVB base form of
lexical verb (except infinitive) e.g. take,
leave VVZ -s form or lexical
verb e.g. takes, leaves For
continuous tense VVG -ing
form of lexical verb e.g. taking, leaving
36Data Analysis and Findings
Samples of tagged concordances of allow in
simple present
37Data Analysis and Findings
4.2. Before counting the frequency, check to
ensure whether the concordances belong to
the tense being studied.
38Data Analysis and Findings
4.3. Compare the frequency and the dispersion
value of each tense.
Sample dispersion of simple present which
uniformly spreads in all subcorpora.
Note on-going analysis
39Data Analysis and Findings
Question 5 What are high-frequency usage of
verbs in passive form in ICT specialized texts?
Similarly to the method in Question 4 5.1 Use
the tagged corpus, produce the concordances of
past participle i.e. VVN past
participle of lexical verb e.g. taken, left 5.2.
Classify the usage of past participles as parts
of tenses or as modifiers. 5.3 Compare
the frequency and dispersion of each usage.
(Note on-going analysis)
40Discussion
- Significance of the study
- Provide an overall idea about language
description of - English for ICT.
- Provide a clear goal of language learning for
serving - particular learning needs.
In materials design, teacher knows which language
items should be focused on in designing
lessons and which ones are already known
by the students. Apart from typical teaching
materials, a corpus itself can also be a
great source of learning. It makes possible for
students direct access to the corpus, which
can promote data-driven learning.
41References
42References
Nation, P. (2001). Learning Vocabulary in Another
Language. Cambridge Cambridge University
Press. Nation, P. and Waring, R. (1997).
Vocabulary size, text coverage and word lists.
In Schmitt, N. and McCarthy, M. (eds.)
Vocabulary Description, Acquisition and
Pedagogy. pp. 6-19. Cambridge Cambridge
University Press. Nurweni, A. and Read, J.
(1999). The English vocabulary knowledge of
Indonesian university students.English for
Specific Purposes. Volume 18 (2) pp. 161 175.
Elsevier Science. Scott, M. (2001). Comparing
corpora and identifying key words, collocations,
frequency distributions through the WordSmith
Tools suite of computer programs. In Ghadessy,
M., Henry, A., and Roseberry, R.L. (2001). Small
Corpus Studies and ELT Theory and Practice.
pp. 47-67. US John Benjamins Publishing. Scott,
M. (2007). WordSmith Tools version 5.0. Oxford
University Press. Available at
http//www.lexically.net/wordsmith/index.html.
Supatranont, P. (2005a). Classroom
concordancing Increasing vocabulary size for
academic reading. KOTESOL Proceeding 2005. pp.
35-44. South Korea. Supatranont, P. (2005b). A
Comparison of the Effects of the
Concordance-based and the Conventional Teaching
Methods on Engineering Students English
Vocabulary Learning. Online Ph.D. Dissertation,
Program of English as an International Language,
Chulalongkorn University, Thailand. Available at
http//www.arts.chula.ac.th/ling/thesis/Pisamai
2548.pdf West, M. (1953). A General Service List
of English Words. London Longman, Green and
Company.
43Any Questions?