Title: Text Encoding Issues
1Text Encoding Issues
- The British Academic Written English
- (BAWE) project
- Corpus Linguistics
- University of Birmingham
- July 16th, 2005
2Assessed student writing
Which theoretical approach has best helped you
make sense of The Waste Land and why?
Case Study of the white-throated capuchin monkey
(Cebus capucinus)
Would you agree that subordination was inscribed
into the life of a domestic servant?
Explore the significance of the chat show genre
as contributor to the project of feminist
heterosexual politics
Information Systems Development
Critical Commentary p180, from "Le jour je
m'égarais..." to "le démon de mon coeur".
The expenditure of National Lottery funds on the
arts in Britain cannot be convincingly defended.
Discuss.
3Assessed student writing
4Text Encoding Issues
- General issues
- A first stage of BAWE mark-up
- Dimensions
- Interactive tagging
- Specific questions
- Text hierarchy
- Formulae
5A first stage of BAWE mark-up
- shift in document format DOC ? XML TEI standard
- formatting preserve information
- automatic vs. manual stepsof annotation
6Dimensions of mark-up
- Text hierarchy
- front, body, back
- sections
- paragraphs
- s-units
- Text flow
- highlighting
- lists
- figures
- tables
- formulae
- block quotes
7Interactive tagging
- Tagging by clicking
- graphical interface
- quick tagging
- reduce errors
- impose coherence
8Interactive tagging
9What goes into ltfrontgt vs. ltbodygt?
- Example of two first pages
10Encoding of example pages
Anthropology vs. English Studies assignment
- ltfrontgt
- lttitlePagegt
- ltdocTitlegt
- lttitlePart type"main"gtCase Study of the
white-throated capuchin monkey (lthi
rend"italic"gtCebus capucinuslt/higt)lt/titlePartgt - lttitlePartgtxxxlt/titlePartgt
- lt/docTitlegt
- ltfigure id"BAWE_3016a-pic1"/gt
- lt/titlePagegt
- lt/frontgt
- ltbodygt
- ltfrontgt
- ltdocTitlegt
- lttitlePart type"main" rend"underline"gtDiscuss
the handling of the discourses of religion and
the effects of religious and ethical change in
the Victorian periodlt/titlePartgt - lt/docTitlegt
- lt/frontgt
- ltbodygt
11Formulae
- equations (and all kinds of variations of )
- chemical formulae
- arithmetic expressions
- logical expressions
- expressions following some other
discipline-specific formalism (e.g. computer
code, phonetic transcription etc.) - a part ("term") of any of these (if non-NL)
12Insert empty ltformulagt tag
- anything that has been inserted with the MS
formula editor (appears as a "field") - any complex formal expression, i.e. that cannot
be represented as a simple sequence of characters
(e.g. fraction, square root) - 0
- I(?s)
- Q
- any formal expression separated typographically
from running text (new paragraph)
13Example
- ... The slope of the yield curve can be analysed
by looking at the spread between the long-term
and the one-period, short-term interest rate,
denoted as Snt Rnt rt. If we manipulate
equation 1, the yield spread, Snt, can be written
as the expectation of a weighted average of
future changes in short-term interest rates as
follows - Snt Et Snt
- Snt (1/n) (n-1)?rt1 (n-2)?rt2
?rt(n-1) 2
- ltpgtltsgt...lt/sgt ltsgtThe slope of the yield curve
can be analysed by looking at the spread between
the long-term and the one-period, short-term
interest rate, denoted as Slthi rend"italic"gtlthi
rend"sup"gtnlt/higtlthi rend"sub"gttlt/higtlt/higt
Rlthi rend"italic"gtlthi rend"sup"gtnlt/higtlthi
rend"sub"gttlt/higtlt/higt rlthi rend"italic"gtlthi
rend"sub"gttlt/higtlt/higt.lt/sgt ltsgtIf we manipulate
equation 1, the yield spread, Slthi
rend"italic"gtlthi rend"sup"gtnlt/higtlthi
rend"sub"gttlt/higtlt/higt, can be written as the
expectation of a weighted average of future
changes in short-term interest rates as
followslt/sgtlt/pgt - ltpgtltformula notation"" id"EC0001-form2"/gtlt/pgt
14Principles of mark-up
- Keep the structure of the document as close to
the original as possible - Mark up elements relevant to our research
- Should be cost effective
15Text Encoding Issues
- Signe Oksefjell Ebeling
- sebeling_at_brookes.ac.uk
- Alois Heuboeck
- a.heuboeck_at_reading.ac.uk