Title: Scripts, Layout and Segmental Awareness
1Scripts, Layout and Segmental Awareness
- Richard Sproat
- SALA 25
- September 15-18, 2005
- University of Illinois at Urbana-Champaign
2Overview
- Computational model of script layout
- Application to Brahmi-derived scripts
- Implications for phonemic awareness
- Are readers of Indian scripts aware of phonemes?
- A computational model of scriptal influence on
phonemic awareness - Further issues phonology or writing?
3Computational Theory of Writing Systems
- Relation between writing and language is a
regular relation (in the sense of formal language
theory) - Writing tends to represent a consistent level of
linguistic representation - Glyphs are combined via a small set of
two-dimensional catenators
4Two-Dimensional Catenators
95 of Chinese Characters ever invented consist
of a semantic and a phonetic component
5?
?
?
6?
?
?
7 ?
?
?
8 ?
?
?
9 ?
?
?
10Brahmi-Derived Indic Scripts
11Properties of Indic Scripts
- Glyphs are arranged into orthographic syllables,
called aksara -CV- CV- CV - - Within each aksara
- consonant sequences are first composed together
using various script/glyph-dependent catenators - then vowels are arranged around the consonant
glyphs using various catenators - Word-initial vowels are written in a full form
- A consonant (sequence) written with no vowel
symbol is understood to have an inherent vowel
12Devanagari Vowels
13Kannada Diacritic Vowels
14(No Transcript)
15(No Transcript)
16g(x) graphical expression of phone x
gr(x) reduced form
17Script Index Feature Vectors
expressed,catenator,reduced,fused,complex,transpa
rent
18Summary of Formal Treatment
- The theory explicitly treats Indic writing
systems as segmental - At an abstract level, symbols are just catenated
together the particular mode of catenation is
only an issue of rendering. - Cf. text transmission standards such as Unicode.
- But do Indic writing systems behave segmentally?
19Alphabets and Segmental Awareness
- A Claim Readers of non-alphabetic writing
systems have no conscious awareness of segments - investigations of language use suggest that many
speakers do not divide words into phonological
segments unless they have received explicit
instruction in such segmentation comparable to
that involved in teaching an alphabetic writing
system (Faber, 1992) - According to Faber, only Western alphabets, which
represent both vowels and consonants inline,
count as alphabetic - Indic scripts are not alphabetic, so readers
should not have segmental awareness
20Fabers Criteria
- Faber classifies scripts according to two main
criteria - Are all segments represented?
- Are all segments represented linearly with vowels
and consonants on a par (versus with some being
diacritics)
21Fabers Classification of Scripts
Korean
22Ethiopic (Geez)
23Is Segmental Awareness a Biproduct of Literacy in
an Alphabetic Script?
- Recently literate Portuguese speakers outperform
illiterates on phonemic segmentation - Japanese school children are less able to perform
segmental manipulation tasks than their American
counterparts - Chinese readers who have been exposed to the
pinyin transliteration system outperform Chinese
readers who have not had this exposure. - Conclusion literacy per se is not sufficient for
phonemic awareness to develop. One needs an
alphabet.
24Segmental Awareness in Korean(Sohn, 1987)
Vowel switching
o
a
This is not expected on Fabers account
25Segmental Awareness in Indian Languages
- Padakannaya (2000) tested awareness of syllables
and phonemes - Syllable manipulation rhyme recognition,
syl.deletion,syl. reversal even illiterate
speakers can handle these. - Phoneme manipulation ph. oddity, ph. deletion,
ph. reversal these cause problems for
readers of non-alphabetic writing systems. - Compared sighted children, who learned the
Kannada script with blind children who learned a
purely alphabetic Kannada Braille. - Blind children consistently outperformed sighted
children on segmental manipulation tasks.
26Phoneme Reversal
Kids start learning English
27Phoneme Awareness and Graphic Prominence
- Phonemic awareness in Kannada and other Indic
writing systems is affected by how noticeable
the components are (Padakannaya et al, 1993)
this varies cross-scriptally. - Thus, Hindi speakers find it hard to treat
anusvara and repha as separate segments. - But this is easy for Kannada speakers
28Diacritics Cross-Scriptally
- In Devanagari, anusvara is a diacritic
- Also find it easier to delete /y/ in
than /r/ in - Diacritics are less salient than non-diacritics
in other scripts. E.g. work of van Heuven (2002)
for Dutch - Errors in placement of diaeresis e.g. Bedouïen
Bedouin have no effect on word recognition,
unlike errors in letters, which have a
significant effect. - But diaeresis is required according to the Dutch
spelling conventions without the diaeresis
Bedouien should be pronounced b?duj? rather
than (correct) beduin
29Phonemic Awareness and
- Hindi speakers find it easier to delete /d/ in
doshii than they do /n/ in nadii - Vaid and Gupta (2002) show that (inline) /i/ in
Devanagari seems to be treated as a separate
segment in reading.
30Vaid Gupta (2002) Evidence for Devanagari as
an Alphabet
- Studied naming latencies in Hindi-speaking adults
and naming errors in Hindi-speaking children for
words containing short /i/. - Single C /tilak/
- Heterosyllabic C
/masjid/ - If D. is a syllabary then misorder should
only cause problems if the C sequence contains a
phonological syllable boundary (syllable-delimited
view). - If D. is an alphabet then both /tilak/ and
/masjid/ should cause problems
(phoneme-delimited view) - Both /tilak/ and /masjid/ show slower naming and
higher error rates than forms not including short
/i/. - This is consistent with Devanagari being an
alphabet.
31Vaid Guptas Results Naming
32Vaid Guptas Results Errors
33Kannada Reduced Consonants
- Padakannaya suggests an explanation for why
deleting in should be
harder than deleting the . - He notes that in cases where there is an explicit
vowel, this is generally ligatured with the
. - So the is more opaque than the
- This is not wholly satisfactory
34Proposed Model
- The ease/difficulty with which a segment is
available for conscious manipulation is directly
related to two factors - The visual prominence of the graphemic
representation of the segment - The complexity of the editing operations involved
in transforming the graphic form of the stimulus
into the graphic form of the response - How to compute edit distance?
35An Alternative Explanation Edit Operations
36Edit Operations rakta ? rata
- Delete
- Move up to inline position
- Change into full form glyph
37Edit Operations rakta ? raka
38Edit Operations rakti ? rati
- Delete
- Move up to inline position
- Change into full form glyph, linking with
39Korean Vowel Switching
hobak (pumpkin)
habok
40Formal Model
- Cost of an edit operation is given by
- We could hope to quantify the ?s by regression
against real psycholinguistic data
Movement cost
Deletion cost
Substitution cost
41Prominence and Similarity
- Need some measure of what it means to be a
diacritic - Also need a measure of similarity to quantify the
cost of substituting one glyph form for another
42Similarity Metric for Glyphs
- 26 subjects took part in a web-based survey
- Task was to rate pairs of glyphs on a 5 point
scale of similarity - Least similar 1
- Most similar 5
- 153 pairs of glyphs were judged from 3 scripts
Devanagari, Kannada and Malayalam
43Some Dissimilar Glyphs
44Some Similar Glyphs
45Are we really talking about phonology?
- Are peoples judgments of the number of sounds in
a word influenced by - Number of phonemes?
- Number of letters?
- Answer seems to be that both are relevant
(Scholes, 1993)
46How many sounds in a word?
- Scholes gave explicit instructions
- at has 2 sounds
- cat has 3 sounds
- Used a verification test to make sure people had
mastered the task
47Results
48So
- No question that judgments about segments are
influenced by spellings of words - But speakers still have some sense of the
underlying phonological structure - In Indian languages, we might assume that
speakers knowledge of phonemes is influenced by
the layout of symbols, but tests of phonemic
awareness are at least in part targeting
phonological knowledge. - Explanation of phonemic awareness behavior seems
to lie in understanding the graphical properties
of the scripts involved.