Title: LSA 369 Writing Systems Week 3
1LSA 369Writing SystemsWeek 3
- Richard Sproat
- URL http//catarina.ai.uiuc.edu/LSA270/
2Piazza Minerva obelisk
3Wahibras cartouche
4Kirchers magic lantern
5Kirchers megaphone
6Rosetta stone
7Various Greek names
8Another cartouche
9And another
10Consider ten most frequent words of English (from
Agence France Presse English newswire)
http//catarina.ai.uiuc.edu/L408/answer.html
11Linear B
12(No Transcript)
13History
- Arthur Evans discovered the first Linear A and
Linear B tablets at Knossos starting in 1900 - Linear B dates from around 1450 BC
- Evans was convinced that neither Linear A nor
Linear B could be Greek - Oddly he came close to the opposite (correct)
conclusion for Linear B when he decoded po-lo
as a probable word for horse (Gk. polos, En.
foal) - But Carl Blegens discovery in 1939 of Linear B
tablets on the Greek mainland brought that
assumption into question
14Decipherment
Introducing the Minoan Language M. G. F. Ventris
American Journal of Archaeology, Vol. 44, No. 4
(Oct. - Dec., 1940), pp. 494-520
Indeed, Ventris resisted the idea that Linear B
was Greek almost up to the time of his eventual
decipherment in 1952.
15Stages of decipherment
ru ki to ru ki ti jo ru ki ti ja Luktos
a mi ni so a mi ni si jo a mi ni si ja Amnisos
- Kobers triplets
- Ventris grid
16Linear B examples
17Confirmation
- The phonology of many words corresponded to what
was suspected for Greek from the relevant period - wa-na-ka (wanaks, later anax ruler)
- i-qo (iqqwos, later hippos horse)
- No definite articles
- Confirmation from new finds by Blegen
-
ti ri po de
qe to ro we
qwetrowes ? tetr-
18Completeness of the decipherment
19(No Transcript)
20The Phaistos Disk
- Discovered July 3, 1908, by the Italian
excavation team at Phaistos (Fa?st?? ),
Crete, headed by Luigi Pernier - Found in a set of buildings off to the northwest
end of the Phaistos palace site - A tablet in Minoan Linear A was found nearby
- Thought to date from roughly 1800 BC (middle of
the late Minoan bronze age)
21The text
- 241 tokens with 45 distinct glyphs
- Glyphs are all pictographic images of animals,
people, various objects - Text is on both sides of disk in a spiral working
from the outside - The Phaistos Disk is the worlds first known
printed document - Text is broken into 61 (31/30) regions separated
by vertical bars. - There is no other artifact known to be written in
the same script
22(No Transcript)
23Decipherments
- There have been well over 20 published
decipherments. Some of the proposed languages - Greek (most common)
- Basque
- Sanskrit
- Chinese (!)
- One published argument that it is pseudowriting
- A couple of suggestions that it was a calendar
- A few published arguments that its a fake
- John Chadwick described the Disk as a permanent
thorn in the flesh of Minoan epigraphists, and
considered it to be undecipherable.
24The Disk in the popular press
1984 National Geographic honors Fischer with an
all-expenses-paid trip to Washington from
Germany for his decipherment of the Disk
25The nature of the script
- Most would-be decipherers have assumed the script
is more or less of the same type as Linear A and
Linear B a V/CV syllabary - More on how these work momentarily
- Arguments are based on
- the apparent number of symbols in the inscription
and - the putative relationships with other scripts of
the region
26Text of the disk, with Evans glyph numbers
Side A 02-12-13-01-18/ 24-40-12 29-45-07/
29-29-34 02-12-04-40-33 27-45-07-12 27-44-08
02-12-06-18-? 31-26-35 02-12-41-19-35 01-41-40-07
02-12-32-23-38/ 39-11 02-27-25-10-23-18 28-01/
02-12-31-26/ 02-12-27-27-35-37-21 33-23
02-12-31-26/ 02-27-25-10-23-18 28-01/
02-12-31-26/ 02-12-27-14-32-18-27 06-18-17-19
31-26-12 02-12-13-01 23-19-35/ 10-03-38
02-12-27-27-35-37-21 13-01 10-03-38 Side
B 02-12-22-40-07 27-45-07-35 02-37-23-05/
22-25-27 33-24-20-12 16-23-18-43/ 13-01-39-33
15-07-13-01-18 22-37-42-25 07-24-40-35
02-26-36-40 27-25-38-01 29-24-24-20-35 16-14-18
29-33-01 06-35-32-39-33 02-09-27-01 29-36-07-08/
29-08-13 29-45-07/ 22-29-36-07-08/ 27-34-23-25
07-18-35 07-45-07/ 07-23-18-24 22-29-36-07-08/
09-30-39-18-07 02-06-35-23-07 29-34-23-25 45-07/
27The Disk Fischers decipherment (Greek)
"Hear ye, Cretans and Greeks my great, my quick!
Hear ye, Danaidans, the great the worthy! Hear
ye, all blacks, and hear ye, Pudaan and Libyan
immigrants! Hear ye, waters, yea earth Hellas
faces battle with the Carians. Hear ye all! Hear
ye, Gods of the Fleet, aye hear yea all faces
battle with the Carians. Hear ye all! Hear ye,
the multitudes of black people and all! Hear ye,
lords, yea freemen To Naxos! Hear ye, Lords of
the Fleet To Naxos!"
28Faucounaus decipherment (as Ionic Greek)
29Faucounaus arguments
- No need of a 2nd disk to confirm solution
- ...nous pensions au départ qu'il ne serait
possible que lorsque l'on disposerait d'un second
disque, auquel on pourrait appliquer les valeurs
phonétiques trouvées. Mais le nombre et surtout
la qualité des preuves que nous avons pu
découvrir au fil des années nous font considérer
aujourd'hui comme superflue la mise au jour d'un
second disque, - Internal proofs
- Coincidence of the sounds derived for the forms
via statistical analysis and acrophonic
principle - Evidence based on corrected typos in the disk
- External proofs
- General arguments to the effect that the Ionians
could have been in Crete at the time
ka from kare head or maybe Kar Carian (
Philistine)
30La lamproie
X
ka
s
la
lae
yi
to
- On voit encore nettement, sur les photos
agrandies, la queue de lanimal avec ses
écailles. - Why a lamprey? Many more obvious words like
lampas lamp - And theres a slight problem with the biology
31Glyphset
32Autodecipherment Knights proposal
- Assume a standard source-channel language
modeling approach - Script form is the observation S
- Language model L over sequences of phonemes in
target language is the source - Noisy channel C is the spelling rules mapping
between the language and the script - The decoding problem is to find the optimal
solution in S?C?L - Use Expectation-Maximization to solve this
problem - Solution will have the lowest cross-entropy
- Actually this is an old idea, dating back to the
early work on HMMs at the IDA, and Shannons
work on codebreaking
33Example Spanish
- Target ancient text is the first page of Don
Quijote - Language model is built over phonetically
transcribed medical text - Initial channel model allows any sound to map to
any character with equal probability. - Task is to learn the weights on the mappings
- Final result decodes 96 of the sounds
- 99 phoneme accuracy for Japanese kana
- 22 syllable accuracy for Chinese
- Has also shown that you can crack substitution
ciphers, and find the correct language among a
couple of handfuls of candidate languages.
34Issues
- In most real cases we dont necessarily know the
underlying language - or the form of the underlying language
- Ancient scripts often encode phonological
information in complex ways - Many scripts are mixed they encode both sound
and aspects of the meaning
35Application to Linear B Could Ventris have been
replaced by a computer?
Loss of most voicing aspiration
distinctions bP -gt p T -gt t d -gt dt Kg -gt
k Final consonant deletion Cons -gt
ltepsilongt / __ lteosgt Son -gt ltepsilongt /
__ Cons s -gt ltepsilongt / __
Obs right-to-left ltepsilongt -gt e / Obs _
Cons e ltepsilongt -gt a / Obs _ Cons
a ltepsilongt -gt i / Obs _ Cons i ltepsilongt
-gt o / Obs _ Cons o ltepsilongt -gt u / Obs
_ Cons u
- Tri-syllable language model built on 1.6 million
words of Greek from Hesiod to the Hellenistic
period - Mapped back to guesses about Mycenean forms using
two kinds of rules - Historical reconstruction
- Phonological simplification
- Mycenaean data is 12,730 syllables from Ventris
Chadwick Documents in Mycenaean Greek - Removed all ideograms, uncertain cases, and
phrases that contained the syllables pte, nwa
?ess??? ????a??? me se ne wa te na i o ????t?
µet? t?? ????? we ke re to me ta to wa ro
t -gt q / __ e p -gt q / __ (ioua) ltepsilongt
-gt w / ltbosgt __ Vow
36Procedure
- Randomly select mapping between truth
(Ventris/Chadwick syllables) and some permutation
of the syllables. - E.g. map to ? ka, pe ? ro, nu ? mu
- 61 syllables so 61! possible permutations
- Measure the cross-entropy between the language
model and the resulting permuted text
37Results
p lt 1.69 x 10-9 (5.52 s.d.) 8.58 x 1075
truth
optimal
38Best cross-entropy
to to e ka o e ro i ke ta ko u jo o te we a
te na qe re ro ja ke pa me ta re ra pe pe ko me ra
Substitute most common syllable according to
language model training data for the most common
in the Ventris/Chadwick corpus, second most
common for second most common, and so forth