LSA 369 Writing Systems Week 3 - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

LSA 369 Writing Systems Week 3

Description:

LSA 369 Writing Systems Week 3 – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 31
Provided by: richar781
Category:
Tags: lsa | crna | systems | week | writing

less

Transcript and Presenter's Notes

Title: LSA 369 Writing Systems Week 3


1
LSA 369Writing SystemsWeek 3
  • Richard Sproat
  • URL http//catarina.ai.uiuc.edu/LSA270/

2
Piazza Minerva obelisk
3
Wahibras cartouche
4
Kirchers magic lantern
5
Kirchers megaphone
6
Rosetta stone
7
Various Greek names
8
Another cartouche
9
And another
10
Consider ten most frequent words of English (from
Agence France Presse English newswire)
http//catarina.ai.uiuc.edu/L408/answer.html
11
Linear B
12
(No Transcript)
13
History
  • Arthur Evans discovered the first Linear A and
    Linear B tablets at Knossos starting in 1900
  • Linear B dates from around 1450 BC
  • Evans was convinced that neither Linear A nor
    Linear B could be Greek
  • Oddly he came close to the opposite (correct)
    conclusion for Linear B when he decoded po-lo
    as a probable word for horse (Gk. polos, En.
    foal)
  • But Carl Blegens discovery in 1939 of Linear B
    tablets on the Greek mainland brought that
    assumption into question

14
Decipherment
Introducing the Minoan Language M. G. F. Ventris
American Journal of Archaeology, Vol. 44, No. 4
(Oct. - Dec., 1940), pp. 494-520
Indeed, Ventris resisted the idea that Linear B
was Greek almost up to the time of his eventual
decipherment in 1952.
15
Stages of decipherment
ru ki to ru ki ti jo ru ki ti ja Luktos
a mi ni so a mi ni si jo a mi ni si ja Amnisos
  • Kobers triplets
  • Ventris grid

16
Linear B examples
17
Confirmation
  • The phonology of many words corresponded to what
    was suspected for Greek from the relevant period
  • wa-na-ka (wanaks, later anax ruler)
  • i-qo (iqqwos, later hippos horse)
  • No definite articles
  • Confirmation from new finds by Blegen

ti ri po de
qe to ro we
qwetrowes ? tetr-
18
Completeness of the decipherment
19
(No Transcript)
20
The Phaistos Disk
  • Discovered July 3, 1908, by the Italian
    excavation team at Phaistos (Fa?st?? ),
    Crete, headed by Luigi Pernier
  • Found in a set of buildings off to the northwest
    end of the Phaistos palace site
  • A tablet in Minoan Linear A was found nearby
  • Thought to date from roughly 1800 BC (middle of
    the late Minoan bronze age)

21
The text
  • 241 tokens with 45 distinct glyphs
  • Glyphs are all pictographic images of animals,
    people, various objects
  • Text is on both sides of disk in a spiral working
    from the outside
  • The Phaistos Disk is the worlds first known
    printed document
  • Text is broken into 61 (31/30) regions separated
    by vertical bars.
  • There is no other artifact known to be written in
    the same script

22
(No Transcript)
23
Decipherments
  • There have been well over 20 published
    decipherments. Some of the proposed languages
  • Greek (most common)
  • Basque
  • Sanskrit
  • Chinese (!)
  • One published argument that it is pseudowriting
  • A couple of suggestions that it was a calendar
  • A few published arguments that its a fake
  • John Chadwick described the Disk as a permanent
    thorn in the flesh of Minoan epigraphists, and
    considered it to be undecipherable.

24
The Disk in the popular press
1984 National Geographic honors Fischer with an
all-expenses-paid trip to Washington from
Germany for his decipherment of the Disk
25
The nature of the script
  • Most would-be decipherers have assumed the script
    is more or less of the same type as Linear A and
    Linear B a V/CV syllabary
  • More on how these work momentarily
  • Arguments are based on
  • the apparent number of symbols in the inscription
    and
  • the putative relationships with other scripts of
    the region

26
Text of the disk, with Evans glyph numbers
Side A 02-12-13-01-18/ 24-40-12 29-45-07/
29-29-34 02-12-04-40-33 27-45-07-12 27-44-08
02-12-06-18-? 31-26-35 02-12-41-19-35 01-41-40-07
02-12-32-23-38/ 39-11 02-27-25-10-23-18 28-01/
02-12-31-26/ 02-12-27-27-35-37-21 33-23
02-12-31-26/ 02-27-25-10-23-18 28-01/
02-12-31-26/ 02-12-27-14-32-18-27 06-18-17-19
31-26-12 02-12-13-01 23-19-35/ 10-03-38
02-12-27-27-35-37-21 13-01 10-03-38 Side
B 02-12-22-40-07 27-45-07-35 02-37-23-05/
22-25-27 33-24-20-12 16-23-18-43/ 13-01-39-33
15-07-13-01-18 22-37-42-25 07-24-40-35
02-26-36-40 27-25-38-01 29-24-24-20-35 16-14-18
29-33-01 06-35-32-39-33 02-09-27-01 29-36-07-08/
29-08-13 29-45-07/ 22-29-36-07-08/ 27-34-23-25
07-18-35 07-45-07/ 07-23-18-24 22-29-36-07-08/
09-30-39-18-07 02-06-35-23-07 29-34-23-25 45-07/
27
The Disk Fischers decipherment (Greek)
"Hear ye, Cretans and Greeks my great, my quick!
Hear ye, Danaidans, the great the worthy! Hear
ye, all blacks, and hear ye, Pudaan and Libyan
immigrants! Hear ye, waters, yea earth Hellas
faces battle with the Carians. Hear ye all! Hear
ye, Gods of the Fleet, aye hear yea all faces
battle with the Carians. Hear ye all! Hear ye,
the multitudes of black people and all! Hear ye,
lords, yea freemen To Naxos! Hear ye, Lords of
the Fleet To Naxos!"
28
Faucounaus decipherment (as Ionic Greek)
29
Faucounaus arguments
  • No need of a 2nd disk to confirm solution
  • ...nous pensions au départ qu'il ne serait
    possible que lorsque l'on disposerait d'un second
    disque, auquel on pourrait appliquer les valeurs
    phonétiques trouvées. Mais le nombre et surtout
    la qualité des preuves que nous avons pu
    découvrir au fil des années nous font considérer
    aujourd'hui comme superflue la mise au jour d'un
    second disque,
  • Internal proofs
  • Coincidence of the sounds derived for the forms
    via statistical analysis and acrophonic
    principle
  • Evidence based on corrected typos in the disk
  • External proofs
  • General arguments to the effect that the Ionians
    could have been in Crete at the time

ka from kare head or maybe Kar Carian (
Philistine)
30
La lamproie
X
ka
s
la
lae
yi
to
  • On voit encore nettement, sur les photos
    agrandies, la queue de lanimal avec ses
    écailles.
  • Why a lamprey? Many more obvious words like
    lampas lamp
  • And theres a slight problem with the biology

31
Glyphset
32
Autodecipherment Knights proposal
  • Assume a standard source-channel language
    modeling approach
  • Script form is the observation S
  • Language model L over sequences of phonemes in
    target language is the source
  • Noisy channel C is the spelling rules mapping
    between the language and the script
  • The decoding problem is to find the optimal
    solution in S?C?L
  • Use Expectation-Maximization to solve this
    problem
  • Solution will have the lowest cross-entropy
  • Actually this is an old idea, dating back to the
    early work on HMMs at the IDA, and Shannons
    work on codebreaking

33
Example Spanish
  • Target ancient text is the first page of Don
    Quijote
  • Language model is built over phonetically
    transcribed medical text
  • Initial channel model allows any sound to map to
    any character with equal probability.
  • Task is to learn the weights on the mappings
  • Final result decodes 96 of the sounds
  • 99 phoneme accuracy for Japanese kana
  • 22 syllable accuracy for Chinese
  • Has also shown that you can crack substitution
    ciphers, and find the correct language among a
    couple of handfuls of candidate languages.

34
Issues
  • In most real cases we dont necessarily know the
    underlying language
  • or the form of the underlying language
  • Ancient scripts often encode phonological
    information in complex ways
  • Many scripts are mixed they encode both sound
    and aspects of the meaning

35
Application to Linear B Could Ventris have been
replaced by a computer?
Loss of most voicing aspiration
distinctions bP -gt p T -gt t d -gt dt Kg -gt
k Final consonant deletion Cons -gt
ltepsilongt / __ lteosgt Son -gt ltepsilongt /
__ Cons s -gt ltepsilongt / __
Obs right-to-left ltepsilongt -gt e / Obs _
Cons e ltepsilongt -gt a / Obs _ Cons
a ltepsilongt -gt i / Obs _ Cons i ltepsilongt
-gt o / Obs _ Cons o ltepsilongt -gt u / Obs
_ Cons u
  • Tri-syllable language model built on 1.6 million
    words of Greek from Hesiod to the Hellenistic
    period
  • Mapped back to guesses about Mycenean forms using
    two kinds of rules
  • Historical reconstruction
  • Phonological simplification
  • Mycenaean data is 12,730 syllables from Ventris
    Chadwick Documents in Mycenaean Greek
  • Removed all ideograms, uncertain cases, and
    phrases that contained the syllables pte, nwa

?ess??? ????a??? me se ne wa te na i o ????t?
µet? t?? ????? we ke re to me ta to wa ro
t -gt q / __ e p -gt q / __ (ioua) ltepsilongt
-gt w / ltbosgt __ Vow
36
Procedure
  • Randomly select mapping between truth
    (Ventris/Chadwick syllables) and some permutation
    of the syllables.
  • E.g. map to ? ka, pe ? ro, nu ? mu
  • 61 syllables so 61! possible permutations
  • Measure the cross-entropy between the language
    model and the resulting permuted text

37
Results
p lt 1.69 x 10-9 (5.52 s.d.) 8.58 x 1075
truth
optimal
38
Best cross-entropy
to to e ka o e ro i ke ta ko u jo o te we a
te na qe re ro ja ke pa me ta re ra pe pe ko me ra
Substitute most common syllable according to
language model training data for the most common
in the Ventris/Chadwick corpus, second most
common for second most common, and so forth
Write a Comment
User Comments (0)
About PowerShow.com