Title: Indus Script: Search for Grammar1
1 Indus Script Search for Grammar1
- Nisha Yadav
- Tata Institute of Fundamental Research
- Collaborators
- Mayank Vahia, Iravatham Mahadevan, Hrishikesh
Joglekar
1 Lecture given at a two day seminar on The
Indus Script Problems and Prospects, Chennai
2 Contents
- Indus Script - An Overview
- Various Approaches
- Our Approach
- Dataset
- Preliminary Analysis
- Analysis - 1 Check against random order
- Analysis - 2 Positional analysis of Frequent
Sign combinations - Text Beginners and Text Enders
- Segmentation of Indus Texts
- Summary
Note In the lecture, unless specified otherwise,
all text examples are from Mahadevan 1977 and
all images are from Parpolas UNESCO volumes of
Indus seals.
3 1) Indus Script An Overview
4Indus Valley Civilization
From Mahadevan, 1977
5Indus Script Pointers to understand
- Indus script is one of the few scripts that defy
decipherment. - Inscriptions found only on small objects like
seals. - The inscriptions are very brief average length
4-5 signs. - There are only 417 signs in the script as per
Mahadevans Concordance (1977). - The script is pictographic with signs showing
human, fish etc. - Signs are modified by joining or by strokes and
many signs appear as combination of other simple
signs. - The direction of the script is variable (mostly
right to left 83 of times). - In general the seals are of 1 to 2 square inches
in size. - There are no bi-lingual texts to aid
decipherment.
6Direction indicators of the script
- Cramping or overflow of signs at the left end
- Orientation of asymmetric signs
- Sequence of frequent combinations of signs
- Split sequences
A split sequence indicating direction
7Scale of a typical seal
For the most part, seals are between 1 inch or 2
inches square.
From Professor John C. Huntingtons ppt
8SEAL
SEAL IMPRESSION
SEAL
SEAL IMPRESSION
From Professor John C. Huntingtons ppt
9Specimens of Indus Texts on different objects
Text No. Text
From Mahadevan, 1977
10INDUS INSCRIPTIONS
Courtesy www.harappa.com
11Indus Script Signs (1 to 110)
From Mahadevan, 1977
12 2) Various Approaches
13Indus Script
- Scientists from a variety of disciplines have
attempted - to read the Indus script with no clear answer.
- Various attempts so far include
- I. Mahadevans analytical work Creation of
first Published Concordance (1977) - Gift Siromoneys statistical work
- A. Parpolas comparison with Dravidian
- Russian groups comparison with Dravidian
- Subbarayappas interpretation as pure numerals
- S. R. Raos interpretation as Vedic literature
- Others (Ref. Possehl,1996)
143) Our Approach
15- We make no assumption about its content or
meaning. - Our first emphasis is to attempt to WRITE IN THE
SCRIPT RATHER THAN READ. - We search for rules of writing without assigning
meanings or interpretations. - We ignore variation due to archaeological context
of sites, stratigraphy and type of objects.
16 4) Dataset
17Dataset
- Unambiguous data subset (EBUDS) was created for
analysis of the grammar of Indus writing, from
the original electronic dataset of Mahadevan
(1977) partially modified as M80. - EBUDS Extended Basic Unique Dataset, excludes
- All ambiguous lines
- All texts from sides having multiple lines
- All duplicates (keeping their single occurrence)
- Thus, EBUDS consists of 1548 lines of texts, with
7000 sign occurrences.
185) Preliminary Analysis
19Frequency distribution of Indus Signs
- Only 67 (16 of total no. of signs) signs
account for over 80 of the writing.
20Conclusions from Preliminary Analysis
- The frequency distribution of the signs in EBUDS
is consistent with M77. - The manner of choosing the data set has not
changed the pattern of occurrence of various
signs and the results are consistent with the
analysis of M77. - Only 67 signs (16 of total no. of signs) account
for over 80 of the writing.
216) Analysis 1Check against Random Order
22Methodology
- We take 1548 unique texts (7000 signs) present in
EBUDS. - We randomise their appearance keeping the
frequency of each sign as in EBUDS. - We split this long random string (of 7000 signs)
into texts of 1 to 14 signs as in EBUDS. - We create 10 such random databases.
- We then compare the frequency of their sign
pairs, triplets etc. with Genuine Indus database
(EBUDS) to check if Indus texts have any
significant sequencing.
23Comparison of EBUDS with Random Datasets
24Result of Analysis 1
25Conclusions from Analysis 1
- String lengths of 2, 3 and 4 signs appear with
frequency far higher than expected by random
chance. - The signs are ordered in a specific manner.
- It is justifiable to state that Indus texts
followed certain rules and thereby meant
something significant and meaningful.
26- 7) Analysis 2
- Positional analysis of Frequent Sign Combinations
27Positional Analysis of Frequent Two-sign
Combinations
28Positional Analysis of Frequent Three-sign
Combinations
29Positional Analysis of Frequent Four-sign
Combinations
30Conclusions from Positional analysis
- The most frequent two-sign, three-sign and
four-sign combinations appear at fixed positions. - The exact location varies from combination to
combination. - However, frequently occurring two-sign,
three-sign and four-sign combinations may be
incomplete except of course when they occur as
solo texts. - It can be seen that two-sign, three-sign and
four-sign combinations which are complete have
typically one of the text-enders (mostly 342
or 211 ) at the end. This is confirmed by
the solo occurrences of such texts.
318) Text Beginners and Text Enders
32Indus Text Beginners and Enders
33- Consider an Indus Text with Signs
- G F E D C B A
- (In order of their statistical significance)
Frequent Text Beginners
Frequent Text Enders
34Specimens of Indus Texts illustrating syntactical
patterns
From Mahadevan (1986)
35Conclusions for Indus Script
- There are well defined text-enders though
text-beginners are not that well-defined. - Sign distribution within the strings seems to be
ordered as per some specific rules. The
distribution is far more significant than would
arise by chance. - This indicates existence of patterns and rules
that need to be dug out.
369) Segmentation of Indus Texts
37Segmentation Approach
- There can be various methods which can be used
for segmenting an - Indus text namely
- Comparing texts
- Using frequent combinations of signs
- Using Pair Frequencies
- Using Single Signs (Enders, Beginners, Auxiliary
Enders) - These methods are overlapping and hence it is
decided to select an - approach which takes into consideration the
effect of each of these. - A cumulative method based on statistically
significant units, is thus - formulated.
38Segmentation using Comparison of texts
Longer text can be shown to consist of 2 or more
shorter texts occurring as complete texts
elsewhere indicating the boundaries
From Mahadevan (1986)
39Segmentation using Frequent Sign Combinations
A few highly frequent signs form stable
combinations with other signs. Then these
sign-combinations can be treated as separate
segments.
From Mahadevan (1986)
40Segmentation using Pair frequencies
Comparison of frequencies of successive adjacent
pairs reveals boundaries at the weakest
junctions.
From Mahadevan (1986)
41Segmentation Process
Percent of texts split (for texts of 5 or more
signs)
INDUS TEXT
Look for pair, triplet and quad texts successively
55 split
Look for frequent 4, 3 and 2 sign combinations
successively
77 split
Look for Enders, Beginners and Auxiliary Enders
successively
88 split
TEXT SEGMENTS
42Segment Length vs. Segment Frequency in EBUDS
before and after segmentation
43EBUDS before and after segmentation
EBUDS after Segmentation
EBUDS before Segmentation
44Few Examples of Segmentation
45Conclusions from segmentation
- It is possible to segment 88 of Indus texts of
length 5 and above into segments of length 4 and
below by using statistically significant signs
and their combinations in addition to all the
texts of length 2, 3 and 4. - Many frequent sign combinations make their
appearance as independent texts. - The Indus texts after segmentation can be viewed
as permutations of the identifiable units
(segments) of 2, 3 or 4 signs. - The identifiable units may or may not be
standalone (or complete) pieces of information.
4610) Summary
47Summary
- The writing is highly ordered.
- Typical length of information containing units is
2, 3 or maximum 4 signs. - However, they are not always complete enough to
exist as standalone pieces of text. - This suggests a more complex grammar in the
writing where information units need proper
beginners or enders. - The present study shows that Indus writing seems
to have specific ordering as would be expected if
sophisticated information is coded. This is
consistent with the general level of
sophistication associated with the Indus culture.
48End