Quantitative approaches to language change - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Quantitative approaches to language change

Description:

2. Automated lexicostatistics: results. 3. Using typological databases for ... Traditional lexicostatistics. 1st step: determine cognates on a standard list: ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 28
Provided by: wich8
Category:

less

Transcript and Presenter's Notes

Title: Quantitative approaches to language change


1
Quantitative approaches to language change
  • Søren Wichmann
  • MPI-EVA Leiden University

2
Overview
  • 1. Automated lexicostatistics tools and
  • methods
  • 2. Automated lexicostatistics results
  • 3. Using typological databases for
  • historical linguistic research

3
Automated lexicostatistics methods
  • Lexicostatistics invented in the early 1950s
  • Recent renaissance due to two new developments
  • Phylogenies can more meaningfully be established
    using modern computational methods developed by
    bioinformaticians
  • Subjective determinations of cognacy can be
    replaced by an objective, automated method

4
Traditional lexicostatistics 1st step determine
cognates on a standard list
Meaning Cocopa Diegeño Cognate?
fire a?á ?aw yes
nose ixú xú yes
one ?ít ?axínk no
Etc. ... ... ...
5
2nd step build a matrix of percent similarities
Cocopa Diegeño Hualapai
Havasupai ... Cocopa 0
90 77 80 Diegeño
90 0 87
75 Hualapai 77 87
0 72 Havasupai 80
75 72 0
(invented example)
3rd step find a graphic way of expressing
the similarities and interpret this
as a phylogeny
6
Fragment of matrix of similarities among Salishan
languages from Swadesh (1950)
7
Salish relations, after Swadesh (1950)
8
UPGMA tree produced in SplitsTree (UPGMA
Unweighted Pair Group Method with Arithmetic mean)
9
(No Transcript)
10
Tools for producing a tree from a similarity
matrix
  • Convert the similarity matrix to a distance
    matrix using a spreadsheet such as Excel
  • Prepare an input file to your preferred
    phylogenetic software using an editor such as
    TextPad (free from www.textpad.com)
  • Run the data using phylogenetic software
    SplitsTree can be recommended (free from
    www.splitstree.org)
  • Choose the most appropriate algorithm (Neighbour
    Joining recommended for distance data)
  • Prepare your tree for presentation using using a
    tool such as the Tree Explorer of MEGA

11
Preparing the input file
  • Look at the example files that come with
    Splitstree and imitate them. For instance this

12
(No Transcript)
13
nexus BEGIN Taxa DIMENSIONS ntax30 TAXLABELS
BellaCoola Comox etc. ... END BEGIN
distances DIMENSIONS ntax30 FORMAT
triangleLOWER diagonal
labels missing? MATRIX BellaCoola
0 Comox 80 0 etc. ... END
14
Lets do this using TextPad ?
15
  • Now we produce a tree from the data
  • Lets do that using SplitsTree,
  • and lets look at different algorithms
  • and features of the program ?

16
Illustrating the difference between UPGMA and
Neighbour Joining
17
UPGMA assumes that all members of a cluster have
the same amount of changes
18
Neighbour Joining doesnt make this assumption
19
Comparing the two trees
20
  • Now we prepare our tree for presentation
  • Lets do that using MEGA ?

21
Automating the similarity measure
Levenshtein distances the minimum number of
stepssubstitutions, insertions or deletionsthat
it takes to get from one word to another
Germ. Zunge ? Eng. tongue
tsu?? tu?? (substitution)
t??? (substitution) t??
(deletion) Or tongue ? Zunge
t?? t??? (insertion)
tu?? (substitution) tsu??
(substitution) 3 steps, so LD 3

22
  • There are more sophisticated versions where the
    phonetic distance
  • between segments is taken into account, but
    operating with such
  • fine distinction only becomes relevant for minute
    dialectology.
  • People who have been using the more refined
    approach
  • John Nerbonne Johan Heeringa (dialectologists,
    Groningen)
  • Michael Cysouws course
  • People who have been using raw LDs
  • Serva Petroni (physicists, Italy)
  • Myself and colleagues

23
Weighting Levenshtein distances
Serva Petroni (2008) divide by the lengths of
the strings compared. Takes into account that
LDs grow with word length Colleagues and I
divide by the length of the longest string
compared and then divide by the average of LDs
among words in Swadesh lists with different
meanings. Takes into account typical word lengths
of the languages compared and accidental
similarity due to similarities in phonological
inventories
24
Comparing results for a test set Mixe-Zoquean
languages (Mexico)
Tree based on shared phonological innovations
(data from Wichmann 1995)
Tree based on automated lexicostatistics (using
Levenshtein distances)
25
So results are similar
  • Disadvantages of automated method
  • blind to anything but lexical evidence
  • not always accurate
  • has a swallower limit of application than the
    comparative method
  • Advantages
  • extremely quick
  • consistent and objective
  • provides information on the amount of changes,
    and therefore a time perspective

26
(No Transcript)
27
So why not apply the automated method to all the
worlds languages and see what happens?
  • Tomorrow about the Automated Similarity Judgment
    Program, recent history and state of the art
Write a Comment
User Comments (0)
About PowerShow.com