Title: Typology: Language Sampling Anna Siewierska
1TypologyLanguageSampling Anna Siewierska
Dik Bakker
2Empirical Cycle
L
L
PROVISIONAL
L
L
L
Definition Categories C1 C3 Hypotheses
L
L
DATA
L
L
L
L
L
L
L
L
3Empirical Cycle
L
L
PROVISIONAL
L
L
L
Definition Categories C1 C3 Hypotheses
L
L
DATA
L
L
L
L
L
L
L
L
4Empirical Cycle
L
L
PROVISIONAL
L
L
L
Definition Categories C1 C3 Hypotheses
L
L
DATA
L
L
L
L
L
TEST
L
L
L
5Empirical Cycle
L
L
L
L
PROVISIONAL
L
L
L
Definition Categories C1 C3 Hypotheses
L
L
L
DATA
L
L
L
L
L
L
L
L
L
TEST
L
L
L
L
L
L
6Empirical Cycle
L
L
L
L
PROVISIONAL
L
L
L
Definition Categories C1 C3 Hypotheses
L
L
L
DATA
L
L
L
L
L
L
L
L
L
TEST
L
L
L
L
L
L
7Overview
8Overview
- Collecting language data
- Why a sample?
- Types of biases in samples
- Two strategies
- Samples in typological literature
- The DV method
9Data collecting
Languages of the world n ? 7000
10Data collecting
Languages of the world n ? 7000
S A M P L E (50 500)
11Data collecting
- Why not all languages in our database?
- Too many
- Only lt1000 well described (grammar)
- lt2000 partial sketch
- Not (always) necessary- Sometimes even wrong
- Impossible even in principle
12All Languages impossible
Extant languages 7000
13All Languages impossible
Extant languages 7000 Extinct languages 500
(Ruhlen 1991)
14All Languages impossible
- Extant languages 7000
- Extinct languages 500 (Ruhlen 1991)
- Latin, Cl. Greek, Gothic, Hebrew, Hittite,
15All Languages impossible
- Extant languages 7000
- Extinct languages 500 (Ruhlen 1991)
- Latin, Cl. Greek, Gothic, Hebrew, Hittite,
- Cl. Turkic, Cl.Tibetan, Archaic Chinese,
16All Languages impossible
- Extant languages 7000
- Extinct languages 500 (Ruhlen 1991)
- Latin, Cl. Greek, Gothic, Hebrew, Hittite,
- Cl. Turkic, Cl.Tibetan, Archaic Chinese,
- - Manx, Cornish,
17All Languages impossible
- Extant languages 7000
- Extinct languages 500 (Ruhlen 1991)
- Latin, Cl. Greek, Gothic, Hebrew, Hittite,
- Cl. Turkic, Cl.Tibetan, Archaic Chinese,
- Manx, Cornish,
- Problem?
18All Languages impossible
- Extant languages 7000
- Extinct languages 500 (Ruhlen 1991)
- Latin, Cl. Greek, Gothic, Hebrew, Hittite,
- Cl. Turkic, Cl.Tibetan, Archaic Chinese,
- Manx, Cornish,
- No native speaker intuitions
19All Languages impossible
- Extant languages 7000
- Extinct languages 500 (Ruhlen 1991)
- Latin, Cl. Greek, Gothic, Hebrew, Hittite,
- Cl. Turkic, Cl.Tibetan, Archaic Chinese,
- Manx, Cornish,
- Illinois, Mohican, Massachusett, Carolina,
20All Languages impossible
- Extant languages 7000
- Extinct languages 500 (Ruhlen 1991)
- Latin, Cl. Greek, Gothic, Hebrew, Hittite,
- Cl. Turkic, Cl.Tibetan, Archaic Chinese,
- Manx, Cornish,
- Illinois, Mohican, Massachusett, Carolina,
- X1, X2, X3, , Xn
21All Languages impossible
Extant languages 7000 Extinct languages 500
(Ruhlen 1991) X1, X2, X3, , Xn ????
22All Languages impossible
Homo Sapiens 200,000 BP Geat Leap Forward
40,000 BP Average n of lgs 6000 Diachronic
change 1000 year X lgs (40,000 / 1000) 6000
240,000
23All Languages impossible
Extant languages 7000 Extinct languages
500 X1, X2, X3, , Xn 240,000 Human
languages 247,500
24All Languages impossible
Extant languages 7000 Extinct languages
500 X1, X2, X3, , Xn 240,000 Human
languages 247,500
3.0
25All Languages impossible
Extant Documented 1500 Extinct languages
500 X1, X2, X3, , Xn 240,000 Human
languages 247,500
0.6
26All Languages impossible
Extant Documented 1500 Extinct languages
500 X1, X2, X3, , Xn 240,000 Human
languages 247,500
0.6
spoken anno 2000
27All Languages impossible
Extant Documented 1500 Extinct languages
500 X1, X2, X3, , Xn 240,000 Human
languages 247,500
0.6
spoken anno 2000
28All Languages impossible
Extant Documented 1500 Extinct languages
500 X1, X2, X3, , Xn 240,000 Human
languages 247,500
0.6
spoken anno 2000
Typology Universals of Human Language
29All Languages impossible
Extant Documented 1500 Extinct languages
500 X1, X2, X3, , Xn 240,000 Human
languages 247,500
0.6
spoken anno 2000
Human Language
30All Languages impossible
Extant Documented 1500 Extinct
Documented lt100 X1, X2, X3, ,
Xn 240,000 Human languages 247,500
spoken anno 2000
Human Language
31All Languages impossible
Uniformi- tarianism (Lass 1997)
Extant Documented 1500 Extinct
Documented lt100 X1, X2, X3, ,
Xn 240,000 Human languages 247,500
spoken anno 2000
Human Language
32All Languages impossible
Uniformi- tarianism (Lass 1997)
Extant Documented 1500 Extinct
Documented lt100 X1, X2, X3, ,
Xn 240,000 Human languages 247,500
spoken anno 2000
Human Language
33All Languages impossible
Uniformi- tarianism (Lass 1997)
Extant Documented 1500 Extinct
Documented lt100 X1, X2, X3, ,
Xn 240,000 Human languages 247,500
spoken anno 2000
Human Language
34All Languages impossible
Extant Documented 1500 Extinct languages
500 X1, X2, X3, , Xn 240,000 Human
languages 247,500
0.6
spoken anno 2000
Typology Variety among human languages
35All Languages impossible
Extant Documented 1500 Extinct languages
500 X1, X2, X3, , Xn 240,000 Human
languages 247,500
0.6
spoken anno 2000
Variety among human languages
36Variety rare types
Variety
37Variety rare types
Variety Clicks (only in one family Khoisan
30 lgs) Active nominal marking (Pomo,
Laz) Opposite person hierarchy Acc-Erg
(Tib.Burm.) Tripartite agreement on
ditransitives Syntactic ergativity (Aus,
Maya) Adverbial agreement with focal (Aus,
Cauc) OSV main clause order (S.Am) N.B.
combination of (rare) features (cf. Greenberg)
38Variety rare types
Variety Clicks (only in one family Khoisan
30 lgs) Active nominal marking (Pomo,
Laz) Opposite person hierarchy Acc-Erg
(Tib.Burm.) Tripartite agreement on
ditransitives Syntactic ergativity (Aus,
Maya) Adverbial agreement with focal (Aus,
Cauc) OSV main clause order (S.Am) ? Rara et
Rarissima
39Data collecting
- Why not all languages in our database?
- Too many
- Only lt1000 well described (grammar)
- lt2000 partial sketch
- Not (always) necessary- Sometimes even wrong
- Impossible even in principle
- Problematic for variety
- Possibly not for universality
40Data collecting
- Why not all languages in our database?
- Too many
- Only lt1000 well described (grammar)
- lt2000 partial sketch
- Not (always) necessary- Sometimes even wrong
41Data collecting
- Why not all languages in our database?
- Too many
- Only lt1000 well described (grammar)
- lt2000 partial sketch
- Not (always) necessary- Sometimes even wrong
42Too many languages
Samples in the typological literature Greenberg
(1963) Word order 30 Hawkins (1983) Word
order 225 Tomlin (1986) Word
order 402 Nichols (1992) Head/Dep
marking 174 Bybee (1994) Tense/Aspect/Mood 76 Si
ewierska Bakker (1990-) Pers.Agr. 450 Dryer
(1985-) Word order 1200 Typical PhD project
(1 person, 3 years) 50 - 100
43Data collecting
- Why not all languages in our database?
- Too many
- Only lt1000 well described (grammar)
- lt2000 partial sketch
- Not (always) necessary- Sometimes even wrong
44Data collecting
- Why not all languages in our database?
- Too many ? sample inevitable
- Only lt1000 well described (grammar)
- lt2000 partial sketch
- Not (always) necessary- Sometimes even wrong
45Data collecting
- Why not all languages in our database?
- Only lt1000 well described (grammar)
- lt2000 partial sketch
- Not (always) necessary- Sometimes even wrong
46Data collecting
- Why not all languages in our database?
- Only lt1000 well described (grammar)
- lt2000 partial sketch
- Not (always) necessary- Sometimes even wrong
47Lack of material
- Bibliographic bias
- - (very) old
- scarce
- theory specific (Tagmemics GG)
- restricted to phonology and morphology
- biased selection of the worlds languages
48Lack of material
Further types of bias
49Lack of material
- Further types of bias
- Genetic
50Lack of material
- Further types of bias
- Genetic
- Indo-European, Ugric, Bantu
- Australian, Amerindian, Papuan - -
51Lack of material
- Further types of bias
- Genetic
- Areal
52Lack of material
- Further types of bias
- Genetic
- Areal
- Sprachbund Balkan
- Circum-Baltic
- C.America
- S.E.Asia
-
53Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
54Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Parametric variables (Hawkins 1983)
55Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Parametric variables (Hawkins 1983)
- Adposition
56Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Parametric variables (Hawkins 1983)
- Prep
57Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Parametric variables (Hawkins 1983)
- Prep ? Dem Num Adj Gen Rel N NP
58Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Parametric variables (Hawkins 1983)
- PRepNounModHierarchy
-
59Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Parametric variables (Hawkins 1983)
- PRepNounModHierarchy
- Prep ? ((NDem OR NNum ? NA) AND
- (NA ? NGen) AND (NGen ? NRel))
60Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
61Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Linguistic relativity (Sapir Whorf)
62Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Linguistic relativity (Sapir Whorf)
- Lucy (1992) count nouns vs classifiers
- counting tasks
63Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Community size
64Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Community size
- Small ? high genetic drift (Kimura 1983)
65Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Community size
- Small ? high genetic drift (Kimura 1983)
- Also linguistic drift? (Dahl hunter/gatherer)
66Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Community size
- Small ? high genetic drift (Kimura 1983)
- Also linguistic drift? (Dahl hunter/gatherer)
- N.B. OSV/OVS only in lt 3000 languages
67Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Community size
- Language contact
68Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Community size
- Language contact
- Borrowed phenomenon measured twice
69Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Community size
- Language contact
- BUT contact may also create new types
70Lack of material
- Further types of bias
- Genetic
- Areal
- Typological
- Cultural
- Community size
- Language contact
- BUT contact may also create new types
71Data collecting
- Why not all languages in our database?
- Only lt1000 well described (grammar)
- lt2000 partial sketch
- ( bibliographical bias)
- Not (always) necessary- Sometimes even wrong
72Data collecting
- Why not all languages in our database?
- Only lt1000 well described (grammar)
- lt 2000 partial sketch
- Cater for biases by stratifying for the
- relevant dimensions
- Not (always) necessary- Sometimes even wrong
73Data collecting
- Why not all languages in our database?
- Not (always) necessary- Sometimes even wrong
74Small is beautiful
A good sample may be better than a large
sample Sample type and size depends on goal of
project Establish the probability of a language
type (e.g. prepositional vs postpositional) ?
Probability sample Explore the existing variety
on a certain dimension (e.g. case systems
combination of order patterns) ? Variety sample
75Small is beautiful
- 1. Probability sample
- Only independent cases
- Control for
- - genetic relations
- - language contact
- But relative stability of relevant variables
- - Reflexive passive (Romance vs Slavic)
76Small is beautiful
Samples in the typological literature Greenberg
(1963) Word order 30 Hawkins (1983) Word
order 225 Tomlin (1986) Word
order 402 Nichols (1992) Head/Dep
marking 174 Bybee (1994) Tense/Aspect/Mood 76 Si
ewierska Bakker (1990-) Pers.Agr. 450 Dryer
(1985-) Word order 1200
probab
77Large may be better
- 2. Variety sample
- Maximum (all?) different cases
- Cater for
- - variation in genetic/areal groups
- - typically cyclical
- - stop when no new cases found
- Research parameters typically unknown !
78Probability vs Variety
- Probability sample
- relatively small (30 150)
- may be too large (double cases)
- Variety sample
- relatively large (gt 200)
- can not be too large (just superfluous)
79Sampling in the literature
Introductions to Typology Comrie (1981)
9-12 (4) Croft (1990) 18-26 (9) Whaley
(1997) 36-43 (8) Song (2001) 17-38 (22)
80Probability sampling
- Bell (1978)
- - genetic, areal and typological bias
- 478 genetic groups (gt 3000 year depth)
- - per family n of lgs proportional to n of
groups - problems
- sample lt 478 selection
- small families disappear
81Probability sampling
- Perkins (1980)
- Bell stratified for culture (Murdock 1967)
- 50 languages with optimal genetic and
- cultural distance
- - good for probability, too small for variety
82Probability sampling
- Dryer (1989)
- Bell, but
- 322 established genera, 3500 4000 years deep
- variable values established per genus
- not language (mainly stable, else the most
frequent) - - 5 macro-areas, counting genera per area
83Probability sampling
? SOV gt SVO
84Probability sampling
Good for universal preferences on stable
variables Unclear how to generalize to other
types of sampling, with languages central
85Variety sampling
Characteristics Create variety samples of any
size Free choice of classification used
(Gen/Ar/Typ) Stratification on other parameter
(Gen Ar/Typ) Generate new samples evaluate
existing samples Fully formalized and computer
implemented
86Variety sampling
- Central idea
- classifications express linguistic
- (dis)similarities between languages
- established on the basis of expert knowledge
- subject to cyclical improvement and refinement
- best starting point for explorative research
- into variation among languages
87Variety sampling
Afro-Asiatic
Amerindian
Caucasian
Dravidian
88Variety sampling
Afro-Asiatic
Amerindian
Caucasian
Dravidian
Mimimum sample 1 language per family
89Variety sampling
Afro-Asiatic
Amerindian
Caucasian
Dravidian
HBR
ARB
QUE
GUA
GEO
CHE
KAN
TAM
90Variety sampling
Afro-Asiatic
Amerindian
Caucasian
Dravidian
HBR
ARB
QUE
GUA
GEO
CHE
KAN
TAM
91Variety sampling
Afro-Asiatic
Amerindian
Caucasian
Dravidian
HBR
ARB
QUE
GUA
GEO
CHE
KAN
TAM
Select language with the best description (for
the purpose)
92Variety sampling
Afro-Asiatic
Amerindian
Caucasian
Dravidian
HBR
ARB
QUE
GUA
GEO
CHE
KAN
TAM
Includes all ISOLATES Basque, Burushaski, Ket,
Nahali,
93Variety sampling
Afro-Asiatic
Amerindian
Caucasian
Dravidian
Mimimum sample 1 language per family Ruhlen
(1991) 27 Ethnologue (2005) 120
Basic Sample Murdock (1967) 50
94Variety sampling
DV3
DV6
DV2
Afro-Asiatic
Amerindian
Caucasian
Dravidian
Extending the Basic Sample to preferred
size e.g. extend Ruhlen-based sample from 27 ?
50 KEY relative complexity of family tree
95Variety sampling
DV3
DV6
DV2
Afro-Asiatic
Amerindian
Caucasian
Dravidian
Adjusting DV values to full tree
structure Recursively down the trees Lower
levels contribute relatively less to DV
96Variety sampling
DV3
DV6
DV2
Afro-Asiatic
Amerindian
Caucasian
Dravidian
Formula for weight per level Ck Ck-1 ( Nk
- Nk-1 ) ( MAX (k-1) ) / MAX )
See Rijkhoff Bakker (1998)
97Variety sampling
DV55.5
DV178.4
DV8.5
Afro-Asiatic
Amerindian
Caucasian
Dravidian
Formula for weight per level Ck Ck-1 ( Nk
- Nk-1 ) ( MAX (k-1) ) / MAX )
98Variety sampling
3
6
2
DV55.5
DV178.4
DV8.5
Afro-Asiatic
Amerindian
Caucasian
Dravidian
Formula for weight per level Ck Ck-1 ( Nk
- Nk-1 ) ( MAX (k-1) ) / MAX )
99Variety sampling
DV55.5
DV178.4
DV8.5
Afro-Asiatic
Amerindian
Caucasian
Dravidian
Computer program
100Variety sampling
DV55.5
DV178.4
DV8.5
Afro-Asiatic
Amerindian
Caucasian
Dravidian
- Computer program
- Number of lgs per family given sample size
101Variety sampling
RUHLEN (1991)
102Variety sampling
5.9
3.3
6.1
103Variety sampling
DV55.5
DV178.4
DV8.5
Afro-Asiatic
Amerindian
Caucasian
Dravidian
- Computer program
- Number of lgs per family given sample size
104Variety sampling
DV55.5
DV178.4
DV8.5
Afro-Asiatic
Amerindian
Caucasian
Dravidian
- Computer program
- Number of lgs per family given sample size
- Optimal distribution over subbranches
- (maximum distance ? maximum variety)
105Variety sampling
106Variety sampling
107Variety sampling
108Variety sampling
109Variety sampling
110Variety sampling
111Variety sampling
Amerind (51 / 854)
Andean (3 / 30)
112Variety sampling
Amerind (51 / 854)
Andean (3 / 30)
NORTH
SOUTH
AYMA
QUECH
CAHUA
URA
113Variety sampling
Amerind (51 / 854)
Andean (3 / 30)
NORTH
SOUTH
AYMA
QUECH
CAHUA
URA
114Variety sampling
Amerind (51 / 854)
Andean (3 / 30)
NORTH
SOUTH
AYMA
QUECH
CAHUA
URA
115Variety sampling output
Typical output
116Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/average Sample size 100 (
1.90 of 5273)
117Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273)
118Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Afro-Asiatic (55.53/6/258)
6 Altaic (15.07/2/62) 2 Amerind
(178.44/6/854) 18Australian
(67.58/30/262) 7
119Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Afro-Asiatic (55.53/6/258)
6 Altaic (15.07/2/62) 2 Amerind
(178.44/6/854) 18Australian
(67.58/30/262) 7 Na-Dene (9.44/2/41)
1 Niger-Kordofanian (90.38/2/1068) 9
120Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Afro-Asiatic (55.53/6/258)
6 Altaic (15.07/2/62) 2 Amerind
(178.44/6/854) 18Australian
(67.58/30/262) 7 Na-Dene (9.44/2/41)
1 Niger-Kordofanian (90.38/2/1068)
9 Basque (1.00/0/0) 1Etruscan
(1.00/0/0) 1Gilyak (1.00/0/0)
1
121Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Afro-Asiatic (55.53/6/258)
6 Altaic (15.07/2/62) 2 Amerind
(178.44/6/854) 18Australian
(67.58/30/262) 7 Na-Dene (9.44/2/41)
1 Niger-Kordofanian (90.38/2/1068)
9 Basque (1.00/0/0) 1Etruscan
(1.00/0/0) 1Gilyak (1.00/0/0)
1
122Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Niger-Kordofanian
(90.38/2/1068) 9
123Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Niger-Kordofanian (2/1068) 9
Niger-Congo (2/1036) 8
Niger-Congo Proper (2/1007) 7
Central Niger-Congo (2/961) 6
South Central Niger-Congo (3/755) 3
Eastern (9/703) 1
Western (2/47) 1
Ijo-Defaka (2/5) 1
North Central Niger-Congo (4/206) 3
West Atlantic (3/46) 1
Mande (3/29) 1 Kordofanian
(2/32) 1
124Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Niger-Kordofanian (2/1068) 9
Niger-Congo (2/1036) 8
Niger-Congo Proper (2/1007) 7
Central Niger-Congo (2/961) 6
South Central Niger-Congo (3/755) 3
Eastern (9/703) 1
Western (2/47) 1
Ijo-Defaka (2/5) 1
North Central Niger-Congo (4/206) 3
West Atlantic (3/46) 1
Mande (3/29) 1 Kordofanian
(2/32) 1
125Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Niger-Kordofanian (2/1068) 9
Niger-Congo (2/1036) 8
Niger-Congo Proper (2/1007) 7
Central Niger-Congo (2/961) 6
South Central Niger-Congo (3/755) 3
Eastern (9/703) 1
Western (2/47) 1
Ijo-Defaka (2/5) 1
North Central Niger-Congo (4/206) 3
West Atlantic (3/46) 1
Mande (3/29) 1 Kordofanian
(2/32) 1
126Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Niger-Kordofanian (2/1068) 9
Niger-Congo (2/1036) 8
Niger-Congo Proper (2/1007) 7
Central Niger-Congo (2/961) 6
South Central Niger-Congo (3/755) 3
Eastern (9/703) 1
Western (2/47) 1
Ijo-Defaka (2/5) 1
North Central Niger-Congo (4/206) 3
West Atlantic (3/46) 1
Mande (3/29) 1 Kordofanian
(2/32) 1
127Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Niger-Kordofanian (2/1068) 9
Niger-Congo (2/1036) 8
Niger-Congo Proper (2/1007) 7
Central Niger-Congo (2/961) 6
South Central Niger-Congo (3/755) 3
Eastern (9/703) 1
Western (2/47) 1
Ijo-Defaka (2/5) 1
North Central Niger-Congo (4/206) 3
West Atlantic (3/46) 1
Mande (3/29) 1 Kordofanian
(2/32) 1
128Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Niger-Kordofanian (2/1068) 9
Niger-Congo (2/1036) 8
Niger-Congo Proper (2/1007) 7
Central Niger-Congo (2/961) 6
South Central Niger-Congo (3/755) 3
Eastern (9/703) 1
Western (2/47) 1
Ijo-Defaka (2/5) 1
North Central Niger-Congo (4/206) 3
West Atlantic (3/46) 1
Mande (3/29) 1 Kordofanian
(2/32) 1
129Variety sampling output
Classification Ruhlen91Criterion 1 Diversity
Value dynamic/global/averageSample size 100 (
1.90 of 5273) Niger-Kordofanian (2/1068) 9
Niger-Congo (2/1036) 8
Niger-Congo Proper (2/1007) 7
Central Niger-Congo (2/961) 6
South Central Niger-Congo (3/755) 3
Eastern (9/703) 1
Western (2/47) 1
Ijo-Defaka (2/5) 1
North Central Niger-Congo (4/206) 3
West Atlantic (3/46) 1
Mande (3/29) 1 Kordofanian
(2/32) 1
130Variety sampling
Side effect of large (variety) sample Hidden
diachrony
131Variety sampling
- Problems
- works only on tree-shaped classifications
- time depth in genetic trees unbalanced
- not good for probability samples
- Creoles? Extinct languages?
132Round off
133Round off
Two Sample Strategies
134Round off
Two Sample Strategies 1. Probability sample
- relatively small - control for Gen/Ar/Typ
bias
135Round off
Two Sample Strategies 1. Probability sample
- relatively small - control for Gen/Ar/Typ
bias 2. Variety sample - relatively large
- may be stratified for bias parameters -
may have diachronic dimension
136Round off
Sample Types 1. Probability sample 2. Variety
sample 3. Random sample when bias is unimportant
137Round off
Sample Types 1. Probability sample 2. Variety
sample 3. Random sample when bias is
unimportant 4. Convenience sample when
bibliographical constraints kick in ...
138?