Inferring human demographic history from DNA sequence data - PowerPoint PPT Presentation

About This Presentation
Title:

Inferring human demographic history from DNA sequence data

Description:

Inferring human demographic history from DNA sequence data Apr ... Our main questions What pattern does archaic ancestry produce in DNA sequence polymorphism ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 62
Provided by: JeffW157
Category:

less

Transcript and Presenter's Notes

Title: Inferring human demographic history from DNA sequence data


1
Inferring human demographic history from DNA
sequence data
  • Apr. 28, 2009
  • J. Wall
  • Institute for Human Genetics, UCSF

2
Standard model of human evolution
3
Standard model of human evolution(Origin and
spread of genus Homo)
2 2.5 Mya
4
Standard model of human evolution(Origin and
spread of genus Homo)
?
?
1.6 1.8 Mya
5
Standard model of human evolution(Origin and
spread of genus Homo)
0.8 1.0 Mya
6
Standard model of human evolutionOrigin and
spread of modern humans
150 200 Kya
7
Standard model of human evolutionOrigin and
spread of modern humans
100 Kya
8
Standard model of human evolutionOrigin and
spread of modern humans
40 60 Kya
9
Standard model of human evolutionOrigin and
spread of modern humans
15 30 Kya
10
Estimating demographic parameters
  • How can we quantify this qualitative scenario
    into an explicit model?
  • How can we choose a model that is both
    biologically feasible as well as computationally
    tractable?
  • How do we estimate parameters and quantify
    uncertainty in parameter estimates?

11
Estimating demographic parameters
  • Calculating full likelihoods (under realistic
    models including recombination) is
    computationally infeasible
  • So, compromises need to be made if one is
    interested in parameter estimation

12
African populations
10 populations 229 individuals
13
African populations
Mandenka (bantu)
61 autosomal loci 350 Kb sequence data
Biaka (pygmies)
San (bushmen)
14
A simple model of African population history
T
g1
m
g2
Biaka (or San)
Mandenka
15
Estimation method
  • We use a composite-likelihood method (cf. Plagnol
    and Wall 2006) that uses information from the
    joint frequency spectrum such as
  • Numbers of segregating sites
  • Numbers of shared and fixed differences
  • Tajimas D
  • FST
  • Fu and Lis D

16
Estimation method
  • We use a composite-likelihood method (cf. Plagnol
    and Wall 2006) that uses information from the
    joint frequency spectrum such as
  • Numbers of segregating sites
  • Numbers of shared and fixed differences
  • Tajimas D
  • FST
  • Fu and Lis D

17
Estimating likelihoods
Pop1 Pop2
18
Estimating likelihoods
Pop 1 private polymorphisms
Pop1 Pop2
19
Estimating likelihoods
Pop 1 private polymorphisms Pop 2 private
polymorphisms
Pop1 Pop2
20
Estimating likelihoods
Pop 1 private polymorphisms Pop 2 private
polymorphisms Shared polymorphisms
Pop1 Pop2
21
Estimation method
  • We use a composite-likelihood method (cf. Plagnol
    and Wall 2006) that uses information from the
    joint frequency spectrum such as
  • Numbers of segregating sites
  • Numbers of shared and fixed differences
  • Tajimas D
  • FST
  • Fu and Lis D

22
Estimating likelihoods
  • We assume these other statistics are multivariate
    normal.
  • Then, we run simulations to estimate the means
    and the covariance matrix.
  • This accounts (in a crude way) for dependencies
    across different summary statistics.

23
Composite likelihood
  • We form a composite likelihood by assuming these
    two classes of summary statistics are independent
    from each other
  • We estimate the (composite)-likelihood over a
    grid of values of g1, g2, T and M and tabulate
    the MLE.
  • We also use standard asymptotic assumptions to
    estimate confidence intervals

24
Estimates (with 95 CIs)
  • Parameter Man-Bia Man-San
  • g1 (000s) 0 (0 3.8) 0 (0 3.8)
  • g2 (000s) 4 (0 7.9) 2 (0 11)
  • T (000s) 450 (300 640) 100 (77 550)
  • M ( 4Nm) 10 (8.4 12) 3 (2.2 4)

25
Fit of the null model
  • How well does the demographic null model fit the
  • patterns of genetic variation found in the actual
  • data?

26
Fit of the null model
  • How well does the demographic null model fit the
  • patterns of genetic variation found in the actual
  • data?
  • Quite well. The model accurately reproduces both
  • parameters used in the original fitting (e.g.,
  • Tajimas D in each population) as well as other
  • aspects of the data (e.g., estimates of ? 4Nr)

27
Estimates (with 95 CIs)
  • Parameter Man-Bia Man-San
  • g1 (000s) 0 (0 3.8) 0 (0 3.8)
  • g2 (000s) 4 (0 7.9) 2 (0 11)
  • T (000s) 450 (300 640) 100 (77 550)
  • M ( 4Nm) 10 (8.4 12) 3 (2.2 4)

28
Population growth
population size
time
29
Population growth
population size
time
spread of agriculture and animal husbandry?
30
Estimates (with 95 CIs)
  • Parameter Man-Bia Man-San
  • g1 (000s) 0 (0 3.8) 0 (0 3.8)
  • g2 (000s) 4 (0 7.9) 2 (0 11)
  • T (000s) 450 (300 640) 100 (77 550)
  • M ( 4Nm) 10 (8.4 12) 3 (2.2 4)

31
Ancestral structure in Africa
  • At face value, these results suggest that
    population structure within Africa is old, and
    predates the migration of modern humans out of
    Africa.
  • Is there any evidence for additional (unknown)
    ancient population structure within Africa?

32
Model of ancestral structure
Archaic human population
T
g1
m
g2
Biaka (or San)
Mandenka
33
Standard model of human evolutionOrigin and
spread of modern humans
100 Kya
34
Admixture mapping
Modern human DNA
Neandertal DNA
35
Admixture mapping
Modern human DNA
Neandertal DNA
36
Admixture mapping
Modern human DNA
Neandertal DNA
37
Admixture mapping
Modern human DNA
Neandertal DNA
38
Admixture mapping
Modern human DNA
Neandertal DNA
Orange chunks are 10 100 Kb in length
39
Genealogy with archaic ancestry
time
Modern humans
Archaic humans
present
40
Genealogy without archaic ancestry
time
Modern humans
Archaic humans
present
41
Our main questions
  • What pattern does archaic ancestry produce in DNA
    sequence polymorphism data (from extant humans)?
  • How can we use data to
  • estimate the contribution of archaic humans to
    the modern gene pool (c)?
  • test whether c gt 0?

42
Genealogy with archaic ancestry(Mutations added)
time
Modern humans
Archaic humans
present
43
Genealogy with archaic ancestry(Mutations added)
time
Modern humans
Archaic humans
present
44
Patterns in DNA sequence data
  • Sequence 1 A T C C A C A G C T G
  • Sequence 2 A G C C A C G G C T G
  • Sequence 3 T G C G G T A A C C T
  • Sequence 4 A G C C A C A G C T G
  • Sequence 5 T G T G G T A A C C T
  • Sequence 6 A G C C A T A G A T G
  • Sequence 7 A G C C A T A G A T G

45
Patterns in DNA sequence data
  • Sequence 1 A T C C A C A G C T G
  • Sequence 2 A G C C A C G G C T G
  • Sequence 3 T G C G G T A A C C T
  • Sequence 4 A G C C A C A G C T G
  • Sequence 5 T G T G G T A A C C T
  • Sequence 6 A G C C A T A G A T G
  • Sequence 7 A G C C A T A G A T G

46
Patterns in DNA sequence data
  • Sequence 1 A T C C A C A G C T G
  • Sequence 2 A G C C A C G G C T G
  • Sequence 3 T G C G G T A A C C T
  • Sequence 4 A G C C A C A G C T G
  • Sequence 5 T G T G G T A A C C T
  • Sequence 6 A G C C A T A G A T G
  • Sequence 7 A G C C A T A G A T G

We call the sites in red congruent sites these
are sites inferred to be on the same branch of an
unrooted tree
47
Linkage disequilibrium (LD)
  • LD is the nonrandom association of alleles at
    different sites.
  • Low LD A C High LD A C
  • A T A C
  • A C A C
  • A T A C
  • G C G T
  • G T G T
  • G C G T
  • G T G T

High recombination Low recombination
48
Measuring congruence
  • To measure the level of congruence in SNP data
    from
  • larger regions we define a score function
  • S
  • where S (i1, . . . ik)
  • and S (ij, ij1) is a function of both congruence
    (or near
  • congruence) and physical distance between ij and
    ij1.

49
An example
50
An example (CHRNA4)
51
An example (CHRNA4)
How often is S from simulations greater than or
equal to the S value from the actual data?
52
An example (CHRNA4)
How often is S from simulations greater than or
equal to the S value from the actual data? p
0.025
53
S is sensitive to ancient admixture
54
General approach
  • We use the model parameters estimated before
    (growth rates, migration rate, split time) as a
    demographic null model.
  • Is our null model sufficient to explain the
    patterns of LD in the data?
  • We test this by comparing the observed S values
    with the distribution of S values calculated
    from data simulated under the null model.

55
Distribution of p-values(Mandenka and San)
frequency
p-value
56
Distribution of p-values(Mandenka and San)
frequency
p-value
Global p-value 2.5 10-5
57
Estimating ancient admixture rates
The global p-values for S are highly significant
in every population that weve studied! If we
estimate the ancient admixture rate in our
(composite)-likelihood framework, we can exclude
no ancient admixture for all populations
studied.
58
A region on chromosome 4
59
A region on chromosome 4
19 mutations (from 6 Kb of sequence) separate 3
Biaka sequences from all of the other sequences
in our sample. Simulations suggest this cannot
be caused by recent population structure (p lt
10-3) This corresponds to isolation lasting 1.5
million years!
60
Possible explanations
  • Isolation followed by later mixing is a recurrent
    feature of human population history
  • Mixing between archaic humans and modern humans
    happened at least once prior to the exodus of
    modern humans out of Africa
  • Some other feature of population structure is
    unaccounted for in our simple models

61
Acknowledgments
  • Collaborators
  • Mike Hammer (U. of Arizona)
  • Vincent Plagnol (Cambridge University)
  • Samples
  • Foundation Jean Dausset (CEPH)
  • Y chromosome consortium (YCC)
  • Funding
  • National Science Foundation
  • National Institutes for Health
Write a Comment
User Comments (0)
About PowerShow.com