Cryptic Variation in the Human mutation rate - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Cryptic Variation in the Human mutation rate

Description:

Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 60
Provided by: Life69
Category:

less

Transcript and Presenter's Notes

Title: Cryptic Variation in the Human mutation rate


1
Cryptic Variation in the Human mutation rate
Alan Hodgkinson Adam Eyre-Walker, Manolis
Ladoukakis
2
Variation in the mutation rate
  • Between different chromosomes
  • Between regions on chromosomes
  • Neighbouring nucleotides

3
Simple context effects
Hwang and Green (2004) PNAS 101 13994-14001
4
Cryptic Variation
  • Remote context
  • AGTCGGTTACCGTGACGTTGAACGTGT

5
Cryptic Variation
  • Remote context
  • AGTCGGTTACCGTGACGTTGAACGTGT
  • Degenerate context
  • AGTCGGTTACCGTGYSRGYGAACGTGT

6
Cryptic Variation
  • Remote context
  • AGTCGGTTACCGTGACGTTGAACGTGT
  • Degenerate context
  • AGTCGGTTACCGTGYSRGYGAACGTGT
  • No context / Complex context

7
Our approach to the problem
  • Search for SNPs in human sequences that also
    have a SNP in the orthologous position in chimp.

8
Our approach to the problem
  • Search for SNPs in human sequences that also
    have a SNP in the orthologous position in chimp.

Do we see more coincident SNPs than expected by
chance?
9
The method
  • Extract all human SNPs from dbSNP and construct
    a BLAST database on a chromosome by chromosome
    basis.

10
The method
  • Extract all human SNPs from dbSNP and construct
    a BLAST database on a chromosome by chromosome
    basis.
  • Extract all chimp SNPs from dbSNP with 50bp
    either side of SNP.

11
The method
  • Extract all human SNPs from dbSNP and construct
    a BLAST database on a chromosome by chromosome
    basis.
  • Extract all chimp SNPs from dbSNP with 50bp
    either side of SNP.
  • BLAST chimp SNPs against human database.

12
The method
  • Extract all human SNPs from dbSNP and construct
    a BLAST database on a chromosome by chromosome
    basis.
  • Extract all chimp SNPs from dbSNP with 50bp
    either side of SNP.
  • BLAST chimp SNPs against human database.
  • Extract results above a certain level of
    homology where there is a SNP on both sequences
    and reduce to 40bp either side of central
    position.

13
The method
  • Extract all human SNPs from dbSNP and construct
    a BLAST database on a chromosome by chromosome
    basis.
  • Extract all chimp SNPs from dbSNP with 50bp
    either side of SNP.
  • BLAST chimp SNPs against human database.
  • Extract results above a certain level of
    homology where there is a SNP on both sequences
    and reduce to 40bp either side of central
    position.
  • Repeating both including and excluding CpG
    effects.

14
Results
  • 1.5 million chimp SNPs.
  • 310,000 81bp alignments containing a human and
    chimp SNP.

15
Results
  • 1.5 million chimp SNPs.
  • 310,000 81bp alignments containing a human and
    chimp SNP.
  • Observe the number of coincident SNPs.
  • Calculate the expected number, taking into
    account the effects of neighbouring nucleotides.

16
Results
Obs Exp Ratio
All 11571 6592 1.76 (1.72,1.79)
No-CpG 5028 2533 1.98 (1.93,2.04)
17
Results
C/T G/A C/A G/T C/G A/T
C/T 1.91 1.04 1.19 1.21 0.96
G/A 1.83 1.24 1.02 1.14 1.40
C/A 1.23 1.08 4.81 1.28 1.39
G/T 1.15 1.38 4.95 1.27 0.77
C/G 1.09 1.14 1.24 1.40 2.79
A/T 0.94 1.06 1.79 0.99 15.43
18
Alternative Explanations
  • Bias in the Method
  • Selection
  • Ancestral Polymorphism
  • Paralogous SNPs

19
Alternative Explanations
  • Bias in the Method
  • Selection
  • Ancestral Polymorphism
  • Paralogous SNPs

20
Methodological Bias
  • Simulated data with same density of human and
    chimp SNPs as dbSNP under different divergence
    and mutation patterns.
  • Method worked well under realistic conditions.

21
Methodological Bias
All sites (HG)
Div Obs Exp Ratio 95 CI
0 839 812 1.033 (0.963,1.103)
1 2419 2316 1.040 (1.003,1.086)
2 681 685 0.995 (0.920,1.069)
Non CpG sites (HG)
Div Obs Exp Ratio 95 CI
0 401 428 0.936 (0.844,1.028)
1 1182 1228 0.963 (0.908,1.018)
2 374 400 0.935 (0.840,1.030)
22
Methodological Bias
All sites (HG)
Div Obs Exp Ratio 95 CI
0 839 812 1.033 (0.963,1.103)
1 2419 2316 1.040 (1.003,1.086)
2 681 685 0.995 (0.920,1.069)
Non CpG sites (HG)
Div Obs Exp Ratio 95 CI
0 401 428 0.936 (0.844,1.028)
1 1182 1228 0.963 (0.908,1.018)
2 374 400 0.935 (0.840,1.030)
23
Alternative Explanations
  • Bias in the method
  • Selection
  • Ancestral Polymorphism
  • Paralogous SNPs

24
Selection
  • Areas of low SNP density result in clustering

Human
Chimp
25
Selection
  • Areas of low SNP density result in clustering

Human
Chimp
Apparent excess of coincident SNPs
26
Selection
  • No clustering

27
Alternative Explanations
  • Bias in the method
  • Selection
  • Ancestral Polymorphism
  • Paralogous SNPs

28
Ancestral Polymorphism
  • SNP inherited from common ancestor of chimp and
    human

29
Ancestral Polymorphism
  • SNP inherited from common ancestor of chimp and
    human

Increase in coincident SNPs
30
Ancestral Polymorphism
  • Expect observed/expected ratio to be same for
    all transitions

C/T G/A C/A G/T C/G A/T
C/T 1.91 1.04 1.19 1.21 0.96
G/A 1.83 1.24 1.02 1.14 1.40
C/A 1.23 1.08 4.81 1.28 1.39
G/T 1.15 1.38 4.95 1.27 0.77
C/G 1.09 1.14 1.24 1.40 2.79
A/T 0.94 1.06 1.79 0.99 15.43
31
Ancestral Polymorphism
  • Repeated initial analysis with macaque data.
  • Humans and Macaque split 23-24 million years
    ago so we expect there to be no shared
    polymorphisms.

32
Ancestral Polymorphism
  • Repeated initial analysis with macaque data.
  • Humans and Macaque split 23-24 million years
    ago so we expect there to be no shared
    polymorphisms.

Obs Exp Ratio
All 77 47 1.64 (1.27,2.00)
No-CpG 34 23 1.51 (1.001,2.02)
33
Alternative Explanations
  • Bias in the method
  • Selection
  • Ancestral Polymorphism
  • Paralogous SNPs

34
Paralogous SNPs
  • Excess of coincident SNPs a consequence of
    artifactual SNPs called as a result of
    substitutions in paralogous regions.

35
Paralogous SNPs
  • Excess of coincident SNPs a consequence of
    artifactual SNPs called as a result of
    substitutions in paralogous regions.
  • Musumeci et al (2010) 8.32 of human variation
    in dbSNP may be due to paralogy.

36
Paralogous SNPs
  • Excess of coincident SNPs a consequence of
    artifactual SNPs called as a result of
    substitutions in paralogous regions.
  • Musumeci et al (2010) 8.32 of human variation
    in dbSNP may be due to paralogy.

AGCTGCACGT Y CGGCATCCAA SNP AGCTGCACGT T
CGGCATCCAA Chromosome 1 AGCTGCACGT A
CGGCATCCAA Chromosome 7
Artifactual SNP
37
Paralogous SNPs
AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T
CGGCATCCAA
AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T
CGGCATCCAA AGCTGCACGT A CGGCATCCAA
38
Paralogous SNPs
AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T
CGGCATCCAA
AGCTGCACGT (T/A) CGGCATCCAA AGCTGCACGT T
CGGCATCCAA AGCTGCACGT A CGGCATCCAA
3.6 of coincident SNPs are possibly a
consequence of paralogous sequences
39
Alternative Explanations
  • Bias in the method
  • Selection
  • Ancestral Polymorphism
  • Paralogous SNPs

Cryptic variation in the mutation rate
40
Context Analysis
  • 4517 sequences containing non-CpG coincident
    SNPs flanked by 200bp.
  • Tabulate triplet frequencies at each position in
    surrounding sequences.
  • Test whether the proportions of triplets we
    observe at each position significantly different
    from the proportions in the sequences as a whole.

41
Context Analysis
  • Coincident SNP in central position

42
Context Analysis
  • Coincident SNP in central position

No obvious context surrounding coincident SNPs
43
Genomic Distribution
  • Tallied the number of coincident SNPs per MB
  • 3.91 coincident SNPs per MB.
  • 1.68 non-CpG coincident SNPs per MB.

44
Genomic Distribution
  • Tallied the number of coincident SNPs per MB
  • 3.91 coincident SNPs per MB.
  • 1.68 non-CpG coincident SNPs per MB.
  • If randomly distributed expect Poisson
    distribution and ? ?2 3.91

45
Genomic Distribution
  • Tallied the number of coincident SNPs per MB
  • 3.91 coincident SNPs per MB.
  • 1.68 non-CpG coincident SNPs per MB.
  • If randomly distributed expect Poisson
    distribution and ? ?2 3.91
  • ?2 13.27 (plt0.001) and so sampling variance
    explains approximately 30 of total variance.

46
Genomic Distribution
Feature r r2 p
SNP density 0.256 0.0655 lt0.001
Distance to Telomere -0.022 0.0004 0.226
Distance to Centromere 0.011 0.0001 0.565
Recombination Rate 0.107 0.0114 lt0.001
Nucleosome Association 0.004 0.0000 0.832
Gene Density -0.022 0.0004 0.230
GC content -0.006 0.0000 0.741
47
Genomic Distribution
  • SNP densities must drive coincident SNP
    densities to a certain extent as approximately
    half of coincident SNPs are created by chance
    alone.

48
Genomic Distribution
  • SNP densities must drive coincident SNP
    densities to a certain extent as approximately
    half of coincident SNPs are created by chance
    alone.
  • Recombination rate positively correlated with
    SNP density (r 0.242, plt0.001).
  • Partial correlation controlling for SNP density
    r 0.048, p0.011.

49
Genomic Distribution
  • SNP densities must drive coincident SNP
    densities to a certain extent as approximately
    half of coincident SNPs are created by chance
    alone.
  • Recombination rate positively correlated with
    SNP density (r 0.242, plt0.001).
  • Partial correlation controlling for SNP density
    r 0.048, p0.011.
  • SNP densities explain 6.5 of the variance,
    recombination rate explains 0.2 of the variance
    of coincident SNPs.

50
Genomic Distribution
Feature r r2 p
Coincident SNP Density 0.256 0.0655 lt0.001
Distance to Telomere -0.171 0.0292 lt0.001
Distance to Centromere -0.047 0.0022 0.012
Recombination Rate 0.234 0.0548 lt0.001
Nucleosome Association 0.187 0.0350 lt0.001
Gene Density 0.064 0.0041 0.001
GC content 0.184 0.0339 lt0.001
51
Quantification
  • Use Log-normal distribution of relative mutation
    rates due to cryptic variation.
  • Model the number of coincident SNPs under the
    effects of cryptic variation.
  • Incorporate effects of divergence.

52
Quantification
  • Use Log-normal distribution of relative mutation
    rates due to cryptic variation.
  • Model the number of coincident SNPs under the
    effects of cryptic variation.
  • Incorporate effects of divergence.

What level of variation in the log-normal
distribution explains our results?
53
Log-normal model
Fastest 5 of sites mutate 16.4 times faster
than slowest 5 of sites.
54
Summary
  • Cryptic variation in the mutation rate.

55
Summary
  • Cryptic variation in the mutation rate.
  • No obvious context surrounding coincident SNPs.

56
Summary
  • Cryptic variation in the mutation rate.
  • No obvious context surrounding coincident SNPs.
  • Variation is truly cryptic.

57
Summary
  • Cryptic variation in the mutation rate.
  • No obvious context surrounding coincident SNPs.
  • Variation is truly cryptic.
  • Genomic distribution of coincident SNPs is
    over-dispersed

58
Summary
  • Cryptic variation in the mutation rate.
  • No obvious context surrounding coincident SNPs.
  • Variation is truly cryptic.
  • Genomic distribution of coincident SNPs is
    over-dispersed
  • Variation in mutation rate is substantial.

59
Acknowledgments
  • BBSRC
  • People

Manolis Ladoukakis
Adam Eyre-Walker
Write a Comment
User Comments (0)
About PowerShow.com