Haplotype Blocks - PowerPoint PPT Presentation

About This Presentation

Title:

Haplotype Blocks

Description:

N. Patil et al., (2001), Blocks of Limited Haplotype Diversity Revealed by High ... two copies of chromosome 21 using a rodent-human somatic cell hybrid technique ... – PowerPoint PPT presentation

Number of Views:425

Avg rating:3.0/5.0

Slides: 39

Provided by: pola4

Learn more at: https://www.stat.rice.edu

Category:

more less

Transcript and Presenter's Notes

Title: Haplotype Blocks

1
Haplotype Blocks

An Overview
A. Polanski
Department of Statistics
Rice University

2
Key Papers

N. Patil et al., (2001), Blocks of Limited
Haplotype Diversity Revealed by High-Resolution
Scanning of Human Chromosome 21, Science, vol.
294, pp. 1719-1723
N. Wang et al., (2002), Distribution of
Recombination Crossovers and the Origin of
Haplotype Blocks The Interplay of Population
History, Recombination and Mutation, Am. J. Hum.
Genet., vol. 71, pp. 1227-1234.
K. Zhang et al., (2002), A Dynamic Programming
Algorithm for Haplotype Block Partitioning, PNAS,
vol. 99, pp. 7335-7339

3
Supplementary Papers

R. Hudson, N. Kaplan, (1985), Statistical
Properties of the Number of Recombination Events
in The History of a Sample of DNA sequences,
Genetics, vol. 111, pp. 147-164
R. Hudson, 2002, Generating Samples under a
Wright-Fisher Neutral Model of Genetic Variation,
Bioinformatics, vol. 18, pp. 337-338
D. Reich et al., (2001), Linkage Disequilibrium
in the Human Genome, Nature, vol. 411, pp. 199-204

4
What are Haplotype Blocks ?

Haplotype block a sequence of contiguous
markers on DNA, homogeneous according to some
criterion
Markers Single Nucleotide Polymorphisms (SNPs)

5
Data (Patil et al. 2001)

Chromosome 21
Physically separated the two copies of chromosome
21 using a rodent-human somatic cell hybrid
technique
Sample of 20 copies of chromosome 21 (32397439
bases)
Found 35989 SNPs

6
Fig. 2 from (Patil et al. 2001)
7
SNP no i
01000000000000000000100000000000000100001110000000
00100000001001000000001001000000000000000000001000
000001101000010101010 0000000010000000000010000000
00010010000100000000000000101100100100101000100100
0000000010010001011000000001101010010101010 000000
00010001000101100010100000000101000110000000000101
00000000000100000100110000011101001000000110000110
001000100011010 0000000000000100010010001010000000
01010001100000000001010000000000010000010011000001
1101001000000110000110001000100011010 000000001000
00000000100001000001001000000000000000000010010010
01001010001001000000000010010001011000000001100100
000000000 0010000000100001000010010000000000010000
01100000000001010000000010010011010001000000001000
0001001000001001110100000000000 000000001000000000
00100001001101001000000000000000000010010010010010
10001001000000000010010001011000000001100100000000
000 1000100000000000000001000001000101000000000000
00000100000000100100000100100100000010000000010000
1000000001101010010101010 000000000000100000001000
00000000000100000110000000000000000010010000000010
01000000000000000000001000000001101000010101010 00
00000010000000000010000100000100100000000000000000
00100100110100101000100100000000001001000101100000
0001100100000000000 100010000000000000000100000100
01010000000000000000010000000010010000000010010000
00000000000000001000010001101010010101010 00001000
00000000100001000000000101000000000000000000000000
00100101000000100100000000000000000000100000000110
1000010101010 100010000000000000000100000100010100
00000000000000010000000010010000010010010000001000
00000100001000000001101010010101010 00000001001000
00000010010000000000011000011010000000010100000010
10010010010001001000001010000100100000100111010000
0000000 100010000000001000000100000100010100000000
00000000010000000010010000010010010000001000000001
00001000000001101010010101010 00000000001000000000
10010000000000010000011000000000010100000000100100
10010001000000001000000100100000100111010100000000
1 000000000010000000001001000000000001000001101000
00000101000000101001001001000100100000100000010010
01001001110100000000000 00010010000100000010001000
00001010000000011001111110000000110000000000000010
011101010000001010100100000000001000001011110 0000
10000000000010000100000000010100000000000000000000
00000010010100000010010000000000000000000010000000
01101000010101010 00010100000000000010000000000000
10000010011101000010000000100000000000000010010001
010000001000100100100000001000001011010

20
i 1, 2, , 35989
8
Problems
9
How do we determine boundaries between blocks ?

Average value of standarized coefficient of
linkage disequilibrium is greater than some
threshold (Wang et al. 2002, Reich et al. 2001)
Infer sites in the sample of DNA sequences where
recombination events happened in the past history
(Wang et al. 2002, Hudson, 2002)
Chromosome coverage minimum number of SNPs to
account for majority of haplotypes (Patil et al.
2001, Zhang et al. 2002)

10
What evolutionary forces are responsible for
haplotype blocks formation ?

Mutation
Genetic drift
Recombination
Recombination hot spots

11
Methods
12
Method 1 (Wang et al. 2002)
Infer sites in the sample of DNA sequences where
recombination events happened in the past
history
13
Three gamete condition

Consider a pair of SNPs, SNP1 and SNP2. If there
was no recombination between SNP1 and SNP2, they
must satisfy three gamete condition

GC
SNP1
SNP2
SNP1
SNP2
AC
A
C
GT
A?G
C?T
G
C
G
T
14
Four gamete test (Hudson and Kaplan, 1985)

If we see all four gametes at SNP1 and SNP2

SNP1
SNP2
A
C
4GT
G
C
G
T
A
T
Then there must have been a recombination event
between these sites in their past history
15
Array of pairwise 4GT test results

Hudson and Kaplan, 1985

0, if there are less then 4 gametes
D, dij
1, if there are 4 gametes
What is the minimal number of recombinations that
could explain observed data ? Statistics FR
(Hudson and Kaplan, 1985)
16
Fig. 1 from Wang et al., 2002
D
Block 1
Block 2
Block 3
17
Wang et al., 2002 - Study

R. Hudsons program for simulating genealogies
with mutation, drift and recombination under
various demographic scenarios
Study of dependence of average lengths of blocks
on different factors
Comparison of simulation results to data from
Patil et al., 2002

18
Dependence of average lengths of blocks on
recombination frequency
19
on sample size
20
... on mutation intensity
21
Comparison to data from Patil et al. 2001

Compute distribution of haplotype block lengths
in the data from Patil et al. 2001
Try to tune parameters ? and R to obtain similar
distribution in the simulations

22
Failed
23
Try a mixture of two different recombination
frequencies - better
24
Method 2 (Patil, 2001)
Chromosome coverage minimum number of SNPs to
account for majority of haplotypes
25
Fig. 2 from (Patil et al. 2001)
26
Problem formulation

Define block boundaries to minimize the number of
SNPs that distinguish at least ? percent of the
haplotypes in each block

27
Common haplotypes

Those represented more than one in the block

28
Condition

Common haplotypes must constitute at least ?80
percent of all haplotypes in the block
Blocks that do not satisfy this are not allowed

29
Fragment of Fig. 2 from Patil et al., 2001
30
Notation

B block defined as numbers of SNPs,
e.g., B 45, 46,.50, or B i, i1,, j
L(B) length of the block (number of SNPs)
f(B) minimum number of SNPs required to
distinguish common haplotypes

31
Greedy Solution
01000000000000000000100000000000000100001110000000
00100000001001000000001001000000000000000000001000
000001101000010101010 0000000010000000000010000000
00010010000100000000000000101100100100101000100100
0000000010010001011000000001101010010101010 000000
00010001000101100010100000000101000110000000000101
00000000000100000100110000011101001000000110000110
001000100011010 0000000000000100010010001010000000
01010001100000000001010000000000010000010011000001
1101001000000110000110001000100011010 000000001000
00000000100001000001001000000000000000000010010010
01001010001001000000000010010001011000000001100100
000000000 0010000000100001000010010000000000010000
01100000000001010000000010010011010001000000001000
0001001000001001110100000000000 000000001000000000
00100001001101001000000000000000000010010010010010
10001001000000000010010001011000000001100100000000
000 1000100000000000000001000001000101000000000000
00000100000000100100000100100100000010000000010000
1000000001101010010101010 000000000000100000001000
00000000000100000110000000000000000010010000000010
01000000000000000000001000000001101000010101010 00
00000010000000000010000100000100100000000000000000
00100100110100101000100100000000001001000101100000
0001100100000000000 100010000000000000000100000100
01010000000000000000010000000010010000000010010000
00000000000000001000010001101010010101010 00001000
00000000100001000000000101000000000000000000000000
00100101000000100100000000000000000000100000000110
1000010101010 100010000000000000000100000100010100
00000000000000010000000010010000010010010000001000
00000100001000000001101010010101010 00000001001000
00000010010000000000011000011010000000010100000010
10010010010001001000001010000100100000100111010000
0000000 100010000000001000000100000100010100000000
00000000010000000010010000010010010000001000000001
00001000000001101010010101010 00000000001000000000
10010000000000010000011000000000010100000000100100
10010001000000001000000100100000100111010100000000
1 000000000010000000001001000000000001000001101000
00000101000000101001001001000100100000100000010010
01001001110100000000000 00010010000100000010001000
00001010000000011001111110000000110000000000000010
011101010000001010100100000000001000001011110 0000
10000000000010000100000000010100000000000000000000
00000010010100000010010000000000000000000010000000
01101000010101010 00010100000000000010000000000000
10000010011101000010000000100000000000000010010001
010000001000100100100000001000001011010
.
Start
End
1. Increment end
0. Fix Start End
2. Compute ratio L(B)/f(B)
3. Stop at max
4. Go to 0
32
Results

4563 representative SNPs (13)
4135 blocks

33
Method 3 (Zhang et al. 2002)

Solves the same problem of 80 chromosome
coverage, but using the better method of dynamic
programming

34
Dynamic programming solution
i
B1(i)
B2(i)
B3(i)
01000000000000000000100000000000000100001110000000
00100000001001000000001001000000000000000000001000
000001101000010101010 0000000010000000000010000000
00010010000100000000000000101100100100101000100100
0000000010010001011000000001101010010101010 000000
00010001000101100010100000000101000110000000000101
00000000000100000100110000011101001000000110000110
001000100011010 0000000000000100010010001010000000
01010001100000000001010000000000010000010011000001
1101001000000110000110001000100011010 000000001000
00000000100001000001001000000000000000000010010010
01001010001001000000000010010001011000000001100100
000000000 0010000000100001000010010000000000010000
01100000000001010000000010010011010001000000001000
0001001000001001110100000000000 000000001000000000
00100001001101001000000000000000000010010010010010
10001001000000000010010001011000000001100100000000
000 1000100000000000000001000001000101000000000000
00000100000000100100000100100100000010000000010000
1000000001101010010101010 000000000000100000001000
00000000000100000110000000000000000010010000000010
01000000000000000000001000000001101000010101010 00
00000010000000000010000100000100100000000000000000
00100100110100101000100100000000001001000101100000
0001100100000000000 100010000000000000000100000100
01010000000000000000010000000010010000000010010000
00000000000000001000010001101010010101010 00001000
00000000100001000000000101000000000000000000000000
00100101000000100100000000000000000000100000000110
1000010101010 100010000000000000000100000100010100
00000000000000010000000010010000010010010000001000
00000100001000000001101010010101010 00000001001000
00000010010000000000011000011010000000010100000010
10010010010001001000001010000100100000100111010000
0000000 100010000000001000000100000100010100000000
00000000010000000010010000010010010000001000000001
00001000000001101010010101010 00000000001000000000
10010000000000010000011000000000010100000000100100
10010001000000001000000100100000100111010100000000
1 000000000010000000001001000000000001000001101000
00000101000000101001001001000100100000100000010010
01001001110100000000000 00010010000100000010001000
00001010000000011001111110000000110000000000000010
011101010000001010100100000000001000001011110 0000
10000000000010000100000000010100000000000000000000
00000010010100000010010000000000000000000010000000
01101000010101010 00010100000000000010000000000000
10000010011101000010000000100000000000000010010001
010000001000100100100000001000001011010

Optimal partition of SNPs 1,2, i
Assume that for all i1, 2, , j-1 we know
optimal block partition, B1(i), B2(i), , Bk(i)
that minimizes
35
Bellmans equation
36
Results