Title: Comparative Analysis of Recombination Hotspots in Humans and Chimpanzees
1Comparative Analysis of Recombination Hotspots in
Humans and Chimpanzees
Instructor Yao-Ting Huang
Bioinformatics Laboratory, Department of Computer
Science Information Engineering, National Chung
Cheng University.
2Recombination Hotspots
- Recombination (or cross over) are known to
cluster in 1-2 kb short regions in the human
genome called recombination hotspots. - The recombination rates in these hotspots are at
least 10 times higher than the background rate. - Recombination hotspots are existed every 60-200
kb.
Recombination hot spots
Haplotype blocks
Chromosome
3Recombination Patterns
- The various recombination rates in the human
genome are speculated due to evolutionary force
(Nature, 2006). - Genes involved with DNA and RNA metabolism are
often within regions of haplotype blocks. - Genes involved immune responses and
neurophysiological processes are often in regions
of recombination hotspots.
4Methods for Measuring Recombination Rates
- Collection of large pedigrees
- Kong et al. A high-resolution recombination map
of the human genome, Nature Genetics, 2002. - Sperm typing
- Clark, et al. Combining sperm typing and linkage
disequilibrium analyses reveals differences in
selective pressures or recombination rates across
human populations, Genetics, 2007. - However, these approaches are not practical to
detect hotspots in a genome-wide scale.
5Inference of Recombination
- Alternatively, recombination rates can be
inferred by linkage disequilibrium (LD) using
Single Nucleotide Polymorphisms (SNPs). - We estimate the amount of recombination needed to
produce the observed LD under a certain model.
6An Example of Phylogenetic Network Implying
Recombination
7Related Works of Inferring Recombination Hotspots
in Humans
- Hudson, R. R. and Kaplan, M. L. Statistical
Properties of the number of Recombination Events
in the history of a sample of DNA sequences.
Genetics, 1985. - Simon R. M. and Robert C. G. Bounds on the
Minimum Number of Recombination Events in a
Sample History. Genetics, 2002. - Li and Stephenes, Modeling linkage disequilibrium
and identifying recombination hotspots using
single nucleotide polymorphisms, Genetics, 2003. - Vineet and Vikas,The Number of Recombination
Events in a Sample History Conflict Graph and
Lower Bounds. IEEE Transaction on Computational
Biology and Bioinformatics, 2004. - Dana, et al. Evidence for substantial fine-scale
variation in recombination rates across the human
genome, Nature Genetics, 2004.
8Recombination Patterns Across Different Species
- A larger-scale study of recombination patterns
between human and chimpanzee. - Susan, et al. Fine-scale recombination patterns
differ between chimpanzees and humans, Nature
Genetics, 2005. - A total of 14 Mb regions in human and chimpanzee
are examined.
9Data Collection
- Data collection in Human
- 14 Mb of two regions on chromosome 21.
- 23 Africans, 24 Han Chinese, and 24 European
Americans. - 11,642 SNPs are genotyped with an average spacing
of 1.2 kb. - Data collection in Chimpanzee
- 14 Mb homologous regions (on chromosome 22).
- 8 central chimpanzees, whereas 5 are wild-born.
- 30,611 SNPs with an average spacing of 440 bp are
discovered.
10Division into Windows
- The 14 Mb regions are divided into 70-kb windows,
with a 20-kb overlap. - SNPs are discarded if minor alleles frequencies
are lt5 in human or singletons in chimp. - SNPs with more than 75 missing data are
discarded. - Windows with ?20 SNPs are considered.
11Estimation of Recombination Rates
- Assume there exists only one hotspot in each
window, PHASE 2.1.1 is used to obtain - the background recombination rate,
- the location of hotspot, and
- the relative intensity of hotspot recombination
rate (?).?1 no recombination variation
?gt10 potential recombination hotspot. - Li and Stephenes, Modeling linkage disequilibrium
and identifying recombination hotspots using
single nucleotide polymorphisms, Genetics, 2003.
12Conserved Hotspots in Human and Chimp
- We want to know the distribution of recombination
hotspots between humans and chimpanzees.
13Mapping Hotspots between Human and Chimpanzee
- Each chimpanzee SNP with surrounding 24 bases are
BLAST-aligned to the human genome. - Only SNPs with a unique perfect match are kept.
- Two chimpanzee SNPs flanked the edge of the
hotspot are taken. - The average distance between the location of
these two SNPs on chimpanzee and human are
computed. - We add this average distance to the chimpanzee
genome position to determine the location of the
chimpanzee hotspot in humans.
Remapped hotspot in human
Human
Chimpanzee
Hotspot in chimp
14Conserved Hotspots in Human and Chimp
- Only 3 out of the 39 chimp hotspots (8) are
overlapped with those in human.
15Simulation Validation
- Even if all hotspots in chimpanzee are presented
in human, we may not detect them all as shared. - For each hotspot in chimpanzee, they simulate a
corresponding hotspot in human. - The hotspot is uniformly distributed from 10 kb
to 60 kb in a 70-kb window using the same
recombination rate. - In 100 simulations, average 72 of these
simulated hotspots are detected as shared, well
above the observed 8. - We can reject the hypothesis that all hotspots in
the chimpanzee are also present in the human (P lt
0.01).
16Simulation Validation
- In each simulation we can compute the of
hotspots detected as shared. - 1 0.68
- 2 0.9
- 3 0.72
-
- 100 0.45
- These values will approximate to a normal
distribution if we generate sufficient simulated
data sets.
17The Normal Distribution
- A continuous random variable X is said to have a
normal distribution, if its p.d.f. is as follows -
- and is commonly denoted by
18(No Transcript)
19Simulation Validation
- In each simulation we can compute the of
hotspots detected as shared. - 1 0.68
- 2 0.9
- 3 0.72
-
- 100 0.45
- We then compute the mean and variance of these
data. - Mean 0.72, variance 1.2
20Linear Transformation of the Normal Distribution
- We transform the observed value (8) into
standard normal distribution. - The expected value and variance of adistribution
are µ and s2, respectively. - is N(0,1), if Y is N(µ, s2).
21(No Transcript)
22Simulation Validation
- Even if all hotspots in chimpanzee are presented
in human, we may not detect them all as shared. - For each hotspot in chimpanzee, they simulate a
corresponding hotspot in human. - The hotspot is uniformly distributed from 10 kb
to 60 kb in a 70-kb window using the same
recombination rate. - In 100 simulations, average 72 of these
simulated hotspots are detected as shared, well
above the observed 8. - We can reject the hypothesis that all hotspots in
the chimpanzee are also present in the human (P lt
0.01).
23Simulation Validation
- Could we reject the hypothesis that all hotspots
are independently distributed in human and
chimpanzee? - They randomly distributed the human hotspots,
preserve their length, and determine how many of
chimp hotspots still overlapped with them. - In 1000 simulations, average 13 of human
hotspots are still overlapped with chimp
hotspots. - We can not reject this hypothesis (P 0.9,
compared with observed 8).
24Conservation of Total Recombination Rates
- The total recombination rates (background rate
plus the hotspot rate) may be still conserved in
these two species. - They measure the correlation of total
recombination rates in 210 pairs of 50-kb
windows. - The result indicates that total recombination
rates are poorly correlated in these two species. - r20.216, P0.002.
- 100 simulated pairs of windows with same rate
suggest that r2 should be 0.69, well above 0.216.
25Conservation of Background Recombination Rates
- They examined the correlation of the background
recombination rates in those pairs of 50-kb
windows. - The result indicates that background
recombination rates are also poorly correlated in
these two species - r20.276, Plt0.001.
- 100 simulated pairs of windows with same rate
suggest that r2 should be 0.77.
26Concluding Remarks
- The recombination hotspots are not conserved in
humans and chimpanzees. - Overall recombination rates are only weakly
conserved in humans and chimpanzees.