Title: Genetics of Complex Diseases
1Human Genetic Variation
- Genetics of Complex Diseases
2Challenges
3Challenge 2 Correcting genotyping errors
- How can we detect genotyping errors?
- Hardy-Weinberg Equilibrium
- If we have Mother-father-child trios we can check
Mendelian consistency.
4Challenge 3 Population Substructure
- Imagine that all the cases are collected from
Africa, and all the controls are from Europe. - Many association signals are going to be found
- The vast majority of them are false
Why ???
Different evolutionary forces drift, selection,
mutation, migration, population bottleneck.
5Shaping Genetic Variation
- Mutations add to genetic variation
- Natural Selection controls the frequency of
certain traits and alleles - Genetic drift
6Ancestral population
7Ancestral population
migration
8- different allele frequencies
Ancestral population
Genetic drift
9Population Substructure
- Imagine that all the cases are collected from
Africa, and all the controls are from Europe. - Many association signals are going to be found
- The vast majority of them are false
What can we do about it?
10Ancestry Inference
- To what extent can population structure be
detected from SNP data? - What can we learn from these inferences?
- Can we build the tree of life?
- How do we analyze complexpopulations (mixed)?
Novembre et al., Nature, 2008
11Principal Component Analysis
- Dimensionality reduction
- Based on linear algebra
- Intuition find the most important features of
the data.
12Principal Component Analysis
Plotting the data on a onedimensional line for
which the spread is maximized.
13Principal Component Analysis
- In our case, we want to look at two dimensions at
a time. - The original data points have many dimensions
each SNP corresponds to one dimension.
14Data Available
15 International consortium that aims in
genotyping the genome of 270 individuals from
four different populations.
16- Launched in 2002.
- First phase (2005)
- 1 million SNPs for 270 individuals from four
populations - Second phase (2007)
- 3.1 million SNPs for 270 individuals from four
populations - Third phase (ongoing)
- gt 1 million SNPs for 1115 individuals across 11
populations
17HapMap Populations
MKK
LWK
YRI
GIH
ASW
JPT
MEX
CHB
CHD
CEU
TSI
18HapMap PCA 1-2
19HapMap PCA 1-3
20HapMap PCA 1,2,4
21Lessons from the HapMap
- African populations have higher genetic diversity
than other populations - Evidence for bottlenecks or founder effect in the
other population - Evidence for the out-of-Africa theory
- HapMap was used to detect
- Common deletions across the genome
- Regions under selection
- Recombination rates, hotspots
- Associations of SNPs with disease
22Example detection of deletions using SNPs
Conrad et al., Nature Genetics, 2006
23Example detection of deletions using SNPs
- Conrad et al. applied the method on the HapMap
and found - Typical individuals have roughly 30-50 deletions
larger than 5kb (500kb-750kb total sequence
length). - Deletions tend to be gene-poor.
- The deletions detected in the HapMap span 267
known and predicted genes. - Deletions were found to be related to different
conditions such as Schizophrenia (Steffanson et
al., 2008), lupus glomerulonephritis (Aitman et
al., Nature, 2006), and others.
24Distribution of deletion length
Conrad et al., Nature Genetics, 2006
25Significant Region
- Why do we have differences between data1 and
data2? - How come so many SNPs seem to be associatedin
this region? - Maybe there are multiple causal SNPs?
- Or maybe there are correlations between the
SNPs ?
26Linkage Disequilibrium
27Linkage Disequilibrium
28Haplotypes vs. Genotypes
Haplotypes
ATCCGA AGACGC
- Cost effective genotyping technology gives
genotypes and not haplotypes.
29Haplotypes cluster naturally
30Haplotypes cluster naturally