Title: Genetics of Complex Diseases
1Human Genetic Variation
- Genetics of Complex Diseases
2The Human Genome Project - Goals
- Determine the sequences of the 3 billion base
pairs that make up human DNA
3The Human Genome Project - Goals
- Determine the sequences of the 3 billion chemical
base pairs that make up human DNA - Improve tools for data analysis
4The Human Genome Project
What we are announcing today is that we have
reached a milestonethat is, covering the genome
ina working draft of the human sequence.
But our work previously has shown that having
one genetic code is important, but it's not all
that useful.
I would be willing to make a predication that
within 10 years, we will have the potential of
offering any of you the opportunity to find out
what particular genetic conditions you may be at
increased risk for
Washington, DC June, 26, 2000
5The Vision of Personalized Medicine
Genetic and epigenetic variants measurable
environmental/behavioral factors would be used
for a personalized treatment and diagnosis
6Example Warfarin
An anticoagulant drug, useful in the prevention
of thrombosis.
7Example Warfarin
Warfarin was originallyused as rat poison.
Optimal dose variesacross the
population Genetic variants (VKORC1 and CYP2C9)
affect the variation of the personalized optimal
dose.
8Association Studies
- Studying complex diseases by comparing cases
to controls
9Where should we look first?
SNP Single Nucleotide Polymorphism
person 1 .AAGCTAAATTTG. person 2
.AAGCTAAGTTTG. person 3 .AAGCTAAGTTTG. person
4 .AAGCTAAATTTG. person 5 .AAGCTAAGTTTG.
Most common SNPs have only two possible alleles.
10Disease Association Studies
Cases
AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGA
GCCGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGTCTACA
TGAGATCGACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGTC AG
AGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGC
CGTGAGATCGACATGATAGCC AGAGCAGTCGACAGGTATAGCCTACATG
AGATCAACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGCC AGAG
CCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCG
TGAGATCAACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAG
ATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCC
GTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTG
AGATCAACATGATAGTC AGAGCAGTCGACAGGTATAGCCTACATGAGAT
CGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC
Associated SNP
Controls
AGAGCAGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGA
GCAGTGAGATCAACATGATAGCC AGAGCAGTCGACATGTATAGTCTACA
TGAGATCAACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AG
AGCAGTCGACATGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGC
CGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATG
AGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGTC AGAG
CCGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCCG
TGAGATCAACATGATAGCC AGAGCAGTCGACAGGTATAGTCTACATGAG
ATCGACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGCC AGAGCC
GTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTG
AGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGTCTACATGAGAT
CAACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGTC
11Genotyping technology
AGACTAACC. ACGAATCCT. GGACTTACC. GCACAACCT. GG
GATTAAC.
DNA
12Genotype technologies
- Cost of genotyping technologies has reduced
dramatically in the last decade. - Genotyping one SNP per one individual was gt 1 in
the beginning of the decade. - Price now is at 0.03 cents.
- Exponential growth doubles every 10 months
- Faster than Moores law doubling every 18
months.
13Public Genotype Data Growth
14Association Studies
- Genetic variants such as Single Nucleotide
Polymorphisms (SNPs) are tested for association
with the trait.
15Published Genome-Wide Associations through
6/2009, 439 published GWA at p lt 5 x 10-8
NHGRI GWA Catalog www.genome.gov/GWAStudies
16(No Transcript)
17Preliminary Definitions
- SNP single nucleotide polymorphism. A genetic
variant which may carry different alleles for
different individuals. - Most SNPs are bi-allelic. There are only two
observed alleles in the populations. - Risk allele the allele which is more common in
cases than in controls (denoted R) - Nonrisk allele the allele which is more common
in the controls (denoted N)
18Other Structural Variants
Inversion
Deletion
Copy number variant
19Chance or Real Association?
Cases
AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGA
GCCGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGTCTACA
TGAGATCGACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGTC AG
AGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGC
CGTGAGATCGACATGATAGCC AGAGCAGTCGACAGGTATAGCCTACATG
AGATCAACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGCC AGAG
CCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCG
TGAGATCAACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAG
ATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCC
GTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTG
AGATCAACATGATAGTC AGAGCAGTCGACAGGTATAGCCTACATGAGAT
CGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC
Associated SNP
Controls
AGAGCAGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGA
GCAGTGAGATCAACATGATAGCC AGAGCAGTCGACATGTATAGTCTACA
TGAGATCAACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AG
AGCAGTCGACATGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGC
CGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATG
AGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGTC AGAG
CCGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCCG
TGAGATCAACATGATAGCC AGAGCAGTCGACAGGTATAGTCTACATGAG
ATCGACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGCC AGAGCC
GTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTG
AGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGTCTACATGAGAT
CAACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGTC
20Hypothesis testing
- We want to distinguish between two hypotheses
- Null hypothesis the allele frequency in the
cases and the controls is the same (the SNP has
nothing to do with the disease) - Alternative hypothesis the allele frequency in
the cases and in the controls is different (the
SNP is correlated with the disease). - Intuitively, we want to ask how likely is the
null hypothesis.
21How does it work?
- For every SNP we can construct a contingency
table - From the table we construct a statistic
. - The likelihood that under the null hypothesis we
get T or a bigger number is a p-value.
22Example
- For every SNP we can construct a contingency
table - T 0.02.
- The p-value is 0.8875 (88 chance of getting T gt
0.02)
23Example
- For every SNP we can construct a contingency
table - T 11.11
- The p-value is low 0.001 10-3
24Example
- For every SNP we can construct a contingency
table - T 83.33
- The p-value is extremely low 10-19
25Results Manhattan Plots
26Challenges
27Challenge 1 Corrections of multiple testing
- In a typical Genome-Wide Association Study
(GWAS), we test millions of SNPs. - If we set the p-value threshold for each test to
be 0.05, by chance we will find about 5 of the
SNPs to be associated with the disease. - This needs to be corrected. Different statistical
methods are used.
28Challenge 2 Correcting genotyping errors
- How can we detect genotyping errors?
- Hardy-Weinberg Equilibrium
- If we have Mother-father-child trios we can check
Mendelian consistency.