Title: Disease Gene Discovery Using Linkage Analysis
1Disease Gene DiscoveryUsing Linkage Analysis
- Scott Williams, Ph.D.
- Center for Human Genetics
- Research
- Ph.D. Program in Human Genetics
2Disease Gene Discovery is a Young, But Successful
Science
- 1966 MIM
- 1487 Known genetic traits
- 0.7 Mapped
- 1973 HGM-I
- Several chromosomes with NO genes at all!
3Disease Gene Discovery is an Exploding Science
- 2003 OMIM
- 14,340 Established genetic loci
- 2205 Disease genes mapped
- 15 Mapped
- 2003 NCBI
- 26,115 Genes mapped to all chromosomes
- Average 8.6 genes/Mb (3.22-26.73)
- Average 116 Kb/gene (37-310)
4640 cubic yards
3,000 MB
1/100 cubic inch
1 x 10-6 MB
It really is like finding a needle in a
haystack! (and a very BIG haystack, at that)
5CLASSES OF HUMANGENETIC DISEASE
- Diseases of Simple Genetic Architecture
- Can tell how trait is passed in a family follows
a recognizable pattern - One gene per family
- Often called Mendelian disease
- Causative gene
6What is the pattern of inheritance in this family?
7CLASSES OF HUMANGENETIC DISEASE
- Diseases of Complex Genetic Architecture
- No clear pattern of inheritance
- Moderate to strong evidence of being inherited
- Common in population cancer, heart disease,
dementia, asthma etc. - Involves many genes or genes and environment
- Susceptibility genes
8Designing Genetic Study
- Define phenotype
- Can sub-phenotypes be defined (genetic
heterogeneity) - i.e., early vs. late age of onset
9- Genetic heterogeneity
- Locus different genes give similar but maybe
non-identical phenotype - Allele different mutations within a gene give
similar but maybe non-identical phenotype
10Prior to Designing Genetic Study
- Is phenotype genetic?
- Is there a measurable heritable component
- Mode of inheritance?
- Is disease dominant/recessive?
- Autosomal/Sex-linked
11Defining Genetic component
- Twin studies monozygotic v. dizygotic twins
- Adoption Studies sibs raised apart
12- Monozygotic twins vs. dizygotic twins
- Monozygotic should be more similar than dizygotic
- Twins raied apart share traits if genetic
13If genetic two approaches
- Random Screen linkage approach
- Candidate gene approach
14(No Transcript)
15Approaches to Disease Gene Discovery
- Mendelian Disease serves as paradigm
- Clean mode of inheritance single gene defect
causes disease - First Successes with Mendelian Diseases
- Huntington Disease (1984-1993)
- Cystic Fibrosis (1986-1989)
- Developed paradigm for future work
- Initial localization by mapping to marker loci
randomly distributed throughout the genome
16Linkage Analysis
- Linkage analysis in humans is based on counting
recombinants and non-recombinants, similar to the
process in experimental animals (e.g. mouse,
fly). However, in humans, we face additional
challenges - long generation time
17Linkage Analysis
- Linkage analysis in humans is based on counting
recombinants and non-recombinants, similar to the
process in experimental animals (e.g. mouse,
fly). However, in humans, we face additional
challenges - long generation time
- inability to control matings
18Linkage Analysis
- Linkage analysis in humans is based on counting
recombinants and non-recombinants, similar to the
process in experimental animals (e.g. mouse,
fly). However, in humans, we face additional
challenges - long generation time
- inability to control matings
- inability to control study participation
19Linkage Analysis
- Linkage analysis in humans is based on counting
recombinants and non-recombinants, similar to the
process in experimental animals (e.g. mouse,
fly). However, in humans, we face additional
challenges - long generation time
- inability to control matings
- inability to control study participation
- inability to dictate key exposures and
environmental conditions
20Linkage Analysis
- Linkage analysis in humans is based on counting
recombinants and non-recombinants, similar to the
process in experimental animals (e.g. mouse,
fly). However, in humans, we face additional
challenges - long generation time
- inability to control matings
- inability to control study participation
- inability to dictate key exposures and
environmental conditions - small family size
21Linkage Analysis in humans
- What is likelihood of getting the data obtained
given linkage vs. non-linkage between marker and
disease causing locus?
22- Closer linkage less likely to get recombinant
- Unlinked 50 recombination
- Null hypothesis marker is unlinked to disease gene
23Assigning Genotype From Phenotype
NA NN
NA NN
NA NN NN NA
NN NA NN
A- abnormal N - Normal
24Definitions
- Linkage - the co-segregation of two or more loci
- Marker locus with alleles 1 or 2 and dominant
disease allele D
12 22
12 22
12 22 22 12
22 12 22
25Phase of marker and disease locus
- Phase pattern of inheritance of alleles at
different loci from a parent - Can be known if grandparents known
26Definitions
- Linkage - the co-segregation of two or more loci
- Marker locus with alleles 1 or 2 and dominant
disease allele D
12 22
1D/2d
12 22
12 22 22 12
22 12 22
27No recombination
28Recombination
Recombinant chromosomes
29Phase of marker and disease locus
- Phase pattern of inheritance of alleles at
different loci from parents - If together from one parent then in phase
- Depending on phase can determine probability or
likelihood of family structure given linkage or
no linkage
30Definitions
- Depending on phase can determine probability or
likelihood of family structure
12 22
1D/2d
12 22
12 22 22 12
22 12 22
(1-q) (1-q) (1-q) (1-q)
(1-q) (1-q) (1-q)
What is chance that marker and disease locus are
co-inherited?
Likelihood in this family is (1-q)7 If q 0,
then likelihood is 1.
31- Require - marker be heterozygous in individual
with KEY meiosis - For rare dominant disease affected parent is key
individual
3212 22
22 12
22 22 22 22
22 22 22
33- Require - marker be heterozygous in individual
with KEY meiosis - For rare dominant disease affected parent is key
individual - For recessive disease with one affected parent,
unaffected parent is key
34 35Evidence for linkage
- What is the likelihood of linkage vs. no linkage
in a given pedigree - Substitute into likelihood equation recombination
distance (q) vs. 0.5 for no linkage
36 Likelihood of data if marker locus and disease
locus are linked at
recombination fraction q ODDS
Likelihood of data if loci unlinked or
recombination fraction
equals 0.50
37Likelihood Analysis
L(pedigree? x)
L(pedigree ? 0.50) where ? ? 0.49. LR is
constructed as L.R.
In our example LR (1-q)7 / 0.57
38Assessing the chance of linkage between a marker
and disease locus
- Keep denominator constant and substitute in
different values of q - Value of q that gives largest ratio is best
estimator of genetic distance between marker and
disease locus because it has the best odds
39LOD (log of the odds) Analysis
L(pedigree? x)
L(pedigree ? 0.50) where ? ? 0.49. LR is
constructed as L.R.
In our example LR (1-q)7 / 0.57
Take Log of this ratio
40WHY LOGARITHMS?
- Note the numbers for the likelihoods can be very
small - Inheritance of disease 0.000015578
- Inheritance of disease and marker 0.000000009347
- These are very small numbers and hard to look at
(too many 0s!) - Logs eliminate that problem
- Log10(0.000015578) -4.81
- Log10(0.000000009347) -8.03
- Including data from many pedigrees add logs
41Linkage Phase Known-Dominant Disease Model
1 2
I. II. III.
11 22
2 and D are in phase
1 2
12 22
1 2 3 4
5 6 7 8
9 10
22 22 12 12 12
22 22 22 22 12
42Linkage Phase Known-Dominant Disease Model
1 2
I. II. III.
11 22
1 2
Likelihood q(1-q)9
12 22
1 2 3 4
5 6 7 8
9 10
22 22 12 12 12
22 22 22 22 12 NR
NR NR NR NR R NR
NR NR NR 1-q 1-q 1-q
1-q 1-q q 1-q 1-q 1-q
1-q
43Linkage Phase Known-Dominant Disease Model
1 2
I. II. III.
Ratio is q(1-q)9 (0.5)10
11 22
1 2
12 22
1 2 3 4
5 6 7 8
9 10
z log? 9log(1-?) - 10log(0.50)
22 22 12 12 12
22 22 22 22 12 NR
NR NR NR NR R NR
NR NR NR
? 0.01 0.05
0.10 0.15 0.20 0.30 0.40 0.97
1.51 1.60 1.55 1.44 1.09 0.62
44Evaluating Lod Scores
A). If z(?) ? 3.0, then conclude significant
evidence for linkage. 103 10001 B). If
z(?) ? -2.0, then conclude significant evidence
for non-linkage. 10-2 1001 against
linkage C). If -2.0 ? z(?) ?? 3.0, then
collect more data.
45Some Factors that can Affect Linkage Analysis
- Misspecification of Parameters
- Genetic model
- Frequency of sporadic cases (phenocopies)
46Some Factors that can Affect Linkage Analysis
- Misspecification of Parameters
- Genetic model
- Dominant vs. recessive
47 48Factors that can Affect Linkage Analysis
- Misspecification of Parameters
- Genetic model
- Frequency of sporadic cases (phenocopies)
- Scoring Errors
- Incorrect trait phenotype
- Incorrect marker genotype
- Incorrect family relationships
- Linkage Heterogeneity
49Role of Heterogeneity among families collected
- Genetic
- Different inheritance patterns for same trait
- Retinitis Pigmentosa
- 6 X linked loci
- 12 AD loci
- 8 AR loci
- Locus
- Different genes leading to same trait
- Breast Cancer
- Allelic
- Different alleles at same locus leading to
different phenotype - FGFR3
- ACHONDROPLASIA
- THANATOPHORIC DYSPLASIA
- CROUZON SYNDROME WITH ACANTHOSIS NIGRICANS
50Locus Heterogeneity
51Locus Heterogeneity
52Breast Cancer Mapping
53(No Transcript)
54(No Transcript)
55MULTIPOINT LINKAGE ANALYSIS
- Uses multiple markers together
- Uses (or generates) multiple estimates of ?
- Can provide good estimates of location
- Very sensitive to
- Incorrect specification of marker order
- Genotyping errors
- Locus heterogeneity
56Multipoint Lod Score
A
B
C
?2
?1
57Exercise Analyze as a dominant
disease Analyze as a recessive disease
58(No Transcript)
59Association Studies of Disease Using Unrelated
Samples
60Goal of case-control association study
- Identify genes and/or alleles that
cause/predispose to disease
61Association studies
- Detected by differential distribution of markers
in the case and control groups - Risk increasing allele/genotype will be more
common in disease group - Risk decreasing allele/genotype will be more
common in normal/control group
62Causative Allele
- Sickle Cell Disease
- b chain - Glutamate-6-Valine
Samples population
SC SC
N N
N
SC
N
SC
SC
63Causative Allele
64Causative Allele Allelic Association
65Causative Allele Genotypic Association-
Recessive Model
66Causative Allele Genotypic Association-
Recessive Model
67Test for Allelic Association
c2 (AD-BC)2 N/(AB)(CD)(AC)(BD)
68Susceptibility Alleles
- True of most common diseases with genetic risk
- Cancers, Heart Disease, Asthma, Diabetes, etc
- The association is not complete, even if you know
mode of inheritance and usually do not - Penetrance of Alleles is Incomplete
- May increase or decrease risk but not absolute
- HIV susceptibility and CCR5 variants
69How to Study Genetics of Disease Using Association
- As with linkage
- Characterize Phenotype
- Collect Samples and Clinical Data
70Two Basic Approaches
- Candidate Gene
- Base selection of genes on basis of known
biological function - e.g., Angiotensinogen for blood pressure and
hypertension - Genome Scan
- Assume no knowledge and scan markers across the
entire genome - 10,000,000 validated SNPs in human genome
71Association studies
- Whole genome association (Not hypothesis driven)
- Use random markers throughout genome
- Similar to linkage in that no bias assumed
- Candidate gene association (Hypothesis driven)
- Choose genes on the basis of known
physiology/function
72Genome Scan Association
- Similar to linkage analysis scan
- Do not use a priori knowledge but scan markers
throughout the genome and look for significant
associations - Can do thousands to millions of markers
- Technology 0.01 per genotype
73(No Transcript)
74Association studies
- Whole genome association (Not hypothesis driven)
- Use random markers throughout genome
- Similar to linkage in that no bias assumed
- Candidate gene association (Hypothesis driven)
- Choose genes on the basis of known
physiology/function
75Direct or Indirect association
- Direct
- Identify and study functional variant
- Indirect
- Study variant in linkage disequilibrium with
functional variant
76Why cant you be sure that an association
identifies your gene?
- Linkage/Linkage Disequilibrium
- What is LD
- Nonrandom association between markers
- e.g., SNP1 a or c f(a) 0.5 SNP2 t or g f(t)
0.3 - Expect at together 15 of time if in equilibrium
- What if SNP1 is next to disease marker such that
the presence of a marks disease phenotype
77Linkage disequilibrium
78Hispanic
Loehmueller et al In Press
79261 kb
Haines et al 2005
80(No Transcript)
81(No Transcript)
82(No Transcript)
83(No Transcript)
84(No Transcript)