Title: Linkage and LOD score
1Linkage and LOD score
Manuel AR Ferreira
Massachusetts General Hospital
Harvard Medical School
Boston
Egmond, 2006
2Outline
1. Aim
2. The Human Genome
3. Principles of Linkage Analysis
4. Parametric Linkage Analysis
Practical
5. Nonparametric Linkage Analysis
31. Aim
4QTL mapping
LOCALIZE and then IDENTIFY a locus that regulates
a trait (QTL)
Nucleotide or sequence of nucleotides with
variation in the population, with different
variants associated with different trait levels.
5For a heritable trait...
Linkage
localize region of the genome where a QTL that
regulates the trait is likely to be harboured
Family-specific phenomenon Affected individuals
in a family share the same ancestral
predisposing DNA segment at a given QTL
identify a QTL that regulates the trait
Association
Population-specific phenomenon Affected
individuals in a population share the same
ancestral predisposing DNA segment at a given QTL
62. Human Genome
7DNA structure
A DNA molecule is a linear backbone of
alternating sugar residues and phosphate groups
Attached to carbon atom 1 of each sugar is a
nitrogenous base A, C, G or T
Two DNA molecules are held together in
anti-parallel fashion by hydrogen bonds between
bases Watson-Crick rules
Antiparallel double helix
A gene is a segment of DNA which is transcribed
to give a protein or RNA product
Only one strand is read during gene transcription
Nucleotide 1 phosphate group 1 sugar 1 base
8DNA polymorphisms
Microsatellites gt100,000 Many alleles, (CA)n,
very informative, even, easily automated
SNPs 11,961,761 (11 Sept 06) Most with 2
alleles (up to 4), not very informative, even,
easily automated
A
B
9DNA organization
22 1
2 (22 1)
2 (22 1)
2 (22 1)
?
?
?
A -
A -
A -
?
B -
?
?
?
?
Mitosis
B -
B -
chr1
A -
A -
A -
A -
- A
- A
?
?
?
B -
B -
B -
B -
- B
- B
A -
- A
- A
B -
- B
chr1
- B
G1 phase
S phase
M phase
Haploid gametes
Diploid zygote 1 cell
Diploid zygote gt1 cell
10DNA recombination
22 1
22 1
A -
NR
(?)
B -
A -
- A
chr1
2 (22 1)
2 (22 1)
B -
- B
?
- A
Meiosis
R
chr1
(?)
(?)
?
?
- B
A -
A -
- A
- A
chr1
B -
B -
- B
- B
A -
R
chr1
chr1
chr1
chr1
(?)
A -
- A
B -
chr1
Diploid gamete precursor cell
B -
- B
- A
chr1
NR
- B
Haploid gamete precursors
chr1
Hap. gametes
11DNA recombination between linked loci
22 1
A -
NR
B -
(?)
A -
- A
B -
- B
2 (22 1)
?
- A
Meiosis
NR
- B
(?)
(?)
?
?
A -
A -
- A
- A
B -
B -
- B
- B
A -
NR
B -
(?)
A -
- A
B -
- B
Diploid gamete precursor
- A
- B
NR
Haploid gamete precursors
Hap. gametes
12Human Genome - summary
DNA is a linear sequence of nucleotides
partitioned into 23 chromosomes Two copies of
each chromosome (2x22 autosomes XY),
from paternal and maternal origins. During
meiosis in gamete precursors, recombination can
occur between maternal and paternal homologs
Recombination fraction between loci A and B
(?) Proportion of gametes produced that are
recombinant for A and B If A and B are very far
apart 50R50NR - ? 0.5 If A and B are very
close together lt50R - 0 ? lt 0.5
Recombination fraction (?) can be converted to
genetic distance (cM) Haldane
eg. ?0.17, cM20.8 Kosambi eg.
?0.17, cM17.7
133. Principles of Linkage Analysis
14Linkage Analysis requires genetic markers
Q
M1
Mn
M2
0.5
.4
.3
.3
.4
0.5
?
0.5
.15
M1
Mn
M2
.35
.35
.22
.26
0.5
?
0.5
0.5
.4
.3
.3
.4
.1
M1
Mn
M2
15Linkage Analysis Parametric vs. Nonparametric
Gene
Chromosome
Recombination
Genetic factors
Q
M
A
Mode of inheritance
Correlation
D
Phe
C
E
Environmental factors
Adapted from Weiss Terwilliger 2000
164. Parametric Linkage Analysis
17Linkage with informative phase known meiosis
Gene
Chromosome
?
?
M1..6
Q1,2
Autosomal dominant, Q1 predisposing allele
Estimate ? between M and Q
M2M5Q2Q2
M1M6Q1Q?
M1
Q1
Informative Phase known
M1Q1/M2Q2
M3M4Q2Q2
M1M2Q1Q2
M2
Q2
M1Q1/M3Q2
M2Q2/M3Q2
M1Q1/M4Q2
M1Q1/M4Q2
M2Q2/M4Q2
M2Q1/M3Q2
NR M1Q1
NR M2Q2
(20.8 cM)
?MQ 1/6 0.17
R M1Q2
R M2Q1
18Linkage with informative phase unknown meiosis
M1
Q1
M1
Q2
Q2Q2
Q1Q?
M2
Q2
M2
Q1
Informative Phase unknown
M1Q1/M2Q2
M1Q2/M2Q1
M1M2Q1Q2
M3M4Q2Q2
M1Q1/M3Q2
M2Q2/M3Q2
M1Q1/M4Q2
M1Q1/M4Q2
M2Q2/M4Q2
M2Q1/M3Q2
M1Q1/M2Q2
M1Q2/M2Q1
P
P
N
N
R M1Q1
3
3
NR M1Q1
1-?
?
NR M2Q2
R M2Q2
2
2
R M1Q2
NR M1Q2
0
0
?
1-?
R M2Q1
NR M2Q1
1
1
19Parametric LOD score calculation
R
Overall LOD score for a given ? is the sum of all
family LOD scores at ?
eg. LOD3 for ?0.28
20Parametric Linkage Analysis - summary
Q
M1
M2
Mn
.3
.4
?
0.5
0.5
.4
.3
0.5
.1
For each marker, estimate the ? that yields
highest LOD score across all families
This ? (and the LOD) will depend upon the mode of
inheritance assumed MOI determines the genotype
at the trait locus Q and thus determines
the number of meiosis which are recombinant or
nonrecombinant. Limited to Mendelian diseases.
Markers with a significant parametric LOD score
(gt3) are said to be linked to the trait locus
with recombination fraction ?
21Practical what is the most likely ? between M
and Q?
M1M2Q1Q1
M3M4Q1Q2
M2M3Q1Q1
M1M4Q1Q2
M1M4Q1Q1
M2M4Q1Q2
M2M4Q1Q2
1. Identify informative individual with offspring
in the pedigree
2. Reconstruct possible phases of that individual
and of all offspring
3. Classify the gametes that individual produces
as R or NR
4. Count the number of R and NR gametes
effectively produced
5. Express
6. Express LOD score
22Practical answers
2. Reconstruct possible phases of that individual
and all offspring
1. Identify informative individual with offspring
in the pedigree
3. Classify the gametes that individual produces
as R or NR
4. Count the number of R and NR gametes
effectively produced
5. Express
6. Express LOD score
M3M4Q1Q2
M1M2Q1Q1
M2Q1/M3Q1
M1Q1/M4Q2
M1Q1/M4Q1
M2Q1/M4Q2
M2Q1/M4Q2
M2M3Q1Q1
M1M4Q1Q2
M1M4Q1Q1
M2M4Q1Q2
M2M4Q1Q2
M3Q1/M4Q2
M3Q2/M4Q1
P
P
N
N
1
1
NR M3Q1
R M3Q1
?
1-?
R M4Q2
NR M4Q2
3
3
R M3Q2
NR M3Q2
0
0
1-?
?
R M4Q1
NR M4Q1
1
1
23Outline
1. Aim
2. The Human Genome
3. Principles of Linkage Analysis
4. Parametric Linkage Analysis
5. Nonparametric Linkage Analysis
245. Nonparametric Linkage Analysis
25Approach
Parametric genotypes marker locus genotypes
trait locus (latter inferred from phenotype
according to a specific disease model) Parameter
of interest ? between marker and trait loci
Nonparametric genotypes marker locus
phenotype If a trait locus truly regulates the
expression of a phenotype, then two relatives
with similar phenotypes should have similar
genotypes at a marker in the vicinity of the
trait locus, and vice-versa. Interest
correlation between phenotypic similarity and
marker genotypic similarity
No need to specify mode of inheritance, allele
frequencies, etc...
26Phenotypic similarity between relatives
Squared trait differences
Squared trait sums
Trait cross-product
Trait variance-covariance matrix
Affection concordance
T2
T1
27Genotypic similarity between relatives
IBS Alleles shared Identical By State look the
same, may have the same DNA sequence but they
are not necessarily derived from a known common
ancestor
M3
M1
M2
M3
Q3
Q1
Q2
Q4
IBD Alleles shared Identical By Descent are
a copy of the same ancestor allele
M1
M2
M3
M3
Q1
Q2
Q3
Q4
IBS
IBD
M1
M3
M1
M3
2
1
Q1
Q3
Q1
Q4
0
0
0
1
1
Inheritance vector (M)
28Genotypic similarity between relatives -
Number of alleles shared IBD
Proportion of alleles shared IBD -
Inheritance vector (M)
M2
M3
M1
M3
0
0
0
0
1
1
Q2
Q4
Q1
Q3
M1
M3
M1
M3
0.5
0
0
0
1
1
Q1
Q3
Q1
Q4
M1
M1
M3
M3
2
1
0
0
0
0
Q1
Q1
Q3
Q3
29Genotypic similarity between relatives -
A
B
C
D
22n
30Statistics that incorporate both phenotypic and
genotypic similarities
Phenotypic similarity
0
0.5
1
Genotypic similarity ( )
31Haseman-Elston regression Quantitative traits
0.5
1
0
Phenotypic (dis)similarity
Genotypic similarity
b
c
32VC ML Quantitative Categorical traits
method
0.5
1
0
H1
H0
e.g. LOD3
33Genome-wide linkage analysis (e.g. VC)
Individual LOD scores can be expressed as P
values (Pointwise) LOD Chi-sq (n-df) P
value 2.1 9.67 0.0009
(x4.6)
True positive
Theoretical (Lander Kruglyak 1995)
k
LOD
LOD 3.6, Chi-sq 16.7, P 0.000022
Type I error
34Nonparametric Linkage Analysis - summary
No need to specify mode of inheritance
Models phenotypic and genotypic similarity of
relatives
Expression of phenotypic similarity, calculation
of IBD
HE and VC are the most popular statistics used
for linkage of quantitative traits
Other statistics available, specially for
affection traits