Title: How many genes? Mapping mouse traits, cont.
1How many genes? Mapping mouse traits, cont.
- Lecture 2B, Statistics 246
- January 22, 2004
2Lets estimate the recombination fraction r
between D12Mit51 and D12Mit132
132 51 A H B Total
A 26 10 0 36
H 10 46 9 65
B 0 5 23 28
Total 36 61 32 129
2-locus genotypes at D12Mit51 and D12Mit132. 129
offspring from H?H, where A?B?H.
3Estimation of r
- First note that we cant simply count
recombinants. Why? Because recombination can
occur in the paternal or the maternal meiosis, or
both, and all we see are the genotypes of the
offspring. In most cases, the parental origin of
the recombination can be inferred, but not in
every case. -
- Denoting the two markers by 1 and 2, the NOD
alleles by a, - and B6 alleles by b, then the parental
haplotypes are a1a2 on one chromosome, and b1b2
on the other. Each parent passes on a1a2 with
probability(1-r)/2, and similarly for b1b2 ,
while they pass on each of the recombinant
haplotypes a1b2 and b1a2 with probability r/2.
-
- In practice, recombinations have slightly
different frequencies in male and female meioses,
but we ignore this refinement.
4Probabilities of parentally transmitted haplotype
combinations (?4)
- Haplotype combinations resulting from
crossing doubly heterozygous parents, each a1/b1
at locus 1 and a2/b2 at locus 2. This table is
for coupling the parental haplotypes are a1a2
and b1b2, i.e. the mother and father are both
a1a2/b1b2. - Here P and M denote the Paternally and
Maternally transmitted haplotypes, respectively.
P M a1a2 a1b2 b1a2 b1b2
a1a2 (1-r)2 r(1-r) r(1-r) (1-r)2
a1b2 r(1-r) r2 r2 r(1-r)
b1a2 r(1-r) r2 r2 r(1-r)
b1b2 (1-r)2 r(1-r) r(1-r) (1-r)2
5From the Punnett square to the table of
2-locus genotype probabilities
- Terms in the Punnett square table can be summed
to build up a table of probabilities for the 9
different 2-locus genotype probabilities. -
- For example, we observe A (a1/a1 ) at locus 1
and H (a2 /b2) at locus 2, if and only if the
transmitted male and female haplotypes are the
pairs a1a2 a1b2 or a1b2 a1a2 , and this occurs
with a combined probability of 2r(1-r)/4. -
- The other terms are built up similarly, the
most complex case being the 2-locus genotype HH,
where 4 different terms need to be considered,
corresponding to the fact that a double
heterozygote can result from 4 different
combinations of parental or recombinant
haplotypes. -
6Probabilities of 2-locus genotypes (?4)
L1 L2 A H B
A (1-r)2 2r(1-r) r2
H 2r(1-r) 2r2(1-r)2 2r(1-r)
B r2 2r(1-r) (1-r)2
Looking at this table, we see that recombinations
(or not) can be inferred, apart from the
parent, in all but the HH case. We can almost
count recombinants.
7Estimation of r, cont.
- Using the table of probabilities we can write
down a log likelihood function for any set of
2-locus frequencies. - Label the cells of the table 1,,9, and denote
the corresponding probabilities by p1(r) ,.p9
(r), and the frequencies by n1, , n9. Then the
log-likelihood for the resulting multinomial
model is - log L ?i ni log pi (r).
- The parameter r is then estimated by maximizing
this function, and an approximate standard error
or confidence interval obtained using the Fisher
information or the asymptotic chi-square
approximation.
8A frill the M-step of an EM-algorithm
- The function log L(r) can be maximized in a
number of ways, but in general there is no closed
form expression for the maximum likelihood
estimate r. If we were able to decompose the
count n5 of HHs into the n5P that are pairs of
parental haplotypes, and n5R that are pairs of
recombinant haplotypes, with frequencies (1-r)2
and r2, resp, the recombinant haplotypes can then
be counted directly and the MLE is - 2(n3 n7 n5R) n2 n4 n6
n8)/2n.
9The E-step
- In general we dont know n5R but can estimate
it using the following formula - In practice, we need a value of r to begin
with. Next we use the above estimate, then get
the next , and then iterate. -
- Exercise Prove the above formula, and that
the iteration is an instance of the EM-algorithm.
102-locus genotype frequencies for D12Mit132 and
D13Mit6
132 6 A H B Total
A 10 21 7 38
H 15 29 17 61
B 5 21 6 32
Total 30 71 30 131
Exercise Estimate r for these two loci. Is it
different from 1/2?