Title: Reconstruction on trees and Phylogeny 2
1Reconstruction on trees and Phylogeny 2
Elchanan Mossel, U.C. Berkeley mossel_at_stat.berke
ley.edu, http//www.cs.berkeley.edu/mossel/ Sup
ported by Microsoft Research and the Miller
Institute
2Reconstruction on Ising-CFN model
- We study the reconstruction problem for the
Ising-CFN model on regular trees. -
-
-
-
-
3Markov models on trees
- Finite set A of information values.
- Tree T(V,E) rooted at r.
- Vertex v 2 V, has information sv 2 A.
- Edge e(v, u), where v is the parent of u, has a
mutation matrix Me of size A A - Mi,j (v,u) P?u j ?v i
- For each character ?, we are given ??T (?v)v 2
?T, - where ?T is the boundary of the tree.
- We will focus on the Ising-CFN model
4Statistical physics
- Statistical physics is a sub-field of
mathematical physics where we study complex
systems with simple microscopic interactions. - The Ising model on a graph is a probability
measure (Gibbs distribution) on the space of
configurations s from vertices to -1,1 such
that - Ps exp(S(v, w) e E s(v)s(w)/T).
- Traditionally studied on cubes in Zd.
The Ising model on 200 x 200 grid
5Statistical physics on trees
- The Ising model on the binary tree can be
defined - Set sr, the root spin, to be /- with probability
½. - For all pairs of (parent, child) (v, w), set sw
sv, with probability ?, otherwise sw /-
with probability ½. - This is exactly the CFN model.
- Studied in statistical physics Spitzer 75,
Higuchi 77, Bleher-Ruiz-Zagrebnov 95,
Evans-Kenyon-Peres-Schulman 2000, Ioffe 99, M 98,
Haggstrom-M 2000, Kenyon-M-Peres 2001,
Martinelli-Sinclair Weitz 2003, Martine 2003
-
-
-
-
6Reconstruction solvability
- Let T be an infinite rooted tree and Tn denote
the first n levels of T. - We say that the reconstruction problem is
solvable if one of the following equivalent
conditions hold - 9 ? s.t. (8 non-degenerate ?) limn ! 1 I(X0,Xn)
gt 0, where I(X0,Xn) H(X0) H(Xn) H(X0,Xn) H
is the entropy operator, H(X) -?x PX x log2
PX x. - 9 i,j s.t. limn ! 1 Pni - Pnj gt 0, where Pnj
denotes the distribution of Xn conditional on X0
j. - If X0 has the uniform distribution then,
liminfn ! 1 ?n gt 1/m, where ?n is the
probability of correct reconstruction of X0 given
Xn. - 9 ? (8 non-degenerate ?) liminfn ! 1
VarEX0Xn gt 0.
7The Ising model on the 3-regular tree
mutual information H(s?) H(sr)) - H(sr,s?)
8Reconstruction for the CFN model
- Thm The reconstruction problem for the Ising
model on the (b1)-regular tree is solvable if
and only if b ?2 gt 1. - Easy direction Higuchi 77 prove that a
certain reconstruction algorithm works when b ?2
gt 1. - Higuchi argument extends to general chains and
general trees. - Will also show an argument from M98 useful for
phylogeny. - Hard direction 95 Non-reconstruction?
- 6 different proofs!
- All involve a magic.
- None extends to other markov models.
- Will follow a coupling proof Martinelli-Siclair-W
eitz
9Non-reconstruction - Coupling down
- Copying rule. For i ,-
- Pi ! i ?.
- Pi ! Uniform 1 ?.
- Continuing down the tree, non-coupled elements
form a branching process with parameter ?.
/ -
/ -
/ -
- If b ? 1, branching process dies ) coupling.
- More generally, at level n, the expected number
of uncoupled sites is bn?n. - (Doesnt work all the way to b ?2 1).
10Non-reconstruction - Coupling up
- We try to couple two configurations which differ
at level n so that they agree at the root. - First consider the case where they differ at
exactly one site.
/ -
u
v
/ -
- Lemma Mossel-Kenyon-Peres Among all boundary
conditions ?, E? ?u 1 ?v 1 E??u -1
?v 1 is maximized for the free boundary. - ) Pnot coupling at u ?.
- ) Pnot coupling at the root ?n.
11Coupling up path coupling
- We got that if ? and ? are two boundary
conditions which differ in one position at level
n, then - E??(?) E??(?) 2 ?n, where ? is the root.
- ) if ? and ? are two boundary conditions which
differ at k sites, then - E??(?) E??(?) 2 k ?n.
- Pf If ? and ? differ at k sites, then we can
find a sequence ? ?(0),?(1),,?(k) ?, such
that ?i and ?i1 differ in exactly one site. - E??(?) E??(?)
- ?i1k E?(i)?(?) E?(i-1)?(?) 2 k ?n.
12Non reconstruction for b ?2 lt 1
- Fix ? such that b ?2 lt 1.
- We will show that EE?(?) ? E-E?(?)
?- ! 0, - where ? boundary conditions conditioned on
?(?) . - Let (?,?-) be given by the down coupling.
- Let K(?,?-) number of disagreements between
?,?-. - EE?(?) ? E- E?(?) ?-
- E_,-E?(?) ? - E?(?) ?-
- E,-2 K(?,?-) ?n 2 ?n E,-K(?,?-) (up
coupling). - 2 ?n bn ?n (down coupling)
- 2 (b ?2)n ! 0 exp. fast in n.
13Where we stopped
- Thm The reconstruction problem for the Ising
model on the (b1)-regular tree is solvable if
and only if b ?2 gt 1. - We showed that if b ?2 lt 1, it is impossible to
reconstruct (hard direction). - We now show that if b ?2 gt 1, we can reconstruct.
14Reconstruction via majority
- Fix ? such that b ?2 gt 1.
- Let X Xn () - (-) at level n.
- We claim that Xn is a good estimator of ?(?).
- EXn bn ?n E-Xn -bn ?n.
- We show that E/-Xn2 c(E/-Xn)2 c b2n
?2n.
- Let f fn (g gn) be the density of the (-)
measure with respect to some reference measure ?. - 2 bn ?n EX E-X s X (f g) d ?
- s X (f1/2 g1/2) (f1/2 g1/2) d ?
- (s X2 (f1/2 g1/2)2 d?)1/2 (s (f1/2
g1/2)2 d? )1/2 - (4 s X2 f d? 4 s X2 g d?)1/2 (s f g
d?)1/2 - (8 c b2n ?2n)1/2 (DTV(,-))1/2.
15Bounds on the second moment
- Write Xn ?v ?(v), where the sum is over all v
in level n. - EXn2 ?v,w E?(v) ?(w).
- For each edge with prob. ? the two end points are
the same and with prob. 1-? the two points are
independent. - If there is a red edge on the path between v and
w, then E?(v) ?(w) 0.
v
w
v
w
- Otherwise, ?(v) ?(w).
- E?(v) ?(w) ?d(v,w).
- EXn2 bn(1 ?i1n (bi bi-1)?2i)
- bn(1 (b-1) ?2 ?i0n-1 bi ?2i).
- O(b2n ?2n) iff b ?2 gt 1.
v
1
2
4
16Remarks on the second moment
- Kamea/ Higuchi argument is very robust.
- Works for general trees when br(T) ?2 gt 1.
- Works for general markov chains, where ? 2nd
eigenvalue of M (M-Peres 2002). - Kesten-Stigum (1966!) proved that for all markov
chains - if b ?2 gt 1, then the limiting law of the count
depends on the root. - If b ?2 lt 1, then the limiting law is normal for
all root values. - M-Peres (2002) count reconstruction is impossible
if b ?2 lt 1.
17Recursive reconstruction for Ising models
?
- An alternative proof for reconstruction for b ?2
gt 1 M98 - Advantage Works also when we have lower bound on
?. Majority doesnt. - Blue edges have ?1 , black ?2, ?1 lt ?2 1.
- Maj(s?) Maj of black tree.
- Maj of black tree sv .
- sv and s? have exp. small correlation.
- Phylogeny reconstruction given bounds.
v
- Instead we will use recursive-majority.