Pr - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Pr

Description:

The invasion of the immortal link (From Hein,Wiuf,Knudsen,Moeller & Wiebling 2000) ... Pi,j(t) = probability that i has evolved into j after time t. ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 38
Provided by: hein88
Category:
Tags: immortal

less

Transcript and Presenter's Notes

Title: Pr


1
Approaches to Data Analysis
Data GTCAT,GTTGGT,GTCA,CTCA
Parsimony, similarity, optimisation.
GT-CAT GTTGGT GT-CA- CT-CA-

statistics
statistics
Ideal Practice 1 phase analysis.
Actual Practice 2 phase analysis.
2
Origins of Statistical Alignment
Bishop Thompson 1986 Thorne Kishino
Felsenstein 1991
Challenges to Statistical Alignment
Understanding the Basic Model Speed of the Basic
Algorithm Analyzing Many Sequences - Multiple
Statistical Alignment Realistic Models
The Biological Problems Phylogeny Molecular
Evolution Alignment Homology Testing More
3
Thorne-Kishino-Felsenstein (1991) Process
A C G
T 0
- - -

T t




l lt m P(s) (1-l/m)(l/m)l pA A .. pT T
l length(s) Time reversible
4
The invasion of the immortal link (From
Hein,Wiuf,Knudsen,Moeller Wiebling 2000)
5
Time reversibility
Pi,j(t) probability that i has evolved into j
after time t. p(i) probability of i after
infinitely long time - equilibrium
distribution p(i) Pi,j(t) p(j) Pj,i(t)
a
t1
t2
s2
s1
s1
s2
t1 t2
6
Two kinds of alignment
Optimisation (here Parsimony) Shortest Path
C T G A G G G T - - G C
CTGAGG
GTGC
Statistical Probability and Sum over all Paths
C T G A G G G T - - G C
CTGAGG
GTGC
7
l m into Alignment Blocks
A. Amino Acids Ignored - - -
- - - - - - - -
-
k k
k e-mt1-lb(t)(lb(t))k-1
1-e-mt-mb(t)1-lb(t)(lb(t))k-1
1-lb(t)(lb(t))k pk(t)
pk(t)
pk(t)
p0(t) mb(t)
b(t)1-e(l-m)t/m-l
B. Amino Acids Considered T - - -
R Q S W Pt(T--gtR)pQ..pWp4(t)
4 T - - - -
- R Q S W pR pQ..pWp4(t)
8
Illustration of single equation.
- - ... - ...
pk1
m
- - ... - - ...
pk
mk
lk
l(k-1)
m(k1)
- - - ... - - ...
- ... - - ...
pk1
pk-1
DpkDtl(k-1) pk-1m(k1)pk1
-(lm)kpkmpk1
9
Diff. Equations for p-functions
- - ... - ...
Dpk Dtl(k-1) pk-1 mkpk1 -
(lm)kpk - - - ... -
- ... DpkDtl(k-1)
pk-1m(k1)pk1-(lm)kpkmpk1
- - - ... - ...
DpkDtlkpk-1m(k-1)pk1-((k1)lmk)
pk Initial Conditions pk(0) pk(0) pk
(0) 0 kgt1 p0(0)
p0(0) 1. p0 (0) 0
10
Basic Pairwise Recursion (O(length3))
i
j
Survives
Dies
i-1
i
i-1
i
j-1
j
j
i-1
i
j-2
j






1 j (j) cases
0 j (j1) cases
11
Basic Pairwise Recursion (O(length3))
j
(i,j)
(i-1,j)
j-1
(i-1,j-1)
Initial condition ps21j
..
(i-1,j-k)
..
..
i
i-1
12
Fundamental Pairwise Recursion.
P(s1i-gts2j) p0P(s1i-1-gts2j) Initial
Condition P(s10 -gts2j)
pjps21j Probability of observation
P(s1,s2) P(s1) P(s1 -gts2) Simplification
Ri,j(p1f(s1i,s2j)p1ps2jj)P(s1i-1-gts2j-1)
l?b p?s2jRi,j-1
P(s1i-gts2j) Ri,j p0 P(s1i-gts2j-1) P(s1i-gts
2j) p0P(s1i-1-gts2j)
??????????????????? lbP(s1i-gts2j-1)
(p1f(s1i,s2jp1p?s2jj- lb p?s2jj
))P(s1i-1-gts2j-1)
13
Geometric Like Offspring Number
- - - - - - -
- k
k e-mt1-lb(t)(lb(t))k-1
1-e-mt-mb(t)1-lb(t)(lb(t))k-1
pk(t)
pk(t)

p0(t) mb(t)
Alternative traversal
Die forward in time Give birth backwards Trace
leftmost unfinished branch. After one survivor,
branch lengths With birth possibility always t.
14
Quadratic Recursion
(i,j)
(i-1,j)
(i-1,j-1)
(i,j-1)
Two state recursion
Ri,j(p1f(s1i,s2j)p1ps2jj)P(s1i-1-gts2j-1)
l?b p?s2jRi,j-1 P(s1i-gts2j) Ri,j p0
P(s1i-gts2j-1)
One state recursion
P(s1i-gts2j) p0P(s1i-1-gts2j)
??????????????????? lbP(s1i-gts2j-1)
(p1f(s1i,s2jp1p?s2jj- lb p?s2jj
))P(s1i-1-gts2j-1)
1. Summation, Maximization and Sampling of
Alignments. 2. For more sequences Ancestral
Sequences Alignments.
15
Likelihood Surface (From Hein,Wiuf,Knudsen,Moeller
Wiebling 2000)
16
a-globin (141) and b-globin (146) (From
Hein,Wiuf,Knudsen,Moeller Wiebling
2000) 430.108 -log(a-globin) 327.320
-log(a-globin --gt b-globin) 730.428
-log(a-globin, b-globin) -log(l(sumalign))
lt 0.0371805 /- 0.0135899 mt
0.0374396 /- 0.0136846 st 0.91701 /-
0.119556 E(Length) E(Insertions,Deletions)
E(Substitutions) 143.499 5.37255
131.59 Maximum contributing
alignment V-LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFP
TTKTYFPHF-DLS--H---GSAQVKGHGKKVADALT VHLTPEEKSAVTA
LWGKV--NVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVK
AHGKKVLGAFS NAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHC
LLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR DGLAHLDNLKGT
FATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVV
AGVANALAHKYH Ratio l(maxalign)/l(sumalign)
0.00565064
17
Likelihood Surface (From Hein,Wiuf,Knudsen,Moeller
Wiebling 2000)
18
Homology Test Wi,j -ln(piP2.5i,j/(pipj)) D(
s1,s2) is evaluated in D(s1,s2) Real s1
ATWYFCAK-AC Random s1 ATWYFC-AKAC
s2 ETWYKCALLAD s2
LTAYKADCWLE
This test 1. Test the competing
hypothesis that 2 sequences are 2.5 events apart
versus infinitely far apart. 2. It only handles
substitutions correctly. The rationale for
indel costs are more arbitrary. 3. It samples in
(pipj) by permuting the order of amino acids in
the second. I.e. uses drawing without
replacement a hypergeometric distribution.
19
a-, myoglobin homology test (From
Hein,Wiuf,Knudsen,Moeller Wiebling 2000)
20
Algorithm for alignment on star tree
(O(length6))(Steel Hein, 2001)
ACGC
TT GT
s2
s1
a

(l/m)
s3
ACG GT
21
Binary Tree Problem
TGA
ACCT
s1
s3
a1
a2
s2
s4
GTT
ACG
22
Binary Tree Problem
  • The problem would be simpler if
  • The ancestral sequences their alignment was
    known.
  • ii. The alignment of ancestral alignment
    columns to leaf sequences was known.

A markov chain generating ancestral alignments
can solve the problem!!
23
Markov Chains Generating the p-functions
Ancestral Sequence Generator

p function generator
- - - -
p/p function generator
- - - -
-
E
lb 1- lb 1-mb mb

-
- - - - -
lb 1- lb
-
24
Generating Ancestral Alignments.
-
E
- E
lb l/m (1- lb)e-m
l/m (1- lb)(1- e-m) (1-
l/m) (1- lb) - lb
l/m (1- lb)e-m l/m (1- lb)(1-
e-m) (1- l/m) (1- lb) _
lb l/m (1- lb)e-m
l/m (1- lb)(1- e-m) (1-
l/m) (1- lb) -

lb
a1 -
E a2
E lb
l/m (1- lb)e-m
(1- l/m) (1- lb)
25
The Basic Recursion
Remove 1st step - recursion
S
E
Remove last step - recursion
26
4-Sequence Recursion II First Step Removal
Pa(Sk) Epifixes (Sk1l) starting in given MC
starts in a.
Pa(Sk)
Where P(kS i,H??)
F(kSi,H)
27
Example 4 globins
logLikelikelihood -1593.223
28
Example 4 globins
29
O(lk)algorithm for k sequences
Two Approaches Use geometric tails of
p-functions suitable rearrangements. Make
ancestral Markov Chain for the leaves as well
30
Contrasting Probability Distance Recursions
Probability
O(l2k) O(lk) possible
Distance (Sankoff, 1973) - O(lk)
A
C
-
A
15 cases
31
k ancestral sequence Markov Chain
State Space E E
All connected . , . ,
. . -tuples
E
a4 - a4 -
/ / a1 ---a2----a3
a1 ---a2----a3 \ -
\ - a5 - a5
32
k ancestral sequences 2 Problems
1. Ambigous Indel/Alignment relationship.
a - / \ /
\ s1 - - s2 s1 - -
- - a - - - - s2 - -
- -
2. Grand children before younger siblings. a
- - - - - - - - a1
- - - - a2 -
- - -
33
Transition Probabilities between two k-ancestral
states
0 - 1 - - 2 - 3 4 - 5 6
- 7 -
1
4
0 -
5
2
3
6
7
34
Gibbs Samplers for Statistical Alignment
Holmes Bruno (2001) Sampling Ancestors to
pairs.
Jensen Hein (subm.) Sampling nodes adjacent to
triples Slower basic operation, faster mixing
35
Work in Progress Plans
State Reduction (Lunter, Song, Hein Miklos)
Longer Insertion-Deletions (Miklos, Lunter,
Holmes)
A T C C G
A T C C G
Heterogeneity along Sequence (Skou,
Hein,..) HMM/SCFG like?
T T
Acceleration Implementation (Lunter
Song) MCMC Methods (Ledet Jensen,
Holmes,...........)
36
Statistical Alignment Summary
Motivation for statistical alignment i.
Data is sequences - not alignment! ii. The
focus on alignments is exagerated!!
Progress Major Accelerations for
pairwise/multiple statistical alignment Longer
Insertion-Deletions models
Challenges ahead Position Heterogeneity hmm
scfg analogues. Algorithms for large data sets
(gt5 sequences) MCMC. Local alignment version
Software ???
37
Acknowledgements (www.stats.ox.ac.uk/hein)
Pairwise (with Knudsen, Wiuf, Møller, Wibling)
Simpler recursion. Computational
acceleration. Multiple Star Tree (with
M.Steel) Binary Tree (with C.Storm, Jens
Ledet, Lunter, Miklos,Song,Holmes,..) Gibbs
Multiple Alignment (with Jens Ledet)
Articles Manuscripts 1. Hein,J.J., C.Wiuf,
B.Knudsen, Møller, M., and G.Wibling (2000)
Statistical Alignment Computational Properties,
Homology Testing and Goodness-of-Fit. (J.
Molecular Biology 302.265-279) 2. J.J.Hein
(2001) A generalisation of the
Thorne-Kishino-Felsenstein model of Statistical
Alignment to k sequences related by a binary
tree. (Pac.Symp.Biocompu. 2001 p179-190 (eds RB
Altman et al.) 3. Steel, M. J.J.Hein (2001) A
generalisation of the Thorne-Kishino-Felsenstein
model of Statistical Alignment to k sequences
related by a star tree. ( Letters in Applied
Mathematics) 4. JJ Hein, J.L.Jensen, C.Pedersen
(2002) Algorithms for Multiple Statistical
Alignment. (submitted to PNAS) 5. J.L.Jensen
JJ Hein (2002) A Gibbs Sampler for Multiple
Statistical Alignment. (submitted Statistical
Journal) 6. Lunter, Song, Miklos Hein (2002)
(In Press J.Com.Biol.) 7. Lunter, Song, Hein
(2003) (in prep.) 8. Miklos, Lunter Holmes
(2002) (in press MBE) 9. Miklos, I Toroczkai
Z. (2001) An improved model for statistical
alignment, in WABI2001, Lecture Notes in Computer
Science, (O. Gascuel BME Moret, eds) 21491-10.
Springer, Berlin 10 Miklos, I (2002) An improved
algorithm for statistical alignment of sequences
related by a star tree. Bul. Math. Biol.
64771-779. 11 Miklos, I (2002) Algorithm for
statistical alignment of sequences derived from a
Poisson sequence length distribution Disc. Appl.
Math. accepted. 12 Holmes, I W.Bruno (2001)
Evolutionary HMMs A Bayesian Approach to
Multiple Alignment Bioinformatics 17.9.803-20.
Write a Comment
User Comments (0)
About PowerShow.com