Title: Untangling Molecular Evolution
1Untangling Molecular Evolution
- Andrew Meade
- A.Meade_at_Reading.ac.uk
2Molecular Data
- Human Genome
- Finished 2003
- 13 years, finished 2 year ahead of schedule.
- 3 billion, cost 2.7
- 483 completely sequenced genomes (2006)
- X Prize 100 human genomes 10 days 10 million.
3Molecular Data Pancreatic Ribonuclease
4What is a phylogeny?
- Representation of evolution
- Inferred from data via a model.
- Data is normally a genetic element (such as a
gene), taken from a number of species. - Allows us to infer the past processes of
evolution without observing it.
5(No Transcript)
6Uses of phylogeny
- Spread of diseases, H5N1, HIV.
- Protein-Protein Interaction
- Predicting changes in Protein structure.
- Information about molecular evolution
7Human Influenza (Flu) Virus
1997
10
1984
8(No Transcript)
9The Chicken And The Egg
80 million years
Amniotic Egg 330 million years
10The true tree is unknown
- Data is only available for living species
- Evolution has been going on for a long time (4
billion years) -
- Evolution is very complex
11There are lots of trees
Number of Possible Phylogenetic Trees
Species Number of Trees
Species 50 275292135328356515452597297515244306393
00973035816196098326553772152587890625
12MCMC
- Sample of trees used.
- Trees are sampled in proportion to there
probability. - Not looking for the best / most probable tree.
13Where
Is the probability of the sequence given Treei
Is a vector of branch lengths
Is a vector of parameters lengths
Is the prior probability of t
Is the prior probability of m
14MCMC properties
- Guaranteed to sample all trees in the search
space.
Only as time goes to 8
Guaranteed to sample trees in proportion to there
probability.
Only at convergence
15MCMC Sampling
16Iteration
Convergence Sampling from the stationary
distribution
Log Likelihood
Burn-in
17Postior distribution of likelihoods
18(No Transcript)
19Computational Time
20Parallel algorithm
Node 1
Node 3
Node 2
21Algorithm Scaling
1 Processor 130 Days 60 Processors 4 Days
22Estimating dinosaur genome properties
In Genome size (pg)
ln Osteocyte cell size (µm3)
23(No Transcript)
24The effect of speciation on molecular evolution
each speciation event makes some contribution to
path length
path length accumulates as a function of time
25How many data sets show evidence of a
punctuational effect?
35 of the 100 data sets showed significant
punctuational effects
significantly more common in plants and fungi
than animals
10,000 molecule studied
26(No Transcript)
27Protein Networks
Genes in the human genome
1999 100,000
2002 65,000 75,000
2007 20,000 25,00 19,599 protein-coding genes
confirmed
28Eukaryote protein-interaction network
animals
yeast protein-interaction network (MIPS)
fungal pathogens
yeast
29Changes in Gene networks
yeast
fungal pathogens
animals
retained link
acquired link
30Areas of computer science interest
- Search / Optimisation
-
- Distributed computation / parallelisation
- Visualisation / user interfaces
- Data mining
31Acknowledgments
- Mark Pagel, Chris Venditti and Daniel Barker -
Computation Biology - Vassil Alexandrov, Christian Weihrauch and Ashish
Thandavan - ACET - Chris Organ, Andrew Shedlock, Scott Edwards -
Harvard University
32Convergence of a Markov chainsampling
phylogenetic tree of n500 tips using
an alignment of n4400 nucleotides
log-likelihood
Iteration number
NB 99 of increase in likelihood in first 2.8
of run. 0.07 change in final 2 million
iterations