Title: BIN6002 project report
1BIN6002 project report
Phylogenomics based on a collection of
mitochondrial proteins
- By Tetsu Ishii and Tan Wang
- June 25, 2004
2(No Transcript)
3Our project task
- We are given a dataset of mt genome sequences
from over 100 species. - Through phylogenetic analysis (phylogenome
approch), we want to place two species, nuclearia
and emiliania in the previous phylogenetic tree.
4Our dataset
103 mitochondia and a-proteobacteria genome from
different species
5Which species and which genes were included in
our analysis ?
- Cob and cox1 gene were used to do a quick
phylogenetic analysis using distance methods,
species with long branch were eliminated. - Some species were eliminated simply because we
are not interested. - Totally, 35 species including outgroup were
selected - The common genes set 11 genes were selected
because they present in all remained species.
These 11 genes were concatenated and used in the
phylogenetic analysis. -
6The five steps in phylogenetic analysis
1
Sequence data
2
Align Sequences
Phylogenetic signal?evolutionary processes?
3
Distances methods
Characters based methods
Distance calculation (which model?)
4
Choose a method
phyml
protpars
Puzzle
Bionj
ML
MP
MB
Wheighting? (sites, changes)?
Model?
Calculate or estimate best fit tree
5
Test phylogenetic reliability
Modified from Hillis et al., (1993). Methods in
Enzymology 224, 456-487
7Which methods should we choose ?
- We simply try all 5 methods, which we can access
- Maximum likelihood phyml, puzzle
- Distance method Bionj
- Maximum parsimony phylip (protpars)
- MrBayes
8Which model should we choose ?
- Models are described in terms of the tendency of
one amino acids/base to change to another
site-to-site rate variation the composition. - Dont assume a model. Rather, find a model that
fits your data. - We try both JTT with/without gamma distribution.
9Two examples models
Site rate heterogeneity were modeled with gamma
distribution
10Is the JTT model with gamma distribution better
than the JTT with no gamma for these data
(puzzle) ?
- model ln likelihood ?
- JTT w/gamma -76158.97
- JTT w/o gamma -82133.12 5974.15
2d2 (Inw/gamma-Inw/o gamma)
P0
11puzzle
JTT w/gamma
ln likelihood -76158.97
12puzzle
JTT w/o gamma
ln likelihood -82133.12
13100
100
a-Proteobacteria
100
100
Rhodophytes
100
100
47
PlantsChlorophytes
48
100
69
100
Jakobids
88
100
100
Haptophyceae
Distance method
1
Cercozoa
100
90
59
100
Stramenopile
91
JTT w/gamma
98
Mycetozoa
92
61
93
Holozoa
97
100
100
100
100
Fungi
99
14100
Cercozoa
100
Stramenopile
100
100
Maximum likelihood
Haptophyceae
40
PlantsChlorophytes
100
Jakobids
100
26
JTT w/gamma
100
Rhodophytes
100
100
27
Jakobids
100
Mycetozoa
98
86
100
87
Holozoa
86
100
100
100
Fungi
100
100
100
100
a-Proteobacteria
100
15The conclusion
- With the phylogenome approach, we are able to
place nuclearia within Holozoa and emiliania
within Haptophyceae. - We have a few difficulties to analyze the huge
sequence data perl problem, limited time,
project organization, etc.