Title: Phylogenetic Analysis in implementation way
1Phylogenetic Analysis -in implementation way
2Abstract
- Concepts of Phylogenetic Tree
- Two Categories of Phylogenetic Tree
- Character State Matrix
- Distance Matrix
- Tools Introduction
3Definition of Phylogenetic Tree
- What is Phylogenetic Tree?
- To present how species relate to one another in
terms of common ancestors. - How do we construct a phylogenetic tree?
- Generally, we dont have enough data about
distinct ancestors of present-day species. - Most of the phylogenetic tree are hypothesis!
4Common Phylogenetic Tree Terminology
Branches or Lineages
A
Represent the TAXA (genes, populations, species,
etc.) used to infer the phylogeny
B
C
D
Ancestral Node or ROOT of the Tree
E
Internal Nodes or Divergence Points (represent
hypothetical ancestors of the taxa)
5Two Categories of Classify Input Data for
Phylogenetic Tree
- Comparative Characters
- Such as beak shape, number of fingers, presence
or absence, which called character state matrix. - Distance numerical data
- Distance between objects, the resulting matrix is
called the distance matrix.
6Convergence(parallel evolution)
- Two or more objects have the same states for the
same characters. - Exbird and insect do have the power to fly, but
these two objects share a state but are not
genetically close. - Convergence events should not happen, or their
number should be minimized.
7Types of data used in phylogenetic
inference Character-based methods Use the
aligned characters, such as DNA or protein
sequences, directly during tree inference.
Taxa Characters Species
A ATGGCTATTCTTATAGTACG Species
B ATCGCTAGTCTTATATTACA Species
C TTCACTAGACCTGTGGTCCA Species
D TTGACCAGACCTGTGGTCCG Species
E TTGACCAGTTCTCTAGTTCG Distance-based methods
Transform the sequence data into pair-wise
distances (dissimilarities), and then use the
matrix during tree building. A
B C D E Species A ---- 0.20
0.50 0.45 0.40 Species B 0.23 ---- 0.40
0.55 0.50 Species C 0.87 0.59 ----
0.15 0.40 Species D 0.73 1.12 0.17 ----
0.25 Species E 0.59 0.89 0.61 0.31 ----
8Comparison between Character method and Distance
method
- Pros
- very fast
- Cons
- Sequence information is reduced to a number
- Provide only one tree topology
- Dependent on the model of evolution used
9Two cases to verify Phylogeny analysis is helpful
- Corona-Viruses
- From "Characterization of a Novel Coronavirus
Associated with Severe Acute Respiratory Syndrome
, Science 1 May 2003. - HIV
- From "Molecular epidemiology of HIV transmission
in a dental practice", Science 22 May 1992.
10The Florida dentist case
- A dentist from Florida seems to have infected 7
of his patients. The dentist died and the
patients claim for insurance money. - Samples
- dentist, 7 patients, controls (HIV positive of
the same area). - env gene
- HIV isolates from 5 patients and the dentist
strain clustered together with sufficient
bootstrap support (80). - Two patients have different virus strains.
- Conclusion
- The dentist has infected 5 of his patients.
- The insurance company made a deal with the
patients.
11(No Transcript)
12Classifications of Corona-viruses
- There are three groups of corona viruses, groups
1 and 2 contains only mammalian viruses, while
groups 3 contains only avian viruses. - Classified into distinct species by
- host range
- antigenic relationships
- genomic organization
13Membrane Spanning
14Phylogenetic Analysis 1
15Phylogenetic Analysis 2
16(No Transcript)
17(No Transcript)
18There are three possible unrooted trees for four
taxa (A, B, C, D)
Phylogenetic tree building (or inference) methods
are aimed at discovering which of the possible
unrooted trees is "correct". We would like this
to be the true biological tree that is, one
that accurately represents the evolutionary
history of the taxa. However, we must settle for
discovering the computationally correct or
optimal tree for the phylogenetic method of
choice.
19The number of unrooted trees increases in a
greater than exponential manner with number of
taxa
unrooted trees for N taxa
20Inferring evolutionary relationships between the
taxa requires rooting tree
To root a tree mentally, imagine that the tree is
made of string. Grab the string at the root
and tug on it until the ends of the string (the
taxa) fall opposite the root
Note that in this rooted tree, taxon A is no more
closely related to taxon B than it is to C or D.
21Now, try it again with the root at another
position
B
C
Unrooted tree
Root
D
A
A
B
B
C
D
Rooted tree
Note that in this rooted tree, taxon A is most
closely related to taxon B, and together they are
equally distantly related to taxa C and D.
Root
22Each unrooted tree theoretically can be rooted
anywhere along any of its branches
C
A
Un
r
oot
e
d
Root
e
d
x
T
axa
T
rees
Root
s
T
rees
D
B
3
1
3
3
5
4
3
5
1
C
5
15
7
1
0
5
A
D
6
105
9
9
4
5
7
945
11
10,3
9
5
8
1
0
,935
13
1
35,1
3
5
B
E
9
13
5
,135
15
2,
0
27,0
2
5
.
.
.
.
C
.
.
.
.
A
D
.
.
.
.
.
.
.
.
3
6
3
8
30
3
.
58
x
10
57
2
.
04
x
10
unrooted trees for N taxa
E
B
F
23(No Transcript)
24Tree building of phylogeny tree
- Phylogenetic analysis should be conceived as a
search for a correct model. - Presumable
- Particular
- Rationality
- Explanation
25Category of Phylogeny Tree
- Distance Base
- UPGMA
- Neighbor Joining
- Fitch-Margoliash
- Minimum Evolution
- Least Square
26Establish by UPGMA
Unweighted Pair Group Method with Arithmetic Mean
27Establish by UPGMA (Cont.)
ATCC, ATGC
ATCC ATGC
Find the difference metrics to seek the minimal
distance
0.5
0.5
28Establish by UPGMA (Cont.)
29Establish by UPGMA (Cont.)
TTCG, TCGG
TTCG TCGG
30Establish by UPGMA (Cont.)
31Establish by UPGMA (Cont.)
1.5
1.5
32Four steps in phylogenetic data analysis
- Alignment
- Building the data model
- Extracting a phylogenetic dataset
- Determining the substitution model
- models of heterogeneity
- Which model to use
- Tree building
- Tree evaluation
33Alignment - Building the data model
- How much computer dependence?
- manually ? optimally ?
- Phylogenetic criteria preferred
- explicitly ?
- Alignment parameter estimation
- parameters should vary dynamically with
divergence - Which alignment procedure is best?
- unless the actual tree relationship are known
beforehand. - Mathematical optimization and analysis structure
- statistical models is not yet clear that can
determine models
34Alignment -Extraction of a phylogenetic data set
- One of the most important steps in p-tree
analysis because it produce the data set. - Be conscious of deleting unambiguously aligned
regions and inserting or deleting gaps. - Slightly modified alignments to determine how
ambiguous regions in the alignment affect.
35Determining the Substitution Model
- The substitution model should be given the same
emphasis as alignment and tree building ! - Which substitution model to use?
- The fewer the parameters the better. This is
because every parameter estimate has an
associated variance.
36(No Transcript)
37(No Transcript)
38(No Transcript)
39Molecular phylogenetic tree building methods
There are many phylogenetic methods available
today, each having strengths and weaknesses.
Most can be classified as follows
40Category of Phylogeny Tree
- Distance Base
- UPGMA
- Neighbor Joining
- Fitch-Margoliash
- Minimum Evolution
- Least Square
41Establish by N.J
42Establish by N.J (Cont.)
43Establish by N.J (Cont.) Step1
- Calculate the net divergence r (i) for each OTU
from all other OTUs. - r (A) 5476830
- r (B) 571091142
- r (C) 32
- r (D) 38
- r (E) 34
- r (F) 44
44Establish by N.J (Cont.) Step2
- Calculate a new distance matrix using for each
pair of OUTs the formula
45Establish by N.J (Cont.) Step2
46Establish by N.J (Cont.) Step3
- choose as neighbors those two OTUs for which Mij
is the smallest. Now we calculate the branch
length from the internal node U to the external
OTUs A and B.
47Establish by N.J (Cont.) Step4
- define new distances from U to each other
terminal node
48Establish by N.J (Cont.) Step4
49Establish by N.J (Cont.) Step5
- N N-1 5
- The entire procedure is repeated starting at step
1
50Category of Phylogeny Tree
- Distance Base
- UPGMA
- Neighbor Joining
- Fitch-Margoliash
- Minimum Evolution
- Least Square
51Use of FM-algorithm for three sequence
Distance from A to B a b 22 (1) Distance
from A to C a c 39 (2) Distance from B to C
b c 39 (3)
substrate (3) form (2), a b - 2 (4) add (1)
and (4), a 10 from (1) and (2), b 12, c 29
52Tree showing relationship among three sequence
A,B and C.
This calculation finds that the branch lengths of
A and B form their common ancestor are not the
same.
A and B are diverging at different rates of
evolution by this calculation and model
53Use of FM-algorithm for five sequence
54Use of FM-algorithm for five sequence
The most closely related sequences given in the
distance table are D and E. A new table is made
with the remaining sequence combined.
55Use of FM-algorithm for five sequence
56Tree showing relationships among sequence A-E
E
57Steps followed by fitch-margoliash algorithm for
phylogenetic analysis
- Find the most closely related pair of sequence.
- Treat the rest of the sequence as a single
composite sequence. - Calculate the distance in the above example with
three sequence. - Calculate the average distances between AB and
make a new distance table. - Identify the next pair of most closely related
sequences. - When necessary, to calculate lengths of
intermediate branches. - Repeat the entire procedure starting with all
possible pairs. - Calculate the predicted distances between each
pair of sequences.
58Category of Phylogeny Tree
- Distance Base
- UPGMA
- Neighbor Joining
- Fitch-Margoliash
- Minimum Evolution
- Least Square
59Construction of ME
The trees with the shortest sum of the branch
lengths (or overall tree length) is chosen as the
best tree.
60Construction of ME (Cont.)
61Category of Phylogeny Tree
- Distance Base
- UPGMA
- Neighbor Joining
- Fitch-Margoliash
- Minimum Evolution
- Least Square
62Category of Phylogeny Tree
- Character Base
- Maximum Parsimony
- Maximum Likelihood
63Maximum parsimony method
- Requires the minimum number of mutational
changes. - Pros
- Not reduce all sequence information
- Evaluate different tree topology
- Cons
- Slow for large data sets
- Sensitive to unequal rates of evolution
- Only give topology but no branch length
64Steps in building maximum parsimony tree
- Investigate all possible tree topologies
- Reconstruct ancestral sequences
- Choose topology with smallest number of steps
65There are 5 substitutions
66ACTGA
ATTGA
1
0
0
ATTGA
ATTGA
2
4
GTGGA
GTGAC
There are 7 substitutions
67ACTGA
ATTGA
1
0
0
ATTGA
ATTGA
4
2
GTGAC
GTGGC
There are 7 substitutions
68Maximum Parsimony Method
Branch and Bound !
69Maximum Parsimony Method (Cont.)
70Maximum Parsimony Method (Cont.)
71Maximum Parsimony Method (Cont.)
72Maximum Parsimony Method (Cont.)
73Category of Phylogeny Tree
- Character Base
- Maximum Parsimony
- Maximum Likelihood
74Tree Evaluation
75Tools Introduction
- http//evolution.genetics.washington.edu/phylip/so
ftware.html - Phylogenetic tree whole website
- http//www.tigr.org/tigr-scripts/CMR2/webmum/mumpl
ot - The Whole Genome Alignment Tool
76PHYLIP
- Phylogeny inference package(PHYLIP)
- Consisting of about 30 programs that cover most
aspects of p-tree analysis - Free and available for a wild variety of computer
platforms. (dos?mac?unix) - A command line program without GUI.
77Sequence data in FASTA format file
Input file in PHYLIP format
Options -K2P -Jin and Nei -max likeihood -Jukes-Ca
ntor
Options -PAM matrix -Kimura -categories model
Options -Neighbor Join -UPGMA -randomize input
order view trees with standard text editors
Option -outgroup -rooting treefile with Treetool
or TreeView
READSEQ
CONSENSE
DNADIST
PRODIST
NEIGHBOR
78Sequence data in FASTA file
DNADIST Options -k2p -jin and Nei -max
likeihood -Jukes-Cantor
NEIGHBOR Option -Neighbor-join -UPGMA -randomize -
input ouderl
readseq
PROTDIST Option -PAM matrix -Kimura -categories
model
SEQBOOT Options -bootstrap -jackknife -permute
CONSENSE Option -outgroup -rooting
view tree with standard text editor
79PHYLIP input file
80PRODIST output
81(No Transcript)
82(No Transcript)
83(No Transcript)
84(No Transcript)
85(No Transcript)
86(No Transcript)
87Other Reference In This Reporting
- Molecular Evolution and Phylogenetics
- -Masatoshi Nei and Sudhir Kumar
- Phylogenetic analysis
- -Caro-Beth.Stewart
- Introduce to Bioinformatics
- -Arther M. Lesk
88THE END