Phylogenetic Analysis in implementation way - PowerPoint PPT Presentation

1 / 88
About This Presentation
Title:

Phylogenetic Analysis in implementation way

Description:

Corona-Viruses. From 'Characterization of a Novel Coronavirus Associated with Severe Acute ... that accurately represents the evolutionary history of the taxa. ... – PowerPoint PPT presentation

Number of Views:223
Avg rating:3.0/5.0
Slides: 89
Provided by: gbaCsie
Category:

less

Transcript and Presenter's Notes

Title: Phylogenetic Analysis in implementation way


1
Phylogenetic Analysis -in implementation way
2
Abstract
  • Concepts of Phylogenetic Tree
  • Two Categories of Phylogenetic Tree
  • Character State Matrix
  • Distance Matrix
  • Tools Introduction

3
Definition of Phylogenetic Tree
  • What is Phylogenetic Tree?
  • To present how species relate to one another in
    terms of common ancestors.
  • How do we construct a phylogenetic tree?
  • Generally, we dont have enough data about
    distinct ancestors of present-day species.
  • Most of the phylogenetic tree are hypothesis!

4
Common Phylogenetic Tree Terminology
Branches or Lineages
A
Represent the TAXA (genes, populations, species,
etc.) used to infer the phylogeny
B
C
D
Ancestral Node or ROOT of the Tree
E
Internal Nodes or Divergence Points (represent
hypothetical ancestors of the taxa)
5
Two Categories of Classify Input Data for
Phylogenetic Tree
  • Comparative Characters
  • Such as beak shape, number of fingers, presence
    or absence, which called character state matrix.
  • Distance numerical data
  • Distance between objects, the resulting matrix is
    called the distance matrix.

6
Convergence(parallel evolution)
  • Two or more objects have the same states for the
    same characters.
  • Exbird and insect do have the power to fly, but
    these two objects share a state but are not
    genetically close.
  • Convergence events should not happen, or their
    number should be minimized.

7
Types of data used in phylogenetic
inference Character-based methods Use the
aligned characters, such as DNA or protein
sequences, directly during tree inference.
Taxa Characters Species
A ATGGCTATTCTTATAGTACG Species
B ATCGCTAGTCTTATATTACA Species
C TTCACTAGACCTGTGGTCCA Species
D TTGACCAGACCTGTGGTCCG Species
E TTGACCAGTTCTCTAGTTCG Distance-based methods
Transform the sequence data into pair-wise
distances (dissimilarities), and then use the
matrix during tree building. A
B C D E Species A ---- 0.20
0.50 0.45 0.40 Species B 0.23 ---- 0.40
0.55 0.50 Species C 0.87 0.59 ----
0.15 0.40 Species D 0.73 1.12 0.17 ----
0.25 Species E 0.59 0.89 0.61 0.31 ----
8
Comparison between Character method and Distance
method
  • Pros
  • very fast
  • Cons
  • Sequence information is reduced to a number
  • Provide only one tree topology
  • Dependent on the model of evolution used

9
Two cases to verify Phylogeny analysis is helpful
  • Corona-Viruses
  • From "Characterization of a Novel Coronavirus
    Associated with Severe Acute Respiratory Syndrome
    , Science 1 May 2003.
  • HIV
  • From "Molecular epidemiology of HIV transmission
    in a dental practice", Science 22 May 1992.

10
The Florida dentist case
  • A dentist from Florida seems to have infected 7
    of his patients. The dentist died and the
    patients claim for insurance money.
  • Samples
  • dentist, 7 patients, controls (HIV positive of
    the same area).
  • env gene
  • HIV isolates from 5 patients and the dentist
    strain clustered together with sufficient
    bootstrap support (80).
  • Two patients have different virus strains.
  • Conclusion
  • The dentist has infected 5 of his patients.
  • The insurance company made a deal with the
    patients.

11
(No Transcript)
12
Classifications of Corona-viruses
  • There are three groups of corona viruses, groups
    1 and 2 contains only mammalian viruses, while
    groups 3 contains only avian viruses.
  • Classified into distinct species by
  • host range
  • antigenic relationships
  • genomic organization

13
Membrane Spanning
14
Phylogenetic Analysis 1
15
Phylogenetic Analysis 2
16
(No Transcript)
17
(No Transcript)
18
There are three possible unrooted trees for four
taxa (A, B, C, D)
Phylogenetic tree building (or inference) methods
are aimed at discovering which of the possible
unrooted trees is "correct". We would like this
to be the true biological tree that is, one
that accurately represents the evolutionary
history of the taxa. However, we must settle for
discovering the computationally correct or
optimal tree for the phylogenetic method of
choice.
19
The number of unrooted trees increases in a
greater than exponential manner with number of
taxa
unrooted trees for N taxa
20
Inferring evolutionary relationships between the
taxa requires rooting tree
To root a tree mentally, imagine that the tree is
made of string. Grab the string at the root
and tug on it until the ends of the string (the
taxa) fall opposite the root
Note that in this rooted tree, taxon A is no more
closely related to taxon B than it is to C or D.
21
Now, try it again with the root at another
position
B
C
Unrooted tree
Root
D
A
A
B
B
C
D
Rooted tree
Note that in this rooted tree, taxon A is most
closely related to taxon B, and together they are
equally distantly related to taxa C and D.
Root
22
Each unrooted tree theoretically can be rooted
anywhere along any of its branches
C
A

Un
r
oot
e
d

Root
e
d
x


T
axa
T
rees

Root
s

T
rees
D
B
3


1
3



3
5
4


3
5

1
C
5


15
7


1
0
5
A
D
6


105
9


9
4
5
7


945
11


10,3
9
5
8
1
0
,935
13

1
35,1
3
5
B
E
9
13
5
,135
15
2,
0
27,0
2
5
.
.
.
.
C
.
.
.
.
A
D
.
.
.
.
.
.
.
.
3
6
3
8
30
3
.
58
x
10
57
2
.
04
x
10
unrooted trees for N taxa




E
B
F
23
(No Transcript)
24
Tree building of phylogeny tree
  • Phylogenetic analysis should be conceived as a
    search for a correct model.
  • Presumable
  • Particular
  • Rationality
  • Explanation

25
Category of Phylogeny Tree
  • Distance Base
  • UPGMA
  • Neighbor Joining
  • Fitch-Margoliash
  • Minimum Evolution
  • Least Square

26
Establish by UPGMA
Unweighted Pair Group Method with Arithmetic Mean
27
Establish by UPGMA (Cont.)
ATCC, ATGC
ATCC ATGC
Find the difference metrics to seek the minimal
distance
0.5
0.5
28
Establish by UPGMA (Cont.)
29
Establish by UPGMA (Cont.)
TTCG, TCGG
TTCG TCGG
30
Establish by UPGMA (Cont.)
31
Establish by UPGMA (Cont.)
1.5
1.5
32
Four steps in phylogenetic data analysis
  • Alignment
  • Building the data model
  • Extracting a phylogenetic dataset
  • Determining the substitution model
  • models of heterogeneity
  • Which model to use
  • Tree building
  • Tree evaluation

33
Alignment - Building the data model
  • How much computer dependence?
  • manually ? optimally ?
  • Phylogenetic criteria preferred
  • explicitly ?
  • Alignment parameter estimation
  • parameters should vary dynamically with
    divergence
  • Which alignment procedure is best?
  • unless the actual tree relationship are known
    beforehand.
  • Mathematical optimization and analysis structure
  • statistical models is not yet clear that can
    determine models

34
Alignment -Extraction of a phylogenetic data set
  • One of the most important steps in p-tree
    analysis because it produce the data set.
  • Be conscious of deleting unambiguously aligned
    regions and inserting or deleting gaps.
  • Slightly modified alignments to determine how
    ambiguous regions in the alignment affect.

35
Determining the Substitution Model
  • The substitution model should be given the same
    emphasis as alignment and tree building !
  • Which substitution model to use?
  • The fewer the parameters the better. This is
    because every parameter estimate has an
    associated variance.

36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Molecular phylogenetic tree building methods
There are many phylogenetic methods available
today, each having strengths and weaknesses.
Most can be classified as follows
40
Category of Phylogeny Tree
  • Distance Base
  • UPGMA
  • Neighbor Joining
  • Fitch-Margoliash
  • Minimum Evolution
  • Least Square

41
Establish by N.J
42
Establish by N.J (Cont.)
43
Establish by N.J (Cont.) Step1
  • Calculate the net divergence r (i) for each OTU
    from all other OTUs.
  • r (A) 5476830
  • r (B) 571091142
  • r (C) 32
  • r (D) 38
  • r (E) 34
  • r (F) 44

44
Establish by N.J (Cont.) Step2
  • Calculate a new distance matrix using for each
    pair of OUTs the formula

45
Establish by N.J (Cont.) Step2
46
Establish by N.J (Cont.) Step3
  • choose as neighbors those two OTUs for which Mij
    is the smallest. Now we calculate the branch
    length from the internal node U to the external
    OTUs A and B.

47
Establish by N.J (Cont.) Step4
  • define new distances from U to each other
    terminal node

48
Establish by N.J (Cont.) Step4
49
Establish by N.J (Cont.) Step5
  • N N-1 5
  • The entire procedure is repeated starting at step
    1

50
Category of Phylogeny Tree
  • Distance Base
  • UPGMA
  • Neighbor Joining
  • Fitch-Margoliash
  • Minimum Evolution
  • Least Square

51
Use of FM-algorithm for three sequence
Distance from A to B a b 22 (1) Distance
from A to C a c 39 (2) Distance from B to C
b c 39 (3)
substrate (3) form (2), a b - 2 (4) add (1)
and (4), a 10 from (1) and (2), b 12, c 29
52
Tree showing relationship among three sequence
A,B and C.
This calculation finds that the branch lengths of
A and B form their common ancestor are not the
same.
A and B are diverging at different rates of
evolution by this calculation and model
53
Use of FM-algorithm for five sequence
54
Use of FM-algorithm for five sequence
The most closely related sequences given in the
distance table are D and E. A new table is made
with the remaining sequence combined.
55
Use of FM-algorithm for five sequence
56
Tree showing relationships among sequence A-E
E
57
Steps followed by fitch-margoliash algorithm for
phylogenetic analysis
  • Find the most closely related pair of sequence.
  • Treat the rest of the sequence as a single
    composite sequence.
  • Calculate the distance in the above example with
    three sequence.
  • Calculate the average distances between AB and
    make a new distance table.
  • Identify the next pair of most closely related
    sequences.
  • When necessary, to calculate lengths of
    intermediate branches.
  • Repeat the entire procedure starting with all
    possible pairs.
  • Calculate the predicted distances between each
    pair of sequences.

58
Category of Phylogeny Tree
  • Distance Base
  • UPGMA
  • Neighbor Joining
  • Fitch-Margoliash
  • Minimum Evolution
  • Least Square

59
Construction of ME
The trees with the shortest sum of the branch
lengths (or overall tree length) is chosen as the
best tree.
60
Construction of ME (Cont.)
61
Category of Phylogeny Tree
  • Distance Base
  • UPGMA
  • Neighbor Joining
  • Fitch-Margoliash
  • Minimum Evolution
  • Least Square

62
Category of Phylogeny Tree
  • Character Base
  • Maximum Parsimony
  • Maximum Likelihood

63
Maximum parsimony method
  • Requires the minimum number of mutational
    changes.
  • Pros
  • Not reduce all sequence information
  • Evaluate different tree topology
  • Cons
  • Slow for large data sets
  • Sensitive to unequal rates of evolution
  • Only give topology but no branch length

64
Steps in building maximum parsimony tree
  • Investigate all possible tree topologies
  • Reconstruct ancestral sequences
  • Choose topology with smallest number of steps

65
There are 5 substitutions
66
ACTGA
ATTGA
1
0
0
ATTGA
ATTGA
2
4
GTGGA
GTGAC
There are 7 substitutions
67
ACTGA
ATTGA
1
0
0
ATTGA
ATTGA
4
2
GTGAC
GTGGC
There are 7 substitutions
68
Maximum Parsimony Method
Branch and Bound !
69
Maximum Parsimony Method (Cont.)
70
Maximum Parsimony Method (Cont.)
71
Maximum Parsimony Method (Cont.)
72
Maximum Parsimony Method (Cont.)
73
Category of Phylogeny Tree
  • Character Base
  • Maximum Parsimony
  • Maximum Likelihood

74
Tree Evaluation
75
Tools Introduction
  • http//evolution.genetics.washington.edu/phylip/so
    ftware.html
  • Phylogenetic tree whole website
  • http//www.tigr.org/tigr-scripts/CMR2/webmum/mumpl
    ot
  • The Whole Genome Alignment Tool

76
PHYLIP
  • Phylogeny inference package(PHYLIP)
  • Consisting of about 30 programs that cover most
    aspects of p-tree analysis
  • Free and available for a wild variety of computer
    platforms. (dos?mac?unix)
  • A command line program without GUI.

77
Sequence data in FASTA format file
Input file in PHYLIP format
Options -K2P -Jin and Nei -max likeihood -Jukes-Ca
ntor
Options -PAM matrix -Kimura -categories model
Options -Neighbor Join -UPGMA -randomize input
order view trees with standard text editors
Option -outgroup -rooting treefile with Treetool
or TreeView
READSEQ
CONSENSE
DNADIST
PRODIST
NEIGHBOR
78
Sequence data in FASTA file
DNADIST Options -k2p -jin and Nei -max
likeihood -Jukes-Cantor
NEIGHBOR Option -Neighbor-join -UPGMA -randomize -
input ouderl
readseq
PROTDIST Option -PAM matrix -Kimura -categories
model
SEQBOOT Options -bootstrap -jackknife -permute
CONSENSE Option -outgroup -rooting
view tree with standard text editor
79
PHYLIP input file
80
PRODIST output
81
(No Transcript)
82
(No Transcript)
83
(No Transcript)
84
(No Transcript)
85
(No Transcript)
86
(No Transcript)
87
Other Reference In This Reporting
  • Molecular Evolution and Phylogenetics
  • -Masatoshi Nei and Sudhir Kumar
  • Phylogenetic analysis
  • -Caro-Beth.Stewart
  • Introduce to Bioinformatics
  • -Arther M. Lesk

88
THE END
  • Thank U for your audient
Write a Comment
User Comments (0)
About PowerShow.com