Intro to Phylogenetic Trees Lecture 5 - PowerPoint PPT Presentation

About This Presentation
Title:

Intro to Phylogenetic Trees Lecture 5

Description:

Elephant. 14. Types of Trees. A natural model to consider is that of rooted trees. Common ... Elephant. Falcon. Proposed root. 18. Type of Data. Distance-based ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 33
Provided by: Shlomo2
Category:

less

Transcript and Presenter's Notes

Title: Intro to Phylogenetic Trees Lecture 5


1
Intro to Phylogenetic TreesLecture 5
Sections 7.1, 7.2, in Durbin et al. Chapter 17 in
Gusfield Slides by Shlomo Moran. Slight
modifications by Benny Chor
2
Evolution
  • Evolution of new organisms is driven by
  • Diversity
  • Different individuals carry different variants of
    the same basic blue print
  • Mutations
  • The DNA sequence can be changed due to single
    base changes, deletion/insertion of DNA segments,
    etc.
  • Selection bias

3
The Tree of Life
Source Alberts et al
4
Tree of life- a better picture
Daprès Ernst Haeckel, 1891
5
Primate evolution
A phylogeny is a tree that describes the sequence
of speciation events that lead to the forming of
a set of current day species also called a
phylogenetic tree.
6
Historical Note
  • Until mid 1950s phylogenies were constructed by
    experts based on their opinion (subjective
    criteria)
  • Since then, focus on objective criteria for
    constructing phylogenetic trees
  • Thousands of articles in the last decades
  • Important for many aspects of biology
  • Classification
  • Understanding biological mechanisms

7
Morphological vs. Molecular
  • Classical phylogenetic analysis morphological
    features number of legs, lengths of legs, etc.
  • Modern biological methods allow to use molecular
    features
  • Gene sequences
  • Protein sequences
  • Analysis based on homologous sequences (e.g.,
    globins) in different species

8
Morphological topology
(Based on Mc Kenna and Bell, 1997)
Archonta
Ungulata
9
From sequences to a phylogenetic tree
Rat QEPGGLVVPPTDA Rabbit QEPGGMVVPPTDA Gorilla QE
PGGLVVPPTDA Cat REPGGLVVPPTEG
There are many possible types of sequences to use
(e.g. Mitochondrial vs Nuclear proteins).
10
Mitochondrial topology
(Based on Pupko et al.,)
11
Nuclear topology
(Based on Pupko et al. slide)
(tree by Madsenl)
12
Theory of Evolution
  • Basic idea
  • speciation events lead to creation of different
    species.
  • Speciation caused by physical separation into
    groups where different genetic variants become
    dominant
  • Any two species share a (possibly distant) common
    ancestor

13
Phylogenenetic trees
  • Leafs - current day species
  • Nodes - hypothetical most recent common ancestors
  • Edges length - time from one speciation to the
    next

14
Types of Trees
  • A natural model to consider is that of rooted
    trees

Common Ancestor
15
Types of trees
  • Unrooted tree represents the same phylogeny
    without the root node

Depending on the model, data from current day
species does not distinguish between different
placements of the root.
16
Rooted versus unrooted trees
Tree c
b
a
c
Represents all three rooted trees
17
Positioning Roots in Unrooted Trees
  • We can estimate the position of the root by
    introducing an outgroup
  • a set of species that are definitely distant from
    all the species of interest

Proposed root
Falcon
Aardvark
Bison
Chimp
Dog
Elephant
18
Type of Data
  • Distance-based
  • Input is a matrix of distances between species
  • Can be fraction of residue they disagree on, or
    alignment score between them, or
  • Character-based
  • Examine each character (e.g., residue) separately

19
Two Methods of Tree Construction
  • Distance- A weighted tree that realizes the
    distances between the objects.
  • Character Based A tree that optimizes an
    objective function based on all characters in
    input sequences (major methods are parsimony and
    likelihood).

We start with distance based methods, considering
the following question Given a set of species
(leaves in a supposed tree), and distances
between them construct a phylogeny which best
fits the distances.
20
Exact solution Additive sets
  • Given a set M of L objects with an LL distance
    matrix
  • d(i,i)0, and for i?j, d(i,j)gt0
  • d(i,j)d(j,i).
  • For all i,j,k it holds that d(i,k)
    d(i,j)d(j,k).
  • Can we construct a weighted tree which realizes
    these distances?

21
Additive sets (cont)
  • We say that the set M with L objects is additive
    if there is a tree T, L of its nodes correspond
    to the L objects, with positive weights on the
    edges, such that for all i,j, d(i,j) dT(i,j),
    the length of the path from i to j in T.
  • Note Sometimes the tree is required to be
    binary, and then the edge weights are required to
    be non-negative.

22
Three objects sets always additive
  • For L3 There is always a (unique) tree with one
    internal node.

Thus
23
How about four objects?
  • L4 Not all sets with 4 objects are additive
  • eg, there is no tree which realizes the below
    distances.

i j k l
i 0 2 2 2
j 0 2 2
k 0 3
l 0
24
The Four Points Condition
  • Theorem A set M of L objects is additive iff any
    subset of four objects can be labeled i,j,k,l so
    that
  • d(i,k) d(j,l) d(i,l) d(k,j) d(i,j)
    d(k,l)
  • We call i,j,k,l the split of i,j,k,l.

Proof Additivity ?4 Points Condition By the
figure...
25
4P Condition ?Additivity
  • Induction on the number of objects, L.
  • For L 3 the condition is empty and tree
    exists.
  • Consider L4.
  • B d(i,k) d(j,l) d(i,l) d(j,k) d(i,j)
    d(k,l) A

Let y (B A)/2 0. Then the tree should look
as follows We have to find the distances a,b, c
and f.
k
c
l
f
n
y
b
a
m
i
j
26
Tree construction for L4
  • Construct the tree by the given distances as
    follows
  • Construct a tree for i, j,k, with internal
    vertex m
  • Add vertex n ,d(m,n) y
  • Add edge (n,l), cfd(k,l)

l
k
f
f
f
f
c
Remains to prove d(i,l) dT(i,l) d(j,l)
dT(j,l)
n
n
n
n
y
b
j
m
a
i
27
Proof for L4
By the 4 points condition and the definition of
y d(i,l) d(i,j) d(k,l) 2y - d(k,j) a y
f dT(i,l) (the middle equality holds since
d(i,j), d(k,l) and d(k,j) are realized by the
tree) d(j,l) dT(j,l) is proved similarly.
28
Induction step for Lgt4
  • Remove Object L from the set
  • By induction, there is a tree, T, for
    1,2,,L-1.
  • For each pair of labeled nodes (i,j) in T, let
    aij, bij, cij be defined by the following figure

29
Induction step
  • Pick i and j that minimize cij.
  • T is constructed by adding L (and possibly mij)
    to T, as in the figure. Then d(i,L) dT(i,L)
    and d(j,L) dT(j,L)
  • Remains to prove For each k ? i,j d(k,L)
    dT(k,L).

30
Induction step (cont.)
  • Let k ?i,j be an arbitrary node in T, and let n
    be the branching point of k in the path from i to
    j.
  • By the minimality of cij , i,j,k,L is not a
    split of i,j,k,L. So assume WLOG that
    i,L,j,k is a
  • split of i,j, k,L.

31
Induction step (end)
  • Since i,L,j,k is a split, by the 4 points
    condition
  • d(L,k) d(i,k) d(L,j) - d(i,j)
  • d(i,k) dT(i,k) and d(i,j) dT(i,j) by
    induction, and
  • d(L,j) dT(L,j) by the construction.
  • Hence d(L,k) dT(L,k).
  • QED

32
Dangers of Paralogs
  • If we happen to consider genes 1A, 2B, and 3A of
    species 1,2,3, we get a wrong tree that does not
    represent the phylogeny of the host species of
    the given sequences because duplication does not
    create new species.

Gene Duplication
S
S
S
Speciation events
2B
1B
3A
3B
2A
1A
In the sequel we assume all given sequences are
orthologs.
Write a Comment
User Comments (0)
About PowerShow.com