TuTh - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

TuTh

Description:

not all sites change at the same rate, 4) Bases are not equally distributed in the genome. ... This issue has been attributed to internal branch lengths that ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 29
Provided by: Biol2
Category:
Tags: tuth | attribute

less

Transcript and Presenter's Notes

Title: TuTh


1
Fall 2009 MB437/537 3credits Molecular
EvolutionADVANCES IN Molecular Evolution
What ARE THE latest theories on the Origins of
life? What are genome Sequencing
projects Teaching us about evolutionary
Complexity? What are the Bioethical implications
of Your Future Research?
LUCA
From the Big Bang to Bioinformatics and Beyond
  • Tu/Th
  • 1100 AM - 1215 PM
  • LEWIS HALL 110

Teach Evolution! Learn Science!
Professor Marcie McClure marsmcclure_at_gmail.com
2
MOLECULAR EVOLUTION MB437
ADVANCES IN MOLECULAR EVOLUTION
MB537   SYLLABUS   Lecture 1 9/1/09
Comments. Organization Introduction Lecture 2
9/3/09 Evolution the Big Picture Lecture
3 9/8/09 The BIG BANG and formation of the
elements necessary for life. Lecture 4
9/10/09 Biogenesis I The primitive earth and
the prebiotic soup. Lecture 5 9/15/09
Biogenesis II Self-assembly, Energetics and the
Protocell. Lecture 6 917/09 Biogenesis III
More on protocelluar formation. Lecture 7
9/22/09 Biogenesis IV Protein or Nucleic Acids
first? RNA or DNA? Lecture 8 9/24/09
The RNA world the three Domains of life and
LUCA or LUCC. Lecture 9
9/29/09 Origin of the Genetic Code and more
on LUCC Lecture 10 10/01/09 Last Day of LUCA
begin Genomes Content and Architecture Chap 8
10/6/09 open
discussion Lecture 11 10/8/09 Mutation
nucleotide substitutions and amino acid
replacements. Chap 1 3 Lecture 12
10/13/09 Methods Analyzing sequences
rates/patterns. Chap 1,
3-4 Lecture 13 10/15/09 Molecular Clock and
Molecular Phylogeny I History Chap 5 Lecture 14
10/20/09 Molecular Phylogeny I terms,
definitions, and limits. Chap 5 Lecture 15
10/22/09 Molecular Phylogeny II Determining a
phylogenetic tree and Bayesian trees Lecture 16
10/27/09 Molecular Phylogeny III The dance of
the Genome and Genome Trees. Lecture 17 10/29/09
Deviation from Tree-like behavior horizontal
transmission of information. Lecture 19 11/3/09
EXAM 11/5/09
Convergent Evolution the antifreeze
story. Lecture 20 11/10/09 Evolution of
Viruses. Lecture 21 11/12/09 Retroid
Agents eukaryotic hosts and disease
states. Lecture 22 11/17/09 Do viral RNA
polymerases share ancestry? Lecture 23 11/19/09
Bioethics of the Human Genome Project/
Introduction to Bioinformatics. Lecture 24
11/24/09 open discussion 11/25-27/09 THANKSGIV
ING HOLIDAY Lecture 26 12/1/09 Death Lecture
27 12/3/09 Prion Lecture 28
12/8/09 Flu Lecture 29 12/9/09 Carbon
BIG BANG PRIMORDIAL SOUP LUCA BASIC
MOLECULAR EVOLUTION ANALYSIS SPECIAL
TOPICS BIOETHICS
3
Basic Phylogenetic Inference Methods
  • 1) In distance matrix methods evolutionary
    distance is computed by counting all the
  • nucleic acid or protein substitutions for all
    pairwise relationships of multiple alignment.
    UPGMA,
  • (unweighted/weighted/ pair group method with
    arithmetic means, and various offshoots), employs
  • a sequential clustering algorithm. The distance
    values are identified in the order of similarity
    and
  • a tree is built in a stepwise manner. The most
    closely related sequences are joined by a node
    and
  • then the next most closely related sequences are
    added, etc. As the connected set of sequences
  • accumulate they are treated as a composite set.

2) Maximum parsimony methods try to find the most
efficient path between two evolutionary states.
This approach is based on finding the minimum
number of mutations to explain the differences
among the sequences. An initial tree topology is
specified and each position in sequence examined
in support of each tree. All reasonable
topologies are examined until a tree with
minimal numbers of changes is chosen and
designated the best tree.
3) Maximum likelihood is similar to the maximum
parsimony approach in that it checks every
reasonable tree topology and examines the
support for each tree by every sequence. The
best tree is the one which maximizes the
probability of the sequences having been
generated by the pathway specified by th tree.
This is computationally intensive method that is
supported by the PHYLIP package of Felsenstein.
4
Strengths and weaknesses of these approaches.
  • UPGMArate consistency should hold.

b) Neighbor-joining (additive method) not good
for multiple hits or very distant relationships.
 c) MP are best because this method only
calculates shortest path.
  • For distance sequences parallelism will be
    under-estimated.

2) If rates vary significantly these methods are
not robust.
5
Rooting trees
  • out-group must not be too distant or distance
    estimates
  • will be unreliable

example if mammals then use marsupials as an
out-group birds can only be used if gene is
highly conserved
2) out-group must have an external measure that
guarantees that it is really an out-group
3) Multiple groups can increase reliability of
distance estimate if they are not too
far out
6
A Phylogenetic tree is a Hypothesis of how
current day, extant OTUs whatever they
maybe, evolved from a common ancestor.
  • A hypothesis must be tested.

7
Assessing tree reliability
There are two basic questions
a) Which parts of this tree are statistically
robust ?
b) Is this tree statistically better than other
trees?
Trees are statistically inferred, therefore,
trees and component parts should be assessed in
a statistical manner
What is bootstrapping versus jack knifing?
8
Phylogenetic Reconstruction 1) Rates and
patterns 2) Terms, definitions and limits 3)
How to determine a phylogenetic tree 4)
Improvements to Trees
9
Problems in Phylogenetic Reconstruction
1) Lack of significant phylogenetic signal
2) Unequal rates of change
3) The well known long branches attract LBA
4) Lack of accounting for the co-evolution of
sites
5) Multiple genomic and gene duplications and
subsequent loss
6) Multiple horizontal exchanges of genetic
information
10
Some ways to improve the accuracy of
reconstruction.
1) Increasing sample size, i.e., addition of more
taxa
a) evolutionary rates--if very low or high
adding taxa more will not help
b) if short sequence lengths more will help
___c) topology to be reconstructed
i. sparsely sampled ML trees suffer from LBA
additional taxa help with this problem
ii. Parsimony is less efficient than ML in
improving topology with the use of additional taxa
2) Increasing sequence length---if sequences are
shorter than 500 bases the benefits of
increasing seq length out weight those of adding
more taxa using any method developed thus far
3) Incorporating knowledge of co-evolving sites
11
Bottom line from simulation work thru 2002
Doubling the number of randomly sampled taxa (33
to 66) caused the same reduction in phylogenetic
error as doubling the sequence length (1K to 2K)
as long as the rates of change/site were not
extremely low or high.
12
Software for nucleic acid/protein substitution
analysis Felsensteinss PHYLIP and Kumars MEGA
Can make trees by parsimony, distance matrix,
and maximum likelihood
  • To overcome the computation intractability of
    exhaustive searches from more than 9 OTUs several
    approaches are available
  • e.g., branch-and-bound, stepwise addition, and
    branch swapping.
  • Tree topologies are routinely evaluated by
    bootstrapping.
  • --this provides statistical confidence or lack
    there of for the tree topology.

 3) When more than one highly probable tree is
observed, a consensus tree can be constructed.
13
Assumptions of the models
  • Changes in different copies of genes are
    independent.
  • Changes at each site are independent.
  • All sites changes at the same rate.
  • All bases are equally frequent.

REALITY
1) The same gene in different organisms in the
same environment may well change in a similar
manner (parallelism)
2) Gene products are three dimensional objects
with both short and long range interactions.
Some sites certainly do not change independently
of others.
3) Functionally important sites change are more
highly constrained, therefore, not all sites
change at the same rate,
4) Bases are not equally distributed in the
genome.
Even though these inconsistencies exist, for the
most part, these models work rather well for
sequences which have not diverged too much.
14
Problems in Phylogenetic Reconstruction
1) Lack of significant phylogenetic signal
2) Unequal rates of change
3) The well known long branches attract LBA
4) Lack of accounting for the co-evolution of
sites
5) Multiple genomic and gene duplications and
subsequent loss
6) Multiple horizontal exchanges of genetic
information
15
(No Transcript)
16
What is a phylogenetic HMM?
A Bayesian inference of phylogenetic
reconstruction.
What is Bayesian Inference?
Bayesian inference has always been controversial.
17
(No Transcript)
18
Posterior likelihood x prior marginal
likelihood
p (DH) p(HD)pD/pH
H the hypothesis
D the data
19
Bayesian Inference is a different way of thinking
about probability.
Bayesian inference is a subjective interpretation
of probability.
When the probability of an occurrence is unknown,
an opinion can be expressed about what is unknown
as a prior probability.
What is a prior probability? It is the
probability distribution of the proportions of
value on the believe that an observer has
without knowledge of data.
After observing data, then one can alter an
opinion about the values assigned in the prior
probability. This new probability distribution,
called the the posterior distribution, is
calculated by Bayes' rule.
All of the observer's knowledge about the prior
distribution is contained in the posterior
distribution, and statistical inferences are made
by summarizing this distribution.
Bayes rule turns prior probabilities into
posterior probabilities. Posterior probabilities
have some observation about the data in them.
So what is so controversial about Bayesian
inference?
20
There is no agreement on what proportion of value
should be placed on believes and opinions about
unknown events.
Furthermore, there is the issue of whether or not
a prior probability on an unknown event can even
exist.
This is a philosophical question not a scientific
one.
21
The Bayesian approach to phylogenetic
reconstruction tries to fit the data to a model
but it uses a prior distribution of the values
of what is believed to be a good tree.
What are we trying to to infer here?
22
The problem of so-called Phylo-HMMs is that they
have to assume a prior distribution of what the
tree is without any information about it.
A Phylo-HMM makes a phylogeny for each column of
a multiple alignment based on the prior
knowledge of the relationship of the nucleotides
of the previous column.
An exact algorithm to do this calculation is
computational intractable. So implementations of
Phylo-HMMS must use approximate approaches.
Even these approximate Phylo-HMMS are very
computational intensive.
23
A Phylo-HMM does not assume a constant rate of
change.
So All of this boils down to whether or not
Bayesian or Phylo-HMMs perform better than
classic methods
24
Bayesian tree of Retroposons
Comparing BI to NJ
Is this a fair comparison?
25
Preliminary observations of consensus trees
generated with a mixed amino acid model and eight
category gamma distribution rate produced high
posterior probabilities with a number of
incorrect internodes even after 100,000s of
iterations and apparent convergence.
Biologically Impossible Relationships
This issue has been attributed to internal branch
lengths that are too close for MrBayes3.1 to
correctly resolve.
These results are in contrast to the other
methods (NJ, ME, MP and UPGMA), which give
biologically supported internodes with lower
bootstrap support values.
The MrBayes3.1 documentation suggests the use of
topology constraints that allow the incorporation
of prior biological knowledge regarding those
highly related sequences that should branch
together.
26
82
Outgroup clade
27
How is a tree topology inconsistent with Biology?
Inconsistent with the fossil record
Oppswhat about horizontal transfer?
Other trees
What do we know that can validate the topology of
a tree?
Data
Percent Identity
Similarity or Distance scores
28
WHY DO YOU WANT TO RECONSTRUCT THE EVOLUTIONAY
HISTORY OF YOUR SEQUENCES?
A phylogenetic graph is a great way to convey
information
Best approximation we have
Write a Comment
User Comments (0)
About PowerShow.com