Title: BASIC OUTLINE OF CLASS
1BASIC OUTLINE OF CLASS
- 1) Speculation and research on the origin of
life. - 2) The RNA world as an intermediary to the DNA
world generalities, history and current - A) What was the RNA world like
- B) Current three domain view
- C) LUCA
- 1) RNA to DNA, David Penny
- 2) Universal Proteins Woese/Olsen, Koonin
- 3) Phylogenetics Forterre
- What is missing in talking about RNA
gtDNAgtprotein? - 3) Genome Content and Architecture
- A) Size and the C paradox
- B) Types of DNA
- 4) Mutation
- A) types of changes
- B) rates and patterns
- 5) Phylogenetic Reconstruction
- A) Term, definitions and limits
- B) How to determine a phylogenetic tree
- C) Improvements and Extensions to Genome Trees
2A Phylogenetic tree is a hypothesis of how
current day, extant OTUs whatever they maybe,
evolved from a common ancestor.
- A hypothesis must be tested.
3Nomenclature of phylogenetic reconstruction
- Internal and external nodes
- Rooted and unrooted
- Scaled and unscaled
- Topology
- Branch lengths
4(No Transcript)
5(No Transcript)
6(No Transcript)
7 Now that you HAVE a multiple alignment of a
biosequence. Count the differences between each
pair, distance or similarity. Weight the
differences according to a specific
model? Compare the values. Now reconstruct the
history of the biosequences WHAT ? HOW?
8(No Transcript)
9(No Transcript)
10(No Transcript)
11- Phylogenetic trees are
- Monophyletic all taxon originate from a
- common ancestor and said grouping includes
- the ancestor and all descendents
- 2) Paraphyletic all taxon originate from a
- common ancestor and said grouping includes
- the ancestor but not all descendents
- 3) Polyphyletic does not include the
- common ancestor in the group
12There are two basic data types
- Character data
- B) Distance data
13 Character Data Char data
provide information about a OTUs 1) can
assume one of two or more mutually exclusive
states example independent chars
length of nose 2) quantitative chars are
continuous measured on a continuous scale
3) qualitative chars are discrete and can be
assigned two or more values a) binaryonly one
of two char states are possible b) multistate
when three or more states are possibleexample
in sequences qualitative multistate chars are
the positions in the sequence while the actual
nucleotide(1/4) or AA(1/20) are the char states
14Explicit assumptions about char data
evolutiona) the number of discrete steps
required to change one char into anotherb) the
probability with which a change may occurc)
chars are unordered if changing from one char to
another takes only one stepexample nucleotides
takes only one step to change from any of the
fourd) chars are ordered if it takes
intermediate steps to change one into anothere)
chars are partially ordered when the number of
steps varies for different pairwise
combinations of chars states example amino
acids f) most discrete characters in molecular
evolution are reversibleg) plesiomorphy is a
primitive or ancestral char stateh) apomorphy is
a derived char state that is evolutionary novel
compared to the ancestral statei) homoplasya
char that has arisen independently in several
lineages
15 Distance Data Distance data
provide quantitative information about the
similarity /dissimilarity 1) distance data
cannot be converted to char data but char data
can be converted into distance data
a sequence string, which is char
data is not of use BUT 2) by providing some
measure of difference/similarity between pairs
of sequences many methods exist to use
these values to infer phylogenetic relationships
16- Assumptions of the models
- Changes in different copies of genes are
independent. - Changes at each site are independent.
- All sites changes at the same rate.
- All bases are equally frequent.
- REALITY
- 1) The same gene in different organisms in the
same environment - may well change in a similar manner (parallelism)
- 2) Gene products are three dimensional objects
with both short and long range - interactions. Some sites certainly do not change
independently o other. - 3) Functionally important sites change are more
highly constrained, therefore, - not all sites change at the same rate,
- 4) Bases are not equally distributed in the
genome.
17- Basic Phylogenetic Inference Methods
- 1) In distance matrix methods evolutionary
distance is computed by counting all the - nucleic acid or protein substitutions for all
pairwise relationships of multiple alignment.
UPGMA, - (unweighted/weighted/ pair group method with
arithmetic means, and various offshoots), employs
- a sequential clustering algorithm. The distance
values are identified in the order of similarity
and - a tree is built in a stepwise manner. The most
closely related sequences are joined by a node
and - then the next most closely related sequences are
added, etc. As the connected set of sequences - accumulate they are treated as a composite set.
- 2) Maximum parsimony methods try to find the most
efficient path between two evolutionary - states. This approach is based on finding the
minimum number of mutations to explain the - differences among the sequences. An initial tree
topology is specified and each position in - sequence examined in support of each tree. All
reasonable topologies are examined until a tree - with minimal numbers of changes is chosen and
designated the best tree. - 3) Maximum likelihood is similar to the maximum
parsimony approach in that it checks every - reasonable tree topology and examines the support
for each tree by every sequence. The best - tree is the one which maximizes the probability
of the sequences having been generated by
18Evolutionary Informative Sites
For example to infer an MP tree 1) identify all
informative sites 2) sum the changes over all
inform sites for each trees 3) the tree with the
fewest changes to account for the observed data
is the MP tree
19 Rooting trees1) out-group
must not be too distant or distance estimates
will be unreliableexample if mammals then use
marsupials as an out-group birds can only be
used if gene is highly conserved2) out-group
must have an external measure that guarantees
that it is really an out-group3) multiple out
groups can increase reliability of distance
estimate if they are not too far
20 Assessing tree reliabilityTrees are
statistically inferred, therefore, trees and
component parts should be assessed in a
statistical mannerThere are two basic
questions a) Which parts of this tree are
statistically robust ? b) Is this tree
statistically better than other trees?
21 Problems in phylogenetic
reconstruction strengths and
weaknessesa) UPGMA type methods rate
consistency should holdb) additive methods
(neighbor-joining) are not good for multiple hits
or very distant relationshipsc)
MP are best because these methods only calculate
shortest path downside for distance sequences
parallelism will be under-
estimated if rates vary significantly
these methods are not robust due to the
long-branch attraction which is random
similarity due to long periods of divergence
among some members of a clade
22(No Transcript)