Title: Measuring genetic change
1Measuring genetic change
- Level 3 Molecular Evolution and Bioinformatics
- Jim Provan
Page and Holmes Section 5.2
2Types of substitution
3Types of substitution (continued)
- Multiple substitutions can greatly obscure actual
evolutionary history, particularly in cases where
there have been many mutations i.e. over long
evolutionary time scales - Final three examples have serious implications
for inference of evolutionary history - Similarity inherited from an ancestor is called
homology - Independently acquired similarity is called
homoplasy - All tree-building methods rely on sufficient
levels of homology
4Types of substitution (continued)
- Substitutions that exchange a purine for another
purine or a pyrimidine for another pyrimidine are
called transitions
- Substitutions that exchange a purine for a
pyrimidine or vice-versa are called transversions
5Measuring evolutionary change
- Simplest measure is to count number of different
sites - Poor measure
- Some sites may undergo repeated substitutions
- As sequences diverge, measure becomes less
accurate
- Saturation occurs - most sites changing have
changed before
6Correction of observed sequence differences
7A general framework of sequence evolution models
f fA fC fG fT
8The Jukes-Cantor (JC) model
- Assumes that all four bases have equal
frequencies and that all substitutions are
equally likely
f ¼ ¼ ¼ ¼
9Kimuras 2 parameter model (K2P)
- Takes into account different frequencies of
transitions vs. transversions
f ¼ ¼ ¼ ¼
10Felsenstein (1981) (F81)
- Takes into account differences in base
composition - Percentage (G C) can range from 25 - 75
- F81 model allows the frequencies of the four
nucleotides to be different - Does not allow for variation between genes/species
f ?A ?C ?G ?T
11Hasegawa, Kishino and Yano (1985) (HKY85)
- Essentially merges the K2P and F81 models to
allow transitions and transversions to occur at
different rates as well as allowing base
frequencies to vary
f ?A ?C ?G ?T
12General reversible model (REV)
- Most general model - each substitution has its
own probability
f ?A ?C ?G ?T
- By constraining a-f it is possible to generate
all the other models
13Comparing the models
JC ?A?C?G?T ??
HKY85 ?A??C??G??T ???
14Comparing the models (continued)
15Assumptions independence
- Assumes that change at one site has no effect on
other sites - Good example is in RNA stem-loop structures
- Substitution may result in mismatched bases and
decreased stem stability
- Compensatory change may occur to restore
Watson-Crick base pairing
16Assumptions base composition
- Assumption that base composition is at
equilibrium and that it is similar across all
taxa studied
- In example opposite, trees inferred using models
which do not allow for this will not group
Thermus and Deinococcus
17Assumptions variation in substitution rate
across sites
- All sites are not equally likely to undergo a
substitution
- Functional constraints
- Pseudogenes have lost all function and can evolve
freely - Fourfold degenerate sites do not change amino
acid composition of proteins - Non-degenerate sites are highly constrained
18Assumptions variation in substitution rate
across sites (continued)
- More rapidly evolving sequence shows most
divergence initially but soon saturates - Sequence A actually appears to be more rapidly
evolving