Title: Errors in sequence replication
1 Errors in sequence replication Shown to be an
important limitation for early life by Eigen
(1971) Treatment here follows Eigen, McCaskill
Schuster (1988)
Sequence space the space of all possible
sequences in which molecular evolution takes
place. e.g. all possible RNA sequences made of A
C G U. Master sequence a high fitness, fast
replicating sequence e.g. a self-replicating
ribozyme in the RNA world or a virus genome
in a modern cell Replication is not perfect
error (mutation) rate u per base - fidelity q
1-u. For sequence length L,
overall fidelity qL (prob of no mistakes) Can the
master sequence survive? Mutation versus
selection.
2Simplest sequence space is binary two symbols 1
and 0 hypercube shows mutational pathways
between sequence
Suppose 000000 represents the master sequence No.
1-mutant neighbours L No. 2-mutant nbs.
L(L-1)/2 etc. For large L, back mutation is
unlikely. Hamming distance measures how far apart
sequences are d(i,k) number of mutations
required to get from seq i to seq k
3Replicating sequences with errors general
equations Ai replication rate of sequence i Di
degradation rate Qii qL prob that copy of i
is also i (no mistakes) Qik prob that
replication of k produces i
- 4 for RNA, 2 for binary sequences
- Balance of accurate reproduction of i and
degradation Wii AiQii-Di - Rate at which i is produced by inaccurate
replication of k Wik AkQik
4xi relative conc of i mean excess production
The final term is required to maintain the total
concentration fixed. Think of it as dilution in a
chemostat or competition for resources.
Easy case sequence 0 is master sequence (better
than the rest) A0 All the others are the same
A1 (A1rate D
5forget back mutation when L 1
Error threshold occurs when x0 ?0 For master
sequence to survive, must have
6Example with L 50, A010, A11 error threshold
when q50 0.1 q 0.955 error rate u 1-q
0.045
7Implications of the error threshold 1
if L 1 and u
If L is fixed there is a maximum error rate at
which the master sequence survives. If u is fixed
there is a maximum length that can survive Max is
of order 1 mutation per whole genome replication.
Eigen paradox longer molecules will likely be
better replicators, but longer molecules require
better accuracy... How could first replicators
overcome this?
8Implications of the error threshold 2
Viruses are present as quasispecies mutations
occur within a patient. Difficult to get
effective drugs. RNA viruses have high mutation
rates u 10-4 - 10-6 per base DNA viruses
10-7 10-8 DNA genomes in cells 10-9
10-10 Viruses limited in size by mutation
rate Maybe high rate is advantageous too...
9(No Transcript)
10Bag of Genes model
Simulations by Gocmanac Higgs. Based on
Stochastic Corrector model of Szathmary
Demeter. Cell requires K (3) types of gene.
Alive if it has at least 1 of each. Lattice with
one cell per site. Pick a cell. Duplicate one
gene in the cell at random. If cell reaches max
size M, divide randomly. Put one cell in parent
position and one on a neighbouring site
(overgrowth/competition).
11Simplest Case
- M14
- Inviable cells (white)
- Viable cells (black)
- Fraction of viable cells increases with M.
- Whole population dies if M too small.
12Parasites / Selfish genes
- A parasite occasionally arises by mutation
- Parasites (pink) replicate faster within the cell
- Dont contribute to cell survival
- Increase likelihood of creating inviable cells
13Parasites created at low rate Black no
parasites Brown At least one parasite Pink
more than half the genes are parasites
M 14
M 40
clusters of infected cells are related
14Fraction of viable cells as a function of M
If M is heritable, cells evolve to an optimal
division size
15Add Horizontal Gene Transfer between
neighbours. Same parameters as M 40 example
before. Many more infected cells Many more
inviable cells Range of params in which
population survives is reduced (see previous
slide)
Infected cells (brown/pink) invade healthy cells
(black). Highly infected cells (pink) die leaving
empty space (white). Healthy cells grow into
empty space.
16Significance of horizontal gene
transfer Phylogenetic trees derived from
different genes not always the same Different
strains of same bacterial species differ in gene
content (e.g E. coli) HGT may have been
more frequent in the past. Maybe there is no
tree? (Doolittle) Alternative view there is a
core of genes that follows the organismal tree
and there is some HGT superimposed on this.
If HGT was very frequent in early evolution,
there would be no separate species. Communally
evolving network of cells (Woese). HGT aids the
spread of new beneficial genes but it also makes
parasites much more serious. Which is most
important?
17General problem in evol. biol. Evolution of
Cooperation
Prisoners dilemma Strategies C (cooperate) or D
(defect cheat) Stories (Prisoners Superpowers
Chimpanzees....)
You
Me
Matrix shows payoff to Me
T (temptation) R (reward) P (punishment) S
(sucker) Game theory logic says always defect. D
is a Nash equilibrium. Evolutionary game theory
strategies multiply in proportion to their
payoffs. Evolves towards 100 defectors. D is an
Evolutionarily Stable Strategy (ESS).
18Spatial Prisoners Dilemma (Nowak May,
1992) Each individual plays against its 8 lattice
neighbours and a copy of itself. Get score for
each individual. At next timestep, each site is
occupied by a copy of the best-scoring site in
its neighbourhood.
Clusters of Cs survive. Hooray! Cs have relatives
on neighbouring sites. Cs would die out if
randomly mixed.
Blue is C, was C Green is C, was D Red
is D, was D Yellow is D, was C.
19Q? experiments (Spiegelmann) Q? is a
bacteriophage virus RNA length 4000 Codes
for a replicase protein. Works in vitro.
allow to replicate, then transfer small amount to
next tube
add virus RNA
analyse sequences in last tube
Sequences evolve towards short lengths (200).
These are rapidly replicated but cannot function
as viruses. Function is lost because it is not
necessary. Spiegelmanns monster.
20Cooperation between replicators Szabo et al.
(2002)
Sequences of four letters A B C D A controls
template efficiency t(nA) B controls replicase
efficiency r(nB) C controls replicase fidelity
f(nC) D is dummy
Each site on lattice is empty or contains one
sequence. Pick a site at random. The sequence on
this site decays with a certain prob, or it is
replicated by a neighbour with a certain
prob. replication rate rt/n (t template
efficiency of the sequence being copied r
replicase eficiency of the neighbour n total
length of the sequence being copied).
21Fidelity for point mutations (like q in Eigen
model) is the f value of the neighbour. Also
allow insertions and deletions that change the
length. Sequences cannot copy themselves. Need to
cooperate with neighbours (reciprocal altruism).
Evolution toward longer sequences. nA, nB, nC all
selected but not nD
22Length distribution is bimodal. Long sequences
are cooperators. They have good fidelity,
replicase and template ability. Short sequences
are defectors selfish genes They mostly have
template ability only like the Q? monsters.
23Begin with previous case. Then turn on diffusion.
n goes down.
The spatial arrangement is important. Sequences
replicate neighbours. There are clusters of
cooperators. If you allow mixing (diffusion on
lattice) then defectors win. Total length goes
down. Efficient replicators cannot evolve.
24- We have seen two ways in which replicators might
cooperate and avoid being overrun by selfish
genes. - Compartments as in stochastic corrector
mechanism. Cells contain groups of moderate
numbers of sequences. Random segregation creates
variation between cells. Evolution selects those
with fewer parasites. - Spatial distribution as in lattice models.
Clusters of cooperators can
survive even if they would not survive in a
freely mixed system.
25Autocatalytic sets
f1, f2 ... are food molecules supplied from
outside a, b, c ... are other molecules made by
metabolism White circles are reactions. Dashed
arrows indicate catalysis Reflectively
autocatalytic each molecule can be built from
other molecules in the set and the food
molecules each formation reaction is catalyzed
by other molecules in the set. Constructively
autocatalytic the set can be built up
sequentially via catalyzed reactions starting
from the food set. definitions from Mossel and
Steel (2005) J. Theor. Biol.
26Kauffman Origins of Order (1993) Food set A and
B Consider all possible polymerization
reactions Suppose all reactions have some chance
of being catalyzed by another polymer (red
arrows). If you consider a large enough set, it
will always contain an autocatalytic
subset. Argues for metabolism first. The set is
autocatalytic, not a single polymer. No reaction
rates, concentrations or thermodynamics
27Lipid world model Segré and Lancet (2000)
Model for self-assembled lipid micelles/vesicles.
Molecules catalyze entry and exit of other
molecules into the assembly. Shows compositional
heredity and natural selection without a genome.