Title: Genome Rearrangements
1Genome Rearrangements
2Genome Rearrangements
- We have seen how differences in genes at the
sequence level can be used to infer evolutionary
relations among species - Differences in sequences in (one or more) genes
resulted from point mutations (insert, delete,
substitute) - These are not the only type of changes that can
occur in the genome
3Genome Rearrangements
- Repair of broken chromosomes is an important
process - Mistakes can occur, however
- Mistakes can also occur during crossover
- These mistakes cause changes in gene order
- A large piece of chromosome can be moved or
copied to another location - It can also move from one chromosome to another
- We call these movements genome rearrangments
4Crossover
5Chromosome Repair
6Genome Rearrangements
- These have important (usually fatal) consequences
for the organism and its evolution - Alignments do not capture genome rearrangments
- Two species may have nearly the same gene
sequences, but in a different order (why would
the two species then be different?)
7Genome Rearrangements
- We need some other way to compare entire genomes
(i.e. compare at a higher level) - Rather than simple point mutations a genome is
obtained from another by a number of a special
kind of rearrangements Reversals - Use the number of reversals needed to transform
one genome into another to measure evolutionary
distance
8The Method
- Use combinatorial optimization techniques in an
attempt to infer a most economical sequence of
rearrangement operations to account for
differences among the genomes - Compare with character-based methods for
phylogenetics (parsimony)
9Reversals
- Consider the genome of a species as a sequence of
blocks - A block is some sequence of the genome (possibly
containing more than one gene) transcribed as a
unit - Blocks are oriented since they can be transcribed
from either strand of DNA - Give homologous blocks the same label
10Reversals
- Relation between chloroplast genomes of alfalfa
and garden pea
11Reversals
- Reversal operation for oriented blocks
- Inverts the order of affected blocks and changes
their orientation (arrow) - Affects a contiguous segment of blocks
- What sequence of reversal operations could have
changed alfalfa into garden pea? - Would like to have a polynomial time algorithm to
find the shortest sequence
12(No Transcript)
13Genome Comparison vs. Gene Comparison
- In the late 1980s, J. Palmer and his colleagues
studied the mitochondrial genomes of cabbage and
turnips - The gene sequences are very similar (some genes
are 99 equal) - Gene order, however, differs dramatically
- Genome rearrangements are now considered to be a
common mode of molecular evolution
14Genome Comparison vs. Gene Comparison
- Extreme conservation of genes on X chromosomes
across mammalian species provides an opportunity
to study the evolutionary history of X chromosome
independently of the rest of the genomes - According to Ohnos law, the gene content of X
chromosome has barely changed throughout
mammalian development in the last 125 million
years. - However, the order of genes on X chromosomes has
been disrupted several times.
15Human and Mouse X Chromosomes
16Human and Mouse X Chromosomes
-4 -6 1 7
2 -3 5 8
3 2 7 -1
6 4 5 8
1 7 2 -3
6 4 5 8
1 2 7 -3
6 4 5 8
1
2 7 3
-6 4 5 8
1 2 -5 -4
-3 6 7 8
1 2 3 4
5 6 7 8
17Genome Comparison vs. Gene Comparison
- The traditional molecular evolutionary technique
is a gene comparison to construct a phylogenetic
tree - In the cabbage and turnip case this is hardly
suitable, since rate of point mutations in their
mitochondrial genes is so low that their genes
are almost identical - Genome comparison (i.e. comparison of gene
orders) is the method of choice in the case of
very slowly evolving genomes - Another area is the case where genomes evolve
very rapidly (genes not very similar)
18Genome Comparison
- Only about (178?39) genome rearrangements have
happened since human and mouse diverged 80
million years ago - Mouse and human genomes can be viewed as a
collection of about 200 fragments which are
shuffled in mice as compared to humans - A comparative mouse-human genetic map gives the
position of a human gene given the location of a
related mouse gene
19Man-Mouse Comparative Physical Map
20Definitions
- A signed permutation a over the set of labels L
1, 2, , n is a permutation such that a(i) a
or a, where a ÃŽ L - Example 3, 2, 1 is a signed permutation over
L 1, 2, 3 - Note that no label may appear twice in the
permutation - A reversal i,j is an operation that transforms
one signed permutation into another, reversing
the order or a contiguous portion and flipping
the signs
21Definitions
- a ai,j a(1), , a(i 1), a(j), , a(i),
a(j 1), , a(n) - We are interested in the problem of sorting by
reversals Given two signed permutations a and b,
find the minimum number of reversals r1, , rt
that will transform a into b - a r1rt b - The reversal distance db(a) t
22Definitions
- Note that the reversal operation does not
directly correspond to the biological operations
(inversion, translocation, fission, fusion) - Given a and b, can we always transform a into b
using only the reversal operation? If so, how
many reversals are required in the worst case?
23Breakpoints
- A breakpoint is a point between consecutive
labels in the initial permutation that must
necessarily be separated by at least one reversal
to reach the target permutation - The two consecutive labels are not consecutive in
the target, or their orientations are not the
same in a relative sense
24Breakpoints
- To formalize the idea of breakpoint, we introduce
the extended version of a - Let a a(1), , a(n)
- Then the extended version of a is (L, a(1), ,
a(n), R) - For example let extended a be (L, 2, 3, 1, 6,
5, 4, R) and let extended b be (L, 1, 2, 3,
4, 5, 6, R) - The breakpoints are (L,2), (2,3), (3,1),
(1,6), (6,5), (4,R)
25Breakpoints
- The number of breakpoints of a permutation a is
denoted by b(a) - In the example, 6
- Can you characterize the situations where L is
involved in a breakpoint? When R is involved in a
breakpoint?
26A Lower Bound
- A reversal can remove at most two breakpoints
- Cuts the permutation in exactly two places
- So, if ar1 rt b then
- b(a) b(ar1) 2
- b(ar1) b(ar1r2) 2
-
- b(ar1rt-1) b(ar1rt) 2
- So b(a) 2t. If t d(a), b(a)/2 d(a)
27Reality and Desire Diagram
- The lower bound found is not very tight
- We can derive a better l.b. based on a structure
called the reality-desire diagram of a
permutation with respect to another - To draw the diagram, we will represent a with
the tuple (-a a) and -a with the tuple (a -a) - The orientation is given by the rightmost member
of the tuple
28Reality and Desire Diagram
- A permutation is a sequence of adjacent tuples
- a 3, 2, 1, 4, 5 can be represented as
L---(3 3)---(2 2)---(1 1)---(4 4)---(5
5)---R - b L---(1 1)---(2 2)---(3 3)---(4
4)---(5 5)---R
29Reality and Desire Diagram
- Now we will draw a graph to represent a (L, 3,
-2, -1, 4, -5, R) - The reality diagram
30Reality and Desire Diagram
- Suppose that b is the identity (L, 1, 2, 3,
4, 5) - We will add desire edges to the previous graph to
represent b
L -3 3 2 -2 1 -1
-4 4 5 -5 R
31Reality and Desire Diagram
- a is the reality
- b is what is desired
- The diagram (a multigraph) shows both reality and
desire - Call it RD(a)
- We can rearrange the nodes of the graph to make
it easier to understand
32Reality and Desire Diagram
L
R
Desire
Reality
33Properties of RD(a)
- Each vertex has degree 2
- Each node is incident to one edge from A, the set
of reality edges, and B, the set of desire edges - The connected components of the graph are
alternating cycles (edges alternate between
reality - blue - and desire - red) - Each cycle has an even number of edges, half
reality and half desire
34Properties of RD(a)
- The number of cycles of RD(a) is denoted by cb(a)
- Note that cb(b) n 1 since b has no
breakpoints - All cycles are two parallel edges between the
same pair of nodes - We have 2n 2 nodes, so n 1 cycles
- This is the only permutation for which cb(a) 1
35Properties of RD(a)
- So transforming a into b can be seen as
transforming RD(a) into a graph with as many
cycles as possible - n 1 - Now we need to see how a reversal affects the
cycles of RD(a) - Note that a reversal is characterized by the two
points where it cuts the current permutation,
which each correspond to a reality edge
36Reversals and RD(a)
- Let r be a reversal defined by two reality edges
(s,t) and (u,v), then RD(ar) differs from RD(a)
as follows - Reality edges (s,t) and (u,v) are replaced by
(s,u) and (t,v) - Vertices u, , t are reversed
- Desire edges remain unchanged
- See example on following slide
37Example
Some nodes/edges omitted
L
L
R
R
38Orientation of Cycles
- How many cycles are affected by a reversal?
- First we define convergent and divergent edges
- Two reality edges on the same cycle converge if
they are traversed in the same direction
(clockwise or counterclockwise on the circle in
the diagram) on the cycle - Otherwise they diverge
39Orientation of Cycles
L
Convergent (3,2) (-1,-4) Divergent (L,-3)
(3,2)
R
40Reversals and Cycles
- Let r be a reversal acting on two reality edges e
and f - If e and f belong to different cycles, c(ar)
c(a) 1 - If e and f belong to the same cycle and converge,
c(ar) c(a) - If e and f belong to the same cycle and diverge,
c(ar) c(a) 1
41First Case
- If e and f belong to different cycles, c(ar)
c(a) 1
42Second Case
- If e and f belong to the same cycle and converge,
c(ar) c(a)
43Third Case
- If e and f belong to the same cycle and diverge,
c(ar) c(a) 1
44Reversals and Cycles
- Note that the number of cycles changes by at most
one with each reversal - Use that to find another lower bound for reversal
distance - Suppose we have ar1r2rt b we know that c(b)
n 1 and we have - c(ar1) - c(a) ? 1
- c(ar1r2) - c(ar1) ? 1
-
- c(ar1r2rt) - c(ar1r2rt-1) ? 1
- Adding and cancelling terms we get
- n 1 - c(a) ? t
- If r1r2rt is optimal then t d(a), n 1 - c(a)
? d(a)
45Interleaving Graph
- This new lower bound is better than the old one -
b(a)/2 - For most signed permutations, it is close to the
actual distance, however it does not always work
(we cant always choose two divergent edges) - We can classify the cycles of RD(a) as good or
bad - A cycle is good if it has two divergent reality
edges - Otherwise it is bad
46Interleaving Graph
- The classification only applies to proper cycles
(those with at least four edges) - Those with three edges dont need to be touched
since reality desire - If we have only good cycles in a permutation,
then the lower bound previously given is an
equality - We sort, increasing the number of cycles by one
per reversal
47Interleaving Graph
- If a desire edge from one cycle crosses some
desire edge from another cycle we say that the
two cycles interleave - Interleaved cycles allow us to change a bad cycle
into a good one while breaking another cycle - This good cycle can then broken in the next step
- To find interleaving cycles, we construct an
interleaving graph
48Interleaving Graph
49Interleaving Graph
- Nodes in the interleaving graph are cycles
- Edge between two nodes if the cycles interleave
- The connected components of the graph are called
bad components if they consist entirely of bad
cycles - Component otherwise is a good component
50Interleaving Graph
- What is the interleaving graph of the previous
example? - Suppose that F and C are good cycles.
- Which components of the interleaving graph are
good and which are bad?
51Sorting Good Components
- We need to choose two divergent edges in the same
cycle to define a reversal that increases the
number of cycles - Example
- A reversal characterized by two divergent edges
of the same cycle is a sorting reversal if and
only if it does not lead to the creation of bad
components
52Bad Components
- Using this criterion to sort all of the good
components, we must now sort the bad ones - Give a hierarchy of bad components
- We say a component B separates components A and C
if all chords in RD(a) that link a terminal in A
to a terminal in C cross a desire edge of B
53Diagram with no Good Components
54Bad Components
- Reversal through reality edges in different
components A and C will result in every component
B that separates A and C being twisted - A bad component becomes good when twisted
- A good component can stay good or become bad when
twisted - So twist only when no good components
55Hierarchy of Bad Components
- A hurdle is a bad component that does not
separate any other two bad components - If a bad component separates others, then it is a
nonhurdle - A hurdle A protects a nonhurdle B when removal of
A would cause B to become a hurdle - B is protected by A when every time B separates
two bad components, A is one of them
56Hierarchy of Bad Components
- A hurdle A is called a superhurdle if it protects
some other nonhurdle B - Otherwise it is called a simple hurdle
Bad Components
Nonhurdles
Hurdles
Simple hurdles
Super hurdles
57Fortress
- A signed permutation a is called a fortress iff
RD(a) has an odd number of hurdles and all of
them are super hurdles
58Reversal Distance
- The reversal distance of oriented permutations is
given by - d(a) n 1 - c(a) h(a) f(a)
- c(a) - number of cycles (proper and non)
- h(a) - number of hurdles
- f(a) - a a fortress? (1 else 0)
- n 1 - c(a) good components and bad components
which become good during sort - h(a) - bad components require extra reversal
- f(a) - extra reversal for fortress
59Algorithm
- If we dont have a good cycle we must use either
a reversal on two convergent edges or a reversal
on edges in different cycles - In first case, number of cycles is constant
- In second case, number of cycles decreases by one
- Choose case one on a hurdle
- Transforms bad component into good
- Number of cycles remains constant
60Algorithm
- Getting rid of a non-hurdle doesnt change the
number of hurdles or fortress status, so distance
remains the same - If we reverse a superhurdle, the nonhurdle it
protects becomes a hurdle so h remains constant - Call reversal on some cycle in a hurdle hurdle
cutting
61Algorithm
- In order not to increase f(a), use hurdle cutting
only when h(a) is odd - Using reversal on edges in two different cycles
increases c(a) - However d(a) will decrease if we can decrease
h(a) by two - Choose edges from two different hurdles - this is
called hurdle merging - The two hurdles as well as any nonhurdle
separating them become good components
62Algorithm
- We have to be careful that hurdle merging doesnt
transform a nonhurdle into a hurdle - A and B are called opposite hurdles when we find
the same number of hurdles walking the circle
clockwise from A to B as we do walking
counterclockwise - This can only happen if h(a) is even
- Choosing opposite hurdles, we dont turn a
nonhurdle into a hurdle
63Algorithm
- To avoid creating a fortress where we dont have
one, we choose the opposite hurdles when they
exist - If h(a) is odd and we have a simple hurdle, do
hurdle cutting to avoid fortress - If neither case if possible, we already have a
fortress so f(a) doesnt increase with any hurdle
merging
64Algorithm
Algorithm Sorting Reversal input distinct
permutations a and b output a sorting reversal
for a with target b if there is a good component
in RDb(a) then pick 2 divergent edges e and f
in this component, making sure the
corresponding reversal does not create any bad
components return the reversal characterized by
e and f else if h(a) is even then return
merging of two opposite hurdles else if
h(a) is odd and there is a simple hurdle
then return a reversal cutting this hurdle
else // fortress return merging of
any two hurdles