Title: Genome Rearrangements
1Genome Rearrangements
2Quick Review
- Looking at evolutionary change through reversals
- Find the shortest possible series of reversals
that transform gene A into gene B - It has been shown that this results in an NP-Hard
problem
3Oriented Blocks
1 2 3 4 5 1 2 5 4 3 1 2 5 3 4 5 2 1 3 4
4Unoriented Blocks
- Orientation of the blocks in the genomes is
unknown
2 1 3 7 5 4 8 6 1 2 3 4 5 6 7 8
5Definitions
- unoriented permutation - a mapping from
1,2,,n to a set L of n labels. - reversal reverses the order of a segment of
consecutive labels.
6Definitions (cont.)
- reversal distance if p1,p2,pt is a shortest
series of reversals such that ap1p2pt ß
, t is the reversal distance of a with
respect to ß, denoted by dß(a)
7Example 1
Figure below shows two chromosomes with
homologous blocks
2 1 3 7 5 4 8 6 1 2 3 4 5 6 7 8
- Assign labels 1 through 8 to the blocks in the
lower chromosome - Transfer the labels to the upper chromosome
giving equal labels to homologous blocks - We obtain a starting permutation in the upper
chromosome and our goal is to sort it into the
lower one, the identity
8Example 1 (cont.)
- 2 1 3 7 5 4 8 6
- 1 2 3 7 5 4 8 6
- 1 2 3 4 5 7 8 6
- 1 2 3 4 5 7 6 8
- 1 2 3 4 5 6 7 8
9Best Solution?
- How do we know that this is the shortest series
of reversals? - To decide what the reversal distance should be,
we look at the breakpoints
10Breakpoints
- A breakpoint of an unoriented permutation a is a
pair of labels adjacent in a but not in the
target. - In the case of the identity, this means adjacent
labels that are not consecutive.
11Example 2
- Assume the identity is the target
- Breakpoints with oriented blocks
- L 5 2 1 3 4 R
- Breakpoints with unoriented blocks
- L 5 2 1 3 4 R
12Example 2 (cont.)
- b(a) denotes the number of breakpoints of a
- a reversal can remove at most two breakpoints
hence d(a) gt ( b(a) / 2 ) where d(a) is
the reversal distance - using this rule, we see that d(a) gt 4 for the
above example
13Strips
- L 4 5 3 2 1 R
- If we have two adjacent labels that do not make a
breakpoint, they must be of the form - x(x1)
- or
- x(x-1)
14Strips (cont.)
- strip a sequence of consecutive labels
surrounded by breakpoints but with no internal
breakpoints - Two types of strips increasing decreasing
15Special Rules
- A single label surrounded by breakpoints is said
to be a strip that is both increasing and
decreasing - L and R are always considered part of an
increasing strip, even if they are by themselves - L and R are considered a single element for the
purpose of defining strips. If 0, 1, is a
strip and , n, n1 is a strip, we consider these
two sequences as a single strip. They are linked
by the common element L R.
16Example 3
- L 1 2 8 7 3 5 6 4 R
- Strips
- increasing (R,L,1,2) (5,6)
- decreasing (8,7)
- both (3) (4)
17Theorem 1
- If label k belongs to a decreasing strip and k -
1 belongs to an increasing strip, then there is a
reversal that removes at least one breakpoint - L 4 5 2 3 1 7 6 R
k
k-1
18Proof
- Labels k 1 and k must belong to different
strips, since only single elements are said to be
both increasing and decreasing. - The above statement implies that each one is the
last element in its strip (each is followed by a
breakpoint).
19Proof (cont.)
- Two possible schemes
- (k - 1) k
- k (k - 1)
- Performing a reversal on the area between the
breakpoints brings k and k-1 together, reducing
the number of breakpoints by at least one.
20Example 4
- L 4 5 2 3 1 7 6 R
- L 4 5 2 3 1 7 6 R
- L 4 5 6 7 1 3 2 R
- L 4 5 6 7 1 3 2 R
k-1
k
21Observations
- All permutations have at least one increasing
strip (L or R) - All permutations do not necessarily have a
decreasing strip - If there is a decreasing strip, the previous
proof shows that there is a breakpoint-removing
reversal
22Theorem 2
- If label k belongs to a decreasing strip and k
1 belongs to an increasing strip, then there is a
reversal that removes at least one breakpoint. - L 5 4 2 3 1 6 7 R
k
k1
23Proof
- Two possible schemes
- (k 1) k
- k (k 1)
- Performing a reversal on the area between the
breakpoints brings k and k1 together, reducing
the number of breakpoints by at least one.
24Example 5
- L 5 4 2 3 1 6 7 R
- L 5 4 2 3 1 6 7 R
- L 1 3 2 4 5 6 7 R
- L 1 3 2 4 5 6 7 R
k
k1
25The Result
- The two proofs just explained show that, as long
as we have decreasing strips, we can always
reduce the number of breakpoints. - Notice that this also applies to single-element
strips - What about when there are no decreasing strips?
26Theorem 3
- Let a be a permutation with a decreasing strip.
If all reversals that remove breakpoints from a
leave no decreasing strips, then there is a
reversal that removes two breakpoints from a.
27Proof
- Let k be the smallest label involved in a
decreasing strip. - p is the reversal uniting k and k - 1
- k 1 must be to the left of k, otherwise p
leaves a decreasing strip. - (k 1) k
28Proof (cont.)
- Let l be the largest label involved in a
decreasing strip. - s is the reversal uniting l and l 1
- l 1 must be to the right of l, otherwise s
leaves a decreasing strip - l (l 1)
29Proof (cont.)
- Observe that k must be inside the interval
reversed by s, otherwise s would leave k s
decreasing strip intact. - Likewise, l must belong to the interval of p
- (k 1) l k (l 1)
30Proof (cont.)
- (k 1) l k (l 1)
- We can see that p s must be true
- The reversal removes two breakpoints because k is
united with k 1 and l is united with l 1
31Example 6
- L 7 8 3 5 4 6 1 2 R
- Reversals that remove breakpoints
- L 7 8 3 5 4 6 1 2 R
- L 7 8 3 4 5 6 1 2 R
k-1
l 1
l
k
32Sorting a Permutation
- We can use an algorithm that sorts a permutation
using at most 2 d(a) reversals (that is, twice
as many reversals as the minimum possible) - Algorithm assumes that the target is the identity
(1,2,3,4.)
33General Idea
- A main loop looks at the current permutation and
selects the best possible reversal to apply - Update the current permutation and report the
reversal applied - The loop stops when the current permutation is
the identity
34Choosing the Reversal s
- If there is a decreasing strip, look for a
reversal that reduces the number of breakpoints
and leaves a decreasing strip. - If no such reversal exists, there is a reversal
that encompasses all the decreasing strips and
removes two breakpoints. - If there are no decreasing strips, select a
reversal that cuts two breakpoints.
35Sorting Algorithm
L 1 2 . 8 7 . 3 . 5 6 . 4 .
R list ? empty k ? 3 p ? (8 7 3) ap L 1 2
3 . 7 8 . 5 6 . 4 . R a ? ap list ? (8 7
3) k ? 4 p ? (7 8 5 6 4) ap L 1 2 3 4
. 6 5 . 8 7 . R a ? ap list ? (8 7 3), (7 8
5 6 4) k ? 5 p ? (6 5) ap L 1 2 3 4
5 6 . 8 7 . R a ? ap list ? (8 7 3), (7 8 5 6
4), (6 5) k ? 7 p ? (8 7) ap L 1 2 3 4
5 6 7 8 R a ? ap list ? (8 7 3), (7 8 5
6 4), (6 5), (8 7)
- Algorithm Sorting Unoriented Permutation
- input permutation a
- output series of reversals that sort a
- list ? empty
- while a ! I do
- if a has a decreasing strip then
- k ? smallest label in a decreasing strip
- p ? reversal that cuts after k and after
k-1 - if ap has no decreasing strip then
- l ? largest label in a decreasing
strip - p ? reversal that cuts before l and
before l1 - else
- p ? reversal that cuts the first two
breakpoints - a ? ap
- list ? listp
- return list
36Another Example
L . 2 1 . 3 . 7 . 5 4 . 8 . 6 . R
- list ? empty
- k ? 1
- p ? (2 1)
- ap L 1 2 3 . 7 . 5 4 . 8 . 6 . R
- a ? ap
- list ? (2 1)
- k ? 4
- p ? (7 5 4)
- ap L 1 2 3 4 5 . 7 8 . 6 . R
- a ? ap
- list ? (2 1), (7 5 4)
- k ? 6
- p ? (7 8 6)
- ap L 1 2 3 4 5 6 . 8 7 . R
- a ? ap
- list ? (2 1) , (7 5 4) , (7 8 6)
k ? 7 p ? (8 7) ap L 1 2 3 4 5 6
7 8 R list ? (2 1), (7 5 4), (7 8 6), (8 7)
37But is it Optimal?
- It has been shown
- d(a) gt ( b(a) / 2 )
- For the previous example
- b(a) 7
- d(a) gt 4
- Although the algorithm produces the optimal
result in this instance, it is not guaranteed to
do so. The algorithm may produce a list
containing more reversals than are actually
necessary to solve the problem.
38Theorem 4
- The number of iterations in algorithm Sorting
Unoriented Permutation is less than or equal to
the number of breakpoints in the initial
permutation
39Proof
- Must prove that, on average, each iteration
removes at least one breakpoint. - We can see this is true because the only time we
remove 0 breakpoints, is immediately after we
have removed 2, keeping the average of 1
breakpoint per iteration intact.