Genome Rearrangement SORTING BY REVERSALS - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Genome Rearrangement SORTING BY REVERSALS

Description:

... Future work. Genome ... A Linear-Time Algorithm for Computing Inversion ... Every resersal can eliminate at most 2 breakpoints {Shamir, 95} Outline ... – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 48
Provided by: joni8
Category:

less

Transcript and Presenter's Notes

Title: Genome Rearrangement SORTING BY REVERSALS


1
Genome RearrangementSORTING BY REVERSALS
Ankur Jain Hoda Mokhtar CS290I SPRING 2003
2
Comparative Genomics
The practice of analyzing and comparing the
genetic material of different species for the
purpose of studying evolution, the function of
genes and inherited diseases. Chromosome
breakage and mistakes in repair, along with a
number of other processes, give rise to changes
in gene order.  These have important
consequences for the evolution of species. 
3
Problem Definition
  • During biological evolution, inter- and
    intra-chromosomal  exchanges of chromosomal
    fragments disrupt the order of genes on a 
    chromosome.
  • The genome rearrangements approach, is the use
    of  combinatorial optimization techniques, to
    infer a sequence of rearrangement events to
    account for the differences among the  genomes. 

4
Outline
  • Problem definition
  • Genome Comparison
  • Possible chromosomal changes
  • Sorting by reversals - Previous work
  • - Definitions
  • - Duality Theorem
  • Our technique - Bit Vector Method
  • - Experimental results - Synthetic
    datasets
  • - Real datasets
  • - Breakpoints Technique
  • Conclusions and Future work

5
Genome Comparison
  • In the late 1980 was discovered remarkable and
    novel
  • pattern of evolutionary change in plant
    organelles.
  • Jeffrey Palmer and his collegues compared the
    mitochondrial genomes of cabbage and turnip,
    which are very closely related. Molecules which
    are almost identical in gene sequences, differ
    dramatically in gene order. Sridhar, Pevzner
    1995
  • This discovery and many other studies proved that
    genome rearrangements represent a common mode of
    molecular evolution.

6
Cabbage and Turnip
Gene orientation
7
Single Chromosome Operations
  • Reversal A section of a chromosome is excised,
    reversed in orientation, and re-inserted.
  • (abc1c2c3c4de -gt ab-c4-c3-c2-c1de)
  • Transposition A section of a chromosome is
    excised and inserted at new position in the
    chromosome, without changing orientation. (abcd
    -gt cdab)
  • Inverted transposition Exactly like
    transposition, except that the transposed segment
    changes orientation. (abcd -gt -c-dab)
  • Gene duplication A section of a chromosome is
    duplicated, so that multiple copies exist of
    every gene in that section.
  • (abc -gt abcb, abc -gt abbc)
  • Gene loss A section of a chromosome is excised
    and lost.
  • (abc-gtac )

8
Operations on 2 Chromosomes
  • Translocation The end of one chromosome is
    broken and attached to the end of another
    chromosome.
  • Fusion two chromosomes merge.
  • Fission one chromosome splits up into two
    chromosomes.

9
Genomic Sorting Problem
  • Given genomes the genomic
    sorting problem is to find a series of reversals
    where and
    t is minimal.
  • We call t the genomic distance between

and
10
Sorting by Reversals
  • Genome rearrangements can be modelled by a
    combinatorical problem of sorting by reversals.

11
Sorting by Reversals (Cont.)
Minimum Sorting by Reversals Given a permutation
?, what is the shortest sequence (?1?2.?t ) of
reversals that sorts ? ?Complexity remains
open. (NP-Hard) Caprara 97 Minimum Signed
Sorting by Reversals Given a signed permutation
?, what is the shortest sequence (?1?2.?t ) of
reversals that sorts ?? ?Solvable in polynomial
time.
12
Sorting of Signed Permutations
  • Transforming cabbage into turnip. Hannenhalli,
    S., and Pevzner, P. 95 - Polynomial algorithm
    for sorting signed permutations by reversals
  • A Very Elementray Presentation of the
    Hannenhalli-Pevzner Theory, A. Bergeron95
    Polynomial algorithm for sorting signed
    permutations, efficiently implemented using bit
    vectors.
  • A Very Elementray Presentation of the
    Hannenhalli-Pevzner Theory, A. Bergeron95
    Polynomial algorithm for sorting signed
    permutations, efficiently implemented using bit
    vectors.
  • Experiments in Computing Sequences of Reversals,
    A. Bergeron and F. Strasbourg95 Polynomial
    algorithm for sorting signed permutations.
  • Fast Sorting by Reversal, Berman, P.,
    Hannenhalli, S. 96. - exploit a few
    combinatorial properties of the cycle graph of a
    permutation and provided a polynomial algorithm.
  • A Faster and Simpler Algorithm for Sorting Signed
    Permutations by Reversals, Kaplan, H., Shamir,
    R., and Tarjan, R. 99. O(n2) using hurdles,
    cycles and fortress.
  • A Linear-Time Algorithm for Computing Inversion
    Distance between Signed Permutations with an
    Experimental Study, Moret, and Yan 00 -
    Computes reversal distance (without actually
    sorting) in O(n) time. Computes the connected
    components using stack rather than Union-Find.
    Hannenhalli-Pevzner 96 (GRAPPA program)

13
Outline
  • Problem definition
  • Genome Comparison
  • Possible chromosomal changes
  • Sorting by reversals - Previous work
  • - Definitions
  • Our technique - Bit Vector Method
  • Experimental results - Synthetic
    datasets
  • - Real datasets
  • - Breakpoints Technique
  • Conclusions and Future work

14
What is a Permutation?
  • Permutation (?) an ordered arrangement of the
    set 1,2,,n
  • Signed Permutation (?) a permutation where the
    elements are oriented a reversal switches element
    orientation
  • 3 -4 7 -6 1 -5 2
  • ?(7,-5) 3 -4 5 -1 6 -7 2

15
BreakPoint
  • Let i j if i j 1. Extend
    permutation

    by adding
    0 and n 1.
  • We call pair of elements ,
    0 i n,
  • of an adjacency if
  • and a breakpoint if is not (
    )

0
n1


16
What is breakpoint graph?
The breakpoint graph of a permutation
is a
edge-colored graph
with 2n2 vertices
by a black edge
We join vertices
and
and
by a gray edge if
We join vertices
17
Breakpoint graph signed case
Straight edges every other pair of consecutive
elements Curved edges - every other pair of
consecutive integers
Every connected component of the graph is a cycle
18
Correlation between the breakpoints and reversal
distance
  • Correlations exists between the reversal distance
    and the number of breakpoints
  • Sorting by reversals corresponds to eliminating
    breakpoints
  • Every resersal can eliminate at most 2
    breakpoints

Shamir, 95
19
Outline
  • Problem definition
  • Genome Comparison
  • Possible chromosomal changes
  • Sorting by reversals - Previous work
  • - Definitions
  • - Duality Theorem (Hurdles !!)
  • Our technique - Vector-Method
  • Experimental results - Synthetic
    datasets
  • - Real datasets
  • -Breakpoints Technique
  • Conclusions and Future work

20
Hurdle
Hurdle - an unoriented component whose elements
are consecutive Simple hurdle - a hurdle whose
deletion decreases the number of hurdles Super
hurdles - hurdles that are not simple
21
Duality Theorem for Sorting Signed Permutations
Hannenhalli and Pevzner, 1995.
For every signed permutation
if
is a fortress

otherwise
22
Safe reversal
C3, h1
C 5, h 2
23
Outline
  • Problem definition
  • Genome Comparison
  • Possible chromosomal changes
  • Sorting by reversals - Previous work
  • - Definitions
  • - Duality Theorem (Hurdles !!)
  • Our technique - Bit Vector Method
  • Experimental results - Synthetic
    datasets
  • - Real datasets
  • - Breakpoints Technique
  • Conclusions and Future work

24
Our Approach
  • Finding hurdles and fortresses in a graph are
    difficult and expensive Kaplan, H., Shamir, R.,
    and Tarjan, R. 99.
  • Use oriented sort to remove the oriented
    components in a graph and then apply the
    breakpoint approach to perform the remaining
    reversals
  • We used the bit-vector approach to perform the
    oriented sort

25
Oriented Sort
  • Choose among the several candidates, a
  • safe reversal, that is a reversal that decreases
    the reversal distance.
  • Theorem The reversal that maximizes the number
    of oriented vertices is safe A. Bergeron95

26
Basic Sorting oriented pair
  • An oriented pair is a pair of consecutive
    integers, that is
  • with opposite signs
  • Example
  • (0 3 1 6 5 -2 -4 7)
  • Oriented pairs are (1,-2) , (3, -4)

27
Reversal score
The number of oriented pairs in the resulting
permutation as a result of a reversal Example
( 0 3 1 6 5 -2 4 7 )
( 0 3 1 6 5 -2 4 7 )
( 0 3 1 6 5 -2 4 7 )
(1, -2)
(3, -2)
( 0 -5 -6 -1 -3 -2 4 7 ) ( 0 3 1 2
-5 -6 4 7 )
Score 4
Score 2
28
Algorithm
  • As long as has an oriented pair choose the
    oriented reversal that has maximal score
  • (0 3 1 6 5 2 4 7)

( 0 -5 -6 -1 -3 -2 4 7 ) (-3, 4)
( 0 -5 -6 -1 2 3 4 7 ) (-1,2)
( 0 -5 -6 1 2 3 4 7 ) (-6,7)
( 0 -5 -4 -3 -2 -1 6 7 ) (-5,6)
( 0 1 2 3 4 5 6 7 )
29
Oriented edge
Let
be a gray edge incident to
black edges
and
Then
.
i k j - l
is oriented if and only if
Edge 20-21 is oriented (contains 3 odd number
of vertices). I 20, j21, k22, l23 I-k -2
j-l -2
Bergeron
Pevzner
30
Oriented reversals
  • Reversals induces by an oriented pair will be

, and
, if
, if
Reversals that create consecutive integers are
always induced by oriented pairs. Such
reversals are called oriented reversal.
Example The pair (1, -2) induces the
reversal (0 3 1 6 5 2 4 7) (0 3 1 2 5 6 4
7)
31
Interleaving Graph
C
Every 2 components are adjacent if there is an
overlap between them but neither of them
contains the other.
32
Constructing the Bit Matrix
Consider the sequence P 3 1 6 5 2 4
7 Represent Pi by
2i-1, 2i if Pi is ve and
2i, 2i-1 otherwise Pi is -ve 3 1
6 5 -2 4 7 0 5
6 1 2 11 12 9 10 4 3 7 8 13 14 15
33
The Algorithm
Step 1. Select the vertex vi with the maximum
score and perform the these operations until we
reach a situation when parity of all the vertices
is zero Step 2. If the sequence is not sorted
completely apply the breakpoint technique to
complete the sorting
34
Outline
  • Problem definition
  • Genome Comparison
  • Possible chromosomal changes
  • Sorting by reversals - Previous work
  • - Definitions
  • - Duality Theorem (Hurdles !!)
  • Our technique - Bit Vector Method
  • Experimental results - Synthetic
    datasets
  • - Real datasets
  • - Breakpoints Technique
  • Conclusions and Future work

35
Experimental Settings
  • 1- Synthetic Datasets
  • generated random signed permutation of different
    lengths and evolution rate using GRIMM
    permutation generation module
  • 2- Real Datasets
  • Used GRAPPA test sets for different species of
    Campanulaceae (flower plant)
  • MGR (multiple genome rearrangement) human-mouse
    gene order data
  • Genome.org Herpes Virus that affects human

36
Experiment 1 - Synthetic
1- Generated files of random permutations of
different lengths (50, 100, 200, 400, 800, 1600)
each file with 50 permutations. 2- We computed
the number of correctly sorted permutations. 3-
Evolution rate varies 20,30,40
37
Experiment 2 - Synthetic
1- Generated files of random permutations of
different lengths (50, 100, 200, 400, 800, 1600)
each file with 50 permutations. 2- We computed
the time needed to obtain the correctly sorted
permutations. 3- Evolution rate varies
20,30,40
38
Experiment 3 - Synthetic
1- Generated files of random permutations of
length 1000 2- We computed the time needed to
obtain the correctly sorted permutations. 3-
Evolution rate varies in increments of 100.
Observation Saturation state is reached as
evolution rate approaches 1000
39
Experiment 1 - Real
Considered Herpes simplex virus (HSV),
Epstein-Barr virus (EBV), and Cytomegalovirus
(CMV) gene orders (Hannenhalli et al. 1995) as
well as the identity gene order
(A) ObservationsOur reversal results matched
those obtained in optimal evolutionary scenario
recovered by MGR-MEDIAN.
40
Experiment 2 - Real
1- Considered Campanulaceae species 2- Obtained
reversals for Cyanathus (11 reversals), Triodanus
(13 reversals), and Symphanra (12 reversals)
versus Tobacco but failed to sort Platyncodon,
Legousia and Codonopsis Observation The ones we
sorted were sorted with same number of reversals
as GRIMM
41
Experiment 3 - Real
1- Considered Human-Mouse gene order from MGR
12 13 14 15 -9 -8 -7 -6 47 48 -46 -45 -44 -11 -10
-58 -57 -56 92 93 -95 -94 -21 -20 -5 -4 -3 -2 -1
34 35 41 42 43 36 37 38 -64 -63 61 62 65 66 67 68
90 91 -55 -54 51 52 53 39 40 -60 -59 -77 -76 -19
-18 16 17 -97 -96 -75 -74 -73 24 25 78 79 -83 -82
-81 -80 84 85 86 87 -28 -27 -26 22 23 98 99 69 70
-72 -71 -33 -32 -31 -30 -29 88 89 -50 -49 -105
-104 106 107 108 114 115 -117 -116 -103 -102 109
110 111 112 113 -101 -100 118 119 120 121 122 123
(mouse genome and human is identity)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 71 72 73 74 75 76 77 78 79 80 81 82
83 84 85 86 87 88 89 90 91 92 93 39 40 41 42 43
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69 70 94 95 96 97 98
99 100 101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120 121 122
123 124
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 71 72 73 74 75 76 77 78 79 80 81 82
83 84 85 86 87 88 89 90 91 92 93 39 40 41 42 43
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69 70 94 95 96 97 98
99 100 101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120 121 122
123 124
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 71 72 73 74 75 76 77 78 79 80 81 82
83 84 85 86 87 88 89 90 91 92 93 39 40 41 42 43
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69 70 94 95 96 97 98
99 100 101 102 103 104 105 106 107 108 109 110
111 112 113 114 115 116 117 118 119 120 121 122
123 124
Identity
GRIMM sorts the permutation in 41 reversals
42
Conclusions
  • We implemented a technique that integrates the
    bit-matrix oriented sorting technique together
    with the greedy breakpoint reversal technique.
  • The technique proposed was tested on both real
    and synthetic data and was able to sort signed
    permutations in a fair number of the test data
  • We think that such integration can yield good
    results beside being a simple and relatively fast
    technique
  • However, the oriented sort algorithm fails to
    sort permutations that have hurdles, in those
    cases we have to apply the breakpoint approach

43
Future Work
  • We really think that the technique we
    implemented can provide good results, we think
    that further experiments can strengthen our claim
  • We started implementing the algorithm proposed
    in Kaplan, H., Shamir, R., and Tarjan R. 99 but
    didnt succeed to complete the implementation. We
    think that having this technique implemented
    under that same conditions as ours can provide a
    good source of comparative results, and can give
    a better confidence about what we propose.
  • Applying the technique in different datasets
    including exon order rather than gene order
  • Considering different species and trying to
    compute reversal distance and use it to confirm
    phylogenetic trees

44
Oriented Pairs
(0 )
  • An oriented pair ( , ) is a pair of
    consecutive integers, that is
  • with opposite signs
  • Example
  • (0 3 1 6 5 2 4 7)
  • Oriented pairs are

(1,-2)
(3, 2)
45
Reversal Distance Estimation
This reversal distance is very in-accurate.
Bafna and Pevzner, 1996 showed that another
hidden parameter hurdles estimated reversal
distance with much greater accuracy.
46
Proper reversal
For every permutation
and reversal
Given an arbitary reversal
denote
(increase in the size of cycle decomposition)
Then for every permutation
and reversal
We call reversal proper if
1
47
Oriented pairs
  • Oriented pairs are useful because they indicate
    reversals that create consecutive elements of the
    permutation.
  • Example
  • The pair (1, -2) induces the reversal
  • (0 3 1 6 5 2 4 7)
  • (0 3 1 2 5 6 4 7)
Write a Comment
User Comments (0)
About PowerShow.com