Title: Genome Rearrangements, Synteny, and Comparative Mapping
1Genome Rearrangements, Synteny, and Comparative
Mapping
- CSCI 4830 Algorithms for Molecular Biology
- Debra S. Goldberg
2Turnip vs Cabbage
- Share a recent common ancestor
- Look and taste different
3Turnip vs Cabbage
- Comparing mtDNA gene sequences yields no
evolutionary information - 99 similarity between genes
- These surprisingly identical gene sequences
differed in gene order - This study helped pave the way to analyzing
genome rearrangements in molecular evolution
4Turnip vs Cabbage Different mtDNA Gene Order
5Turnip vs Cabbage Different mtDNA Gene Order
6Turnip vs Cabbage Different mtDNA Gene Order
7Turnip vs Cabbage Different mtDNA Gene Order
8Turnip vs Cabbage Gene Order Comparison
- Evolution is manifested as the divergence in gene
order
9Transforming Cabbage into Turnip
10Reversals
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
- Blocks represent conserved genes.
- In the course of evolution or in a clinical
context, blocks 1,,10 could be misread as 1, 2,
3, -8, -7, -6, -5, -4, 9, 10.
11Types of Mutations
12Types of Rearrangements
Inversion/Reversal
1 2 3 4 5 6
1 2 -5 -4 -3 6
13Types of Rearrangements
Translocation
1 2 3 4 5 6
1 2 6 4 5 3
14Types of Rearrangements
Fusion
1 2 3 4 5 6
1 2 3 4 5 6
Fission
15Genome rearrangements
Mouse (X chrom.)
Unknown ancestor 75 million years ago
Human (X chrom.)
- What are the similarity blocks and how to find
them? - What is the architecture of the ancestral genome?
- What is the evolutionary scenario for
transforming one genome into the other?
16Why do we care?
17SKY (spectral karyotyping)
18Robertsonian Translocation
19Robertsonian Translocation
- Translocation of chromosomes 13 and 14
- No net gain or loss of genetic material normal
phenotype. - Increased risk for an abnormal child or
spontaneous pregnancy loss
20Philadelphia Chromosome
21Philadelphia Chromosome
- A translocation between chromosomes 9 and 22
(part of 22 is attached to 9) - Seen in about 90 of patients with Chronic
myelogenous leukemia (CML)
22Colon cancer
23Colon Cancer
24Comparative maps
25Waardenburgs Syndrome Mouse Provides Insight
into Human Genetic Disorder
-
- Characterized by pigmentary dysphasia
- Gene implicated linked to human chromosome 2
- It was not clear where exactly on chromosome 2
26Waardenburgs syndrome and splotch mice
- A breed of mice (with splotch gene) had similar
symptoms caused by the same type of gene as in
humans - Scientists identified location of gene
responsible for disorder in mice - Finding the gene in mice gives clues to where
same gene is located in humans
27Reversals Example
- p 1 2 3 4 5 6 7 8
-
-
- 1 2 5 4 3 6 7 8
-
-
28Reversals Example
- p 1 2 3 4 5 6 7 8
-
-
- 1 2 5 4 3 6 7 8
-
- 1 2 5 4 6 3 7 8
29Reversals and Gene Orders
- Gene order represented by permutation p
- p p 1 ------ p i-1 p i p i1 ------ p j-1 p
j p j1 ----- p n -
- p 1 ------ p i-1 p j p j-1 ------ p i1
p i p j1 ----- pn - Reversal r ( i, j ) reverses (flips) the elements
from i to j in p
r(i,j)
30Reversal Distance Problem
- Goal Given two permutations, find shortest
series of reversals to transform one into another - Input Permutations p and s
- Output A series of reversals r1,rt
transforming p into s, such that t is minimum - t - reversal distance between p and s
- d(p, s) smallest possible value of t, given p,
s
31Sorting By Reversals Problem
- Goal Given a permutation, find a shortest series
of reversals that transforms it into the identity
permutation (1 2 n ) - Input Permutation p
- Output A series of reversals r1, rt
transforming p into the identity permutation such
that t is minimum - min t d(p ) reversal distance of p
32Sorting by reversalsExample 5 steps
33Sorting by reversalsExample 4 steps
What is the reversal distance for this
permutation? Can it be sorted in 3 steps?
34Pancake Flipping Problem
- Chef prepares unordered stack of pancakes of
different sizes - The waiter wants to sort (rearrange) them,
smallest on top, largest at bottom - He does it by flipping over several from the top,
repeating this as many times as necessary
35Sorting By Reversals A Greedy Algorithm
- Unsigned permutations
- Example permutation p 1 2 3 6 4 5
- First three elements are already in order
- prefix(p) length of already sorted prefix
- prefix(p) 3
- Idea increase prefix(p) at every step
36Greedy Algorithm An Example
- Doing so, p can be sorted
-
- 1 2 3 6 4 5
- 1 2 3 4 6 5
-
- 1 2 3 4 5 6
- Number of steps to sort permutation of length n
is at most (n 1)
37Greedy Algorithm Pseudocode
- SimpleReversalSort(p)
- 1 for i ? 1 to n 1
- 2 j ? position of element i in p (i.e., pj
i) - 3 if j ?i
- 4 p ? p r(i, j)
- 5 output p
- 6 if p is the identity permutation
- 7 return
38Analyzing SimpleReversalSort
- SimpleReversalSort does not guarantee the
smallest number of reversals and takes five steps
on p 6 1 2 3 4 5 - Step 1 1 6 2 3 4 5
- Step 2 1 2 6 3 4 5
- Step 3 1 2 3 6 4 5
- Step 4 1 2 3 4 6 5
- Step 5 1 2 3 4 5 6
39Analyzing SimpleReversalSort (contd)
- But it can be sorted in two steps
- p 6 1 2 3 4 5
- Step 1 5 4 3 2 1 6
- Step 2 1 2 3 4 5 6
- So, SimpleReversalSort(p) is not optimal
- Optimal algorithms are unknown for many problems
approximation algorithms used
40Approximation Algorithms
- These algorithms find approximate solutions
rather than optimal solutions - The approximation ratio of an algorithm A on
input p is - A(p) / OPT(p)
- where
- A(p) -solution produced by algorithm A
OPT(p) - optimal solution of the problem
41Approximation Ratio / Performance Guarantee
- Approximation ratio (performance guarantee) of
algorithm A max approximation ratio of all
inputs of size n - For algorithm A that minimizes objective function
(minimization algorithm) - maxp n A(p) / OPT(p)
- For maximization algorithm
- minp n A(p) / OPT(p)
42Adjacencies and Breakpoints
- p p1p2p3pn-1pn
- A pair of elements p i and p i 1 are adjacent
if - pi1 pi 1
- For example
- p 1 9 3 4 7 8 2 6 5
- (3, 4) or (7, 8) and (6,5) are adjacent pairs
43Breakpoints An Example
- There is a breakpoint between any adjacent
element that are non-consecutive - p 1 9 3 4 7 8 2 6 5
- Pairs (1,9), (9,3), (4,7), (8,2) and (2,5) form
breakpoints of permutation p - b(p) - breakpoints in permutation p
-
44Adjacency Breakpoints
- An adjacency consecutive
- A breakpoint not consecutive
Extend p with p0 0 and pn1 n1
adjacencies
p 5 6 2 1 3 4
0 5 6 2 1 3 4 7
breakpoints
45Extending Permutations
- Add p 0 0 and p n 1n1 at ends of p
- Example
-
p 1 9 3 4 7 8 2 6 5
Extending with 0 and 10
p 0 1 9 3 4 7 8 2 6 5 10
Note A new breakpoint was created after extending
46Reversal Distance and Breakpoints
- Each reversal eliminates at most 2 breakpoints.
- p 2 3 1 4 6 5
- 0 2 3 1 4 6 5 7 b(p) 5
- 0 1 3 2 4 6 5 7 b(p) 4
- 0 1 2 3 4 6 5 7 b(p) 2
- 0 1 2 3 4 5 6 7 b(p) 0
This implies reversal distance breakpoints /
2
47Sorting By Reversals A Better Greedy Algorithm
- BreakPointReversalSort(p)
- 1 while b(p) gt 0
- 2 Among all possible reversals, choose
reversal r minimizing b(p r) - 3 p ? p r(i, j)
- 4 output p
- 5 return
Problem this algorithm may work forever
48Strips
- Strip an interval between two consecutive
breakpoints in a permutation - Decreasing strip elements in decreasing order
(e.g. 6 5) - Increasing strip elements in increasing order
(e.g. 7 8) -
- 0 1 9 4 3 7 8 2 5 6 10
- Consider single-element strips decreasing except
strips 0 and n1 are increasing
49Reducing the Number of Breakpoints
- Theorem 1
- If permutation p contains at least one
decreasing strip, then there exists a reversal r
which decreases the number of breakpoints (i.e.
b(p r) lt b(p) )
50Things To Consider
- For p 1 4 6 5 7 8 3 2
- 0 1 4 6 5 7 8 3 2 9 b(p)
5 - Choose decreasing strip with the smallest element
k in p (k 2 in this case)
51Things To Consider (contd)
- For p 1 4 6 5 7 8 3 2
- 0 1 4 6 5 7 8 3 2 9 b(p)
5 - Choose decreasing strip with the smallest element
k in p (k 2 in this case)
52Things To Consider (contd)
- For p 1 4 6 5 7 8 3 2
- 0 1 4 6 5 7 8 3 2 9 b(p)
5 - Choose decreasing strip with the smallest element
k in p (k 2 in this case) - Find k 1 in the permutation
53Things To Consider (contd)
- For p 1 4 6 5 7 8 3 2
- 0 1 4 6 5 7 8 3 2 9 b(p)
5 - Choose decreasing strip with the smallest element
k in p (k 2 in this case) - Find k 1 in the permutation
- Reverse segment between k and k-1 0 1
2 3 8 7 5 6 4 9 b(p) 4
54Reducing the Number of Breakpoints Again
- If there is no decreasing strip, there may be no
reversal r that reduces the number of
breakpoints (i.e. b(p r) b(p) for any
reversal r). - By reversing an increasing strip ( of
breakpoints stay unchanged ), we will create a
decreasing strip at the next step. Then the
number of breakpoints will be reduced in the next
step (theorem 1).
55Things To Consider (contd)
- There are no decreasing strips in p, for
- p 0 1 2 5 6 7 3 4 8 b(p) 3
- p r(3,4) 0 1 2 5 6 7 4 3 8 b(p)
3 - r(3,4) does not change the of breakpoints
- r(3,4) creates a decreasing strip thus
guaranteeing that the next step will decrease the
of breakpoints.
56ImprovedBreakpointReversalSort
- ImprovedBreakpointReversalSort(p)
- 1 while b(p) gt 0
- 2 if p has a decreasing strip
- Among all possible reversals, choose reversal
r - that minimizes b(p
r) - 4 else
- 5 Choose a reversal r that flips an
increasing strip in p - 6 p ? p r
- 7 output p
- 8 return
57Performance Guarantee
- ImprovedBreakPointReversalSort is an
approximation algorithm with a performance
guarantee of at most 4 - It eliminates at least one breakpoint in every
two steps at most 2b(p) steps - Approximation ratio 2b(p) / d(p)
- Optimal algorithm eliminates at most 2
breakpoints in every step d(p) ? b(p) / 2 - Performance guarantee
- ( 2b(p) / d(p) ) ? 2b(p) / (b(p) / 2) 4
58Signed Permutations
- Up to this point, reversal sort algorithms sorted
unsigned permutations - But genes have directions so we should consider
signed permutations
p 1 -2 - 3 4 -5
59Signed Permutation
- Genes are directed fragments of DNA
- Genes in the same position but different
orientations do not have same gene order
- These two permutations are not equivalent gene
sequences
1 2 3 4 5
-1 2 -3 -4 -5
60Signed permutations are easier!
- Polynomial time (optimal) algorithm is known
61Genome rearrangements
Mouse (X chrom.)
Unknown ancestor 75 million years ago
Human (X chrom.)
- What are the similarity blocks and how to find
them? - What is the architecture of the ancestral genome?
- What is the evolutionary scenario for
transforming one genome into the other?
62Genome rearrangements
Mouse (X chrom.)
Unknown ancestor 75 million years ago
Human (X chrom.)
- What are the similarity blocks and how to find
them? - What is the architecture of the ancestral genome?
- What is the evolutionary scenario for
transforming one genome into the other?
63Comparative maps
64A brief history
- Chromosome comparisons
- no information about genes
- 1920s Sturtevant, Weinstein
- Today many organisms, many uses
- Humans
- primates, mouse, cat, dog, zebrafish, ...
- Alzheimer, cancers, diabetes, obesity, ...
65Why construct comparative maps?
- Identify isolate genes
- Crops drought resistance, yield, nutrition...
- Human disease genes, drug response,
- Infer ancestral relationships
- Discover principles of evolution
- Chromosome
- Gene family
- key to understanding the human genome
66Map construction
Go from this
to this
Maize 1 (target), Rice (base) Wilson et al.
Genetics 1999
67Why automate?
- Time consuming, laborious
- Needs to be redone frequently
- Codify a common set of principles
- Nadeau and Sankoff warn of arbitrary nature of
comparative map construction
68Input/Output
- Input
- genetic maps of 2 species
- marker/gene correspondences (homologs)
- Output
- a comparative map
- homeologies identified
69A natural model?
Maize 1 (target), Rice (base) Wilson et al.
Genetics 1999
70Scoring
10L
3L
71Assumptions
- Accept published marker order
- All linkage groups of base are unique
- Simplistic homeology criteria
- At least one homeologous region
72A natural model?
73Dynamic programming
- li location of homolog to marker i
- Si,a penalty (score) for an optimal labeling
of the submap from marker i to the end, when
labeling begins with label a
a 1 ... i ... n
74Recurrence relation
- Sn,a m ?(a, ln)Si,a m ?(a, li) min
(Si1,b s ?(a,b) )
a ... n ... ln
b?L
75Problem with linear model
76The stack model
d
c
c
f
e
b
b
b
a
- Segment at top of the stack can be
- pushed (remembered), later popped
- replaced
- Push and replace cost s -- pop is free.
77Scoring
uaz265a (7L) isu136 (2L)
isu151 (7L) rz509b (7L) cdo59c
(7L) rz698c (9L) bcd1087a (9L)
rz206b (9L) bcd1088c (9L)
csu40 (3S) cdo786a (9L) csu154
(7L) isu113a (7L) csu17 (7L)
cdo337 (3L) rz530a (7L)
7L
9L
7L
78Dynamic programming
- Si,j,a score for an optimal labeling of
- submap from marker i to marker j
- when labeling begins with label a -- i.e.,
marker i is labeled a
a 1 ... i ... j ... n
79Recurrence relation
- Si,i,a m ?(a, li)
- Si,j,a min
- m ?(a, li) min (Si1,j,b s ?(a,b) )
- min Si,k,a Sk1,j,a
b?L
iltkltj
a a 1 ... i ... k1 ... j ... n
80Advantage output similar to experts
- Maize 6 (target),
- Rice (base)
81Advantage proposes testable hypotheses
- New relations predictedgreater resolution maps
confirm
Maize 7 (target), Rice (base)
Ahn-Tanksley 93 Ahn-Tanksley data Wilson et.
al. 99
82Advantage infers evolutionary events
Wilson et al.
Maize 1 (target) Rice (base)
83Problem Incomplete input
- Gene order not always fully resolved.
- Co-located genes can be ordered to give most
parsimonious labeling.
33.0 33.0 33.0 33.0 33.0 33.0 33.0 33.0 33.0 33.0
84The reordering algorithm
- Uses a compression scheme
- Within a megalocus, group genes by location of
related gene. - Order these groups
- First, last groups interact with nearby genes
- Any ordering of internal groups is equally
parsimonious
85The reordering algorithm
86The reordering algorithm
87Definitions
- ? extended to distance to a set A of labels
- 0 if a ? A,
- 1 otherwise
- li set of labels matching markers in
megalocus i - S set of megalocus start nodes
?(a, A)
88Definitions
- p(i,a,b) gives the total mismatched marker and
segment boundary penalties attributed to hidden
markers - i is index for megalocus
- a and b are labels for megalocus ends
- Do any markers in megalocus match a, b?
- No dont penalize in recurrence and p(i,a,b)
89Recurrence relation
Si,j,a min m ?(a, li) min (Si1,j,b s
?(a,b) p(i,a,b)) min Si,k,a Sk1,j,a
b?L
iltkltj k ? S
90Results Fewer mismatches
Mouse 5 (target) Human (base)
91Results Mismatches placed between segments
Mouse 8 (target) Human (base)
92Results Detects new segments
Mouse 13 (target) Human (base)
93Summary
- Global view
- Finds optimal comparative map
- Arranges markers in most parsimonious way
- Biologically meaningful results
- Robust
- not species-specific
- high/low resolution, genetic/physical maps
- stable to errors in marker order