Genome Rearrangements, Synteny, and Comparative Mapping - PowerPoint PPT Presentation

1 / 93
About This Presentation
Title:

Genome Rearrangements, Synteny, and Comparative Mapping

Description:

This study helped pave the way to analyzing genome rearrangements in molecular evolution ... Colon cancer. Colon Cancer. Comparative maps. Waardenburg's Syndrome: ... – PowerPoint PPT presentation

Number of Views:356
Avg rating:3.0/5.0
Slides: 94
Provided by: deb862
Category:

less

Transcript and Presenter's Notes

Title: Genome Rearrangements, Synteny, and Comparative Mapping


1
Genome Rearrangements, Synteny, and Comparative
Mapping
  • CSCI 4830 Algorithms for Molecular Biology
  • Debra S. Goldberg

2
Turnip vs Cabbage
  • Share a recent common ancestor
  • Look and taste different

3
Turnip vs Cabbage
  • Comparing mtDNA gene sequences yields no
    evolutionary information
  • 99 similarity between genes
  • These surprisingly identical gene sequences
    differed in gene order
  • This study helped pave the way to analyzing
    genome rearrangements in molecular evolution

4
Turnip vs Cabbage Different mtDNA Gene Order
  • Gene order comparison

5
Turnip vs Cabbage Different mtDNA Gene Order
  • Gene order comparison

6
Turnip vs Cabbage Different mtDNA Gene Order
  • Gene order comparison

7
Turnip vs Cabbage Different mtDNA Gene Order
  • Gene order comparison

8
Turnip vs Cabbage Gene Order Comparison
  • Evolution is manifested as the divergence in gene
    order

9
Transforming Cabbage into Turnip
10
Reversals
1
3
2
10
9
8
4
7
5
6
1, 2, 3, -8, -7, -6, -5, -4, 9, 10
  • Blocks represent conserved genes.
  • In the course of evolution or in a clinical
    context, blocks 1,,10 could be misread as 1, 2,
    3, -8, -7, -6, -5, -4, 9, 10.

11
Types of Mutations
12
Types of Rearrangements
Inversion/Reversal
1 2 3 4 5 6
1 2 -5 -4 -3 6
13
Types of Rearrangements
Translocation
1 2 3 4 5 6
1 2 6 4 5 3
14
Types of Rearrangements
Fusion
1 2 3 4 5 6
1 2 3 4 5 6
Fission
15
Genome rearrangements
Mouse (X chrom.)
Unknown ancestor 75 million years ago
Human (X chrom.)
  • What are the similarity blocks and how to find
    them?
  • What is the architecture of the ancestral genome?
  • What is the evolutionary scenario for
    transforming one genome into the other?

16
Why do we care?
17
SKY (spectral karyotyping)
18
Robertsonian Translocation
19
Robertsonian Translocation
  • Translocation of chromosomes 13 and 14
  • No net gain or loss of genetic material normal
    phenotype.
  • Increased risk for an abnormal child or
    spontaneous pregnancy loss

20
Philadelphia Chromosome
21
Philadelphia Chromosome
  • A translocation between chromosomes 9 and 22
    (part of 22 is attached to 9)
  • Seen in about 90 of patients with Chronic
    myelogenous leukemia (CML)

22
Colon cancer
23
Colon Cancer
24
Comparative maps
25
Waardenburgs Syndrome Mouse Provides Insight
into Human Genetic Disorder
  • Characterized by pigmentary dysphasia
  • Gene implicated linked to human chromosome 2
  • It was not clear where exactly on chromosome 2

26
Waardenburgs syndrome and splotch mice
  • A breed of mice (with splotch gene) had similar
    symptoms caused by the same type of gene as in
    humans
  • Scientists identified location of gene
    responsible for disorder in mice
  • Finding the gene in mice gives clues to where
    same gene is located in humans

27
Reversals Example
  • p 1 2 3 4 5 6 7 8

  • 1 2 5 4 3 6 7 8

28
Reversals Example
  • p 1 2 3 4 5 6 7 8

  • 1 2 5 4 3 6 7 8
  • 1 2 5 4 6 3 7 8

29
Reversals and Gene Orders
  • Gene order represented by permutation p
  • p p 1 ------ p i-1 p i p i1 ------ p j-1 p
    j p j1 ----- p n
  • p 1 ------ p i-1 p j p j-1 ------ p i1
    p i p j1 ----- pn
  • Reversal r ( i, j ) reverses (flips) the elements
    from i to j in p

r(i,j)
30
Reversal Distance Problem
  • Goal Given two permutations, find shortest
    series of reversals to transform one into another
  • Input Permutations p and s
  • Output A series of reversals r1,rt
    transforming p into s, such that t is minimum
  • t - reversal distance between p and s
  • d(p, s) smallest possible value of t, given p,
    s

31
Sorting By Reversals Problem
  • Goal Given a permutation, find a shortest series
    of reversals that transforms it into the identity
    permutation (1 2 n )
  • Input Permutation p
  • Output A series of reversals r1, rt
    transforming p into the identity permutation such
    that t is minimum
  • min t d(p ) reversal distance of p

32
Sorting by reversalsExample 5 steps
33
Sorting by reversalsExample 4 steps
What is the reversal distance for this
permutation? Can it be sorted in 3 steps?
34
Pancake Flipping Problem
  • Chef prepares unordered stack of pancakes of
    different sizes
  • The waiter wants to sort (rearrange) them,
    smallest on top, largest at bottom
  • He does it by flipping over several from the top,
    repeating this as many times as necessary

35
Sorting By Reversals A Greedy Algorithm
  • Unsigned permutations
  • Example permutation p 1 2 3 6 4 5
  • First three elements are already in order
  • prefix(p) length of already sorted prefix
  • prefix(p) 3
  • Idea increase prefix(p) at every step

36
Greedy Algorithm An Example
  • Doing so, p can be sorted
  • 1 2 3 6 4 5
  • 1 2 3 4 6 5
  • 1 2 3 4 5 6
  • Number of steps to sort permutation of length n
    is at most (n 1)

37
Greedy Algorithm Pseudocode
  • SimpleReversalSort(p)
  • 1 for i ? 1 to n 1
  • 2 j ? position of element i in p (i.e., pj
    i)
  • 3 if j ?i
  • 4 p ? p r(i, j)
  • 5 output p
  • 6 if p is the identity permutation
  • 7 return

38
Analyzing SimpleReversalSort
  • SimpleReversalSort does not guarantee the
    smallest number of reversals and takes five steps
    on p 6 1 2 3 4 5
  • Step 1 1 6 2 3 4 5
  • Step 2 1 2 6 3 4 5
  • Step 3 1 2 3 6 4 5
  • Step 4 1 2 3 4 6 5
  • Step 5 1 2 3 4 5 6

39
Analyzing SimpleReversalSort (contd)
  • But it can be sorted in two steps
  • p 6 1 2 3 4 5
  • Step 1 5 4 3 2 1 6
  • Step 2 1 2 3 4 5 6
  • So, SimpleReversalSort(p) is not optimal
  • Optimal algorithms are unknown for many problems
    approximation algorithms used

40
Approximation Algorithms
  • These algorithms find approximate solutions
    rather than optimal solutions
  • The approximation ratio of an algorithm A on
    input p is
  • A(p) / OPT(p)
  • where
  • A(p) -solution produced by algorithm A
    OPT(p) - optimal solution of the problem

41
Approximation Ratio / Performance Guarantee
  • Approximation ratio (performance guarantee) of
    algorithm A max approximation ratio of all
    inputs of size n
  • For algorithm A that minimizes objective function
    (minimization algorithm)
  • maxp n A(p) / OPT(p)
  • For maximization algorithm
  • minp n A(p) / OPT(p)

42
Adjacencies and Breakpoints
  • p p1p2p3pn-1pn
  • A pair of elements p i and p i 1 are adjacent
    if
  • pi1 pi 1
  • For example
  • p 1 9 3 4 7 8 2 6 5
  • (3, 4) or (7, 8) and (6,5) are adjacent pairs

43
Breakpoints An Example
  • There is a breakpoint between any adjacent
    element that are non-consecutive
  • p 1 9 3 4 7 8 2 6 5
  • Pairs (1,9), (9,3), (4,7), (8,2) and (2,5) form
    breakpoints of permutation p
  • b(p) - breakpoints in permutation p

44
Adjacency Breakpoints
  • An adjacency consecutive
  • A breakpoint not consecutive

Extend p with p0 0 and pn1 n1
adjacencies
p 5 6 2 1 3 4
0 5 6 2 1 3 4 7
breakpoints
45
Extending Permutations
  • Add p 0 0 and p n 1n1 at ends of p
  • Example

p 1 9 3 4 7 8 2 6 5
Extending with 0 and 10
p 0 1 9 3 4 7 8 2 6 5 10
Note A new breakpoint was created after extending
46
Reversal Distance and Breakpoints
  • Each reversal eliminates at most 2 breakpoints.
  • p 2 3 1 4 6 5
  • 0 2 3 1 4 6 5 7 b(p) 5
  • 0 1 3 2 4 6 5 7 b(p) 4
  • 0 1 2 3 4 6 5 7 b(p) 2
  • 0 1 2 3 4 5 6 7 b(p) 0

This implies reversal distance breakpoints /
2
47
Sorting By Reversals A Better Greedy Algorithm
  • BreakPointReversalSort(p)
  • 1 while b(p) gt 0
  • 2 Among all possible reversals, choose
    reversal r minimizing b(p r)
  • 3 p ? p r(i, j)
  • 4 output p
  • 5 return

Problem this algorithm may work forever
48
Strips
  • Strip an interval between two consecutive
    breakpoints in a permutation
  • Decreasing strip elements in decreasing order
    (e.g. 6 5)
  • Increasing strip elements in increasing order
    (e.g. 7 8)
  • 0 1 9 4 3 7 8 2 5 6 10
  • Consider single-element strips decreasing except
    strips 0 and n1 are increasing

49
Reducing the Number of Breakpoints
  • Theorem 1
  • If permutation p contains at least one
    decreasing strip, then there exists a reversal r
    which decreases the number of breakpoints (i.e.
    b(p r) lt b(p) )

50
Things To Consider
  • For p 1 4 6 5 7 8 3 2
  • 0 1 4 6 5 7 8 3 2 9 b(p)
    5
  • Choose decreasing strip with the smallest element
    k in p (k 2 in this case)

51
Things To Consider (contd)
  • For p 1 4 6 5 7 8 3 2
  • 0 1 4 6 5 7 8 3 2 9 b(p)
    5
  • Choose decreasing strip with the smallest element
    k in p (k 2 in this case)

52
Things To Consider (contd)
  • For p 1 4 6 5 7 8 3 2
  • 0 1 4 6 5 7 8 3 2 9 b(p)
    5
  • Choose decreasing strip with the smallest element
    k in p (k 2 in this case)
  • Find k 1 in the permutation

53
Things To Consider (contd)
  • For p 1 4 6 5 7 8 3 2
  • 0 1 4 6 5 7 8 3 2 9 b(p)
    5
  • Choose decreasing strip with the smallest element
    k in p (k 2 in this case)
  • Find k 1 in the permutation
  • Reverse segment between k and k-1 0 1
    2 3 8 7 5 6 4 9 b(p) 4

54
Reducing the Number of Breakpoints Again
  • If there is no decreasing strip, there may be no
    reversal r that reduces the number of
    breakpoints (i.e. b(p r) b(p) for any
    reversal r).
  • By reversing an increasing strip ( of
    breakpoints stay unchanged ), we will create a
    decreasing strip at the next step. Then the
    number of breakpoints will be reduced in the next
    step (theorem 1).

55
Things To Consider (contd)
  • There are no decreasing strips in p, for
  • p 0 1 2 5 6 7 3 4 8 b(p) 3
  • p r(3,4) 0 1 2 5 6 7 4 3 8 b(p)
    3
  • r(3,4) does not change the of breakpoints
  • r(3,4) creates a decreasing strip thus
    guaranteeing that the next step will decrease the
    of breakpoints.

56
ImprovedBreakpointReversalSort
  • ImprovedBreakpointReversalSort(p)
  • 1 while b(p) gt 0
  • 2 if p has a decreasing strip
  • Among all possible reversals, choose reversal
    r
  • that minimizes b(p
    r)
  • 4 else
  • 5 Choose a reversal r that flips an
    increasing strip in p
  • 6 p ? p r
  • 7 output p
  • 8 return

57
Performance Guarantee
  • ImprovedBreakPointReversalSort is an
    approximation algorithm with a performance
    guarantee of at most 4
  • It eliminates at least one breakpoint in every
    two steps at most 2b(p) steps
  • Approximation ratio 2b(p) / d(p)
  • Optimal algorithm eliminates at most 2
    breakpoints in every step d(p) ? b(p) / 2
  • Performance guarantee
  • ( 2b(p) / d(p) ) ? 2b(p) / (b(p) / 2) 4

58
Signed Permutations
  • Up to this point, reversal sort algorithms sorted
    unsigned permutations
  • But genes have directions so we should consider
    signed permutations

p 1 -2 - 3 4 -5
59
Signed Permutation
  • Genes are directed fragments of DNA
  • Genes in the same position but different
    orientations do not have same gene order
  • These two permutations are not equivalent gene
    sequences

1 2 3 4 5
-1 2 -3 -4 -5
60
Signed permutations are easier!
  • Polynomial time (optimal) algorithm is known

61
Genome rearrangements
Mouse (X chrom.)
Unknown ancestor 75 million years ago
Human (X chrom.)
  • What are the similarity blocks and how to find
    them?
  • What is the architecture of the ancestral genome?
  • What is the evolutionary scenario for
    transforming one genome into the other?

62
Genome rearrangements
Mouse (X chrom.)
Unknown ancestor 75 million years ago
Human (X chrom.)
  • What are the similarity blocks and how to find
    them?
  • What is the architecture of the ancestral genome?
  • What is the evolutionary scenario for
    transforming one genome into the other?

63
Comparative maps
64
A brief history
  • Chromosome comparisons
  • no information about genes
  • 1920s Sturtevant, Weinstein
  • Today many organisms, many uses
  • Humans
  • primates, mouse, cat, dog, zebrafish, ...
  • Alzheimer, cancers, diabetes, obesity, ...

65
Why construct comparative maps?
  • Identify isolate genes
  • Crops drought resistance, yield, nutrition...
  • Human disease genes, drug response,
  • Infer ancestral relationships
  • Discover principles of evolution
  • Chromosome
  • Gene family
  • key to understanding the human genome

66
Map construction
Go from this
to this
Maize 1 (target), Rice (base) Wilson et al.
Genetics 1999
67
Why automate?
  • Time consuming, laborious
  • Needs to be redone frequently
  • Codify a common set of principles
  • Nadeau and Sankoff warn of arbitrary nature of
    comparative map construction

68
Input/Output
  • Input
  • genetic maps of 2 species
  • marker/gene correspondences (homologs)
  • Output
  • a comparative map
  • homeologies identified

69
A natural model?
Maize 1 (target), Rice (base) Wilson et al.
Genetics 1999
70
Scoring
10L
3L
71
Assumptions
  • Accept published marker order
  • All linkage groups of base are unique
  • Simplistic homeology criteria
  • At least one homeologous region

72
A natural model?
73
Dynamic programming
  • li location of homolog to marker i
  • Si,a penalty (score) for an optimal labeling
    of the submap from marker i to the end, when
    labeling begins with label a

a 1 ... i ... n
74
Recurrence relation
  • Sn,a m ?(a, ln)Si,a m ?(a, li) min
    (Si1,b s ?(a,b) )

a ... n ... ln
b?L
75
Problem with linear model
  • s 2

76
The stack model
d
c
c
f
e
b
b
b
a
  • Segment at top of the stack can be
  • pushed (remembered), later popped
  • replaced
  • Push and replace cost s -- pop is free.

77
Scoring
uaz265a (7L) isu136 (2L)
isu151 (7L) rz509b (7L) cdo59c
(7L) rz698c (9L) bcd1087a (9L)
rz206b (9L) bcd1088c (9L)
csu40 (3S) cdo786a (9L) csu154
(7L) isu113a (7L) csu17 (7L)
cdo337 (3L) rz530a (7L)
7L
9L
7L
78
Dynamic programming
  • Si,j,a score for an optimal labeling of
  • submap from marker i to marker j
  • when labeling begins with label a -- i.e.,
    marker i is labeled a

a 1 ... i ... j ... n
79
Recurrence relation
  • Si,i,a m ?(a, li)
  • Si,j,a min
  • m ?(a, li) min (Si1,j,b s ?(a,b) )
  • min Si,k,a Sk1,j,a

b?L
iltkltj
a a 1 ... i ... k1 ... j ... n
80
Advantage output similar to experts
  • Maize 6 (target),
  • Rice (base)

81
Advantage proposes testable hypotheses
  • New relations predictedgreater resolution maps
    confirm

Maize 7 (target), Rice (base)
Ahn-Tanksley 93 Ahn-Tanksley data Wilson et.
al. 99
82
Advantage infers evolutionary events
Wilson et al.
Maize 1 (target) Rice (base)
83
Problem Incomplete input
  • Gene order not always fully resolved.
  • Co-located genes can be ordered to give most
    parsimonious labeling.

33.0 33.0 33.0 33.0 33.0 33.0 33.0 33.0 33.0 33.0
84
The reordering algorithm
  • Uses a compression scheme
  • Within a megalocus, group genes by location of
    related gene.
  • Order these groups
  • First, last groups interact with nearby genes
  • Any ordering of internal groups is equally
    parsimonious

85
The reordering algorithm
86
The reordering algorithm
87
Definitions
  • ? extended to distance to a set A of labels
  • 0 if a ? A,
  • 1 otherwise
  • li set of labels matching markers in
    megalocus i
  • S set of megalocus start nodes

?(a, A)
88
Definitions
  • p(i,a,b) gives the total mismatched marker and
    segment boundary penalties attributed to hidden
    markers
  • i is index for megalocus
  • a and b are labels for megalocus ends
  • Do any markers in megalocus match a, b?
  • No dont penalize in recurrence and p(i,a,b)

89
Recurrence relation
  • Si,i,a m ?(a, li)

Si,j,a min m ?(a, li) min (Si1,j,b s
?(a,b) p(i,a,b)) min Si,k,a Sk1,j,a

b?L
iltkltj k ? S
90
Results Fewer mismatches
  • stack reordering

Mouse 5 (target) Human (base)
91
Results Mismatches placed between segments
  • stack reordering

Mouse 8 (target) Human (base)
92
Results Detects new segments
  • stack reordering

Mouse 13 (target) Human (base)
93
Summary
  • Global view
  • Finds optimal comparative map
  • Arranges markers in most parsimonious way
  • Biologically meaningful results
  • Robust
  • not species-specific
  • high/low resolution, genetic/physical maps
  • stable to errors in marker order
Write a Comment
User Comments (0)
About PowerShow.com