Title: Alignment of Whole Genomes: Algorithms
1Alignment of Whole GenomesAlgorithms Tools
- Michael Brudno
- Department of Computer Science
- University of Toronto
- CBW 02/15/06
2The Human Genome
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCC
ACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAG
CGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCTCCTGACTTT
CCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCA
TAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCC
CAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAA
GACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGT
TTAATTACAGACCTGAACTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGG
ACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGT
GGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACA
GAATGCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAA
ACCTCACCCATGGGAATGCTCACGCATTTAATTACAGACCTGAAAGGAGA
GGAAGCTCGGGAGGTGG
3Basic Biology
- DNA (4 residues, Double-stranded)
- RNA (4 residues, Single-stranded)
- Protein (20 amino acids)
- A.a. code triplet of RNA codes 1 amino acid
gene
E
UTR
P
exon
UTR
exon
UTR
exon
UTR
exon
exon
exon
4The Human Genome
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCC
ACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAG
CGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCTCCTGACTTT
CCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCA
TAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCC
CAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAA
GACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGT
TTAATTACAGACCTGAACTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGG
ACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGT
GGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGACA
GAATGCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAA
ACCTCACCCATGGGAATGCTCACGCATTTAATTACAGACCTGAAAGGAGA
GGAAGCTCGGGAGGTGG
5Complete DNA Sequences
nearly 200 complete genomes have been sequenced
6Complete DNA Sequences
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCC
ACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAG
CGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCTCCTGACTTT
CCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCA
TAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCC
CAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAA
GACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGT
TTAATTACAGACCTGAACTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGG
ACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGT
GGCATGTGACCTCCGAGCAGTCACCADCCAGGCGGCAGGAAGGCGCACCC
CCCCAGCAATCCGCGCGCCGGGACAGAATGCCTGCAGGAACTTCTTCTGG
AAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGGGAATGCTCACGC
ATTTAATTACAGACCTGAAAGGAGAGGAAGCTCGGGAGGTGGGCATCTGA
CA
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCC
ACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAG
CGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCTCCTGACTTT
CCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCA
TAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCC
CAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAA
GACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGT
TTAATTACAGACCTGAACTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGG
ACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAAGGAAGCTCGGGAGG
TGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGAC
AGAATGCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAA
AACCTCACCCATGGGAATGCTCACGCATTTAATTACAGACCTGAAAGGAG
AGGAAGCTACAGTCATGTGCFCGGGAGGTGGGCATCTGACA
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCC
ACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAG
CGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCTCCTGACTTT
CCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCA
TAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCC
CAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAA
GACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGT
TTAATTACAGACCTGAACTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGG
ACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAAGGAAGCTCGGGAGG
TGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGAC
AGAATGCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAA
AACCTCACCCATGGGAATGCTCACGCATTTAATTACAGACCTGAAAGGAG
AGGAAGCTACAGTCATGTGCFCGGGAGGTGGGCATCTGACA
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCC
ACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAG
CGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCTCCTGACTTT
CCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCA
TAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCC
CAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAA
GACCTCCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGT
TTAATTACAGACCTGAACTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGG
ACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGT
GGCATGTGACCTCCGAGCAGTCACCADCCAGGCGGCAGGAAGGCGCACCC
CCCCAGCAATCCGCGCGCCGGGACAGAATGCCTGCAGGAACTTCTTCTGG
AAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGGGAATGCTCACGC
ATTTAATTACAGACCTGAAAGGAGAGGAAGCTCGGGAGGTGGGCATCTGA
C
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCC
ACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAG
CGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCTCCTGACTTT
CCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCA
TAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCC
CAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAA
GACCTCCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGT
TTAATTACAGACCTGAACTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGG
ACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGT
GGCATGTGACCTCCGAGCAGTCACCADCCAGGCGGCAGGAAGGCGCACCC
CCCCAGCAATCCGCGCGCCGGGACAGAATGCCTGCAGGAACTTCTTCTGG
AAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGGGAATGCTCACGC
ATTTAATTACAGACCTGAAAGGAGAGGAAGCTCGGGAGGTGGGCATCTGA
C
ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCC
ACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAG
CGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCTCCTGACTTT
CCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCA
TAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCC
CAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAA
GACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGT
TTAATTACAGACCTGAACTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGG
ACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAAGGAAGCTCGGGAGG
TGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGAC
AGAATGCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAA
AACCTCACCCATGGGAATGCTCACGCATTTAATTACAGACCTGAAAGGAG
AGGAAGCTACAGTCATGTGCFCGGGAGGTGGGCATCTGACA
7Evolution
8Conservation Implies Function
Gene
Exon
CNS Other Conserved
Dubchak, Brudno et al 2000
9Whole Genome Similarity Map
10Map of Rearrangements
11Alignment of a Syntenic Area
12 84790 84800 84810 84820
84830 84840 seq1 -AGGACTTCTTCCTTTCACCATATA
CAAAAATCAACCCAAGATTGATACAATACTTTAAT
seq2 CAGGATCGACGTCTCTCACCTTATACAAAAA
TCAACTGAAGATGCATCACAGACTTTAA- 13770
13780 13790 13800 13810 13820
84850 84860 84870 84880
84890 84900 seq1 GTAAAATAAAAAACTGTAAAACTC
TAGAAGAAAATGTAGAAACA-----CCATCTGGACA
seq2 ---AAACTGAAACCATAACAATTCTAGAAGTA
AACATTGAAAAAAAAACCCTTCTAGACA 13830
13840 13850 13860 13870 13880
84910 84920 84930 84940
84950 84960 seq1 TCAGCCTGGGCAGAGAATTTATGA
CTAAGTCCTCAAAAGTAATTTCAACAAAAATAAACA
seq2
TTTGCTTAGGCAAAGACTTCATGACAAAGAATGCAAAAGCAG--------
---------- 13890 13900 13910
13920
Local Area of Similarity
13seq1 AAATAAAAAACTGTAAAACTCTAGAAGAAAATGTAGAAACA
-----CCATCTGGACA
seq2
AAACTGAAACCATAACAATTCTAGAAGTAAACATTGAAAAAAAAACCCTT
CTAGACA
How Similar are these Sequences?
14Edit Distance Model
- Minimal weighted sum of insertions, deletions
mutations required to transform one string into
another - AGGCACA--CA AGGCACACA
- or
- A--CACATTCA ACACATTCA
Levenshtein 1966
15Edit Distance Model
- Lets figure out how to compute the sequence of
events that gives the highest score - http//meetings.cshl.edu/tgac/tgac/flash/DynamicPr
ogramming.swf
16 84790 84800 84810 84820 84830
84840 seq1 -AGGACTTCTTCCTTTCACCATATACAAAAATCA
ACCCAAGATTGATACAATACTTTAAT
seq2 CAGGATCGACGTCTCTCACCTTATACAAAAATC
AACTGAAGATGCATCACAGACTTTAA- 13770 13780
13790 13800 13810 13820
84850 84860 84870 84880
84890 84900 seq1 GTAAAATAAAAAACTGTAAAACTCT
AGAAGAAAATGTAGAAACA-----CCATCTGGACA
seq2 ---AAACTGAAACCATAACAATTCTAGAAGTA
AACATTGAAAAAAAAACCCTTCTAGACA 13830
13840 13850 13860 13870 13880
84910 84920 84930 84940
84950 84960 seq1 TCAGCCTGGGCAGAGAATTTATGA
CTAAGTCCTCAAAAGTAATTTCAACAAAAATAAACA
seq2
TTTGCTTAGGCAAAGACTTCATGACAAAGAATGCAAAAGCAG--------
---------- 13890 13900 13910
13920
Local Area of Similarity (Var-mers)
17Local Alignment
F(i,j) max (F(i,j), 0) Return all paths with a
position i,j where F(i,j) gt C Time O( n2 ) for
two seqs, ?( nk ) for k seqs
Smith Waterman 1982
18Heuristic Local Alignment
BLAST
FASTA
Altschul et al 1990
Pearson 1987
19Alignment of a Syntenic Area (LAGAN)
20Global Alignment
AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA
AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC
21LAGAN 1. FIND Local Alignments
- Find Local Alignments
- Chain Local Alignments
- Restricted DP
Brudno, Do et al 2003
22LAGAN 2. CHAIN Local Alignments
- Find Local Alignments
- Chain Local Alignments
- Restricted DP
Brudno, Do et al 2003
23LAGAN 3. Restricted DP
- Find Local Alignments
- Chain Local Alignments
- Restricted DP
Brudno, Do et al 2003
24MLAGAN 1. Progressive Alignment
Human
Baboon
Mouse
Rat
- Given N sequences, phylogenetic tree
- Align pairwise, in order of the tree (LAGAN)
25MLAGAN 2. Multi-anchoring
To anchor the (X/Y), and (Z) alignments
X
Z
Y
Z
X/Y
Z
26Cystic Fibrosis (CFTR), 12 species
Chicken
Zebrafish
Cow
Pig
Chimp
Human
Dog
Rat
Fugufish
Cat
Baboon
Mouse
- Human sequence length 1.8 Mb
- Total genomic sequence 13 Mb
27CFTR (contd )
MAX MEMORY (Mb)
TIME (sec)
Exons Aligned
90
550
99.7
Mammals
LAGAN
90
862
84
Chicken Fishes
Mammals
99.8
MLAGAN
Chicken Fishes
670
4547
91
28Evolution Over a Chromosome
Cooper, Brudno et al 2004
29How Many Genomes?
Cooper, Brudno et al 2003
30Fish-Human Comparison (Woolfe et al 2004)
Highly conserved vertebrate non-coding elements
direct tissue-specific reporter gene expression
31Phylo-VISTA Visualization
Shah,, Brudno, et al 2004
32Summary of LAGAN Alignment
- It is possible to build megabase long multiple
alignments for dozens of sequences - The alignments are accurate at aligning major
biological functional areas - These alignments can be used to better
understand evolution and regulation
33Map of Rearrangements (Shuffle-LAGAN)
34Evolution at the DNA level
Deletion
Mutation
ACGGTGCAGTTACCA
SEQUENCE EDITS
AC----CAGTCACCA
REARRANGEMENTS
Inversion
Translocation
Duplication
35Local Global Alignment
Global
Local
36Glocal Alignment Problem
- Find least cost transformation of one sequence
into another using new operations
- Sequence edits
- Inversions
- Translocations
- Duplications
- Combinations of above
37Shuffle-LAGAN
A glocal aligner for long DNA sequences
Brudno, Malde et al 2003
38S-LAGAN Find Local Alignments
- Find Local Alignments
- Build Rough Homology Map
- Globally Align Consistent Parts
39S-LAGAN Build Homology Map
- Find Local Alignments
- Build Rough Homology Map
- Globally Align Consistent Parts
40Building the Homology Map
Chain (using Eppstein Galil) each alignment gets
a score which is MAX over 4 possible
chains. Penalties are affine (event and
distance components)
- Penalties
- regular
- translocation
c) inversion d) inverted translocation
41S-LAGAN Build Homology Map
- Find Local Alignments
- Build Rough Homology Map
- Globally Align Consistent Parts
42S-LAGAN Global Alignment
- Find Local Alignments
- Build Rough Homology Map
- Globally Align Consistent Parts
43S-LAGAN Results (CFTR)
Local
Glocal
44S-LAGAN Results (CFTR)
Hum/Mus
Hum/Rat
45S-LAGAN results (IGF cluster)
46S-LAGAN results (HOX)
- 12 paralogous genes
- Conserved order in mammals
47S-LAGAN results (HOX)
- 12 paralogous genes
- Conserved order in mammals
48Rearrangements in Human v. Mouse
- Some conclusions
- Rearrangements come in all sizes
- Duplications worse conserved than other
rearranged regions - Half of exons not alignable by LAGAN aligned by
S-LAGAN
49Whole Genome Similarity Map
50Handling Chromosomes Symmetry
- Problems
- S-LAGAN is meant to run on two sequences
- S-LAGAN is not symmetric (it has a base genome)
- Solutions
- Switch penalty
- Super-monotonic maps
Sundararajan, Brudno et al 2004 Brudno, Kislyuk
et al unpublished
51Handling Chromosomes Switch Penalty
Chr 3
Chr 2
Chr 1
Chr 4
Switch Penalty
Base chromosome
52Supermap Algorithm
Duplication Inversion Translocation
- Build 1-monotonic maps with both base genomes
- (cyan pink)
53Supermap Algorithm
Duplication Inversion Translocation
- Build 1-monotonic maps with both base genomes
- (cyan pink)
54Supermap Algorithm
Duplication Inversion Translocation
- Build 1-monotonic maps with both base genomes
- (cyan pink)
- Whenever the maps agree, join them (blue)
55Supermap Algorithm
Duplication Inversion Translocation
- Build 1-monotonic maps with both base genomes
- (cyan pink)
- Whenever the maps agree, join them (blue)
- Syntenic areas are those with a degree of 1
56Human Mouse Rearrangement Map
57Human Genome Alignment Results
- Compared with the previous tandem local/global
approach - 2-fold speedup
- Sensitivity of exon alignment unchanged in
human/mouse, improved in human/chicken - 9-fold reduction in the number of mapped syntenic
segments in human/mouse, and a 2-fold reduction
in human/chicken.
58VISTA Genome Browser
http//pipeline.lbl.gov
Brudno, Poliakov et al 2004
59Recap
60 84790 84800 84810 84820 84830
84840 seq1 -AGGACTTCTTCCTTTCACCATATACAAAAATCA
ACCCAAGATTGATACAATACTTTAAT
seq2 CAGGATCGACGTCTCTCACCTTATACAAAAATC
AACTGAAGATGCATCACAGACTTTAA- 13770 13780
13790 13800 13810 13820
84850 84860 84870 84880
84890 84900 seq1 GTAAAATAAAAAACTGTAAAACTCT
AGAAGAAAATGTAGAAACA-----CCATCTGGACA
seq2 ---AAACTGAAACCATAACAATTCTAGAAGTA
AACATTGAAAAAAAAACCCTTCTAGACA 13830
13840 13850 13860 13870 13880
84910 84920 84930 84940
84950 84960 seq1 TCAGCCTGGGCAGAGAATTTATGA
CTAAGTCCTCAAAAGTAATTTCAACAAAAATAAACA
seq2
TTTGCTTAGGCAAAGACTTCATGACAAAGAATGCAAAAGCAG--------
---------- 13890 13900 13910
13920
Local alignment (var-mers)
61Global Alignment (LAGAN)
62Rearrangements (S-LAGAN)
63Whole Genome Similarity Map
64Is Sequence Alignment Solved?
65Progressive Alignment
- We want to get an alignment that equally
reflects all species. - In the phylogenetic tree
- - Leaves are real genomes
- Internal nodes are Ancestors
- At every step we do an alignment corresponding
to some internal node, in fact building some
estimation of the Ancestors genome.
Human
Baboon
Mouse
Rat
66Which Organism Should be the Base?
Duplication Inversion Translocation
- After Supermap we have 1 or 2 alternatives at
the end of each syntenic area - Whenever there are 2 alternatives at most one of
the two corresponds to the ancestor
67What is similar to the Ancestor?
68What is similar to the Ancestor?
- Ask the Outgroup on the phylogenetic tree!
69What is similar to the Ancestor?
- Ask the Outgroup on the phylogenetic tree!
C1
U
C2
70What is similar to the Ancestor?
- Ask the Outgroup on the phylogenetic tree!
- S (UMax(C1,C2))/ Min(C1,C2)
C1
U
-log(1-S)
C2
71What is similar to the Ancestor?
- Ask the Outgroup on the phylogenetic tree!
- S (UMax(C1,C2))/ Min(C1,C2)
-log(1-S)
C1
C2
U
72What is similar to the Ancestor?
- Ask the Outgroup on the phylogenetic tree!
- S (UMax(C1,C2))/ Min(C1,C2)
- Find set of connections s.t. max 1 way in out
of each syntenic area
-log(1-S1)
-log(1-S2)
73Transforming the plot into a graph
Duplication Inversion Translocation
74Reduction to Matching
- Solve maximum weghted matching for connected
components (colored edges)
75Reduction to Matching
- Solve maximum weghted matching for connected
components (colored edges) - Any edge present in solution joins syntenic
regions giving us ancestral ordering
76Back to the sequence
77Back to the sequence
78How well does it work?
- It takes 20 minutes to churn through mouse/rat
(using human as the outgroup) most of the time
is spent calculating scores
79How well does it work?
- It takes 20 minutes to churn through mouse/rat
(using human as the outgroup) most of the time
is spent calculating scores - The resulting alignment (to human) is visually
better
80How well does it work?
Old New
81How well does it work?
- LP?Solve takes 20 minutes to churn through
mouse/rat (using human as the outgroup) most of
the time is spent calculating scores - The resulting alignment (to human) is visually
better - The new alignments are significantly shorter, but
have higher coverage
82Ancestral Alignment (HMR)
- Used Berkeley Genome Pipeline
- Human genome aligned to mouse rat
- Conservation criteria from Waterston, et al
83Future Work
- Verify accuracy of ancestral reconstruction
- Do we actually get the ancestral sequence, or
something that is easy to align? - Build a multi-alignment of all mammals and flies
- Molecular Evolution
- We have (for the first time) a dating for the
various rearrangement events (they are mapped to
a particular branch of the tree). Does the event
contribute to evolutionary rate? -
84Overall Conclusions
- Sequence comparison shows evolution
- Evolution key to understanding the Human Genome
- Computational Biology is data-driven
- - follow the data
85Acknowledgments
- Stanford
- Serafim Batzoglou
- Arend Sidow
- Gregory Cooper
- Chuong (Tom) Do
- Kerrin Small
- Mukund Sundararajan
Berkeley Gene Myers Inna Dubchak Alexander
Poliakov Göttingen Burkhard Morgenstern
http//lagan.stanford.edu http//pipeline.lbl.gov