Whole Genome Alignment - PowerPoint PPT Presentation

1 / 18

About This Presentation

Title:

Whole Genome Alignment

Description:

standard alignment: point mutations, insertions, deletions ... inserts ... simple insertions: trivial to detect. Step 3: Close the Gaps. polymorphic regions ... – PowerPoint PPT presentation

Number of Views:178

Avg rating:3.0/5.0

Slides: 19

Provided by: MarkC120

Category:

Tags: alignment | genome | insertions | whole

Transcript and Presenter's Notes

Title: Whole Genome Alignment

1
Whole Genome Alignment

BMI/CS 776
www.biostat.wisc.edu/craven/776.html
Mark Craven
craven_at_biostat.wisc.edu
February 2002

2
Announcements

talk of interest today Divergence Time and
Evolutionary Rate Estimation with Multilocus Data
Jeffrey Thorne, North Carolina State University
400pm, 1221 Computer Sciences
guest lectures next week
Prof. Christina Kendziorski on quantitative trait
loci (QTL) mapping
Prof. Rich Maclin on keyphrase extraction to
annotate high-throughput experiments
reading for the week of 2/25 Chapter 3 of Durbin
et al.

3
Whole Genome AlignmentTask Definition

Given
a pair of genomes (or other very large scale
sequences)
a method for scoring the similarity of a pair of
characters
Do
construct global alignment identify matches
between genomes as well as various non-match
features

4
E. Coli Whole Genome Alignment
Perna et al., Nature 2001
5
Why Not Use Standard DP Methods?

size of sequences being compared
memory, run-time issues
features accounted for
standard alignment point mutations, insertions,
deletions
whole genome alignment also transpositions,
differences in tandem repeats, etc.

6
The MUMmer System

Delcher et al., Nucleic Acids Research, 1999
given genomes A and B
find all maximal, unique, matching subsequences
(MUMs)
extract the longest possible set of matches that
occur in the same order in both genomes
close the gaps
output the alignment

7
Features Identified by MUMmer

single nucleotide polymorphisms (SNPs)
regions of divergence gt 1 SNP
large inserts
repeats
tandem repeats two or more adjacent, approximate
copies of a DNA pattern

8
Step 1 MUM Decomposition

maximal unique match (MUM)
occurs exactly once in both genomes A and B
not contained in any longer MUM

mismatches

key insight a significantly long MUM is certain
to be part of the global alignment

9
Suffix Trees

the key idea in identifying MUMs is to build a
suffix tree for genomes A and B

each internal node represents a repeated sequence
Figure from Delcher et al. Nucleic Acids
Research 27, 1999
10
MUMs and Suffix Trees

add suffixes for both genomes A and B to tree
label each leaf node with genome it represents

Genome A ccacg
Genome B cct
t
acg
c
g
B, 3
A, 3
A, 5
acg
t
c
g
A, 2
A, 4
B, 2
acg
t
A, 1
B, 1
11
MUMs and Suffix Trees

a unique match internal node with 2 children
leaf nodes from different genomes
but these matches are not necessarily maximal

Genome A ccacg
Genome B cct
t
acg
c
g
B, 3
A, 3
A, 5
acg
t
c
g
A, 2
A, 4
B, 2
acg
t
represents unique match
A, 1
B, 1
12
MUMs and Suffix Trees

to identify maximal matches, can compare suffixes
following unique match nodes

Genome A acat
Genome B acaa
the suffixes following these two match nodes are
the same
13
Suffix Trees

can build in linear time (in lengths of genomes)
can identify all MUMs in linear time (one scan
of tree)
space complexity is linear (exactly one leaf and
at most one internal node for each base)
main parameter of system length of shortest MUM
that should be identified (20 - 50bp here)

14
Step 2 Find Longest Subsequence

sort MUMs according to position in genome A
solve variation of Longest Increasing Subsequence
(LIS) problem to find sequences in ascending
order in both genomes

Figure from Delcher et al. Nucleic Acids
Research 27, 1999
15
Finding Longest Subsequence

unlike ordinary LIS problems, MUMmer takes into
account
lengths of sequences represented by MUMs
overlaps
requires time where k is number
of MUMs

16
Types of Gaps in a MUM Alignment
Figure from Delcher et al. Nucleic Acids
Research 27, 1999
17
Step 3 Close the Gaps

SNPs
between MUMs trivial to detect
otherwise handle like repeats
inserts
transpositions (subsequences that were deleted
from one location and inserted elsewhere) look
for out-of-sequence MUMs
simple insertions trivial to detect

18
Step 3 Close the Gaps

polymorphic regions
short ones align them with dynamic programming
method
long ones call MUMmer recursively w/ reduced min
MUM length
repeats
detected by overlapping MUMs

Figure from Delcher et al. Nucleic Acids
Research 27, 1999

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Assembling and Annotating the Draft Human Genome PowerPoint PPT Presentation

Assembling and Annotating the Draft Human Genome - Assembling and Annotating the Draft Human Genome | PowerPoint PPT presentation | free to view

Finding regulatory modules from local alignment PowerPoint PPT Presentation

Finding regulatory modules from local alignment - Finding regulatory modules from local alignment. Department of ... We compared human genes to orthologs in mouse, rat, chicken, fugu, tetraodon and zebrafish ... | PowerPoint PPT presentation | free to view

The UCSC Genome Browser PowerPoint PPT Presentation

The UCSC Genome Browser - The UCSC Genome Browser. From Men to Mice ... Mouse/Human Synteny. Track Options & Filters ... 1,000,000 BLASTZ jobs in 25 hours for mouse/human alignment ... | PowerPoint PPT presentation | free to view

The Shocking Details of Genome.ucsc.edu PowerPoint PPT Presentation

The Shocking Details of Genome.ucsc.edu - Started in 1999 in C after Java proved hopelessly unportable across browsers. Early modules include a Worm genome browser (Intronerator), and GigAssembler ... | PowerPoint PPT presentation | free to view

Pair-wise and Multiple Sequence Alignment Using Dynamic Programming (Local PowerPoint PPT Presentation

Pair-wise and Multiple Sequence Alignment Using Dynamic Programming (Local - Pair-wise and Multiple Sequence Alignment Using Dynamic Programming (Local & Global Alignment) ... of Two Sequences (Pair-wise Alignment) The Scoring Schemes or ... | PowerPoint PPT presentation | free to view

Whole Genome Alignment PowerPoint PPT Presentation

Whole Genome Alignment - Usage Examples. WGA example with nucmer. Yersina pestis CO92 vs. Yersina pestis KIM ... http://mummer.sourceforge.net/examples. walkthroughs. Email. mummer-help ... | PowerPoint PPT presentation | free to view

Whole Genome Assembly Microarray analysis PowerPoint PPT Presentation

Whole Genome Assembly Microarray analysis - Human, Mouse, Rat, Dog, Chimpanzee.. Many Prokaryotes (One can be sequenced in a day) ... DNA signals. Gene Finding. Assembly. Other static analysis is possible ... | PowerPoint PPT presentation | free to view

Alignment of Whole Genomes: Algorithms PowerPoint PPT Presentation

Alignment of Whole Genomes: Algorithms - nearly 200 complete genomes have been sequenced ... Penalties are affine (event and distance. components) Penalties: regular. translocation ... | PowerPoint PPT presentation | free to view

Whole Genome Alignment WGA PowerPoint PPT Presentation

Whole Genome Alignment WGA - When the genomic DNA sequences of closely related organisms become ... other implementations: QUASAR, REPuter. 4. Lecture WS 2003/04. Bioinformatics III. 16 ... | PowerPoint PPT presentation | free to view

Whole Genome Alignment PowerPoint PPT Presentation

Whole Genome Alignment - Two nucleotides are said to be homologous if they are descendants of a common ... in one sequence is either paired with a residue in the other one or a dash. ... | PowerPoint PPT presentation | free to view

Whole Genome Alignment PowerPoint PPT Presentation

Whole Genome Alignment - A perfect alignment between A and B would completely fill the positive diagonal. B ... Global pairwise alignment ...AAGCTTGGCTTAGCTGCTAGGGTAGGCTTGGG... | PowerPoint PPT presentation | free to view

An efficient algorithm for optimizing whole genome alignment with noise PowerPoint PPT Presentation

An efficient algorithm for optimizing whole genome alignment with noise - P. Wong, T. Lam, N. Lu, H. Ting, and S. Yiu ... Genes that have the same functionality among species. ... We investigate the optimization problem of finding an ... | PowerPoint PPT presentation | free to view

Whole Genome Assembly PowerPoint PPT Presentation

Whole Genome Assembly - shotgun sequencing the one discussed here. The results were almost identical ... Bacteriophage lambda (virus), 50,000. Escherichia Coli (bacterium), 5,000,000 ... | PowerPoint PPT presentation | free to view

Locating conserved genes in whole genome scale PowerPoint PPT Presentation

Locating conserved genes in whole genome scale - ... outperforms MaxMinCluster and MUMmer-3 on closely related ... can apply either MUMmer-3 or MaxMinCluster. these clusters are treated as MUM with bigger weight ... | PowerPoint PPT presentation | free to view

Whole Genome Phylogenetic Analysis PowerPoint PPT Presentation

Whole Genome Phylogenetic Analysis - Whole Genome Phylogenetic Analysis ... Conclusion * DNA versus AA Sequence There are more k ... Neighbor-Joining program in PHYLIP We visualized ... | PowerPoint PPT presentation | free to view

Whole-genome comparative genomics PowerPoint PPT Presentation

Whole-genome comparative genomics - 6.095/6.895 - Computational Biology: Genomes, Networks, Evolution Whole-genome comparative genomics Analyzing the human genome Lecture 21 Dec 6, 2005 | PowerPoint PPT presentation | free to view

Alignment of whole genomes using suffix trees PowerPoint PPT Presentation

Alignment of whole genomes using suffix trees - Alignment of whole genomes using suffix trees Mahshid Shakiba Nov 17, 2004 IFT 6299, University of Montreal Outline Motivation MUMmer Algorithms Observations MUMmer 2 ... | PowerPoint PPT presentation | free to view

Whole-Genome Prokaryote Phylogeny without Sequence Alignment PowerPoint PPT Presentation

Whole-Genome Prokaryote Phylogeny without Sequence Alignment - Title: PowerPoint Author: Bailin Hao Last modified by: Hao Bailin Created Date: 7/14/2003 12:57:44 PM Document presentation format | PowerPoint PPT presentation | free to view

Sequence Alignment Algorithms PowerPoint PPT Presentation

Sequence Alignment Algorithms - Title: Introduction to C++ Software evolution Author: Physics Last modified by: partha Created Date: 8/31/2000 7:11:56 AM Document presentation format | PowerPoint PPT presentation | free to view

Whole Genome Assembly Microarray analysis PowerPoint PPT Presentation

Whole Genome Assembly Microarray analysis - Whole Genome Assembly Microarray analysis | PowerPoint PPT presentation | free to view

Interpreting the human genome PowerPoint PPT Presentation

Interpreting the human genome - Interpreting the human genome Manolis Kellis CSAIL MIT Computer Science and Artificial Intelligence Lab Broad Institute of MIT and Harvard for Genomics in Medicine | PowerPoint PPT presentation | free to view

Interpreting the human genome PowerPoint PPT Presentation

Interpreting the human genome - Interpreting the human genome Manolis Kellis CSAIL MIT Computer Science and Artificial Intelligence Lab Broad Institute of MIT and Harvard for Genomics in Medicine | PowerPoint PPT presentation | free to view

Algorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation PowerPoint PPT Presentation

Algorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation - Algorithms for Ultra-large Multiple Sequence Alignment and Phylogeny Estimation Tandy Warnow Department of Computer Science The University of Texas at Austin | PowerPoint PPT presentation | free to view

????????? ??? Human genome project and Computer science PowerPoint PPT Presentation

????????? ??? Human genome project and Computer science - Title: Human genome project and Computer science Author: Hyung-Yong Kim Last modified by: Hyungyong Kim Created Date | PowerPoint PPT presentation | free to view

The Zebrafish Genome Sequencing Project Bioinformatics resources PowerPoint PPT Presentation

The Zebrafish Genome Sequencing Project Bioinformatics resources - Kerstin Howe, Mario Caccamo, Ian Sealy The Zebrafish Genome Sequencing Project Bioinformatics resources mis-joins and other complications change of strategy from Zv2 ... | PowerPoint PPT presentation | free to view

Sequencing, Sequence Alignment PowerPoint PPT Presentation

Sequencing, Sequence Alignment - Sequencing, Sequence Alignment & Software Lushan Wang, Shandong University | PowerPoint PPT presentation | free to view

WHOLE GENOME SEQUENCING PowerPoint PPT Presentation

WHOLE GENOME SEQUENCING - Whole genome sequencing is a method to determine the order of bases in the genome of an organism in one process. | PowerPoint PPT presentation | free to view