Whole Genome Alignment - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Whole Genome Alignment

Description:

Usage Examples. WGA example with nucmer. Yersina pestis CO92 vs. Yersina pestis KIM ... http://mummer.sourceforge.net/examples. walkthroughs. Email. mummer-help ... – PowerPoint PPT presentation

Number of Views:202
Avg rating:3.0/5.0
Slides: 38
Provided by: adammph
Category:

less

Transcript and Presenter's Notes

Title: Whole Genome Alignment


1
Whole Genome Alignment
  • Adam Phillippy
  • Fall 2008

2
Motivation
3
Breast cancer cell lines
4
Goal of WGA
  • For two genomes, A and B, find a mapping from
    each position in A to its corresponding position
    in B

5
Global vs. Local alignments
  • Global pairwise alignment
  • ...AAGCTTGGCTTAGCTGCTAGGGTAGGCTTGGG...
  • ...AAGCTGGGCTTAGTTGCTAG..TAGGCTTTGG...
  • Whole genome alignment

6
Alignment Visualization
7
Global visualization
8
WGA visualization
  • How can we visualize whole genome alignments?
  • With an alignment dot plot
  • N x M matrix
  • Let i position in genome A
  • Let j position in genome B
  • Fill cell (i,j) if Ai shows similarity to Bj
  • A perfect alignment between A and B would
    completely fill the positive diagonal

9
Translocation
Inversion
Insertion
10
Drosophila shuffling
11
(No Transcript)
12
Alignment Algorithms
13
Alignment tools
  • Whole genome alignment
  • MUMmer (nucmer)
  • Developed, supported and available at CBCB
  • LAGAN, AVID
  • VISTA identity plots
  • Multiple genome alignment
  • MGA, MLAGAN, DIALIGN, MAVID
  • Multiple alignment
  • Muscle, ClustalW
  • Local sequence alignment
  • BLAST, FASTA, Vmatch

open source
14
MUMmer
  • Maximal Unique Matcher (MUM)
  • match
  • exact match of a minimum length
  • maximal
  • cannot be extended in either direction without a
    mismatch
  • unique
  • occurs only once in both sequences (MUM)
  • occurs only once in a single sequence (MAM)
  • occurs one or more times in either sequence (MEM)

15
Fee Fi Fo Fum,is it a MAM, MEM or MUM?
MUM maximal unique match
MAM maximal almost-unique match
MEM maximal exact match
R
Q
16
Seed and extend with nucmer
  • How can we make bigger matches?
  • Find MUMs
  • using a suffix tree
  • Cluster MUMs
  • using size, gap and distance parameters
  • Extend clusters
  • using modified Smith-Waterman algorithm

17
Seed and extend
FIND all MUMs
CLUSTER consistent MUMs
EXTEND alignments
R
Q
18
Matching
  • Suffix tree
  • O(n) construction
  • O(n) space
  • O(nm) Longest common substring
  • O(1) Lowest common ancestor
  • O(1) Longest common suffix-prefix
  • O(nk) Find all k maximal repeats
  • O(nmk) Find all k maximal matches

19
Suffix Tree for atgtgtgtc
20
Clustering
cluster length ?mi
gap distance C
indel factor B A / B or B A
21
Extending
R
score 70
Q
22
Banded alignment
0
23
Usage Examples
24
WGA example with nucmer
  • Yersina pestis CO92 vs. Yersina pestis KIM
  • High nucleotide similarity, 99.86
  • Two strains of the same species
  • Extensive genome shuffling
  • Global alignment will not work
  • Highly repetitive
  • Many local alignments

25
COMMANDwhole genome alignment
  • nucmer maxmatch CO92.fasta KIM.fasta
  • -maxmatch Find maximal exact matches (MEMs)
  • delta-filter m out.delta gt out.filter.m
  • -m Many-to-many mapping
  • show-coords -r out.delta.m gt out.coords
  • -r Sort alignments by reference position
  • mummerplot --large --fat out.delta.m
  • --large Large plot
  • --fat Nice layout for multi-fasta files
  • --x11 Default, draw using x11 (--postscript,
    --png)
  • requires gnuplot

26
(No Transcript)
27
COMMANDassembly comparison
  • nucmer maxmatch ASM1.fasta ASM2.fasta
  • -maxmatch Find maximal exact matches (MEMs)
  • ASM1 Multi-fasta file of contigs from assembly 1
  • delta-filter m out.delta gt out.filter.m
  • -m Many-to-many mapping
  • show-coords -rcl out.delta.m gt out.coords
  • -r Sort alignments by reference position
  • -c Alignment coverage as a of contig length
  • -l Length of aligned contig
  • mummerplot --large --fat out.delta.m
  • --large Large plot
  • --fat Nice layout for multi-fasta files
  • --x11 Default, draw using x11 (--postscript,
    --png)
  • requires gnuplot

28
Assembly comparison
9kb insertion
5kb translocation
29
Comparative assembly
  • Assembly
  • Overlap, layout, consensus
  • Orient and place sequencing reads
  • Using overlaps and mate-pair information
  • Scaffolding
  • Order and orient draft contigs
  • Using mate-pair information and experimental
    validation
  • Comparative assembly and scaffolding
  • Map, layout, consensus
  • Orient and place reads and contigs
  • Using a reference genome and alignment mapping
  • AMOScmp

30
Its easier when you know the answer
31
Comparative assembly caveats
Truth
Assembly
A
B
A
B
A
B
32
Human vs. Human
33
(No Transcript)
34
Human vs. human
35
Human indels
36
References
  • Documentation
  • http//mummer.sourceforge.net
  • publication listing
  • http//mummer.sourceforge.net/manual
  • documentation
  • http//mummer.sourceforge.net/examples
  • walkthroughs
  • Email
  • mummer-help_at_lists.sourceforge.net
  • amp_at_umiacs.umd.edu

37
Acknowledgements
Art Delcher
Steven Salzberg
Mihai Pop
Mike Schatz
Stefan Kurtz
Write a Comment
User Comments (0)
About PowerShow.com