Title: Exon prediction by Genomic Sequence alignment Burkhard Mo
1 Burkhard Morgenstern Institut für Mikrobiologie
und Genetik Grundlagen der Bioinformatik Multipl
es Sequenzalignment Juni 2007
2Progressive Alignment
- Most popular approach to (global) multiple
sequence alignment - Progressive Alignment
- Since mid-Eighties Feng/Doolittle,
Higgins/Sharp, Taylor,
3Progressive Alignment
- WCEAQTKNGQGWVPSNYITPVN
- WWRLNDKEGYVPRNLLGLYP
- AVVIQDNSDIKVVPKAKIIRD
- YAVESEAHPGSFQPVAALERIN
- WLNYNETTGERGDFPGTYVEYIGRKKISP
4Progressive Alignment
- WCEAQTKNGQGWVPSNYITPVN
- WWRLNDKEGYVPRNLLGLYP
- AVVIQDNSDIKVVPKAKIIRD
- YAVESEAHPGSFQPVAALERIN
- WLNYNETTGERGDFPGTYVEYIGRKKISP
- Guide tree
5Progressive Alignment
- WCEAQTKNGQGWVPSNYITPVN
- WW--RLNDKEGYVPRNLLGLYP-
- AVVIQDNSDIKVVP--KAKIIRD
- YAVESEASFQPVAALERIN
- WLNYNEERGDFPGTYVEYIGRKKISP
- Profile alignment, once a gap - always a gap
6Progressive Alignment
- WCEAQTKNGQGWVPSNYITPVN
- WW--RLNDKEGYVPRNLLGLYP-
- AVVIQDNSDIKVVP--KAKIIRD
- YAVESEASVQ--PVAALERIN------
- WLN-YNEERGDFPGTYVEYIGRKKISP
- Profile alignment, once a gap - always a gap
7Progressive Alignment
- WCEAQTKNGQGWVPSNYITPVN-
- WW--RLNDKEGYVPRNLLGLYP-
- AVVIQDNSDIKVVP--KAKIIRD
- YAVESEASVQ--PVAALERIN------
- WLN-YNEERGDFPGTYVEYIGRKKISP
- Profile alignment, once a gap - always a gap
8Progressive Alignment
- WCEAQTKNGQGWVPSNYITPVN--------
- WW--RLNDKEGYVPRNLLGLYP--------
- AVVIQDNSDIKVVP--KAKIIRD-------
- YAVESEA---SVQ--PVAALERIN------
- WLN-YNE---ERGDFPGTYVEYIGRKKISP
- Profile alignment, once a gap - always a gap
9Progressive Alignment
- WCEAQTKNGQGWVPSNYITPVN--------
- WW--RLNDKEGYVPRNLLGLYP--------
- AVVIQDNSDIKVVP--KAKIIRD-------
- YAVESEA---SVQ--PVAALERIN------
- WLN-YNE---ERGDFPGTYVEYIGRKKISP
- Most important implementation CLUSTAL W
10Progressive Alignment
- CLUSTAL W Thompson et al., 1994 (17.000
citations) - Pairwise distances as 1 - percentage of identity
- Calculate un-rooted tree with Neighbor Joining
- Define root as central position in tree
- Define sequence weights based on tree
- Gap penalties calculated based on various
parameters
11Tools for multiple sequence alignment
- Problems with traditional approach
- Results depend on gap penalty
- Heuristic guide tree determines alignment
alignment used for phylogeny reconstruction - Algorithm produces global alignments.
12Tools for multiple sequence alignment
- Problems with traditional approach
- But
- Many sequence families share only local
similarity - E.g. sequences share one conserved motif
13Local sequence alignment
EYENS
ERYENS
ERYAS
Find common motif in sequences ignore the rest
14Local sequence alignment
E-YENS
ERYENS
ERYA-S
Find common motif in sequences ignore the rest
15Local sequence alignment
E-YENS
ERYENS
ERYA-S
Find common motif in sequences ignore the rest
Local alignment
16Local sequence alignment
Traditional alignment approaches Either global
or local methods!
17New question sequence families with multiple
local similarities
Neither local nor global methods appliccable
18New question sequence families with multiple
local similarities
Alignment possible if order conserved
19The DIALIGN approach
The DIALIGN approach
20The DIALIGN approach
The DIALIGN approach
21The DIALIGN approach
The DIALIGN approach
22The DIALIGN approach
The DIALIGN approach
23The DIALIGN approach
The DIALIGN approach
24The DIALIGN approach
The DIALIGN approach
25The DIALIGN approach
The DIALIGN approach
26The DIALIGN approach
The DIALIGN approach
27The DIALIGN approach
The DIALIGN approach
28The DIALIGN approach
The DIALIGN approach
29The DIALIGN approach
The DIALIGN approach
30The DIALIGN approach
The DIALIGN approach
Consistency!
31The DIALIGN approach
The DIALIGN approach
32The DIALIGN approach
The DIALIGN approach
33The DIALIGN approach
The DIALIGN approach
34The DIALIGN approach
The DIALIGN approach
35The DIALIGN approach
The DIALIGN approach
36The DIALIGN approach
The DIALIGN approach
37The DIALIGN approach
The DIALIGN approach
38The DIALIGN approach
The DIALIGN approach
39The DIALIGN approach
The DIALIGN approach
40The DIALIGN approach
The DIALIGN approach
41The DIALIGN approach
The DIALIGN approach
42The DIALIGN approach
The DIALIGN approach
43The DIALIGN approach
The DIALIGN approach
44The DIALIGN approach
The DIALIGN approach
45The DIALIGN approach
The DIALIGN approach
46The DIALIGN approach
The DIALIGN approach
47The DIALIGN approach
The DIALIGN approach
48The DIALIGN approach
The DIALIGN approach
49 T-COFFEE
C. Notredame, D. Higgins, J. Heringa (2000),
T-Coffee A novel algorithm for multiple sequence
alignment, J. Mol. Biol.
Problem progressive alignment can go wrong if
mistakes are made at an early stage.
Example
50 T-COFFEE
SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD
THE FAST CAT SeqC GARFIELD THE VERY FAST
CAT SeqD THE FAT CAT
51 T-COFFEE
SeqA GARFIELD THE LAST FAT CAT SeqB GARFIELD
THE FAST CAT SeqC GARFIELD THE VERY FAST
CAT SeqD THE FAT CAT
52T-COFFEE
53 T-COFFEE
- Idea consider different pairwise alignments
(local and global) - check how these alignments support each other
-
54T-COFFEE
55T-COFFEE
56 T-COFFEE
-
- T-COFFEE
- Less sensitive to spurious pairwise similarities
- Can handle local homologies better than CLUSTAL
57 Evaluation of multi-alignment methods
- Alignment evaluation by comparison to trusted
benchmark alignments. - True alignment known by information about
structure or evolution. -
58 Evaluation of multi-alignment methods
- For protein alignment
- M. McClure et al. (1994)
- 4 protein families, known functional sites
- J. Thompson et al. (1999)
- Benchmark data base, 130 known 3D structures
(BAliBASE) - T. Lassmann E. Sonnhammer (2002)
BAliBASE simulated evolution (ROSE)
59 Evaluation of multi-alignment methods
60 Evaluation of multi-alignment methods
- Alignment evaluation by comparison to trusted
benchmark alignments. - True alignment known by information about
structure or evolution. -
61 Evaluation of multi-alignment methods
62 Evaluation of multi-alignment methods
1aboA 1 .NLFVALYDfvasgdntlsitkGEKLRVLgynhn
..............gE 1ycsB 1
kGVIYALWDyepqnddelpmkeGDCMTIIhrede............deiE
1pht 1 gYQYRALYDykkereedidlhlGDILTVNkgs
lvalgfsdgqearpeeiG 1ihvA 1
.NFRVYYRDsrd......pvwkGPAKLLWkg.................eG
1vie 1 .drvrkksga.........awqGQIVGWYctn
lt.............peG 1aboA 36
WCEAQt..kngqGWVPSNYITPVN...... 1ycsB 39
WWWARl..ndkeGYVPRNLLGLYP...... 1pht 51
WLNGYnettgerGDFPGTYVEYIGrkkisp 1ihvA 27
AVVIQd..nsdiKVVPRRKAKIIRd..... 1vie 28
YAVESeahpgsvQIYPVAALERIN...... Key alpha
helix RED beta strand GREEN core blocks
UNDERSCORE
BAliBASE Reference alignments
63 Evaluation of multi-alignment methods
- 5 categories of benchmark sequences (globally
related, internal gaps, end gaps) - CLUSTAL W, RPPR perform well on globally related
sequences, DIALIGN superior for local
similarities - Conclusion no single best multi alignment
program!
64 Evaluation of multi-alignment methods
- T. Lassmann E. Sonnhammer (2002)
BAliBASE simulated evolution (ROSE)
65 66 Result DIALIGN best for distantly related
sequences, TCOFFEE best for closely related
sequences