Title:
1Bioinformatics
- Nothing in Biology makes sense except in the
light of evolution (Theodosius Dobzhansky
(1900-1975)) - Nothing in bioinformatics makes sense except in
the light of Biology
2Evolution
- Three requirements
- Template structure providing stability (DNA)
- Copying mechanism (meiosis)
- Mechanism providing variation (mutations
insertions and deletions crossing-over etc.)
3Evolution
- Ancestral sequence ABCD
-
- ACCD (B C)
ABD (C ø) -
- ACCD or ACCD
Pairwise Alignment - AB-D A-BD
-
mutation deletion
4Evolution
- Ancestral sequence ABCD
-
- ACCD (B C)
ABD (C ø) - ACCD or ACCD
Pairwise Alignment - AB-D A-BD
-
mutation deletion
true alignment
5Example Pairwise sequence alignment needs sense
of evolution Global dynamic programming
MDAGSTVILCFVG
Evolution
M D A A S T I L C G S
Amino Acid Exchange Matrix
Search matrix
MDAGSTVILCFVG-
Gap penalties (open,extension)
MDAAST-ILC--GS
6Sequence alignmentHistory
1970 Needleman-Wunsch global pair-wise
alignment 1981 Smith-Waterman local pair- wise
alignment 1984 Hogeweg-Hesper progressive
multiple alignment 1989 Lipman-Altschul-Kececiog
lu simultaneous multiple alignment 1994 Hidden
Markov Models (HMM) for multiple alignment 1996
Iterative strategies for progressive multiple
alignment revived 1997 PSI-Blast (PSSM)
7Pair-wise alignment
T D W V T A L K T D W L - - I K
Combinatorial explosion - 1 gap in 1 sequence
n1 possibilities - 2 gaps in 1 sequence (n1)n
- 3 gaps in 1 sequence (n1)n(n-1), etc.
2n (2n)! 22n
n (n!)2
??n 2 sequences of 300 a.a. 1088
alignments 2 sequences of 1000 a.a. 10600
alignments!
8A protein sequence alignment MSTGAVLIY--TSILIKECHA
MPAGNE----- ---GGILLFHRTHELIKESHAMANDEGGSNNS A
DNA sequence alignment attcgttggcaaatcgcccctatccgg
ccttaa attt---ggcggatcg-cctctacgggcc----
9Dynamic programmingScoring alignments
Sa,b gp(k) pi k?pe affine gap
penalties pi and pe are the penalties for gap
initialisation and extension, respectively
10Dynamic programmingScoring alignments
T D W V T A L K T D W L - - I K
20?20
10
1
Affine gap penalties (open, extension)
Amino Acid Exchange Matrix
Score s(T,T)s(D,D)s(W,W)s(V,L)Po2Px
s(L,I)s(K,K)
11Amino acid exchange matrices
20?20
How do we get one? And how do we get associated
gap penalties? First systematic method to derive
a.a. exchange matrices by Margaret Dayhoff et al.
(1978) Atlas of Protein Structure
12A 2 R -2 6 N 0 0 2 D 0 -1 2 4 C -2 -4 -4
-5 12 Q 0 1 1 2 -5 4 E 0 -1 1 3 -5 2
4 G 1 -3 0 1 -3 -1 0 5 H -1 2 2 1 -3 3
1 -2 6 I -1 -2 -2 -2 -2 -2 -2 -3 -2 5 L -2 -3
-3 -4 -6 -2 -3 -4 -2 2 6 K -1 3 1 0 -5 1 0
-2 0 -2 -3 5 M -1 0 -2 -3 -5 -1 -2 -3 -2 2 4
0 6 F -4 -4 -4 -6 -4 -5 -5 -5 -2 1 2 -5 0
9 P 1 0 -1 -1 -3 0 -1 -1 0 -2 -3 -1 -2 -5
6 S 1 0 1 0 0 -1 0 1 -1 -1 -3 0 -2 -3 1
2 T 1 -1 0 0 -2 -1 0 0 -1 0 -2 0 -1 -3 0
1 3 W -6 2 -4 -7 -8 -5 -7 -7 -3 -5 -2 -3 -4 0
-6 -2 -5 17 Y -3 -4 -2 -4 0 -4 -4 -5 0 -1 -1 -4
-2 7 -5 -3 -3 0 10 V 0 -2 -2 -2 -2 -2 -2 -1 -2
4 2 -2 2 -1 -1 -1 0 -6 -2 4 B 0 -1 2 3 -4
1 2 0 1 -2 -3 1 -2 -5 -1 0 0 -5 -3 -2 2 Z
0 0 1 3 -5 3 3 -1 2 -2 -3 0 -2 -5 0 0
-1 -6 -4 -2 2 3 A R N D C Q E G H I
L K M F P S T W Y V B Z
PAM250 matrix amino acid exchange matrix (log
odds)
Positive exchange values denote mutations that
are more likely than randomly expected, while
negative numbers correspond to avoided mutations
compared to the randomly expected situation
13Pairwise sequence alignment Global dynamic
programming
MDAGSTVILCFVG
Evolution
M D A A S T I L C G S
Amino Acid Exchange Matrix
Search matrix
Gap penalties (open,extension)
MDAGSTVILCFVG-
MDAAST-ILC--GS
14Global dynamic programming
j-1
i-1
MaxS0ltxlti-1, j-1 - Pi - (i-x-1)Px Si-1,j-1 MaxS
i-1, 0ltyltj-1 - Pi - (j-y-1)Px
Si,j si,j Max
15Global dynamic programming
16Global dynamic programming
17Pairwise alignment
- Global alignment all gaps are penalised
- Semi-global alignment N- and C-terminal gaps
(end-gaps) are not penalised - MSTGAVLIY--TS-----
- ---GGILLFHRTSGTSNS
End-gaps
End-gaps
18Local dynamic programming (Smith Waterman,
1981)
LCFVMLAGSTVIVGTR
E D A S T I L C G S
Negative numbers
Amino Acid Exchange Matrix
Search matrix
Gap penalties (open, extension)
AGSTVIVG A-STILCG
19 Local dynamic programming (Smith Waterman,
1981)
j-1
i-1
Si,j MaxS0ltxlti-1,j-1 - Pi - (i-x-1)Px Si,j
Si-1,j-1 Si,j Max Si-1,0ltyltj-1 - Pi -
(j-y-1)Px 0
Si,j Max
20Local dynamic programming
21Dot plots
- Way of representing (visualising) sequence
similarity without doing dynamic programming (DP) - Make same matrix, but locally represent sequence
similarity by averaging using a window - See Lesks book pp. 167-171
22Comparing two sequences We want to be able to
choose the best alignment between two
sequences. A simple method of finding
similarities between two sequences is to use dot
plots. The first sequence to be compared is
assigned to the horizontal axis and the second is
assigned to the vertical axis.
23Dot plots can be filtered by window approaches
(to calculate running averages) and applying a
threshold They can identify insertions,
deletions, inversions