Title: Bioinformatics
1Bioinformatics NSF Summer School 2003 Z.
Luthey-Schulten, UIUC
2Sequence-Sequence Alignment
- Smith-Watermann
- Needleman-Wunsch
Sequence-Structure Alignment
3Sequence Alignment Dynamic Programming
number of possible alignments
Seq. 1 a1 a2 a3 - - a4 a5an Seq. 2 c1 - c2
c3 c4 c5 - cm
Smith-Waterman alignment algorithm
Score Matrix H Traceback
AWGHE AW--HE
4Smith-Waterman Local Alignment Score Matrix
AWGHE AW--HE
5Blosum 40 Substitution Matrix
6Protein Structural Relationships
Can protein structural relationships help us to
understand evolutionary dynamics? Is there a
connection between evolutionary events and
changes in protein structure? What is the
effect of gene duplication, horizontal gene
transfer, and other evolutionary mechanisms on
protein shape?
Substitution
Indel
Domain Insertion
ODonoghue and Luthey-Schulten, UIUC 2003
7Sequence Alignment Dynamic Programming
number of possible alignments
Seq. 1 a1 a2 a3 - - a4 a5an Seq. 2 c1 - c2
c3 c4 c5 - cm
Needleman-Wunsch alignment algorithm
Score Matrix H Traceback
??? Tutorial Wd
8Needleman-Wunsch Global Alignment
Similarity Values
Initialization of Gap Penalties
http//www.dkfz-heidelberg.de/tbi/bioinfo/Practica
lSection/AliApplet/index.html
9Filling out the Score Matrix H
http//www.dkfz-heidelberg.de/tbi/bioinfo/Practica
lSection/AliApplet/index.html
10Traceback and Alignment
The Alignment
Traceback (blue) from optimal score
http//www.dkfz-heidelberg.de/tbi/bioinfo/Practica
lSection/AliApplet/index.html
11Energy Landscape Theory of Structure Prediction
12Protein Structure Prediction
1-D protein sequence
3-D protein structure
Ab Initio protein folding
SISSIRVKSKRIQLG.
Sequence Alignment
SISSRVKSKRIQLGLNQAELAQKV------GTTQ QFANEFKVRRIKL
GYTQTNVGEALAAVHGS
Target protein of unknown structure
Homologous/analogous protein of known structure
Sequence Alignment the Energy Function
E Ematch Egap
Egap
?
Ematch
13Threading Sequence-Structure Alignment
Scaffold structure
Target sequence
threading alignment between target and scaffold
A1
A3
A2
A4
A5
Threading Energy Function
R. Goldstein, Z. Luthey-Schulten, P. Wolynes
(1992, PNAS)
14Gap Penalties
Distribution of Gaps
Insertion
Deletion
Sequence-Structure Gap Energy
target
scaffold
Bulge
R. Goldstein, Z. Luthey-Schulten, P. Wolynes
(1994) Proc 27th Annu Hawaii Int Conf Sys Sci306.
15Similarity Measures
Sequence Identity fraction of identically
matched residues
Q Structural Identity fraction of native
contacts
16lt?Es/?Egt
2?
?Es
17Homology Modeling - Threading
18Results from CASP5 CM/FR
The prediction is never better than the
scaffold. Threading Energy function requires
improvement.
19You are now entering the twilight zone of
sequence identity. We need profiles!
Watch for Bioinformants!!!
20Profiles Evolution Revisited
- What molecular sequences taught us in the 1960s
was that the genealogical history of an organism
is written to one extent or another into the
sequences of each of its genes, an insight that
became the central tenet of a new discipline,
molecular evolution - Woese (PNAS, 2000)
Pauling (1965)
21Universal Tree
The Universal Phylogenetic Tree inferred from
comparative analyses of rRNA sequences
Woese(PNAS, 1990)
22Horizontal Gene Transfer
ODonoghue and Luthey-Schulten, UIUC 2003
23Multiple Sequence Alignments
- The aminoacyl-tRNA synthetases, perhaps better
than any other molecules in the cell, eptiomize
the current situation and help to under standard
(the effects) of HGT Woese (PNAS, 2000 MMBR
2000)
24Standard Dogma Molecular Biology
- DNA RNA
Proteins - Role of AARS?
- Charging of t-RNA
25NCBI 3D
26LeuRS Canonical Tree
Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll
(Yale) Micro. Mol. Biol. Rev. March 2000..
27D,N Sequence Phylogenetic Trees
Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll
(Yale) Micro. Mol. Biol. Rev. March 2000..
28Fold Motifs of AARSs
ODonoghue and Luthey-Schulten, UIUC 2003
29Structure Conserved More than Sequence
Structural Overlap of Class II AARS
Conserved helices
Conserved sheets
30Subset of Class II Structural Tree
ODonoghue and Luthey-Schulten, UIUC 2003
31(No Transcript)
32Novel Evolutionary Connections from Sequence and
Structure
Canonical Pattern D E F L W Y
Canonical Pattern A B I H P M
Gemini K1 K2 C S G N Q
Basal Canonical V T A R
B
A
No canonical pattern Horizontal transfer
after B-AE split.
ODonoghue, Luthey-Schulten, UIUC 2003
Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll
(Yale) Micro. Mol. Biol. Rev. March 2000..
33Gap Distribution Functions
Spatial Gap Distribution Funciton
Length Gap Distribution Function
l, gap length (residues)
rij, spatial gap distance (Å)
B. Qian R. Goldstein. (2001) Proteins 45102.
34Structural Alignment Methods
- PDB - Structural Neighbors CE (Bourne)
- Stamp - Russell
35Multiple Structural Alignments
- STAMP
- Initial Alignment
- Multiple Sequence alignment
- Ridged Body Scan
- Refine Initial Alignment Produce Multiple
Structural Alignment
- Dynamic Programming (Smith-Waterman) through P
matrix gives optimal set of equivalent residues. - This set is used to re-superpose the two chains.
Then iterate until alignment score is unchanged. - This procedure is performed for all pairs.
R. Russell, G. Barton (1992) Proteins 14 309.
36Multiple Structural Alignments
- STAMP contd
- Refine Initial Alignment Produce Multiple
Structural Alignment
Alignment score
- Multiple Alignment
- Create a dendrogram using the alignment score.
- Successively align groups of proteins (from
branch tips to root). - When 2 or more sequences are in a group,
- then average coordinates are used.
37Stamp Output/Secondary Structure
ODonoghue and Luthey-Schulten, UIUC 2003
38Stamp Output/Clustal Format
ODonoghue and Luthey-Schulten, UIUC 2003
39Examples of Useful Web Tools
- Genomes Sequence and Gene Information
- Domain Architecture
- Multiple Sequence Alignments
- Phylogenetic Trees
- Structural Databases
- Hidden Markov Methods
40NCBI Genomes
41Charging the tRNA
Woese, Olsen (UIUC), Ibba (Panum Inst.), Soll
(Yale) Micro. Mol. Biol. Rev. March 2000..
42NCBI 3D
43Report from SWISS-PROT
44PFAM Report
45(No Transcript)
46Sequence Dendrogram from Clustal
Luthey-Schulten, UIUC 2003
47Phylogenetic Tree in Tutorial
Pogorelov and Luthey-Schulten, UIUC 2003
48(No Transcript)
49Alignment in MOE
50Alignment in MOE
51Transmembrane Proteins - HMM
Example Bacteriorhodpsin Anurag Sethi UIUC
52Stamp Profile
Sethi and Luthey-Schulten, UIUC 2003
53(No Transcript)
54HMMer Profile-Profile Alignment
Sethi and Luthey-Schulten, UIUC 2003
55Clustal Profile-Profile Alignment
Sethi and Luthey-Schulten, 2003
56Structure Prediction Modeller 6.2/Hmmer
Sethi and Luthey-Schulten, UIUC 2003
Modeller 6.2 A. Sali, et al.
57Acknowledgements
- Felix Autenrieth
- Barry Isralewitz
- Patrick ODonoghue
- Taras Pogorelov
- Anurag Sethi