Title: Bioinformatics and Computational Molecular Biology
1Bioinformatics and Computational Molecular
Biology Geoff Barton http//www.compbio.dundee.a
c.uk
2Practical Tutorial
- Dr David Martin practical tutorial on the use of
pymol molecular graphics software. - In this lecture I will show lots of protein
structures use www.ebi.ac.uk/msd to find them,
and/or scop domains database (find with google).
3Similarities in Proteins
- Lecture 1
- Overview of data in molecular biology
- Protein modelling
- Similarities of Protein Sequence, Structure,
Function
4Introduction to Sequence Comparison
- Lecture 2
- Why compare sequences?
- Methods for sequence comparison/alignment.
- Multiple alignment
- Database searching - FASTA/BLAST
- Iterative searching - PSI-BLAST
5Practical/WWW references
- Organised by Drs Martin
- Good preparation would be to look at
http//www.ebi.ac.uk/Tools andhttp//www.ncbi.nlm
.nih.gov - Look at BLAST and FASTA on these sites as well as
database access facilities.
6Traditional biological research
Analysis Reading. Talking. Thinking.
Hypothesis!
Private Data Past Experiments. Lab note
books. Group discussions.
Public Data Journals Conferences
Experiment Design. Execution.
Publish!
7Bioinformatics/Computational Biology and
biological research
Analysis Reading. Talking. Thinking. Computational
Analysis Software Development
Private Data Past Experiments. Lab note
books. Group discussions. DNA sequences Protein
Sequences Genetic maps Transcripts 3D
structures proteomics results SNP
data etc etc etc
Public Data Journals Conferences DNA
sequences Protein Sequences Genetic
maps Transcripts 3D structures proteomics
results SNP data etc etc etc
Hypothesis! Computer aided.
Experiment Design. Execution. Computational
experiments Simulation
Publish! Database submission Database management
8EMBL Nucleotide Sequence Database Growth (to 2nd
Oct 2006)
Taken from www.ebi.ac.uk
9Protein Sequences
- Approx 3,500,000 known for all species (Oct.
2006.) - 25,000 for Human (not counting splice variants
and post-translational modifications)
10Protein 3D Structures
- Approx 39,000 known(much duplication)
11Biological data in context
12Overview of Biological Hierarchy...
DNA
RNA
Molecular Levels
Protein Sequence
Protein 3D structure
Molecular function
13Ecosystem many different organisms
Technology and data in biology
Expression Data (Transcriptomics) Which of the
genes are switched on in which cells/tissues
and when? What are the effects of drugs
and disease on expression patterns DNA CHIP
TECHNOLOGY
Population group of the same type of organism
Organelle nucleus, mitochondria, etc...
DNA
Family group with known common lineage
RNA
Nucleus
Protein Sequence
Whole organism animal, plant, etc.
Chromosome
Protein 3D structure
Tissue/organ brain, heart, lungs blood, ...
Gene
Molecular function
Cell nerve,muscle,etc..
14Ecosystem many different organisms
Technology and data in biology
Protein Expression Data (Proteomics) Which
proteins are being produced in which
cells/tissues when? Which modified forms are
present? What are the effects of drugs and
disease on these patterns 2D Gels Mass
Spectrometry.
Population group of the same type of organism
Organelle nucleus, mitochondria, etc...
DNA
Family group with known common lineage
RNA
Nucleus
Protein Sequence
Whole organism animal, plant, etc.
Chromosome
Protein 3D structure
Tissue/organ brain, heart, lungs blood, ...
Gene
Molecular function
Cell nerve,muscle,etc..
15Ecosystem many different organisms
Technology and data in biology
Protein 3D Structure - the bridge to
chemistry (Structural Genomics) What is the
atomic level structure of the protein? What
other molecules does it interact with? What
small molecules - potential drugs - does it
interact with? What are the effects of point
mutations on the structure? X-ray
crystallography, NMR spectroscopy, single
particle, cryo-electron microscopy.
Population group of the same type of organism
Organelle nucleus, mitochondria, etc...
DNA
Family group with known common lineage
RNA
Nucleus
Protein Sequence
Whole organism animal, plant, etc.
Chromosome
Protein 3D structure
Tissue/organ brain, heart, lungs blood, ...
Gene
Molecular function
Cell nerve,muscle,etc..
16Ecosystem many different organisms
Overview of Biological Hierarchy...
Population group of the same type of organism
Organelle nucleus, mitochondria, etc...
DNA
Family group with known common lineage
RNA
Macroscopic Levels
Nucleus
Protein Sequence
Whole organism animal, plant, etc.
Chromosome
Protein 3D structure
Tissue/organ brain, heart, lungs blood, ...
Gene
Molecular function
Cell nerve,muscle,etc..
17Biology is now a data intensive science
- To do good science, you need to know how to use
(and not abuse) computational tools.
18Protein Structure Prediction
- Homology modelling
- Relies on the fact that similarity of sequence
implies similarity of 3D structure.
19?
Lysozyme (1lz1)
a-lactalbumin (1alc)
Imagine we dont know the 3D structure of
a-lactalbumin, but we do know its amino acid
sequence and that of lysozyme
20?
Lysozyme (1lz1)
a-lactalbumin (1alc)
37.7 Identity, Z17.6
21Protein structure prediction(Homology Modelling)
- Align sequence of protein of unknown structure to
sequence of protein of known structure. - In conserved core of protein, substitute the
amino acid types into the known structure. - Deal with loops between the core elements of
structure.
22Lysozyme (1lz1)
a-lactalbumin (1alc)
37.7 Identity, Z17.6
23Protein structure prediction(Homology modelling)
- Problems
- Need protein of known structure that is similar
in sequence. - Building loops where there are deletions.
- Verifying model.
- Key is getting a good alignment in the first
place - Bad alignment gt bad model.
24Good alignment on its own can
- Identify key residues (absolutely conserved)
- Identify likely protein core (conserved
hydrophobic residues) - Help predict protein secondary structure (not
this lecture).
25Sequence alignment is a fundamental technique in
molecular biology.
- May predict proteins of common function even when
no 3D structure is known. - May be used to predict 3D structure and so help
understanding of mutants. - Some examples of where this is right and wrong...
26Prediction of structure and function by
similarity to known sequences and structures
Assumption is that similar sequence implies
similar structure and function. But what do we
mean by similar? Does similarity of sequence
really imply similarity of function?
27Protein Sequence/Structure/Function Network
28Protein Sequence/Structure/Function Network
Sequence
3D Structure
Function
Similar
Similar
Similar
Different
Different
Different
29Similar Sequence, Similar Structure, Similar
Function. e.g. Trypsin-like Serine
Proteinases Same fold, same catalytic
mechanism. But DIFFERENT specificity. e.g.
Immunoglobulin variable domains. Same fold,
similar binding function. But DIFFERENT
specificity. True of all examples. Similarities
only give clues to function, differences in
specificity can be regarded as differences of
function.
30Immunoglobulin Variable Domains
e.g. see 1a2y
31Tryptophan at core of Ig variable domain
32Protein Sequence/Structure/Function Network
Sequence
3D Structure
Function
Similar
Similar
Similar
Different
Different
Different
33Lysozyme (1lz1)
a-lactalbumin (1alc)
37.7 Identity, Z17.6
34e-crystallin/ L-Lactate Dehydrogenase
35Protein Sequence/Structure/Function Network
Sequence
3D Structure
Function
Similar
Similar
Similar
Different
Different
Different
36Trypsin (3ptn)
Subtilisin (2sec)
37Subtilisin (2sec)
Trypsin (3ptn)
38His- 57, Asp-102, Ser-195
Trypsin (3ptn)
Asp- 32, His- 64, Ser-221
Subtilisin (2sec)
39Protein Sequence/Structure/Function Network
Sequence
3D Structure
Function
Similar
Similar
Similar
Different
Different
Different
40Nature 398,84-90, 1999
PDB 1b47
4111 sequence ID rmsd 1.47Ã… over 70 residues
PDB 1b47
42Protein Sequence/Structure/Function Network
Sequence
3D Structure
Function
Similar
Similar
Similar
Different
Different
Different
43PDB 2ptk
PDB 1bia
Russell, R. B. and Barton, G. J. (1993), "An
SH2-SH3 Domain hybrid", Nature, 364, 765.
44PDB1bas
PDB2aai
45Matthews, S., et al. (1994), "The p17 Matrix
Protein from HIV-1 is Structurally Similar to
Interferon-gamma", Nature, 370, 666-668.
46Protein Sequence/Structure/Function Network
Sequence
3D Structure
Function
Similar
Similar
Similar
Different
Different
Different
Does this ever happen?
47HIV Reverse Transcriptase (RT)
48HIV Reverse Transcriptase (RT)
49HIV Reverse Transcriptase (RT) - domain linkers
50Protein Sequence and Structural Similarity
51Protein Sequence and Structural Similarity
52Barton, G. J. et al, (1992), "Human Platelet
Derived Endothelial Cell Growth Factor is
Homologous to E.coli Thymidine Phosphorylase",
Prot. Sci., 1, 688-690.
53Protein Sequence and Structural Similarity
54Barton, G. J., Cohen, P. T. C. and Barford, D.
(1994),"Conservation Analysis and Structure
Prediction of the Protein Serine/Threonine
Phosphatases Sequence Similarity with
Diadenosine Tetra-phosphatase fromE. coli
Suggests Homology to the Protein Phosphatases",
Eur. J. Biochem.,220, 225-237.
55Protein Sequence and Structural Similarity
56Russell, R. B. and Barton, G. J. (1993), "An
SH2-SH3 Domain hybrid", Nature, 364, 765.
57Reading material for this lecture This lecture
itself. pdfs for Barton papers
www.compbio.dundee.ac.uk/ftp/pdf/ Database
statistics http//www.ebi.ac.uk/embl/
Structure of the amino-terminal domain of Cbl
complexed to its binding site on ZAP-70 kinase
Wuyi Meng, Sansana Sawasdikosol, Steven J.
Burakoff, Michael J. EckNature 398, 84 - 90 (04
March 1999)(available on-line at www.nature.com
- search for ZAP-70 kinase - republished in
December on-line)
Protein recognition An SH2 domain in disguise
John Kuriyan, James E. DarnellNature 398, 22 -
25 (04 March 1999) (news and views article for
above paper)
Russell, R. B. and Barton, G. J. (1993), "An
SH2-SH3 Domain hybrid", Nature, 364, 765.
Matthews, S., et al. (1994), "The p17 Matrix
Protein from HIV-1 is Structurally Similar to
Interferon-gamma", Nature, 370, 666-668.
Barton, G. J., Cohen, P. T. C. and Barford, D.
(1994),"Conservation Analysis and Structure
Prediction of the Protein Serine/Threonine
Phosphatases Sequence Similarity with
Diadenosine Tetra-phosphatase fromE. coli
Suggests Homology to the Protein Phosphatases",
Eur. J. Biochem.,220, 225-237.
58The end of Lecture 1
- Lecture 2 will be on sequence comparison methods.