Title: Lecture 6.3: From DNA to Protein
1Lecture 6.3 From DNA to Protein
- Dr. Joanne Fox
- Day 6 Saturday February 21st, 2004
- 1345 1515pm
2From DNA to Protein
3Objectives
- Review protein sequence features and databases
- Review the structural diversity of amino acids
and protein sequences - Highlight several physiochemical and structural
features which can be calculated from protein
sequences - Show how proteomics utilizes methods and
techniques for measuring, comparing and assessing
protein features
4Outline
- Protein sequence features
- Databases of protein sequences
- Basics of protein structure
- 1o structure, prediction of Mw and pI
- 2o structure, prediction methods
- 3o structure, methods for predicting folds
- Proteomics
- Current methods
- Cutting edge technology
5Amino Acids
amino group
alpha carbon
- The general formula for an amino acid
- R is commonly one of 20 different side chains
- At pH 7 both the amino and carboxyl groups are
ionized
carboxyl group
side chain group
6Peptide Bonds
- Amino acids are joined together by an amide
linkage called a peptide bond. - The two bonds on either side of the rigid planar
peptide unit exhibit a high degree rotation
peptide bonds
rotation occurs here
7Families of Amino Acids
- The common amino acids are grouped according to
whether their side chains are - acidic D, E
- basic K, R, H
- uncharged polar N, Q, S, T, Y
- nonpolar G, A, V, L, I, P, F, M, W, C
- Hydrophilic amino acids (uncharged polar) are
usually on the outside of a protein whereas
nonpolar residues cluster on the inside of
protein - Basic or acidic amino acids are very polar and
are generally found on the outside of protein
molecules
8Protein Sequence Features
- Proteins exhibit far more sequence and chemical
complexity than DNA or RNA - Properties and structure are defined by the
sequence and side chains of their constituent
amino acids - The engines of life
- gt95 of all drugs target proteins
- Favorite topic of post-genomic era
9Protein Sequence Databases
- Where does protein sequence information reside?
- Entrez Cross Database Search
- http//www.ncbi.nlm.nih.gov/gquery/gquery.fcgi
- Swissprot TrEMBL
- http//ca.expasy.org/sprot/
- PIR
- http//pir.georgetown.edu/
- As of December 2003, all of this information is
integrated into unified protein database called
Uniprot. - Uniprot
- http//www.pir.uniprot.org/
10Entrez Cross Database Search
- Protein sequence database gives access to
translated protein sequences from
Genbank/EMBL/DDBJ - Complete set of deduced protein sequences
- Redundancy problem
11Swissprot TrEMBL
- Swissprot is an expert curated database
- Function, domain structure, post-translational
modifications, variants, reactions, similarities - TrEMBL (translated EMBL)
- Computer annotated supplement to Swissprot
12PIR Protein Information Resource
- Annotated database which includes protein family
classification information
13The Uniprot Knowledgebase
- Contains all of the information in Swiss-Prot,
TrEMBL, and PIR. This new unified database was
launched in December 2003.
14Basics of Protein Structure
- Primary
- Secondary
- Tertiary
15Molecular Weight
- Quick formula 110 X number of residues
- Accurate determination of mass by mass
spectrometry - Tools exist for accurately calculating mass of
peptides based on amino acid composition
16Molecular Weight Proteomics
2-D Gel QTOF Mass Spectrometry
17Isoelectric Point
- The pH at which a protein has a net charge0
18Basics of Protein Structure
- Primary
- Secondary
- Tertiary
19Common Secondary Structure Elements
20Common Secondary Structure Elements
21Secondary StructurePhi Psi Angles Defined
- Rotational constraints emerge from interactions
with bulky groups (ie. side chains). - Phi Psi angles define the secondary structure
adopted by a protein.
22Ramachandran Plot
23Supersecondary Structure
24Secondary Structure Protein Folding
- Understanding the forces of hydrophobicity
Hydrogen bonds can form with polar side
chains on outside of the protein
nonpolar side chains
polar side chains
hydrophobic core contains nonpolar side chains
unfolded or partially folded polypeptide
folded conformation
25Hydrophobicity is a property which can be
calculated for protein sequences
- Hydrophobicity Scales
- Used to calculate hydrophobicity
- Based on experimental evidence indicating
hydrophobic/hydrophilic properties of each aa - Solubility, Stability, Location and/or
Globularity of protein sequences can be predicted
26Hydrophobicity Profile
- Moving segment approach
- Correlation of this technique with 3D structure
exterior
interior residues
hydrophobic hydrophilic -
score
NH2 protein sequence
COOH
27The a-helix is a common secondary structure
element
acidic
- A helical wheel is a representation of the 3D
structure of the a-helix. - Projection of aa side chains onto a plane
perpendicular to axis of helix - Hydrophobic arcs stabilize helical interactions
- Amphipathic helices are common
nonpolar
28Secondary Structure Prediction
- The presence of secondary structure elements can
be predicted. - Current algorithms rely on
- statistics (Chou-Fasman, GOR)
- homology or nearest neighbor comparisons (Levin)
- physico-chemical properties (Lim, Eisenberg)
- pattern matching (Cohen, Rooman)
- neural networks (Qian Sejnowski, Karplus)
- evolutionary methods (Barton, Niemann)
- and combined approaches (Rost, Levin, Argos)
29Chou-Fasman Algorithm
- Assign each residue a Pa, Pb, Pc value
- Take a window of 7 residues and calculate a
window-averaged value for all Pa, Pb, Pc - Assign the average value for each of the
secondary structures to the middle residue - Move down one residue and repeat steps 2 thru 3
until finished - Scan and assign SS to the highest P/residue
30Chou-Fasman Statistics
31The PhD Approach
PRFILE...
32The PhD Algorithm
- Search the SWISS-PROT database and select high
scoring homologues - Create a sequence profile from the resulting
multiple alignment - Include global sequence info in the profile
- Input the profile into a trained two-layer neural
network to predict the structure and to
clean-up the prediction
33Predicting via Neural Nets PSSM
- PHDhtm
- http//www.embl-heidelberg.de/predictprotein/
- TMAP
- http//www.mbb.ki.se/tmap/index.html
- TMPred
- http//www.ch.embnet.org/software/TMPRED_form.html
34Prediction Performance
35Best of the Best
- PredictProtein-PHD (72)
- http//cubic.bioc.columbia.edu/predictprotein
- Jpred (73-75)
- http//www.compbio.dundee.ac.uk/www-jpred/
- PREDATOR (75)
- http//www.hgmp.mrc.ac.uk/Registered/Option/predat
or.html - PSIpred (77)
- http//bioinf.cs.ucl.ac.uk/psipred/
36Basics of Protein Structure
- Primary
- Secondary
- Tertiary
37Tertiary Structure
38Protein Structure Databases
- Where does protein structural information reside?
- PDB
- http//www.rcsb.org/pdb/
- MMDB
- http//www.ncbi.nlm.nih.gov/Structure/
- FSSP
- http//www.ebi.ac.uk/dali/fssp/
- SCOP
- http//scop.mrc-lmb.cam.ac.uk/scop/
- CATH
- http//www.biochem.ucl.ac.uk/bsm/cath_new/
39Structural Proteomics
- Aim to delineate total repertoire of protein
folds - Provide 3D portraits for all proteins in an
organism - Goal Use structure to infer function.
- Compare structure of unknown protein to known set
of structures - More sensitive than primary sequence comparisons
40The Protein Fold Universe
500? 2000? 10000?
How Big Is It???
8
?
41Structures in PDB
PDB 19860 structures Jan 03 PDB 23997
structures Jan 04 structural genomics search
156 structures Jan 03 search 478 structures Jan
04
42Structural Proteomics
100000
90000
80000
70000
60000
50000
Sequences
Structures
40000
30000
20000
10000
0
43Unique folds in PDB
44Prediction Methods for 3D structure
- Intermediate Steps
- Predict secondary structure
- Calculate solvent accessibility
- Methods for 3D structure prediction based on
- Threading, Homology Modeling or Fold recognition
- Similarity in amino acid sequence implies similar
structure/function - Ab Initio Techniques
- Numerical methods designed to simulate the
structure and dynamics of marcromolecules
45Proteomics
- The study of the expression, location,
interaction, function and structure of all the
proteins in a given cell or organism - Expressional Proteomics
- Functional Proteomics
- Structural Proteomics
46Proteomics
- Expressional Proteomics
- 2D or Capillary Electrophoresis, protein chips
- Mass Spectrometry, Laser induced fluorescence
- Functional Proteomics
- Mass Spectrometry, micro-assays, protein chips
- Yeast or Bacterial 2-hybrid systems
- Structural Proteomics
- High throughput X-ray crystallography
- High throughput NMR spectroscopy
472D Gel Principles
SDS PAGE
48Mass Spec Principles
Sample
_
Detector
Ionizer
Mass Filter
49Ionization Methods
370 nm UV laser
Fluid (no salt)
_
Gold tip needle
cyano-hydroxy cinnamic acid
MALDI
ESI
50Protein ID Protocol
51Computational Tools for Protein Identification
- PeptIdent
- http//us.expasy.org/tools/peptident.html
- Mascot
- http//www.matrixscience.com/search_form_select.ht
ml - ProteinProspector
- http//prospector.ucsf.edu/
- MOWSE
- http//srs.hgmp.mrc.ac.uk/cgi-bin/mowse
- PeptideSearch
- http//www.mann.embl-heidelberg.de/
GroupPages/PageLink/peptidesearchpage.html - AACompSim/AACompIdent
- http//www.expasy.ch/tools
Covered in Lab 6.4
52Proteomics
- Human proteome estimated to contain 500,000
proteins - The next big wave in bioinformatics
- How to deal with so much data?
- How to link structure to function to sequence?
- How to show or store temporal and spatial data?
- How to use it in drug discovery development?
Proteomics Workshop July 19 24th, 2004
Calgary, Alberta
53The Cutting Edge of Proteomics
- Evolution of Proteomes
- Structural Genomics
- Quantitative Mass Spectrometry and Protein
Chip Technology - Chemical Proteomics
- Proteome Scale Analysis of Networks, i.e., signal
transduction, Y2H experiments
54Global Proteome Interaction Mapping in C. elegans
Science 23 January 2004 303 540
see also
Science 7 January 2000 287 116
55Yeast Two Hybrid (Y2H) on the genomic scale
- Global interaction map of C. elegans
- Use proteome as bait in Y2H experiment
- Detect all pairwise interactions
- Create global proteinprotein interaction network
56ProteinProtein Interaction Networks
57DNA vs Protein Chip Technology
- DNA microtechnology
- Can successfully read 1000s of side by side
measurements of RNA levels - BUT RNA ? protein function
- Protein Microarray Technology
- Goal develop protein chip with proteins in
active state. - Proteins more challenging to prepare than DNA/RNA
- Protein functionality depends on state,
modifications, binding partners, localization
etc.
58Protein Chip - Methods
- Attachment Methods
- Diffusion
- Absorption
- nitrocellulose
- Covalent Crosslinking
- Reactive surfaces
- Affinity Attachment
- Affinity tags
59Protein Chip - Applications
- Antibody Chip
- Detect Ag-Ab interactions
- Protein Chip
- Proteinprotein
- Proteindrug
- Enzymesubstrate
- Ligand Chip
- And more.
60Protein Chips
61Summary
- Protein sequence, and subsequently protein
sequence databases, are much more complex than
DNA - Prediction of protein structure is a complex
problem at both the 2D and 3D levels - Proteomics initiatives based on different
technologies are making inroads into the study of
protein structure and function on a global level