Title: CS 177 Proteins part 1: Structure-function relationships
1CS 177 Proteins part 1
Structure-function relationships
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
2Need for analyses of protein structures
A protein performs metabolic, structural, or
regulatory functions in a cell. Cellular
biochemistry works based on interactions between
3-D molecular structures
The 3-D structure of a protein determines its
function
Therefore, the relationship of sequence to
function is primarily concerned with
understanding the 3-D folding of proteins and
inferring protein functions from these 3-D
structures(e.g. binding sites, catalytic
activities, interactions with other molecules)
The study of protein structure is not only of
fundamental scientific interest in terms of
understanding biochemical processes, but also
produces very valuable practical benefits
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
Medicine The understanding of enzyme function
allows the design of new and improved drugs
Agriculture Therapeutic proteins and drugs for
veterinary purposes and for treatment of plant
diseases Industry Protein engineering has
potential for the synthesis of enzymes to carry
out various industrial processes on a mass scale
3Need for analyses of protein structures
Protein 3-D structure has direct medical
implicationsa incorrectly folded protein will
not function properly Examples
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
4Need for analyses of protein structures
Examples for diseases associates with protein
misfolding (cont.)
- Alzheimer's disease New studies indicate that
Alzheimer's disease may be caused by small clumps
of wrongly folded proteins. Scientists have found
that misfolded amyloid beta protein molecules
hinder memory processes in rat brains by blocking
synapses
References 1. Walsh, D. M. et al. Naturally
secreted oligomers of amyloid (protein potently
inhibit hippocampal long-term potentiation in
vivo. Nature, 416, 535 - 539, (2002). 2.
Bucciantini, M. et al. Inherent toxicity of
aggregates implies a common mechanism for protein
misfolding diseases. Nature, 416, 507 - 511,
(2002).
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
CT scan of the brain of an Alzheimer's patient
showing widespread destruction (pink) of brain
tissue (green)
5Need for analyses of protein structures
Examples for diseases associates with protein
misfolding (cont.)
- Transmissible Spongiform Encephalopathies
(TSEs) (such as mad cow disease or the human
version, Creutzfeldt-Jakob disease)
Infectious agent is probably a small misfolded
protein called prion. Prions naturally occur
in the brain with unknown function. Infectious
prions can cause correctly folded proteins to
misfold. Domino-effect large numbers of
misfolded prions cause neural degeneration
- Other non-infectious brain diseases such as
Parkinsons, Huntingtons, and Lou Gehrigs.
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
6Sources of protein structure information
3-D macromolecular structures stored in databases
The most important database the Protein Data
Bank (PDB)The PDB is maintained by the Research
Collaboratory for Structural Bioinformatics
(RCSB) and can be accessed at three different
sites (plus a number of mirror sites outside the
USA) - http//rcsb.rutgers.edu/pdb (Rutgers
University)- http//www.rcsb.org/pdb/ (San Diego
Supercomputer Center)- http//tcsb.nist.gov/pdb/
(National Institute for Standards and
Technology) It is the very first
bioinformatics database ever build
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
7(No Transcript)
8The Protein Data Bank (PDB)
PDB 20,254 structures (4 March 2003)
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
SwissProt 122,564 entries (5 March 2003)
Ratio 16 (structure of more than 83 of
proteins still unknown)
9Sources of protein structure information
Experimental structure determination
In practice, most biomolecular structures (gt99
of structures in PDB) are determined using three
techniques- X-ray crystallography (low to very
high resolution) Problem requires crystals
difficult to crystallize proteins by maintaining
their native conformation not all protein can
be crystallized - Nuclear magnetic resonance
(NMR) spectroscopy of proteins in solution
(medium to high resolution) Problem Works only
with small and medium size proteins (50 of
proteins cannot be studied with this method)
requires high solubility - Electron microscopy
and crystallography (low to medium resolution)
Problem (still) relatively low resolution
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
Experimental methods are still very time
consuming and expensive in most cases the
experimental data will contain errors and/or are
incomplete. Thus the initial model needs to be
refined and rebuild
10Sources of protein structure information
Computational Modeling
Researches have been working for decades to
develop procedures for predicting protein
structure that are not so time consuming and not
hindered by size and solubility constrains. As
protein sequences are encoded in DNA, in
principle, it should therefore be possible to
translate a gene sequence into an amino acid
sequence, and topredict the three-dimensional
structure of the resulting chain from this amino
acid sequence
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
11Some common terminology used in homology modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
12Computational modeling
Gene finding
Identification of protein coding regions within
DNA sequences (ORFs) This is one of the single
biggest challenges facing the bioinformatics
specialistsworking on Genome Projects Existing
software is only about 90 accurate in predicting
genes in large stretches of genomic DNA The
problem gets worse in eukaryotic genomes by the
common occurrence of pseudogenes that are highly
similar to real sequences, but are not
transcribed
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
13Computational modeling
How to find genes?
Similarity search against the expressed sequence
tag (EST) database (e.g. dbEST) Translation and
similarity search against the protein databanks
(e.g.SWISS-PROT and GenPept) - automatic
translate and search functions implemented in
BLASTX and TFASTA - if a protein (or EST
sequence) matches, it can be aligned with the
unknown genomic sequence start and stop codons
should line up nicely and the introns should be
obvious - small error rate remains If there are
no handy template sequences in the databanks, one
must rely onknowledge of DNA code - the
transcription initiation site is generally a ATG
codon it is usually about 30bp downstream from
a TAATAA sequence (or some close approximation) -
graphic map of all 6 reading frames can be
produced to search for a long one - several
software packages are available that map ORFs
(e.g. FRAMES, GeneWorks, MacVector, DNA
Strider, GRAIL, ORF finder, DNA translation, BCM
GeneFinder) - problem none of those programs is
perfect errors will occur - confirming evidence
can be collected by looking for regulatory
sequences (promoters, enhancers, transcription
factors also known as signal sequences) that
generally occur near ORFs. Several databases
for signal sequences are available (e.g.
TransFac) and several software tool make use of
these databases (e.g. Signal Scan, FindPatterns)
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
14Computational modeling
How to predict the protein structure?
Ab initio prediction of protein structure from
sequence not yet. Problem the information
contained in protein structures lies essentially
in theconformational torsion angles. Even if we
only assume that every amino-acid residuehas
three such torsion angles, and that each of these
three can only assume oneof three "ideal" values
(e.g., 60, 180 and -60 degrees), this still
leaves us with 27possible conformations per
residue.
For a typical 200-amino acid protein, this would
give 27200 (roughly 1.87 x 10286)possible
conformations!
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
Q Cant we just generate all these
conformations, calculate their energy and
see which conformation has the lowest
energy?
15Computational modeling
Solution homology modeling
Homology (comparative) modeling attempts to
predict structure on the strengthof a proteins
sequence similarity to another protein of known
structure
Basic idea a significant alignment of the query
sequence with a target sequence from PDB is
evidence that the query sequence has a similar
3-D structure (current threshold 40 sequence
identity). Then multiple sequence alignment and
pattern analysis can be used to predict the
structure of the protein
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
16Computational modeling
Flow chart for protein structure prediction (from
Mount, 2001)
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
17Computational modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
Protein sequence - partial or full sequences
predicted through gene finding
18Computational modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
Database similarity search - sequence is used as
a query in a database similarity search against
proteins in PDB
19Computational modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
- Does the sequence align with a protein of known
structure? - Yes if the database similarity search reveals
a significant alignment between the query
sequence and a PDB target sequence, the alignment
can be used to position the amino acids of
the query sequence in the same approximate 3-D
structure - No proceed to protein family analysis
20Computational modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
- Protein family analysis/relationship to known
structure - Family (structural context) structures that
have a significant level of structural
similarity but not necessarily significant
sequence similarity - the goal is to exploit these structure sequence
relationships two questions 1) is the new
protein a member of a family, 2) does the family
have a predicted structural fold? - analyze sequence for family specific profiles
and patterns. Available databases 3D-Ali,
3D-PSSM, BLOCKS, eMOTIF, INTERPRO, Pfam ) - if the family analysis reveals that the query
protein is a member of a family with a
predicted structural fold, multiple alignment can
be used for structural modeling
21Computational modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
- Protein family analysis/relationship to known
structure - if the family analysis is unsuccessful, proceed
to structural analyses
22Computational modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
- Structural analysis
- several different types of analyses to infer
structural information - presence of small amino acid motifs in a protein
can be indicator of a biochemical function
associated with a particular structure. Motifs
are available from the Prosite catalog - spacing and arrangement of amino acids (e.g.
hydrophobic amino acids) provide important
structural clues that can be used for modeling - certain amino acid combinations can occur in
certain types of secondary structure - - These structural analyses can provide clues as
to the presence of active sites and regions of
secondary structure. These information can help
to identify a new protein as a member of a
known structural class
23Computational modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
- 3-D structural analysis in lab
- proteins that fail to show any relationship to
proteins of known structure are candidates for
structural analyses (X-ray crystallography, NMR).
There are about 600 known fold families and new
structures are frequently found to have already
known structural fold. Accordingly, protein
families with no relatives of known structure may
represent a novel fold
24Computational modeling summary
Partial or full sequencespredicted through gene
finding
Similarity searchagainst proteins in PDB
Find structures that have a significantlevel of
structural similarity (but notnecessarily
significant sequence similarity)
Alignment can be used to position theamino
acids of the query sequence inthe same
approximate 3-D structure
If member of a family with a predicted
structural fold, multiple alignment can be used
for structural modeling
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
Infer structural information (e.g. presence of
smallamino acid motifs spacing and arrangement
ofamino acids certain typical amino acid
combinationsassociated with certain types of
secondary structure)can provide clues as to the
presence of active sites andregions of
secondary structure
Structural analyses in the lab(X-ray
crystallography, NMR)
25Computational modeling summary
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
How to predict the protein structure?
Ab initio prediction of protein structure from
sequence
Homology (comparative) modeling attempts to
predict structure on the strength of a
proteins sequence similarity to another protein
of known structure
Experimental structure determination
26Computational modeling summary
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling
27Computational modeling
Viewing protein structures
A number of molecular viewers are freely
available and run on most computer platforms and
operating systems Examples Cn3D 4.0
(stand-alone) Rasmol (stand-alone) Chime
(Web browser based on Rasmol) Swiss 3D viewer
Spdbv (stand-alone) All these viewers can use
the PDB identification code or the structural
file from PDB
Review of protein structures Need for
analyses of protein structures Sources of
protein structure information Computational
Modeling