Title: Presentacin de PowerPoint
1Protein Structure Predictions
Molecular Biology for the Environment An EU-US
Course in Environmental Biotechnology
Amalia Muñoz, CNB - CSIC
2http//www.rcsb.org/pdb
3 PROTEIN STRUCTURE PREDICTION
KESAAAKFER QHMDSGNSPS SSSNYCNLMM CCRKMTQGKC
KPVNTFVHES LADVKAVCSQ KKVTCKNGQT NCYQSKSTMR
DASV
4Secondary Structure Predictions
Those predictions for which a one letter/number
code can be assigned for each of the amino acids
and that correlates with some of its
characteristics 1D characteristics http//www.e
nvbiocourse.rutgers.edu/eu-us/
5- Key Antecedents
- 1951. Pauling Corey suggest patterns of local
conformation such as alpha helices and beta
sheets. - 1957. Szent-Györgyi Cohen connected the amount
of some of the amino acids with the alpha helice
content. - 1960. Blout, Fasman et al. 1962 Blout expanded
this previous idea to both alphas and betas and
for all the amino acids. - 1960. Kendrew et al. Perutz et al.
characterized the first protein structures
mioglobin and hemoglobin.
6- Database searches
- generation of multiple sequence Aligns (MaxHom)
- detection of functional motifs (PROSITE)
- detection of composition-bias (SEG)
- detection of protein domains (PRODOM)
- Fold recognition by prediction-based threading
(TOPITS) - Predictions of
- secondary structure (PHDsec, and PROFsec)
- residue solvent accessibility (PHDacc, and
PROFacc) - transmembrane helix location and topology
(PHDhtm, PHDtopology) - protein globularity (GLOBE)
- coiled-coil regions (COILS)
- cysteine bonds (CYSPRED)
71D________________________________________ KESAAA
KFER QHMDSGNSPS SSSNYCNLMM CCRKMTQGKC KPVNTFVHES
HHHHHHH HH SSTT T HHHHHH HHTT SSSS
SEEEEE S LADVKAVCSQ KKVTCKNGQT NCYQSKSTMR
ITDCRETGSS KYPNCAYKTT HHHHHGGGGS EEE TTS S EEE
SSEEE EEEEEE TTT BTTB EEEE QVEKHIIVAC GGKPSVPVHF
DASV EEEEEEEEEE ETTTTEE EE EE
Secondary Structure Predictions The secondary
structure of a protein (alpha-beta-loop) can be
determined from its amino acidic sequence. The
secondary structure is generally assigned from
non-local interactions, that is from its
H-bonding profile between CO and NH groups of the
protein backbone.
8- Available Serves
- PHDsec neuronal network that applies multiple
Aligns. Reliability 70. - Jpred2 two neuronal networks and evolutive
information (PsiBlast). Ver. 2 combines results
from 4 networks (JNet, NSSP, Predator, PHD) - PROF Based on multiple Aligns in addition to
other characteristics of these amino acids from
databases. Reliability 70. - PSIpred uses PsiBlast profiles (filtering the
results) and neuronal networks (it combines the
results from several prediction methods).
Reliability gt76. - SAM-T99 Neuronal network and multiple Align
profiles improved through the use of "Hidden
Markov models. - SSpro recurrent bidirectional neuronal networks
(using fixed and small windows that allows the
use of the whole sequence as imput).
9Most of these methods apply either neronal
networks or other algorithms that are trained
with proteins of known structures. In some cases
additional information from multiple Alignments
is also consider in the predictions.
Scheme for PHD Protein Prediction Methods Rost et
al. (1997) J. Mol. Biol. 270 471-480
10- Advantages and Problems
- Advantages
- Reliability (predictions 3-states) gt 70
- Reliability for betas alphas loops
- Problems
- bad alignment gtgt wrong predictions
- long range interactions gtgt problems
differentiating alphas and betas - problems evaluating unusual proteins
11Example of the output for the PhD server
12Example of the output for the PhD server
13Other Features of Secondary Structure to Predict
14 Solvent Accessibility Accessibility to
the solvent is of interest for the modeling of a
sequence. The most detailed method evaluates the
volume exposure to the solvent by each residue
has been developed by Connolly and is implemented
in DSSP. (Output accessibility lt 16 buried or
16 exposed).
15- Available Servers
- PHD
- PROFphd
- JPred2
- PHD y PROFphd (from PredictProtein) apply
neuronal networks and multiple Align information.
- These servers provide numeric values for the
accessibility (matrix with values of 0, 1, 4, 9,
16, 25, 36, 49, 64, 81). - JPred2 uses PsiBlast profiles as input for the
neuronal networks and returns two state-values
(buried or exposed).
16- Transmembrane Proteins
- One of the biggest challenges of proteomics is
the determination of the structure of
transmembrane proteins (difficult to crystalize
and determine by NMR). - There are two main groups of transmembrane
proteins - those getting alpha helices in the membrane and,
- those forming pores made off beta barrels (type
porines). - So far there are not publicly available servers
to determine the second type of transmembrane
proteins (because of the lack of experimental
information). However, the situation is quite
different for the first type. - The 3D structure of these proteins can be
determined knowning the precise position of its
helices just by checking all the possible
conformations.
17- Available Serves
- MEMSAT uses a dynamic program base on
statistical preferences - TMAP uses statistical preferences and Align
profiles - PHD combines neuronal networks and evolutionary
information withing dynamic programs to optimize
predictions - DAS optimizes the use of hydrophobic profiles
- SOSUI combines hydrophobic preferences and
anphypaticity profiles - TMHMM the method most advance and reliable. It
uses statistical information and "Hidden Markov"
model to optime predictions
18- Other topology features
- protein globularity (GLOBE)
- http//cubic.bioc.columbia.edu/predictprotein
- coiled-coil regions (COILS)
- http//www.ch.embnet.org/software/COILS_form.html
- cysteine bonds (CYSPRED)
- http//prion.biocomp.unibo.it/cyspred.html
- EXAMPLE OF PredictProtein SERVER OUTPUT
Click on this link
19- Post-transcripcional Modifications
- ExPASy Proteomics tools
- http//www.expasy.ch/tools/
- PSORT signal peptides and localization
- TargetP subcellular localization
- SignalP peptide signals
- ChloroP chloroplasts peptides
- MITOPROT mitochondrial target sequences
- Predotar mitochondrial and plastids target
sequences - NetOGlyc O-glicosilation sites for mammals
- NDictyOGlyc GlcNAc O-glicosilation sites for
Dictyostelium - YinOYang O-beta-GlcNAc binding sites for
eukaryots - big-PI Predictor GPI modification sites
(Glicosil-fosfatidil inositol) - DGPI GPI binding and breaking sites
- NetPhos phosphorilation sites (Ser, Thr, Tyr)
for eukaryots
20- EVA EValuation of Automatic servers (B. Rost)
- Continuously and automatically analyses protein
structure prediction servers in real time and
based on known structures (It is not a
metaserver) - Methods covered
- Predictions 1D (secondary structure, solvent
accessibility) - Predictions 2D (inter-residue distances)
- Predictions 3D (homology modelling)
- Predictions 3D (threading methods restricted to
search for homologies among sequences) - Prediction of novel foldings
21EVA EValuation of Automatic Servers (B. Rost)
22Tertiary Structure Predictions
PROTEIN STRUCTURE PREDICTION
KESAAAKFER QHMDSGNSPS SSSNYCNLMM CCRKMTQGKC
KPVNTFVHES LADVKAVCSQ KKVTCKNGQT NCYQSKSTMR
DASV
23(No Transcript)
24- GO to
- http//www.envbiocourse.rutgers.edu/eu-us/
- Enter and then select
- Additional tools
- Links
- Programs servers on protein modeling
25Example for Protein Structure Predictions XylR-A
Xylr is the central regulator of the toluene
degradation pathway in Pseudomonas sp. XylR
activates the Pu promoter in response to m-xylene
and p-xylene and belongs to the class of
regulators known generically as the NtrC family
of prokaryotic enhancer-binding proteins.
Regulators of this kind activate at distance
promoters dependent on the alternative sigma
factor s54 and are generally composed of four
separate domains.
26Fasta Format
gtXylR-A MSLTYKPKMQHEDMQDLSSQIRFVAAEGKIWLGEQRMLVMQL
STLASFRREIISLIGVERAKGFFLRLGYQSGLMDAELARKLRPAMREEEV
FLAGPQLYALKGMVKVRLLTMDIAIRDGRFNVEAEWIDSFEVDICRTELG
LMNEPVCWTVLGYASGYGSAFMGRRIIFQETSCRGCGDDKCLIVGKTAEE
WGDVSSFEAYFKSDPIVDE
27Select appropiated filters
28Using one of the Align programs Clustal-W
http//www2.ebi.ac.uk/clustalw/
29Searching for motifs and domains Pfam / DART /
PRINTS / SMART / Blocks-Prints / InterPro /
ProDom http//www.sanger.ac.uk/Software/Pfam/
30(No Transcript)
31Using one of the Align programs Clustal-W
http//www2.ebi.ac.uk/clustalw/
32(No Transcript)
33(No Transcript)
34Modelling by Threading
35(No Transcript)
363D-PSSM threading server http//www.sbg.bio.ic.a
c.uk/3dpssm/
37(No Transcript)
38 query___Seq MSLTYKPKMQ HEDMQDLSSQ IRFVAAEGKI
WLGEQRMLVM QLSTLASFRR d1gesa1_Seq K..HYDYIAI
GGGSGGIASI NRAAMYGQKC ALIEAKELGG TCVNVGCVPK
query___Seq EIISLIG..V ERAKGFFLRL GYQ.......
SGLMDAELAR KLRPAMREEE d1gesa1_Seq KVMWHAAQIR
EAIHMYGPDY GFDTTINKFN WETLIASRTA YIDRIHTSYE
query___Seq VFLAGPQLYA LKGMVK.... .......VRL
LTMDIAIRDG RFNVEAEWID d1gesa1_Seq NVLGKNNVDV
IKGFARFVDA KTLEVNGETI TADHILIATG GRPSHPREPA
query___Seq SFEVDICRTE LGLMNEPVCW TVLGYASGYG
SAFMGRRIIF QETSCRGCGD d1gesa1_Seq NDNINL..EA
AGVKTNE... .......... .....KGYIV VDKYQN.TNI
query___Seq DKCLIVGKTA EEWGDVSSFE AYFK......
....SDPIVD E d1gesa1_Seq EGIYAVGDNT GAVELTPVAV
AAGRRLSERL FNNKPDEHLD .
3D coordinates of the models
PFRMAT TS TARGET XylR-A REMARK After
reevaluating this model, it was selected from
those offered by 3DPSSM SCORE 1 MODEL 1 PARENT
1a8h ATOM 2896 CA GLU 12 36.767
9.288 64.728 ATOM 2904 CA ASP 13
34.886 7.822 61.752 ATOM 2912 CA MET
14 38.030 6.755 59.907 ATOM 2920 CA
GLN 15 40.967 6.651 62.303 ATOM
2924 CA ASP 16 39.163 5.046 65.233
ATOM 2932 CA LEU 17 37.431 2.462
63.036 ATOM 2940 CA SER 18 40.852
1.452 61.699 ATOM 2947 CA SER 19
42.365 1.348 65.196 ATOM 2956 CA GLN
20 39.310 -0.573 66.432 ATOM 2967 CA
ILE 21 39.408 -3.120 63.604 ATOM
2974 CA ARG 22 43.122 -3.728 64.118
ATOM 3006 CA PHE 23 44.663 -9.425
66.871
39Threadlize
40Similar analyses were carried out for several of
the homologues. The best hit for all the
sequences studied when all the additional
information available was included (secondary
structure, binding site, solvent exposure, ...)
was selected as a template for the modeling of
our query sequence (XylR-A). The selected
sequence is 1vid
41FSSP http//www2.emb-ebi.ac.uk/dali/fssp
42(No Transcript)
43(No Transcript)
44Using one of the Align programs Clustal-W
http//www2.ebi.ac.uk/clustalw/
45Modelling by Homology Methods
46(No Transcript)
47(No Transcript)
48Example of the output for three of these
prediction servers. Comparison with the
experimental values
Sequence for a SH3 domain. The experimentally
observed seconary structure was calculated using
DSSP. Reliability levels CF 59, GORIII
65 y PHD 72 (CF y GOR values higher than
average). Reliability index within the range
0-9. For Rel. values gt 4 the prediction is
correct.
49Signal Peptides http//www.cbs.dtu.dk/services/Sig
nalP/ Prediction of existence and location of
breaking sites of signal peptides
50Signal Peptides http//www.cbs.dtu.dk/services/Sig
nalP/