Presentacin de PowerPoint - PowerPoint PPT Presentation

1 / 50
About This Presentation
Title:

Presentacin de PowerPoint

Description:

An EU-US Course in Environmental Biotechnology. H. influenza. http://www.rcsb.org/pdb ... Identities = 183/305 (60%), Positives = 233/305 (76%), Gaps = 1/305 (0 ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 51
Provided by: Ama84
Category:

less

Transcript and Presenter's Notes

Title: Presentacin de PowerPoint


1
Protein Structure Predictions
Molecular Biology for the Environment An EU-US
Course in Environmental Biotechnology
Amalia Muñoz, CNB - CSIC
2
http//www.rcsb.org/pdb
3
PROTEIN STRUCTURE PREDICTION
KESAAAKFER QHMDSGNSPS SSSNYCNLMM CCRKMTQGKC
KPVNTFVHES LADVKAVCSQ KKVTCKNGQT NCYQSKSTMR
DASV
4
Secondary Structure Predictions
Those predictions for which a one letter/number
code can be assigned for each of the amino acids
and that correlates with some of its
characteristics 1D characteristics http//www.e
nvbiocourse.rutgers.edu/eu-us/
5
  • Key Antecedents
  • 1951. Pauling Corey suggest patterns of local
    conformation such as alpha helices and beta
    sheets.
  • 1957. Szent-Györgyi Cohen connected the amount
    of some of the amino acids with the alpha helice
    content.
  • 1960. Blout, Fasman et al. 1962 Blout expanded
    this previous idea to both alphas and betas and
    for all the amino acids.
  • 1960. Kendrew et al. Perutz et al.
    characterized the first protein structures
    mioglobin and hemoglobin.

6
  • Database searches
  • generation of multiple sequence Aligns (MaxHom)
  • detection of functional motifs (PROSITE)
  • detection of composition-bias (SEG)
  • detection of protein domains (PRODOM)
  • Fold recognition by prediction-based threading
    (TOPITS)
  • Predictions of
  • secondary structure (PHDsec, and PROFsec)
  • residue solvent accessibility (PHDacc, and
    PROFacc)
  • transmembrane helix location and topology
    (PHDhtm, PHDtopology)
  • protein globularity (GLOBE)
  • coiled-coil regions (COILS)
  • cysteine bonds (CYSPRED)

7
1D________________________________________ KESAAA
KFER QHMDSGNSPS SSSNYCNLMM CCRKMTQGKC KPVNTFVHES
HHHHHHH HH SSTT T HHHHHH HHTT SSSS
SEEEEE S LADVKAVCSQ KKVTCKNGQT NCYQSKSTMR
ITDCRETGSS KYPNCAYKTT HHHHHGGGGS EEE TTS S EEE
SSEEE EEEEEE TTT BTTB EEEE QVEKHIIVAC GGKPSVPVHF
DASV EEEEEEEEEE ETTTTEE EE EE
Secondary Structure Predictions The secondary
structure of a protein (alpha-beta-loop) can be
determined from its amino acidic sequence. The
secondary structure is generally assigned from
non-local interactions, that is from its
H-bonding profile between CO and NH groups of the
protein backbone.
8
  • Available Serves
  • PHDsec neuronal network that applies multiple
    Aligns. Reliability 70.
  • Jpred2  two neuronal networks and evolutive
    information (PsiBlast). Ver. 2 combines results
    from 4 networks (JNet, NSSP, Predator, PHD)
  • PROF Based on multiple Aligns in addition to
    other characteristics of these amino acids from
    databases. Reliability 70.
  • PSIpred  uses PsiBlast profiles (filtering the
    results) and neuronal networks (it combines the
    results from several prediction methods).
    Reliability gt76.
  • SAM-T99 Neuronal network and multiple Align
    profiles improved through the use of "Hidden
    Markov models.
  • SSpro recurrent bidirectional neuronal networks
    (using fixed and small windows that allows the
    use of the whole sequence as imput).

9
Most of these methods apply either neronal
networks or other algorithms that are trained
with proteins of known structures. In some cases
additional information from multiple Alignments
is also consider in the predictions.
Scheme for PHD Protein Prediction Methods Rost et
al. (1997) J. Mol. Biol. 270 471-480
10
  • Advantages and Problems
  • Advantages
  • Reliability (predictions 3-states) gt 70
  • Reliability for betas alphas loops
  • Problems
  • bad alignment gtgt wrong predictions
  • long range interactions gtgt problems
    differentiating alphas and betas
  • problems evaluating unusual proteins

11
Example of the output for the PhD server
12
Example of the output for the PhD server
13
Other Features of Secondary Structure to Predict
14
Solvent Accessibility Accessibility to
the solvent is of interest for the modeling of a
sequence. The most detailed method evaluates the
volume exposure to the solvent by each residue
has been developed by Connolly and is implemented
in DSSP. (Output accessibility lt 16 buried or
16 exposed).
15
  • Available Servers
  • PHD
  • PROFphd
  • JPred2
  • PHD y PROFphd (from PredictProtein) apply
    neuronal networks and multiple Align information.
  • These servers provide numeric values for the
    accessibility (matrix with values of 0, 1, 4, 9,
    16, 25, 36, 49, 64, 81).
  • JPred2 uses PsiBlast profiles as input for the
    neuronal networks and returns two state-values
    (buried or exposed).

16
  • Transmembrane Proteins
  • One of the biggest challenges of proteomics is
    the determination of the structure of
    transmembrane proteins (difficult to crystalize
    and determine by NMR).
  • There are two main groups of transmembrane
    proteins
  • those getting alpha helices in the membrane and,
  • those forming pores made off beta barrels (type
    porines).
  • So far there are not publicly available servers
    to determine the second type of transmembrane
    proteins (because of the lack of experimental
    information). However, the situation is quite
    different for the first type.
  • The 3D structure of these proteins can be
    determined knowning the precise position of its
    helices just by checking all the possible
    conformations.

17
  • Available Serves
  • MEMSAT uses a dynamic program base on
    statistical preferences
  • TMAP uses statistical preferences and Align
    profiles
  • PHD combines neuronal networks and evolutionary
    information withing dynamic programs to optimize
    predictions
  • DAS optimizes the use of hydrophobic profiles
  • SOSUI combines hydrophobic preferences and
    anphypaticity profiles
  • TMHMM the method most advance and reliable. It
    uses statistical information and "Hidden Markov"
    model to optime predictions

18
  • Other topology features
  • protein globularity (GLOBE)
  • http//cubic.bioc.columbia.edu/predictprotein
  • coiled-coil regions (COILS)
  • http//www.ch.embnet.org/software/COILS_form.html
  • cysteine bonds (CYSPRED)
  • http//prion.biocomp.unibo.it/cyspred.html
  • EXAMPLE OF PredictProtein SERVER OUTPUT

Click on this link
19
  • Post-transcripcional Modifications
  • ExPASy Proteomics tools
  • http//www.expasy.ch/tools/
  • PSORT signal peptides and localization
  • TargetP subcellular localization
  • SignalP peptide signals
  • ChloroP chloroplasts peptides
  • MITOPROT mitochondrial target sequences
  • Predotar mitochondrial and plastids target
    sequences
  • NetOGlyc O-glicosilation sites for mammals
  • NDictyOGlyc GlcNAc O-glicosilation sites for
    Dictyostelium
  • YinOYang O-beta-GlcNAc binding sites for
    eukaryots
  • big-PI Predictor GPI modification sites
    (Glicosil-fosfatidil inositol)
  • DGPI GPI binding and breaking sites
  • NetPhos phosphorilation sites (Ser, Thr, Tyr)
    for eukaryots

20
  • EVA EValuation of Automatic servers (B. Rost)
  • Continuously and automatically analyses protein
    structure prediction servers in real time and
    based on known structures (It is not a
    metaserver)
  • Methods covered
  • Predictions 1D (secondary structure, solvent
    accessibility)
  • Predictions 2D (inter-residue distances)
  • Predictions 3D (homology modelling)
  • Predictions 3D (threading methods restricted to
    search for homologies among sequences)
  • Prediction of novel foldings

21
EVA EValuation of Automatic Servers (B. Rost)
22
Tertiary Structure Predictions
PROTEIN STRUCTURE PREDICTION
KESAAAKFER QHMDSGNSPS SSSNYCNLMM CCRKMTQGKC
KPVNTFVHES LADVKAVCSQ KKVTCKNGQT NCYQSKSTMR
DASV
23
(No Transcript)
24
  • GO to
  • http//www.envbiocourse.rutgers.edu/eu-us/
  • Enter and then select
  • Additional tools
  • Links
  • Programs servers on protein modeling

25
Example for Protein Structure Predictions XylR-A
Xylr is the central regulator of the toluene
degradation pathway in Pseudomonas sp. XylR
activates the Pu promoter in response to m-xylene
and p-xylene and belongs to the class of
regulators known generically as the NtrC family
of prokaryotic enhancer-binding proteins.
Regulators of this kind activate at distance
promoters dependent on the alternative sigma
factor s54 and are generally composed of four
separate domains.
26
Fasta Format
gtXylR-A MSLTYKPKMQHEDMQDLSSQIRFVAAEGKIWLGEQRMLVMQL
STLASFRREIISLIGVERAKGFFLRLGYQSGLMDAELARKLRPAMREEEV
FLAGPQLYALKGMVKVRLLTMDIAIRDGRFNVEAEWIDSFEVDICRTELG
LMNEPVCWTVLGYASGYGSAFMGRRIIFQETSCRGCGDDKCLIVGKTAEE
WGDVSSFEAYFKSDPIVDE
27
Select appropiated filters
28
Using one of the Align programs Clustal-W
http//www2.ebi.ac.uk/clustalw/
29
Searching for motifs and domains Pfam / DART /
PRINTS / SMART / Blocks-Prints / InterPro /
ProDom http//www.sanger.ac.uk/Software/Pfam/
30
(No Transcript)
31
Using one of the Align programs Clustal-W
http//www2.ebi.ac.uk/clustalw/
32
(No Transcript)
33
(No Transcript)
34
Modelling by Threading
35
(No Transcript)
36
3D-PSSM threading server http//www.sbg.bio.ic.a
c.uk/3dpssm/
37
(No Transcript)
38
query___Seq MSLTYKPKMQ HEDMQDLSSQ IRFVAAEGKI
WLGEQRMLVM QLSTLASFRR d1gesa1_Seq K..HYDYIAI
GGGSGGIASI NRAAMYGQKC ALIEAKELGG TCVNVGCVPK
query___Seq EIISLIG..V ERAKGFFLRL GYQ.......
SGLMDAELAR KLRPAMREEE d1gesa1_Seq KVMWHAAQIR
EAIHMYGPDY GFDTTINKFN WETLIASRTA YIDRIHTSYE
query___Seq VFLAGPQLYA LKGMVK.... .......VRL
LTMDIAIRDG RFNVEAEWID d1gesa1_Seq NVLGKNNVDV
IKGFARFVDA KTLEVNGETI TADHILIATG GRPSHPREPA
query___Seq SFEVDICRTE LGLMNEPVCW TVLGYASGYG
SAFMGRRIIF QETSCRGCGD d1gesa1_Seq NDNINL..EA
AGVKTNE... .......... .....KGYIV VDKYQN.TNI
query___Seq DKCLIVGKTA EEWGDVSSFE AYFK......
....SDPIVD E d1gesa1_Seq EGIYAVGDNT GAVELTPVAV
AAGRRLSERL FNNKPDEHLD .
3D coordinates of the models
PFRMAT TS TARGET XylR-A REMARK After
reevaluating this model, it was selected from
those offered by 3DPSSM SCORE 1 MODEL 1 PARENT
1a8h ATOM 2896 CA GLU 12 36.767
9.288 64.728 ATOM 2904 CA ASP 13
34.886 7.822 61.752 ATOM 2912 CA MET
14 38.030 6.755 59.907 ATOM 2920 CA
GLN 15 40.967 6.651 62.303 ATOM
2924 CA ASP 16 39.163 5.046 65.233
ATOM 2932 CA LEU 17 37.431 2.462
63.036 ATOM 2940 CA SER 18 40.852
1.452 61.699 ATOM 2947 CA SER 19
42.365 1.348 65.196 ATOM 2956 CA GLN
20 39.310 -0.573 66.432 ATOM 2967 CA
ILE 21 39.408 -3.120 63.604 ATOM
2974 CA ARG 22 43.122 -3.728 64.118
ATOM 3006 CA PHE 23 44.663 -9.425
66.871
39
Threadlize
40
Similar analyses were carried out for several of
the homologues. The best hit for all the
sequences studied when all the additional
information available was included (secondary
structure, binding site, solvent exposure, ...)
was selected as a template for the modeling of
our query sequence (XylR-A). The selected
sequence is 1vid
41
FSSP http//www2.emb-ebi.ac.uk/dali/fssp
42
(No Transcript)
43
(No Transcript)
44
Using one of the Align programs Clustal-W
http//www2.ebi.ac.uk/clustalw/
45
Modelling by Homology Methods
46
(No Transcript)
47
(No Transcript)
48
Example of the output for three of these
prediction servers. Comparison with the
experimental values
Sequence for a SH3 domain. The experimentally
observed seconary structure was calculated using
DSSP. Reliability levels CF 59, GORIII
65 y PHD 72 (CF y GOR values higher than
average). Reliability index within the range
0-9. For Rel. values gt 4 the prediction is
correct.
49
Signal Peptides http//www.cbs.dtu.dk/services/Sig
nalP/ Prediction of existence and location of
breaking sites of signal peptides
50
Signal Peptides http//www.cbs.dtu.dk/services/Sig
nalP/
Write a Comment
User Comments (0)
About PowerShow.com