PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS

Description:

Chrissy Oriol. Lydia Shih. Sudhakar Reddy. Proteins. And Secondary Structure. Project Goals ... CHRISSY ORIOL. JNET AND JPRED. Multiple Alignement. Neural ... – PowerPoint PPT presentation

Number of Views:306
Avg rating:3.0/5.0
Slides: 81
Provided by: valueds436
Category:

less

Transcript and Presenter's Notes

Title: PREDICTING PROTEIN SECONDARY STRUCTURE USING ARTIFICIAL NEURAL NETWORKS


1
PREDICTING PROTEIN SECONDARY STRUCTURE USING
ARTIFICIAL NEURAL NETWORKS
  • Sudhakar Reddy
  • Patrick Shih
  • Chrissy Oriol
  • Lydia Shih

2
Proteins And Secondary Structure
Sudhakar Reddy
3
Project Goals
  • To predict the secondary structure of a protein
    using artificial neural networks.

4
STRUCTURES
  • Primary structure linear arrangement of amino
    acid (a.a) residues that constitute the
    polypeptide chain.

5
SECONDARY STRUCTURE
  • Localized organization of parts of a polypeptide
    chain, through hydrogen bonds between different
    residues.
  • Without any stabilizing interactions , a
    polypeptide assumes random coil structure.
  • When stabilizing hydrogen bond forms, the
    polypeptide backbone folds periodically in to one
    of two geometric arrangements viz.
  • ALPHA HELIX
  • BETA SHEET
  • U-TURNS


6
ALPHA HELIX
  • A polypeptide back bone is folded in to spiral
    that is held in place by hydrogen bonds between
    backbone oxygen atoms and hydrogen atoms.
  • The carbonyl oxygen of each peptide bond is
    hydrogen bonded to the amide hydrogen of the a.a
    4 residues toward the C-terminus
  • Each alpha helix has 3.6 a.a per turn
  • From the backbone side chains point outward
  • Hydrophobic/hydrophilic quality of the helix is
    determined entirely by side chains, because polar
    groups of the peptide backbone are already
    involved H-bonding in the helix and thus are
    unable to affect its hydrophobic/hydrophilic.

7
ALPHA HELIX
8
THE BETA SHEET
  • Consists of laterally packed beta strands
  • Each beta strand is a short (5-8 residues),
    nearly fully extended polypeptide chain
  • Hydrogen bonding between backbone atoms in a
    adjacent beta strands, within either the same or
    different polypeptide chains forms a beta sheet.
  • Orientation can be either parallel or
    anti-parallel. In both arrangements side chains
    project from both faces of the sheet.

9
THE BETA SHEET
10
THE BETA SHEET
11
TURNS
  • Composed of 3-4 residues , are compact, U-shaped
    secondary structures stabilized by H-bonds
    between their end residues.
  • Located on the surface of the protein, forming a
    sharp bend that redirects the polypeptide
    backbone back toward the interior.
  • Glycine and proline are commonly present.
  • Without these turns , a protein would be large,
    extended and loosely packed.

12
TURNS
13
MOTIFS
  • MOTIFS regular combinations of secondary
    structure.
  • Coiled coil motif
  • Helix-loop-helix(Ca)
  • Zinc finger motif.

14
COILED-COIL MOTIF
15
HELIX-LOOP-HELIX (CA)
16
ZINC-FINGER MOTIF
17
FUTURE
  • Protein structure identification is key to
    understanding biological function and its role in
    health and disease
  • Characterizing a protein structure helpful in the
    development of new agents and devices to treat
    disease
  • Challenge of unraveling the structure lies in
    developing methods for accurately and reliably
    understanding this relationship
  • Most of the current protein structures have been
    characterized by NMR and X-Ray diffraction
  • Revolution in sequencing studies-growing data
    base-only 3000 known structures

18
ADVANTAGE
  • Very few confirmations of protein are possible
    and structure and sequence are directly related
    to each other, we can unravel the secondary
    structure by developing an efficient algorithm,
    which compares new sequences with the ones
    available, and use them in health care industry.

19
WHY SECONDARY STRUCTURE?
  • Prediction of secondary structure is an
    essential intermediate step on the way to
    predicting the full 3-D structure of a protein
  • If the secondary structure of a protein is known,
    it is possible to derive a comparatively small
    number of possible tertiary structures using
    knowledge about the ways that secondary
    structural elements pack

20
Artificial Neural Network (ANN)
Peichung Shih
21
Biological Neural Network
22
Artificial Neural Network
X1k Input from X1 X2k Input from X2
W1k Weight of X1 W2k Weight of X2
X0k Bias term W0k Weight of bias term
qk Output of node k
23
Artificial Neural Network - Example
7
1
Output 1
24
Paradigms of ANN - Overview
  • Perceptron
  • Adaline Madaline
  • Backpropagation (BP)

25
Paradigms of ANN - Feedforward
26
Paradigms of ANN - feedback
27
Paradigms of ANN - supervised
28
Paradigms of ANN - Unsupervised
29
Paradigms of ANN - Overview
  • Perceptron
  • Adaline Madaline
  • Backpropagation (BP)

30
  • Perceptron
  • One of the earliest learning networks was
    proposed by
  • Rosenblatt in the late 1950's.

RULE net w1I1 w2I2 if net gt Q then output
1, otherwise o 0.
MODEL

31
  • Perceptron Example AND Operation

Initial Network
W W 1
32
  • Perceptron Example AND Operation

33
  • Perceptron Example AND Operation

34
  • Perceptron Example AND Operation

35
  • Perceptron Example AND Operation

36
  • Perceptron Example AND Operation

37
  • Perceptron Example AND Operation

38
  • Perceptron Example AND Operation

39
  • Perceptron Example AND Operation

40
  • Hidden Layer

41
  • Hidden Layer

42
  • Hidden Layer

0.5
1
1
- 2
1
1
1.5
1
1
1
1
1
1
43
How Many Hidden Nodes?
We have indicated the number of layers needed.
However, no indication is provided as to the
optimal number of nodes per layer. There is no
formal method to determine this optimal number
typically, one uses trial and error.
44
Hidden Units Q3() 0 62.50 5
61.60 10 61.50 15 62.60
20 62.30 30 62.50 40 62.70
60 61.40
45
JNET AND JPRED
CHRISSY ORIOL
46
JNET
  • Multiple Alignement
  • Neural Network
  • Consensus of methods

47
TRAINING AND TESTS
  • 480 proteins train (1996 PDB)
  • 406 proteins test (2000 PDB)
  • Blind test
  • 7-fold cross validation test

48
MULTIPLE ALIGNMENTS
49
ALIGNMENTS
  • Multiple sequence alignment constructed
  • Generation of profiles
  • Frequency counts of each residue / total
    residue in the column (expressed as percentage)
  • Each residue scored by its value from BLOSUM62
    and the scores were averaged based on the number
    of sequence in that column
  • Profile HMM generated by HMMER2
  • PSI-BLAST (Position Specific Iterative Basic
    Local Alignment Search Tool)
  • Frequency of residue
  • PSSM (Position Specific Scoring Matrix)

50
HMM PROFILE
  • Uses
  • Statistical descriptions of a sequence family's
    consensus
  • Position-specific scores for residues,
    insertions and deletions
  • Profiles
  • Captures important information about the degree
    of conservation at different positions
  • Varying degree to which gaps and insertions and
    deletions are permitted

51
PSI-BLAST PROFILE
Remove gaps in a and the column below the gaps
to form a restrained profile which better
represents sequence a
Align a and b
Full length seq. from the initial PSIBlast
search, extracted from the database, and ordered
by p-value
Align c to profile of a and b
Iterate addition of each sequence from PSIBlast
search until all are aligned
Alignment profile based on the query sequence to
be predicted
52
PSI-BLAST PROFILE
  • Iterative
  • Low complexity sequences polluted searching
    profile
  • Filtered database to mask out
  • Low complexity sequences (SEG)
  • Coiled-coil regions (HELIXFILT)
  • Transmembrane helices (HELIXFILT)

53
NUERAL NETWORK
54
NUERAL NETWORK
  • Two Nueral Network Used
  • 1st
  • Sliding window of 17 residues
  • 9 hidden nodes
  • 3 outputs
  • 2nd
  • Sliding window of 19 residue
  • 9 hidden nodes
  • 3 outputs

55
CONSENSUS COMBINATION OF PREDICTION METHODS
56
CONSENSUS COMBINATION OF PREDICTION METHODS
  • Jury Agreement (Identical predictions by all
    methods Q3 82)
  • No Jury (Q3 76.4)
  • Trained another neural network

57
ASSESMENT OF ACCURACY
Segment Overlap
Confidence 10 C (outmax - outnext)
58
RIBONUCLEASE A
KEY H helix E strand B - buried
residue - exposed residue no jury
59
JNET OUTPUT
YourSeq MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDF
VEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK
YourSeq YA60_PYRHO ERALIEAQIQAILRKILTPEARERL
ARVKLVRPELARQVELILVQLYQAGQITERIDDAKLKRILAQIEAKRREF
RIKW. YA60_PYRHO TF19_HUMAN
..KHREAEMRSILAQVLDQSARARLSNLALVKPEKTKAVENYLIQMARYG
QLSEKVSEQGLIEILKKVSQQEKTTTVKFN TF19_HUMAN
Q9VUZ8 ..MRAQEEMKSILSQVLDQQARARLNTLKVSKPE
KAQMFENMVIRMAQMGQVRGKLDDAQFVSILESVNAQQSKSSVKYD
Q9VUZ8 YRGK_CAEEL ARAENQETAKGMISQILDQAAMQRLS
NLAVAKPEKAQMVEAALINMARRGQLSGKMTDDGLKALMERVSAQQKATS
VKFD YRGK_CAEEL Y691_METJA
..ALLEAEMQALLRKILTPEARERLERIRLARPEFAEAVEVQLIQLAQLG
RLPIPLSDEDFKALLERISALKRKREIKIV Y691_METJA
YK68_ARCFU MRRQVEAQKKAILRAILEPEAKERLSRLKLAHPE
IAEAVENQLIYLAQAGRIQSKITDKMLVEILKRVQPKKRETRIIRK
YK68_ARCFU YF69_SCHPO ..QEVQDEMRNLLSQILEHPAR
DRLRRIALVRKDRAEAVEELLLRMAKTGQISHKISEPELIELLEKISGEK
RNETKIVI YF69_SCHPO YMW4_YEAST
.AGGGENSAPAAIANFLEPQALERLSRVALVRRDRAQAVETYLKKLIATN
NVTHKITEAEIVSILNGIAKQQNNSKIIFE YMW4_YEAST  
1---------11--------21--------31-----
---41--------51--------61--------71--------
OrigSeq MRQQLEMQKKQIMMQILTPEARSRLANLRLTRP
DFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK
OrigSeq   jalign --HHHHHHHHHHHHHHHHHHHHHHH
HHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH----
EE--- jalign jfreq -HHHHHHHHHHHHHHHHHHH
HHHHHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH-
---EEEEE-- jfreq jhmm
-HHHHHHHHHHHHHHH---HHHHHHHHHHH----HHHHHHHHHHHHHHH-
-------HHHHHHHHHHHHHH---EEEEE- jhmm jnet
-HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHH
HH--------HHHHHHHHHHHH-----EEEEE- jnet jpssm
--HHHHHHHHHHHHHH--HHHHHHH-HEEEE---HHHHHHHHHH
HHHHH--------HHHHHHHHHHHH-----EEE---
jpssm   Jpred -HHHHHHHHHHHHHHHHHHHHHHHHHH
HHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEE
E-- Jpred   MCoil ---------------------
--------------------------------------------------
--------- MCoil MCoilDI
--------------------------------------------------
------------------------------ MCoilDI MCoilTRI
--------------------------------------------
------------------------------------
MCoilTRI Lupas 21 --------------------------
--------------------------------------------------
---- Lupas 21 Lupas 14 -------------------
--------------------------------------------------
----------- Lupas 14 Lupas 28
--------------------------------------------------
------------------------------ Lupas
28   Jnet_25 ---BB---B--BBB-BB---B--BB--B-B
B---BB-BBB-BBB-BB-BB-B---B----BB-BB--B--------B---
Jnet_25 Jnet_5 -----------BB--B----B---
B--B----------B---B--B--------------B--BB---------
------ Jnet_5 Jnet_0 -------------------
-------------------B---B--B--------------B--------
----------- Jnet_0 Jnet Rel
79889998888998643697888849188454657899999999988626
987657778999999986007883747728 Jnet Rel
60
JPRED SERVER Consensus web server
  • JNET default method
  • PREDATOR
  • Neural network focused on predicting hydrogen
    bonds
  • PHD - PredictProtein
  • Neural network focused on predicting hydrogen
    bonds

61
JPRED SERVER cont.
  • NNSSP Nearest-neighbor SS prediction
  • DSC Discrimination of protein Secondary
  • structure Class
  • Based on dividing secondary structure
    prediction into the basic concepts for prediction
    and then use simple and linear statistical
    methods to combine the concepts for prediction
  • ZPRED
  • physiochemical information
  • MULPRED
  • Single sequence method combination

62
YourSeq MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDF
VEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK
YourSeq YA60_PYRHO ERALIEAQIQAILRKILTPEARERL
ARVKLVRPELARQVELILVQLYQAGQITERIDDAKLKRILAQIEAKRREF
RIKW. YA60_PYRHO TF19_HUMAN
..KHREAEMRSILAQVLDQSARARLSNLALVKPEKTKAVENYLIQMARYG
QLSEKVSEQGLIEILKKVSQQEKTTTVKFN TF19_HUMAN
Q9VUZ8 ..MRAQEEMKSILSQVLDQQARARLNTLKVSKPE
KAQMFENMVIRMAQMGQVRGKLDDAQFVSILESVNAQQSKSSVKYD
Q9VUZ8 YRGK_CAEEL ARAENQETAKGMISQILDQAAMQRLS
NLAVAKPEKAQMVEAALINMARRGQLSGKMTDDGLKALMERVSAQQKATS
VKFD YRGK_CAEEL Y691_METJA
..ALLEAEMQALLRKILTPEARERLERIRLARPEFAEAVEVQLIQLAQLG
RLPIPLSDEDFKALLERISALKRKREIKIV Y691_METJA
YK68_ARCFU MRRQVEAQKKAILRAILEPEAKERLSRLKLAHPE
IAEAVENQLIYLAQAGRIQSKITDKMLVEILKRVQPKKRETRIIRK
YK68_ARCFU YF69_SCHPO ..QEVQDEMRNLLSQILEHPAR
DRLRRIALVRKDRAEAVEELLLRMAKTGQISHKISEPELIELLEKISGEK
RNETKIVI YF69_SCHPO YMW4_YEAST
.AGGGENSAPAAIANFLEPQALERLSRVALVRRDRAQAVETYLKKLIATN
NVTHKITEAEIVSILNGIAKQQNNSKIIFE YMW4_YEAST
consv --3-273433568336-522-43--258385738
36556-2384484316682-37581274298238323542-3422-
consv 1---------11--------21-------
-31--------41--------51--------61--------71-------
- OrigSeq MRQQLEMQKKQIMMQILTPEARSRLANLRLT
RPDFVEQIELQLIQLAQMGRVRSKITDEQLKELLKRVAGKKREIKISRK
OrigSeq   jalign --HHHHHHHHHHHHHHHHHHHHH
HHHHHHH---HHHHHHHHHHHHHHHH--------HHHHHHHHHHHHHH--
--EE--- jalign jfreq
-HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHHHHHHHH-
-------HHHHHHHHHHHH----EEEEE-- jfreq jhmm
-HHHHHHHHHHHHHHH---HHHHHHHHHHH----HHHHHHHHHHHH
HHH--------HHHHHHHHHHHHHH---EEEEE- jhmm jnet
-HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHHHH
HHHHHH--------HHHHHHHHHHHH-----EEEEE-
jnet jpssm --HHHHHHHHHHHHHH--HHHHHHH-HEEE
E---HHHHHHHHHHHHHHH--------HHHHHHHHHHHH-----EEE---
jpssm mul --HHHHHHHHHHHHHHHHH--HHHHH
HHH-H--HHHHHHHHHHHHHH----------HHHHHHHHHHHHHHH--H-
EEE- mul nnssp HHHHHHHHHHHHHHHHHHHHHHHH
HHHHHH--HHHHHHHHHHHHHHHHH--------HHHHHHHHHHHHH----
-EEEEE nnssp phd ---HHHHHHHHHHHHHHHHH
HHHHHHHHHH--HHHHHHHHHHHHHHHHH--------HHHHHHHHHHHHH
H----EEE-- phd pred
---HHHHHHHHHHHHHHHHHHHHHHHHHHHHH-HHHHHHHHHHHHHHH--
-----HHHHHHHHHHHHHHHHHHHHH---- pred zpred
--HHHHHHHHHHHHHEHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
HH-EE----HHHHHHHHHHHHHHHHH---EE-- zpred   Jpred
-HHHHHHHHHHHHHHHHHHHHHHHHHHHHH---HHHHHHHH
HHHHHHHH--------HHHHHHHHHHHHH----EEEE--
Jpred PHDHtm ----------------------------
--------------------------------------------------
-- PHDHtm MCoil -----------------------
--------------------------------------------------
------- MCoil MCoilDI -------------------
--------------------------------------------------
----------- MCoilDI MCoilTRI
--------------------------------------------------
------------------------------ MCoilTRI Lupas
21 -----------------------------------------
--------------------------------------- Lupas
21 Lupas 14 --------------------------------
------------------------------------------------
Lupas 14 Lupas 28 ------------------------
--------------------------------------------------
------ Lupas 28   PHDacc
----B---B-BBBBBBB---B---BB-B-BB----B-BB-BBBB-BB-BB
-B---B----B--BB--B------B-B-U- PHDacc Jnet_25
---BB---B--BBB-BB---B--BB--B-BB---BB-BBB-BBB-
BB-BB-B---B----BB-BB--B--------B---
Jnet_25 Jnet_5 -----------BB--B----B---B--
B----------B---B--B--------------B--BB------------
--- Jnet_5 Jnet_0 ----------------------
----------------B---B--B--------------B-----------
-------- Jnet_0   PHD Rel
97527999999999999899999999986315269999999999999964
332235649999999999962356225319 PHD Rel Pred Rel
00777700999990990609990999886606668099999999
009677787757768989909999957077777000 Predator
Rel Jnet Rel 7988999888899864369788884918845
4657899999999988626987657778999999986007883747728
Jnet Rel
63
Accuracy Evaluation
  • By Liang-Yu Shih

64
Methods
  • Per-residue accuracy
  • Q3 measurement traditional way
  • Mathews correlation coefficient
  • Per-segment accuracy
  • SOV measurement CASP2
  • Subcategorizing the incorrect prediction
  • Over predict alpha/beta when it is coil
  • Under predict coil when it is alpha/beta
  • Wrong predict alpha when it is beta or vice
    versa

65
How to measure Q3
  • Qindex
  • Qhelix, Qstrand and Qcoil for a single
    conformational state
  • Qi (number of residues correctly
    predicted
  • in state i)/(number of residues
    observed in state i) x 100
  • Q3 for all three states
  • Q3 (number of residues correctly
    predicted)/(number of all residues) x 100

66
How to measure Matthew coefficients
67
Problems in per-residue accuracy
  • It does not reflect 3D structure.
  • Example assigning the entire myoblobin chain as
    a single helix gives a Q3 score of 80.
  • Conformational variation observed at secondary
    structure segment ends.
  • Example low Q3 value but can predict folding
    well.

68
Q What is a good measure?A A structurally
oriented measure
  • A structurally oriented measure consider the
    following..
  • Type and position of secondary structure segments
    rather than a per-residue assignment of
    conformational state.
  • Natural variation of segment boundaries among
    families of homologous proteins.

69
How to measure SOV

70
SOV Example
  • Observed (S1) CCEEECCCCCCEEEEEECCC
  • Predicted (S2) CCCCCCCEEEEECCCEECCC
    Minov
  • Maxov

71
SOV Example Cont.
  • Sov(E)

EEECCCCCCEEEEEE
S(E) S(E) S(E)
S(E)
minov(s1, s2) delta(s1,s2) / maxov(s1, s2)
Delta(s1,s2)min(10-1)(1)(15/2)(10/2)
Delta(s1,s2)min(6-2)(2)(15/2)(10/2)
72
Evaluation-Step 1(query sequence)
  • Hypothetical Protein
  • MRQQLEMQKKQIMMQILTPEARSRLANLRLTRPDFVEQIQLIQLAQMGR
    VRSKITDEQLKELLKRVAGKKREIKISRK
  • 80 residues
  • Methanothermobacter thermautotrophicus
  • Structures solved by NMR
  • Christendat,D., et al. Nat. Struct. Biol. 7 (10),
    903-909 (2000)

73
Evaluation-Step 2 (programs)
 
74
Severs
  • APSSPhttp//imtech.ernet.in/raghava/apssp/
  • JPred http//jura.ebi.ac.uk8888/
  • PHDhttp//cubic.bioc.columbia.edu/predictprotein
  • PROFsechttp//cubic.bioc.columbia.edu/predictprote
    in
  • PSIpredhttp//insulin.brunel.ac.uk/psiform.html
  • SAM-T99sec http//www.cse.ucsc.edu/research/compbi
    o/HMM-apps/T99-query.html

75
Evaluation-Step 3
  • Conversion of DSSP secondary structure from 8
    states to 3 states


H alpha helix E beta strand L coil (others)
76
Evaluation-Step 4
  • First column protein sequence (AA) in one-letter
    code
  • Second column observed (OSEC) secondary
    structure
  • Third column predicted (PSEC) secondary
    structure
  • http//predictioncenter.llnl.gov/local/sov/sov.htm
    l

77
Evaluation-Result
 
78
EVA Evaluation of Automatic protein structure
prediction
http//cubic.bioc.columbia.edu/eva/sec/graph/commo
n3.jpg
79
Conclusion
  • Jpred is the pioneer of methods which give high
    Q3 and SOV scores.
  • The 2ndary structure prediction using a jury of
    neural networks is one of the best methods.

80
REFERENCES
  • Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton
    GJ. Jpred A consensus secondary structure
    prediction server, Bioinformatics,
    199814892-893.
  • Cuff,J.A. and Barton, G.J. Evaluation and
    improvement of multiple sequence methods for
    protein secondary structure prediction.
    Proteins Structure, Functions, and Genetics,
    199934508-519.
  • Cuff,J.A. and Barton, G.J. Application of
    multiple sequence alignment profiles to improve
    protein secondary structure prediction.
    Proteins Structure, Functions, and Genetics,
    200040502-511.

4. Zemla et al. A modified definition of Sov,
a Segment-Based Measure for Protein Secondary
Structure Prediction Assessment. Protein
199934220-223   5. Defay T, Cohen F.
Evaluation of current techniques for ab initio
protein structure prediction. Proteins 1995
23431-445.   6. Barton GJ. Protein secondary
structure prediction. Curr Opin Struct Biol 1995
5372-376   7. Schulz GE. A critical evaluation
of methods for prediction of secondary
structures. Ann Rev Biophys Chem 1988
171-21   8. Zhu Z-Y. A new approach to the
evaluation of protein secondary structure
predictions at the level of the elements of
secondary strucuter. Protein Eng 1995 8103-108
Write a Comment
User Comments (0)
About PowerShow.com