Protein structure prediction - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

Protein structure prediction

Description:

PSI-PRED secondary structure prediction based on PSIBLAST ... Strand(E), Helix(H) and Coil(C) AA: RLMPHIKRSAIPVNHGQCRWEDNVDERTNCMIQYVLIMRD ... – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 62
Provided by: lir2
Category:

less

Transcript and Presenter's Notes

Title: Protein structure prediction


1
Protein structure prediction
Einat Granot Liron Atedgi
2
Protein folding
  • Protein folding determined by AA sequence
  • Why knowing the folding is importance ?
  • Determine its functionality
  • Find distant evolutionary relationship
  • Design drugs

3
Protein structures
  • Primary structure
  • Secondary structure
  • Tertiary structure

4
Two prediction methods
  • PSI-PRED secondary structure prediction based
    on PSIBLAST
  • GenTHREADER tertiary structure prediction

Were developed by the group of David
T.Jones,University of Warwick
5
Methods general format
  • Sequence
  • Alignment
  • Additional
  • data

Neuron networks
Structure prediction
6

Neuron networks
7
Neuron networks
Output
Numerical inputs
Units
Why do we call it neuron network ?
Every unit performs weighted calculation
8
Neuron network hidden layer
with the increasing number of added layers the
mean square error is lower
Hidden layer
9
Neuron networks training
  • Network connections and weights determined by
    training process
  • Training performs by samples of input and
    expected output.
  • The learning algorithm is called back propagation

10
Network training testing
  • After training we perform testing
  • Training and testing groups must be chosen very
    carefully
  • What problems can arise ?
  • Insufficient training or testing
  • Testing group may be biased

11
Neuron networks is a black-box
  • The specific algorithm ofa working neuron
    networkis not known
  • Its hard to deduce new biological principles
    about the solved problem

12
PSI-PRED Secondary structure prediction
13
Secondary structure prediction
  • In DSSP 8 secondary structures categories
  • In PSI-PRED were joined into 3Strand(E),
    Helix(H) and Coil(C)

AA RLMPHIKRSAIPVNHGQCRWEDNVDERTNCMIQYVLIMRD Pre
d CCCCCHHHCCCCCCEEEEEECCCCCCHHHHEEEEEECCCC
14
PSI-PRED
  • sequence alignment (Find homologous)
  • Create protein profile
  • Insert to first neuron network
  • Insert to second neuron network
  • Final prediction

15
sequence alignment
  • Finding homologous for target protein using
    PSI-BLAST
  • Reminder ? What is PSI-BLAST?
  • Position Specific Iterated Blast,giving
    output to PSSM.

16
PSI-BLAST Pros Cons
  • Pros
  • Sensitive to distant homologous
  • Reliable
  • Accessible from every workstation
  • Cons
  • Sensitive to distant homologous - Result might be
    biased
  • Sensitive to repetitive sequences

17
Solving PSI-BLAST problems
  • A special DB of 340,000 sequences was constructed
    for PSI-PRED
  • This DB contains only unique and unrepetitive
    sequences

18
PSI-PRED
  • sequence alignment (Find homologous)
  • Create protein profile
  • Insert to first neuron network
  • Insert to second neuron network
  • Final prediction

19
Create protein profile
  • PSI-PRED uses the PSSM from PSI-BLAST produced
    after 3 iteration
  • This matrix is processed by transformation
  • f(x) , so the final values are
  • between 0 to 1

20
PSSM Output of PSI-BLAST
Transformation
21
Create protein profile
  • The matrix size is M x 20, when M is the sequence
    length
  • Addition column is added which defined the N/C
    terminus -gt M x 21 matrix

22
PSI-PRED
  • sequence alignment (Find homologous)
  • Create protein profile
  • Insert to first neuron network
  • Insert to second neuron network
  • Final prediction

23
Networks training testing
  • 187 proteins were selected according to CATH and
    PSI-BLAST
  • CATH filters proteins according to their folding
    domains configuration (T-level)
  • This considered to be a strict selection

24
First neuron network
Every time, a sequence of 15 AA long is inserted
into the first network
E H C
0.2 0.5 0.8
0.1 0.2 0.9
0.8 0.4 0.9
0.2 0.5 0.3
0.7 0.8 0.3
0.2 0.1 0.5
0.3 0.8 0.4
0.9 0.8 0.1
0.2 0.2 0.6
0.7 0.3 0.6
0.3 0.6 0.8
0.3 0.1 0.8
0.7 0.4 0.4
0.6 0.6 0.1
0.4 0.3 0.6
The output is a matrix 15 x 3
25
PSI-PRED
  • sequence alignment (Find homologous)
  • Create protein profile
  • Insert to first neuron network
  • Insert to second neuron network
  • Final prediction

26
Second neuron network
The input for the 2nd network is the output from
the 1st one
N/C E H C
0 0.2 0.5 0.8
0 0.1 0.2 0.9
0 0.8 0.4 0.9
0 0.2 0.5 0.3
0 0.7 0.8 0.3
0 0.2 0.1 0.5
0 0.3 0.8 0.4
0 0.9 0.8 0.1
0 0.2 0.2 0.6
0 0.7 0.3 0.6
0 0.3 0.6 0.8
0 0.3 0.1 0.8
0 0.7 0.4 0.4
0 0.6 0.6 0.1
1 0.4 0.3 0.6
E H C
0.1 0.5 0.9
0.2 0.1 0.9
0.3 0.4 0.8
0.2 0.7 0.1
0.5 0.8 0.2
0.2 0.7 0.3
0.2 0.9 0.3
0.6 0.8 0.3
0.1 0.3 0.7
0.4 0.2 0.8
0.3 0.5 0.9
0.3 0.2 0.8
0.9 0.2 0.1
0.8 0.3 0.1
0.8 0.3 0.4
Again, another column is added, indicates the N/C
terminus
27
Why do we need a second network?
  • Lets examine a possible prediction from
  • the 1st network
  • What is the problem with this prediction ?

Seq VLFLNDNLDDVVIGRPKRTYTAITL Pred
EEEECCCCHHHCCCHCCCEEEECC
A single AA helix does not exist
The 2nd network maintains the coherency between
adjacent AA and improves the accuracy
28
PSI-PRED
  • sequence alignment (Find homologous)
  • Create protein profile
  • Insert to first neuron network
  • Insert to second neuron network
  • Final prediction

29
Final prediction
Image of prediction
Degree Of confidence
Target sequence
Secondary structure
30
PSI-PRED evaluation
  • CASP Critical Assessment of technique for
    protein Structure Prediction experiments
  • At CASP3 PSI-PRED achieved the best results from
    all other methods participated

31
PSI-PRED evaluation
Q3 average PSI-PRED - 76.3 JPRED
72.4 DSC - 67.3
Q3 score percentage of AA predicted correctly
32

Reasons for success
  • The use of PSI-BLAST
  • More sensitive (iterative algorithm)
  • More accurate (pairwise local alignments)
  • Usage of neuron networks
  • Strict selection for training testing

33
Possible improvements
  • Larger data bases (training alignment)
  • Combinations with other methods (JPRED)
  • Predict more than 3 secondary structure

34
Bring out the food
35
GenTHREADER Tertiary structure Prediction

36
Threading methods
  • Trying to thread a target AA sequence on a
    template 3D structure

N
S
Q
M
V
D
L
I
R
E
R
A
Q
T
V
L
C
N
K
37
Templates collection
  • Target sequence is compared against a collection
    of sequences with known folding
  • The collection was taken from Brookhaven Protein
    Data Bank and includes unique sequences

38
GenTHREADER
  • Sequence alignment
  • Calculate threading potential
  • Insert to neuron network
  • Final prediction

39
Sequence alignment
  • The target sequence is aligned against each of
    the templates twice
  • Target profile against template sequence
  • Target sequence against template profile
  • The best result is taken

40
Creating a profile
  • Steps for creating a profile
  • Alignment against OWL DB(A DB for coding
    sequences)
  • Selection of sequences with E-Value lower than
    0.01
  • Constructing a profile using BLOSUM50

41
Creating a profile
A L M P H I K R S A I P V N H G Y V I M Q C R W E
D N S T K V
42
GenTHREADER
  • Sequence alignment
  • Calculate threading potential
  • Insert to neuron network
  • Final prediction

43
Calculate threading potential
  • Threading potential includes
  • pairwise potential
  • solvation potential

44
Pairwise potential
  • Potential for interaction between two AA
  • Considerate analysis of known structure and
    favorable energy configuration
  • Lower pairwise potential indicates a favorable
    state

45
Solvation potential
  • Calculated per AA and proportional to its degree
    of burial
  • Degree of burial (DOB) The num of other AA
    located in a radius of 10Ã…
  • Hydrophobic acids - a high DOB is preferred
  • Hydrophilic acids - a low DOB is preferred

46
GenTHREADER
  • Sequence alignment
  • Calculate threading potential
  • Insert to neuron network
  • Final prediction

47
Insert to neuron network
  • Prediction is very complex therefore a neuron
    network is used

48
Neuron network
  • Again, the 6 input parameters were converted to
    values between 0 1 using the function f(x)
  • The output is a value between 0 -1 showing the
    confidence of the match

49
Network training testing
  • The network was trained using pairs of proteins
    with known folding patterns
  • Again the training and testing sets were
    separated to avoid bias

50
GenTHREADER
  • Sequence alignment
  • Calculate threading potential
  • Insert to neuron network
  • Final prediction

51
Final prediction
  • Example for GenTHREADER results

52
GenTHREADER evaluation
  • Evaluated using Fischer benchmarking, 68 hard for
    predictions proteins pairs
  • 73.5 were properly detectedBest method - 76.5,
    Other methods 50-60
  • For the unrecognized proteins the scores were
    less than 0.5

53
Genome analysis example
  • As an example for GenTHREADER efficiency
    Mycoplasma Genitalium genome was analyzed
  • M.Genitalium is the smallest known bacterial
    genome contains 468 ORFs

54
Genome analyzing example
In one day the whole genome was analyzed
Confidence categories
Distribution of protein domain architecture
55
Genome analyzing example
GenTHREADER succeed to predict 46 from the
ORFs. For comparison, in other methods Fischer
Eisenberg 22 Using PSI-BLAST OWL 30
Huynen 38
56
Genome analyzing example
  • While analyzing M.Genitalium genome ORF MG353 was
    assigned to 1HUE
  • (Histone like protein)
  • ORF MG353 function wasnot known

57
Genome analyzing example
  • The results show high similarity in both
    secondary structure and functional AA

58
GenTHREADER advantages
  • Fast
  • No need for human intervention
  • Can distinguish false positive in high accuracy

59
Possible improvements
  • Improvement of sequence alignment (PSI-BLAST)
  • Additional input of sequence features (for
    example, secondary structure)
  • Larger DB of known folding proteins
  • Combination with other methods

Already exists (mGenTHREADER)
60
Dont try this at home
  • www.psipred.net
  • The web is for both GenTHREADER PSI-PRED

61
References
  • Jones DT. Protein secondary structure prediction
    based on position-specific matrices. J Mol Biol.
    1999 292195-202
  • Jones DT. GenTHREADER an efficient and reliable
    protein fold recognition method for genomic
    sequences. J Mol Biol. 1999 287797-815
Write a Comment
User Comments (0)
About PowerShow.com