Protein Secondary Structures - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Protein Secondary Structures

Description:

Protein Secondary Structures Assignment and prediction Secondary Structure Elements Use of secondary structure Classification of protein structures Definition of ... – PowerPoint PPT presentation

Number of Views:198
Avg rating:3.0/5.0
Slides: 38
Provided by: Claus78
Category:

less

Transcript and Presenter's Notes

Title: Protein Secondary Structures


1
Protein Secondary Structures
  • Assignment and prediction

2
Secondary Structure Elements
ß-strand
Helix
Bend
Turn
3
Use of secondary structure
  • Classification of protein structures
  • Definition of loops/core
  • Use in fold recognition methods
  • Improvements of alignments
  • Definition of domain boundaries

4
Classification of secondary structure
  • Defining features
  • Dihedral angles
  • Hydrogen bonds
  • Geometry
  • Assigned manually by crystallographers or
  • Automatic
  • DSSP (Kabsch Sander,1983)
  • STRIDE (Frishman Argos, 1995)
  • Continuum (Andersen et al., 2002)

5
Dihedral Angles
From http//www.imb-jena.de
phi - dihedral angle about the N-Calpha
bond psi - dihedral angle about the
Calpha-C bond omega - dihedral angle about
the C-N (peptide) bond
6
Helices
phi(deg) psi(deg)
H-bond pattern ----------------------------------
-------------------------------- right-handed
alpha-helix -57.8 -47.0
i4 pi-helix -57.1
-69.7 i5 3-10 helix
-74.0 -4.0 i3 (omega is 180 deg in
all cases) ---------------------------------------
--------------------------
From http//www.imb-jena.de
7
Beta Strands
phi(deg) psi(deg)
omega (deg) ------------------------------------
------------------------------ beta strand
-120 120 180
-------------------------------------------------
----------------
Hydrogen bond patterns in beta sheets. Here a
four-stranded beta sheet is drawn schematically
which contains three antiparallel and one
parallel strand. Hydrogen bonds are indicated
with red lines (antiparallel strands) and green
lines (parallel strands) connecting the hydrogen
and receptor oxygen.
From http//broccoli.mfn.ki.se/pps_course_96/
8
Secondary Structure Elements
ß-strand
Helix
Bend
Turn
9
Secondary Structure Type Descriptions
H alpha helix B residue in isolated
beta-bridge E extended strand, participates
in beta ladder G 3-helix (3/10 helix) I
5 helix (pi helix) T hydrogen bonded turn
S bend
10
Automatic assignment programs
  • DSSP ( http//www.cmbi.kun.nl/gv/dssp/ )
  • Continuum ( http//cubic.bioc.columbia.edu/service
    s/DSSPcont/ )
  • STRIDE ( http//www.hgmp.mrc.ac.uk/Registered/Opti
    on/stride.html )

RESIDUE AA STRUCTURE BP1 BP2 ACC N-H--gtO
O--gtH-N N-H--gtO O--gtH-N TCO KAPPA ALPHA
PHI PSI X-CA Y-CA Z-CA 1 4 A E
0 0 205 0, 0.0 2,-0.3 0, 0.0
0, 0.0 0.000 360.0 360.0 360.0 113.5 5.7
42.2 25.1 2 5 A H - 0 0
127 2, 0.0 2,-0.4 21, 0.0 21, 0.0 -0.987
360.0-152.8-149.1 154.0 9.4 41.3 24.7
3 6 A V - 0 0 66 -2,-0.3
21,-2.6 2, 0.0 2,-0.5 -0.995
4.6-170.2-134.3 126.3 11.5 38.4 23.5 4
7 A I E -A 23 0A 106 -2,-0.4
2,-0.4 19,-0.2 19,-0.2 -0.976
13.9-170.8-114.8 126.6 15.0 37.6 24.5 5
8 A I E -A 22 0A 74 17,-2.8
17,-2.8 -2,-0.5 2,-0.9 -0.972
20.8-158.4-125.4 129.1 16.6 34.9 22.4 6
9 A Q E -A 21 0A 86 -2,-0.4
2,-0.4 15,-0.2 15,-0.2 -0.910 29.5-170.4
-98.9 106.4 19.9 33.0 23.0 7 10 A A
E A 20 0A 18 13,-2.5 13,-2.5
-2,-0.9 2,-0.3 -0.852 11.5 172.8-108.1 141.7
20.7 31.8 19.5 8 11 A E E A 19
0A 63 -2,-0.4 2,-0.3 11,-0.2 11,-0.2
-0.933 4.4 175.4-139.1 156.9 23.4 29.4
18.4 9 12 A F E -A 18 0A 31
9,-1.5 9,-1.8 -2,-0.3 2,-0.4 -0.967
13.3-160.9-160.6 151.3 24.4 27.6 15.3 10
13 A Y E -A 17 0A 36 -2,-0.3
2,-0.4 7,-0.2 7,-0.2 -0.994
16.5-156.0-136.8 132.1 27.2 25.3 14.1 11
14 A L E gtgt -A 16 0A 24 5,-3.2
4,-1.7 -2,-0.4 5,-1.3 -0.929
11.7-122.6-120.0 133.5 28.0 24.8 10.4 12
15 A N T 45S 0 0 54 -2,-0.4 -2,
0.0 2,-0.2 0, 0.0 -0.884 84.3 9.0-113.8
150.9 29.7 22.0 8.6 13 16 A P T
45S 0 0 114 0, 0.0 -1,-0.2 0, 0.0
-2, 0.0 -0.963 125.4 60.5 -86.5 8.5 32.0
21.6 6.8 14 17 A D T 45S- 0 0
66 2,-0.1 -2,-0.2 1,-0.1 3,-0.1 0.752
89.3-146.2 -64.6 -23.0 33.0 25.2 7.6 15
18 A Q T lt5 0 0 132 -4,-1.7
2,-0.3 1,-0.2 -3,-0.2 0.936 51.1 134.1
52.9 50.0 33.3 24.2 11.2 16 19 A S E
lt A 11 0A 44 -5,-1.3 -5,-3.2 2, 0.0
2,-0.3 -0.877 28.9 174.9-124.8 156.8 32.1
27.7 12.3 17 20 A G E -A 10 0A
28 -2,-0.3 2,-0.3 -7,-0.2 -7,-0.2 -0.893
15.9-146.5-151.0-178.9 29.6 28.7 14.8 18
21 A E E -A 9 0A 14 -9,-1.8
-9,-1.5 -2,-0.3 2,-0.4 -0.979
5.0-169.6-158.6 146.0 28.0 31.5 16.7 19
22 A F E A 8 0A 3 12,-0.4
12,-2.3 -2,-0.3 2,-0.3 -0.982 27.8
149.2-139.1 120.3 26.5 32.2 20.1 20 23
A M E -AB 7 30A 0 -13,-2.5 -13,-2.5
-2,-0.4 2,-0.4 -0.983 39.7-127.8-152.1 161.6
24.5 35.4 20.6 21 24 A F E -AB 6
29A 45 8,-2.4 7,-2.9 -2,-0.3 8,-1.0
-0.934 23.9-164.1-112.5 137.7 21.7 37.0
22.6 22 25 A D E -AB 5 27A 6
-17,-2.8 -17,-2.8 -2,-0.4 2,-0.5 -0.948
6.9-165.0-123.7 138.3 18.9 38.9 20.8 23
26 A F E gt S-AB 4 26A 76 3,-3.5
3,-2.1 -2,-0.4 -19,-0.2 -0.947 78.4
-27.2-127.3 111.5 16.4 41.3 22.3 24 27
A D T 3 S- 0 0 74 -21,-2.6 -20,-0.1
-2,-0.5 -1,-0.1 0.904 128.9 -46.6 50.4 45.0
13.4 42.1 20.2 25 28 A G T 3 S 0
0 20 -22,-0.3 2,-0.4 1,-0.2 -1,-0.3
0.291 118.8 109.3 84.7 -11.1 15.4 41.4
17.0 26 29 A D E lt S-B 23 0A 114
-3,-2.1 -3,-3.5 109, 0.0 2,-0.3 -0.822
71.8-114.7-103.1 140.3 18.4 43.4 18.1 27
30 A E E -B 22 0A 8 -2,-0.4
-5,-0.3 -5,-0.2 3,-0.1 -0.525 24.9-177.7
-74.1 127.5 21.8 41.8 19.1
11
Secondary Structure Prediction
  • What to predict?
  • All 8 types or pool types into groups

Q3
H
H a helix B residue in isolated b-bridge
E extended strand, participates in b ladder
G 3-helix (3/10 helix)
E
I 5 helix (p helix)
T hydrogen bonded turn S bend C/.
random coil
C
Straight HEC
CASP
12
Secondary Structure Prediction
  • Simple alignments
  • Align to a close homolog for which the structure
    has been experimentally solved.
  • Heuristic Methods (e.g., Chou-Fasman, 1974)
  • Apply scores for each amino acid an sum up over a
    window.
  • Neural Networks (different inputs)
  • Raw Sequence (late 80s)
  • Blosum matrix (e.g., PhD, early 90s)
  • Position specific alignment profiles (e.g.,
    PsiPred, late 90s)
  • Multiple networks balloting, probability
    conversion, output expansion (Petersen et al.,
    2000).

13
Improvement of accuracy
1974 Chou Fasman 50-53 1978
Garnier 63 1987 Zvelebil 66 1988 Quian
Sejnowski 64.3 1993 Rost Sander 70.8-72.0 199
7 Frishman Argos lt75 1999 Cuff
Barton 72.9 1999 Jones 76.5 2000 Petersen et
al. 77.9
14
Simple Alignments
  • Solved structure of a homolog to query is needed
  • Homologous proteins have 88 identical (3
    state) secondary structure
  • If no close homologue can be identified
    alignments will give almost random results

15
Amino acid preferences in a-Helix
16
Amino acid preferences in b-Strand
17
Amino acid preferences in coil
18
Chou-Fasman
Name P(a) P(b) P(turn) f(i) f(i1) f(i2) f(i3) A
la 142 83 66 0.06 0.076 0.035 0.058 Arg
98 93 95 0.070 0.106 0.099 0.085 Asp
101 54 146 0.147 0.110 0.179 0.081 Asn
67 89 156 0.161 0.083 0.191 0.091 Cys
70 119 119 0.149 0.050 0.117 0.128 Glu
151 37 74 0.056 0.060 0.077 0.064 Gln
111 110 98 0.074 0.098 0.037 0.098 Gly
57 75 156 0.102 0.085 0.190 0.152 His
100 87 95 0.140 0.047 0.093 0.054 Ile
108 160 47 0.043 0.034 0.013 0.056 Leu
121 130 59 0.061 0.025 0.036 0.070 Lys
114 74 101 0.055 0.115 0.072 0.095 Met
145 105 60 0.068 0.082 0.014 0.055 Phe
113 138 60 0.059 0.041 0.065 0.065 Pro
57 55 152 0.102 0.301 0.034 0.068 Ser
77 75 143 0.120 0.139 0.125 0.106 Thr
83 119 96 0.086 0.108 0.065 0.079 Trp
108 137 96 0.077 0.013 0.064 0.167 Tyr
69 147 114 0.082 0.065 0.114 0.125 Val
106 170 50 0.062 0.048 0.028 0.053
19
Chou-Fasman
1. Assign all of the residues in the peptide the
appropriate set of parameters. 2. Scan through
the peptide and identify regions where 4 out of 6
contiguous residues have P(a-helix) gt 100. That
region is declared an alpha-helix. Extend the
helix in both directions until a set of four
contiguous residues that have an average
P(a-helix) lt 100 is reached. That is declared the
end of the helix. If the segment defined by this
procedure is longer than 5 residues and the
average P(a-helix) gt P(b-sheet) for that segment,
the segment can be assigned as a
helix. 3. Repeat this procedure to locate all of
the helical regions in the sequence. 4. Scan
through the peptide and identify a region where 3
out of 5 of the residues have a value of
P(b-sheet) gt 100. That region is declared as a
beta-sheet. Extend the sheet in both directions
until a set of four contiguous residues that have
an average P(b-sheet) lt 100 is reached. That is
declared the end of the beta-sheet. Any segment
of the region located by this procedure is
assigned as a beta-sheet if the average
P(b-sheet) gt 105 and the average P(b-sheet) gt
P(a-helix) for that region. 5. Any region
containing overlapping alpha-helical and
beta-sheet assignments are taken to be helical if
the average P(a-helix) gt P(b-sheet) for that
region. It is a beta sheet if the average
P(b-sheet) gt P(a-helix) for that region. 6. To
identify a bend at residue number j, calculate
the following value p(t) f(j)f(j1)f(j2)f(j3)
where the f(j1) value for the j1 residue is
used, the f(j2) value for the j2 residue is
used and the f(j3) value for the j3 residue is
used. If (1) p(t) gt 0.000075 (2) the average
value for P(turn) gt 1.00 in the tetra-peptide
and (3) the averages for the tetra-peptide obey
the inequality P(a-helix) lt P(turn) gt P(b-sheet),
then a beta-turn is predicted at that location.
20
Chou-Fasman
  • General applicable
  • Works for sequences with no solved homologs
  • But, Low Accuracy

21
Neural Networks
  • Benefits
  • General applicable
  • Can capture higher order correlations
  • Inputs other than sequence information
  • Drawbacks
  • Needs many data (different solved structures).
    However, theese does exist today (nearly 2000
    solved structures with low sequence identity.
  • Complex methods with several pitfalls.

22
Architecture
Weights
Input Layer
I
K
H
Output Layer
E
E
E
C
H
V
I
I
Q
A
E
Hidden Layer
Window
IKEEHVIIQAEFYLNPDQSGEF..
23
Sparse encoding
Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 AAcid A 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 R
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 N 0 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 D
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 C 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 Q 0 0
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 E 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0
24
Input Layer
0
0
0
0
I
0
K
0
E
1
E
0
H
0
V
0
I
0
I
0
Q
0
A
0
E
0
0
0
0
0
0
25
BLOSUM 62
A R N D C Q E G H I L K M F P S
T W Y V B Z X A 4 -1 -2 -2 0 -1 -1 0
-2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 -2 -1 0 -4
R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2
-1 -1 -3 -2 -3 -1 0 -1 -4 N -2 0 6 1 -3 0
0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 3 0 -1
-4 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3
-1 0 -1 -4 -3 -3 4 1 -1 -4 C 0 -3 -3 -3 9
-3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3
-3 -2 -4 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1
0 -3 -1 0 -1 -2 -1 -2 0 3 -1 -4 E -1 0 0 2
-4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2
1 4 -1 -4 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2
-3 -3 -2 0 -2 -2 -3 -3 -1 -2 -1 -4 H -2 0 1
-1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2
-3 0 0 -1 -4 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4
2 -3 1 0 -3 -2 -1 -3 -1 3 -3 -3 -1 -4 L -1 -2
-3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2
-1 1 -4 -3 -1 -4 K -1 2 0 -1 -3 1 1 -2 -1
-3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 0 1 -1 -4 M
-1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1
-1 -1 -1 1 -3 -1 -1 -4 F -2 -3 -3 -3 -2 -3 -3
-3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 -3 -3 -1
-4 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4
7 -1 -1 -4 -3 -2 -2 -1 -2 -4 S 1 -1 1 0 -1 0
0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 0 0
0 -4 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2
-1 1 5 -2 -2 0 -1 -1 0 -4 W -3 -3 -4 -4 -2
-2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 -4
-3 -2 -4 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2
-1 3 -3 -2 -2 2 7 -1 -3 -2 -1 -4 V 0 -3 -3
-3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1
4 -3 -2 -1 -4
26
Input Layer
-1
0
0
I
2
K
-4
E
2
E
5
H
-2
V
0
I
-3
I
-3
Q
1
A
-2
E
-3
-1
0
-1
-3
-2
-2
27
Secondary networks(Structure-to-Structure)
Weights
Input Layer
H
E
H
Output Layer
C
E
H
C
E
C
H
E
C
Window
Hidden Layer
IKEEHVIIQAEFYLNPDQSGEF..
28
PHD method (Rost and Sander)
  • Combine neural networks with sequence profiles
  • 6-8 Percentage points increase in prediction
    accuracy over standard neural networks
  • Use second layer Structure to structure network
    to filter predictions
  • Jury of predictors
  • Set up as mail server

29
PSI-Pred (Jones, DT)
  • Use alignments from iterative sequence searches
    (PSI-Blast) as input to a neural network
  • Better predictions due to better sequence
    profiles
  • Available as stand alone program and via the web

30
Position specific scoring matrices (BLAST
profiles)
A R N D C Q E G H I L K M
F P S T W Y V 1 I -2 -4 -5 -5 -2
-4 -4 -5 -5 6 0 -4 0 -2 -4 -4 -2 -4 -3 4
2 K -1 -1 -2 -2 -3 -1 3 -3 -2 -2 -3 4 -2 -4
-3 1 1 -4 -3 2 3 E 5 -3 -3 -3 -3 3
1 -2 -3 -3 -3 -2 -2 -4 -3 -1 -2 -4 -3 1 4
E -4 -3 2 5 -6 1 5 -4 -3 -6 -6 -2 -5 -6 -4
-2 -3 -6 -5 -5 5 H -4 2 1 1 -5 1 -2
-4 9 -5 -2 -3 -4 -4 -5 -3 -4 -5 1 -5 6 V
-3 0 -4 -5 -4 -4 -2 -3 -5 1 -2 1 0 1 -4 -3
3 -5 -3 5 7 I 0 -2 -4 1 -4 -2 -4 -4
-5 1 0 -2 0 2 -5 1 -1 -5 -3 4 8 I
-3 0 -5 -5 -4 -2 -5 -6 1 2 4 -4 -1 0 -5 -2
0 -3 5 -1 9 Q -2 -3 -2 -3 -5 4 -1 3
5 -5 -3 -3 -4 -2 -4 2 -1 -4 2 -2 10 A
2 -4 -4 -3 2 -3 -1 -4 -2 1 -1 -4 -3 -4 1 2 3
-5 -1 1 11 E -1 3 1 1 -1 0 1 -4 -3
-1 -3 0 3 -5 4 -1 -3 -6 -3 -1 12 F -3
-5 -5 -5 -4 -4 -4 -1 -1 1 1 -5 2 5 -1 -4 -4
-3 5 2 13 Y 3 -5 -5 -6 3 -4 -5 -2 -1
0 -4 -5 -3 3 -5 -2 -2 -2 7 1 14 L -1
-3 -4 -2 1 5 1 -1 -1 -1 1 -3 -3 1 -5 -1 -1
-2 3 -2 15 N -1 -4 4 1 5 -3 -4 2 -4
-4 -4 -3 -2 -4 -5 2 0 -5 0 0 16 P -2
4 -4 -4 -5 0 -3 3 2 -5 -4 0 -4 -3 0 1 -2 -1
5 -3 17 D -3 -2 1 5 -6 -2 2 2 -1 -2
-2 -3 -5 -4 -5 -1 2 -6 -3 -4
31
Benchmarking secondary structure predictions
  • CASP
  • Critical Assessment of Structure Predictions
  • Sequences from about-to-be-solved-structures are
    given to groups who submit their predictions
    before the structure is published
  • EVA
  • Newly solved structures are send to prediction
    servers.

32
EVA results (Rost et al., 2001)
  • PROFphd 77.0
  • PSIPRED 76.8
  • SAM-T99sec 76.1
  • SSpro 76.0
  • Jpred2 75.5
  • PHD 71.7
  • Cubic.columbia.edu/eva

33
Several different architectures
  • Sequence-to-structure
  • Window sizes 15,17,19 and 21
  • Hidden units 50 and 75
  • 10-fold cross validation gt 80 predictions
  • Structure-to-structure
  • Window size 17
  • Hidden units 40
  • 10-fold cross validation gt 800 predictions

34
Balloting procedure
  • Confidence of a per residue prediction
  • P(Highest) P(second highest)
  • H 0.80 E 0.05 C0.15 gt conf.0.65
  • Mean per chain confidence for all 800 predictions
  • Calculate Mean and Standard deviation
  • Averaging of per chain predictions with Z gt2

35
Activities to probabilities
Helix activities (output) Strand activities
(output) Coil probabilities! (calculated)
Coil conversion
0.05 0.1 0.15 1.0 0.05 0.99 0.10 0.15 0.9 0.83
0.75 . . . 1.0
36
Links to servers
  • Database of links
  • http//mmtsb.scripps.edu/cgi-bin/renderrelres?prot
    model
  • ProfPHD
  • http//cubic.bioc.columbia.edu/
  • PSIPRED
  • http//bioinf.cs.ucl.ac.uk/psipred/
  • JPred
  • www.compbio.dundee.ac.uk/Software/JPred/jpred.html

37
Practical Conclusion
  • If you need a secondary structure prediction use
    one of the newer ones such as
  • ProfPHD,
  • PSIPRED, and
  • JPred
  • And not one of the older ones such as
  • Chou-Fasman, and
  • Garnier
Write a Comment
User Comments (0)
About PowerShow.com