Title: SSASS: Single Spectrum ASSignment of Protein Backbone Chemical Shifts
1SSASS Single Spectrum ASSignment of Protein
Backbone Chemical Shifts
- Shan Sundararaj
- November 19, 2004
2NMR Analysis of Protein Structure
Data Collection (days)
Data Processing (hours to days)
Sequential Chemical Shift Assignment
NOESY assignment
Structure Building
3NMR Structure Determination
- Assign chemical shifts using HSQC and 3D
experiments - Use assigned chemical shifts to assign NOESY data
- Use distance constraints from NOESY data to
assemble structure
4NMR Structure Determination
Assigned Spectra Assigned NOESY 3D
Structure
12
123
34
43
76
45
16
77
11
99
52
23
89
88
12
78
123
79
33
76
34
9
44
32
43
76
45
16
51
44
32
24
76
45
16
56
11
53
32
98
76
45
47
56
53
11
89
52
89
47
77
49
89
52
66
33
47
76
56
44
52
133
44
143
50 ms 100 ms 250 ms
HSQC HNCO HNCA HNHA
CBCACONH HNCACB CACONH HACONH etc.
5Can We Do It Even Faster?
HNCACB
- Collect just 1 HNCACB spectrum
- Determine structure without NOEs
Single Spectrum Spectroscopy
6Chemical Shift Assignment
- Currently a slow, tedious process
- Several multidimensional spectra
- HSQC-TOCSY, CBCA(CO)NH, HNCA, etc.
- Processing, peak-picking for each spectrum
- Use HSQC to select 15N and 1H shifts, then use
other experiments to pick out likely spin systems - Assemble spin systems based on connectivity in
various experiments ? assign chemical shifts to
residues
7Another Way?
- Chemical shifts themselves are directly
influenced by structure - Several tools to infer chemical shifts from
structure or homology - Manual assignment can be automated
8SHIFTY
- Searches RefDB for homologous proteins already
assigned by NMR - Higher sequence homology ? more accurate
prediction of chemical shifts
9SHIFTX
- Given a structure or model of a protein, predicts
chemical shifts based on local structure of each
residue
10SimPred
- Predicts chemical shifts based on secondary
structure prediction
11Automated Assignment
- Can use predicted chemical shift assignments as a
guide to make assignments from real spectra - What is the best method to get enough real
experimental data to make assignments? - Use single HNCACB experiment to generate all the
shifts and connectivity information (Single
Spectrum ASSignment SSASS!)
12HNCACB Experiment
- First 2 dimensions are amide proton/ nitrogen
- Third dimension contains 13C chemical shifts
- Ca and Cb resonances of a residue AND of the
residue before it
13HNCACB Connectivity
- Each amide nitrogen couples more strongly to its
own Ca than to the Ca of previous residue (higher
intensity) - Correlation of Ca and Cb will have opposite signs
(i.e. Ca has ve intensity, Cb has ve) - Each spin system should have 4 C shifts
-
- CAi CBi CAi-1
CBi-1
14HNCACB Correlations
http//www-bioc.rice.edu/mev/spectra2.html
15HNCACB Correlations
Residue i
Residue i-1
N 128.27
N 124.52
16Chemical Shift Assignment
- HN NH CA CB CA-1 CB-1
- 7.91 105.2 45.1 0.0 55.3 33.1
- 8.43 111.3 61.2 68.3 45.0 0.0
- 8.21 114.9 56.4 32.6 61.2 68.2
- 7.92 122.4 52.6 36.9 56.5 32.7
- 7.59 128.1 60.5 41.8 52.6 36.9
- 8.61 124.5 50.8 38.7 60.3 41.9
- 8.92 128.3 56.4 36.5 50.8 38.7
- 8.46 111.9 58.1 38.2 56.7 0.0
G T E K I N M R
17SSASS Algorithm
- Two initial, parallel steps
- Predict what true chemical shifts might be
- Process peak list to build spin systems
- Make best (most complete) assignment of spin
systems to the protein sequence
18SSASS Algorithm
INPUT
PROCESS SPINS
PREDICT
MAKE BLOCKS
ASSIGN SCORE
ASSIGNMENT
19SSASS Algorithm
Protein Sequence Peak File
- 1) PsiPred prediction
- 2) Homodeller model structure
- 3) Predict chemical shifts
1) Filter shifts (peak selection) 2) Form
refine spin systems 3) Form blocks of spins
1) Score Assign highest scoring blocks 2) Fill
in blanks with singleton systems
Protein with assigned backbone chemical shifts
20Input
- Input to SSASS
- the sequence of the protein
- an automatically picked peak list from an HNCACB
experiment (from VNMR, NMRView, NMRPipe, SPARKY,
etc.) - Manually picked lists can be used as well, of
course
21Predict
gt35 ID to BMRB entry?
SHIFTY
gt95 ID to PDB entry, or model provided?
SHIFTX
Sequence
gt35 ID to PDB entry?
Homodeller SHIFTX
lt35 ID to PDB entry?
PsiPred SimPred
22Peak List Processing
- Initial peak list should ideally be at low
threshold cutoff (slightly noisy data more
likely to contain more real peaks) - All peaks are sorted by 15N dimension, then 1H
dimension - Filtering script (shifts) then removes spurious
peaks (phasing and spectral artifacts, duplicated
peaks, solvent peaks, side chain traces) - No need if peaks are manually picked
23Spin System Formation
- Remaining peaks are looked at iteratively to make
spin systems, starting with the most intense peak
(spin) stopping when 110 of expected systems
found - NOTE spin systems may be missing peaks or
have overlapped peaks or be overlapped systems
24(No Transcript)
25Spin System Refinement
- Those systems that cant unambiguously be broken
down into CAi, CBi, CAi-1, and CBi-1, are then
combinatorially made into a list of alternate
systems (list) - If the two CA peaks are of similar intensity, two
systems are made - If more than 4 peaks are found, all possible
systems of 4 are made
26Spin System ? Amino Acid
- Assign the likelihood that each system
corresponds to a type of amino acid - Use distributions of known amino acid chemical
shift information (calculated for coil, helix,
and strand) - we assume normal distributions
- e.g. CA 46.0 ? Glycine
27Connectivity
- Use the info in CAi-1, and CBi-1 to make blocks
of possible connected spin systems - Algorithm uses breadth first search of
connectivity, with pruning (i.e. all possible
blocks of any length are considered, unless the
level of connectivity drops below a certain
threshold) - 100s to 1000s to 100000s possible
28Chemical Shift Assignment
- Each block is aligned to the sequence and a
fitness score is calculated for it - Score based on
- RMSD to predicted shift for each residue
- Length of block being evaluated
- Connectivity within block
- Intensity of spin systems
- Fit of shifts to sequence and 2o structure
29Block Alignment
Predicted Shifts
Block
. . N P G T E I L A K L E L Q G . .
8.23 115.2 55.3 41.2 9.78 121.8 64.3 33.7 7.77
103.2 44.9 0.0 8.02 109.2 62.2 65.1 8.56 117.7
57.4 33.9 7.49 124.9 53.1 38.5 7.52 128.2 59.1
31.8 8.44 127.8 53.6 18.3 7.77 131.1 55.4
34.6 8.28 109.9 58.1 38.5 8.01 115.3 53.9
32.2 8.64 122.4 54.8 37.4
HN NH CA CB CA-1 CB-1 7.91 105.2 45.1
0.0 0.0 0.0 8.43 111.3 61.2 68.3 45.0 0.0 8.21
114.9 56.4 32.6 61.2 68.2 7.92 122.4 52.6 36.9
56.5 32.7 7.59 128.1 59.5 31.1 52.6 36.9 8.98
125.1 55.1 17.8 59.4 31.1 7.54 130.5 55.8 34.6
0.0 17.7 8.46 111.9 58.1 38.2 55.8 0.0
C C C C H H H H H H H H C C
30Output Best Assignment!
31Evaluation
- Used test set of 6 proteins of varying secondary
structure - MT0807
- MT0895
- pointed domain of ets-1
- CDC4
- UBC-13
- mms2
32MT0807 (85 AA)
MT0895 (77 AA)
Ets-1 (110 AA)
Ubc-13 (151 AA)
Cdc4 (141 AA)
Mms2 (145 AA)
33Spin System Analysis
34Accuracy (No Homology)
35Accuracy (Using SHIFTX/Y)
36Effect of Homology Modeling
37Effect of Homology Modeling
38Current State
- Performance is best on smaller proteins (lt100
residues) - Performance suffers greatly when there is a lack
of connectivity information in HNCACB - Performance improves when results from homology
are included
39One Spectrum Not Enough?
- Can also handle input from HNCA and CBCA(CO)NH
experiments ( HNCO) - More proteins to test
- Troponin C (90 res)
- GA binding protein (91 res)
- Carnobacteriocin immunity protein (111 res)
- TEM-1 lactamase (263 res)
40Future
- Improve rating of spin systems (relative
intensity of peaks, order of peaks, overlapped
peaks) - Improve assignment process (build and assign
blocks of most intense peaks first, repeat
assignment at lower match thresholds, improve
search algorithm) - Optimize assignment scoring (best mix of homology
prediction, secondary structure, spin
connectivity, spin intensity) confidence score,
explanation - Allow more user control (manually set some
assignments and re-assign protein)
41Acknowledgements
- Dr. Hassan Monzavi
- Haiyan Zhang
- Trent Bjorndahl
- G. Amegbey
- Nelson Young
- Questions?