Title: How to identify peptides
1How to identify peptides
Gustavo de SouzaIMM, OUS
October 2013
2Peptide or Proteins?
3Bottom-up Proteomics
42DE-based approach
5Peptide Mass Fingerprinting
MALDI (Matrix Assisted Laser Desorption
Ionization)
6Peptide Mass Fingerprinting
Intensity
m/z
7MS/MS
8MS/MS
899.013
899.013
899.013
9Fragmentation
Nomenclature for peptide sequence-ions
Collision-Induced Dissociation (CID) MHnn
N2 --gt b y
Electron Capture Dissociation (ECD) MHnn e-
--gt MHn(n-1) --gt c z
10Fragmentation
Roepstorff-Fohlmann-Biemann-Nomenclature
11Fragmentation
12 aa
b ions
y ions
12MS/MS of a peptide
y8
P y13
VPTVDVSVVDLTVK
y10
y6
y9
b5
y12
y11
y5
y4
y7
b6
b3
b10
y3
b8
b4
b7
P y13
b9
y2
b11
b12
b13
13How to Identify MS/MS
Stenn and Mann, 2004.
Peptide Sequence Tags
Autocorrelation
Probability based match
14Submitting to Search
15How identification happen?
Your data
Step 1 which theoretical peptides has the same
mass of the observed ion?
Step 2 From those, which one have the most
similar fragmentation pattern?
16High mass accuracy what is it good for?
All theoretical tryptic peptide masses from
human IPI database
Example Tryptic HSP-70 peptide ELEEIVQPIISK,
mass 1396.7813Â Da
LTQ-FT
LTQ-FT
QSTAR
QSTAR
LTQ
Instrument
LTQ-FT
2 ppm
1 ppm
10 ppm
20 ppm
500
Mass Accuracy
0.5 ppm
Ext.
Ext-SIM
Int.
Ext.
Ext.
Calibration
Int.
11
9
33
52
344
of tryptic peptides for m/z 1396.7813
3
17Defining the Search Space
18The Search Space
2 mcl
1 mcl
0 mcl
1/2/3
1/2
2/3
2/3/4
4/5
3/4/5
1
3/4
4/5/6
2
3
5/6
4
5
1/2
2/3
1
6
3
2
4/5
4
3/4
5
5/6
6
1
3
2
4
5
6
19Importance of Search Space Size
Search tool does not identify a peptide. It only
reports the statiscally most suitable theoretical
sequence related with the experimental data. If
you increase the size of the database too much,
or the size of the search space, false-positive
rates also increase.
20Defining FDRs
Steen and Mann, 2004
21MOWSE
Chance that two peptides with different sequences
but approximate Mr and sharing MS/MS similarities.
More variables inserted during search ? Higher
chance to get random events ? Higher MOWSE score
threshold
- Parameters that can modify the MOWSE calculation
- Database size
- MMD (measured mass deviation)
- Number of PTMs choosen
- Data quality.
22Example of MMD issue
- Mycoplasma sp. sample (Munich 2006)
- Database had 700 entries
- Data accuracy had 0.7ppm average
- MMD used during search 3 ppm.
23Strategies to Visualize FDRs
Peng et al (2003). Evaluation of multidimensional
chromatography coupled with tandem mass
spectrometry (LC/LC-MS/MS) for large-scale
protein analysis the yeast proteome. J Prot Res
2, 43-50. Reversed database sequence
24False positive identification using reversed
database
25Typical Result
26How to Validate the Data
Are there any Reversed hit protein with 2
peptides above MOWSE score? -No All proteins
identified with 2 peptides score higher than
plt0.05 are good -Yes Repeat mascot search with
more stringent parameters.
What about 1-hit wonders? (Proteins identified
with only 1 peptide)
27How to Validate the Data
Basically, the idea is to play around with the
statistics to make your result more reliable.
28Take home message
- Data quality (mass accuracy) and a well-defined
search space are key for reliable peptide
identification - Reliable identification is an interplay between
asking enough without asking too much (careful
when trying to get as many IDs as I can!)
29PTMs
Gustavo de SouzaIMM, OUS
October 2013
30PTMs in biology
31PTMs in biology
32Complexity of Protein Samples in Eukaryotes
Modifications are specificto a group of amino
acids
33What difference to expect at MS level?
Larsen MR et al, 2006.
34Defining the Search Space
35PTM abundance in a cell
Total peptides in a sample
Modified peptides
Number of Peptides
Abundance level
Differences from 10e2 to 10e4
36PTM abundance in a cell
37Stable vs. Labile PTMs
Larsen MR et al, 2006.
38Neutral loss
Boersema PJ et al, 2009.
39Identifying Labile PTMs
Larsen MR et al, 2006.
40HCD fragmentation
Larsen MR et al, 2006.
41Status of PTM coverage
Lemeer and Heck, 2009.
42Status of PTM coverage
Derouiche A et al, 2012.
43Take home message
- Depending on PTM, identification can be very
easy or very hard
- Dependent on stability under fragmentation and
abundance in the sample - ID improvement was mostly defined by
instrumentationimprovements (sensitivity etc)