Title: Methodology for MALD-TOF data analysis
1Methodology for MALD-TOF data analysis
Once the peptide mass data of individual protein
is obtained from MALDI-TOF, the next step is for
identification of protein through database
search. The database matching for query sequence
depends on lot of parameters, which need to
optimized to come up with the best fit
- Related Los Protein database search tools
- gt Prior Viewing IDD26 Spot picking, IDD 28
In solution digestion, IDD29 Matrix preparation
for MALDIanalysis, IDD 31 MALDI-TOF data
analysis, - gt Future Viewing IDD43 MALDI Molecular weight
application, IDD44 MALDI PTM application. - Course Name MALD-TOF data analysis
- Level(UG/PG) PG
- Author(s) Dinesh Raghu, Vinayak Pachapur
- Mentor Dr. Sanjeeva Srivastava
The contents in this ppt are licensed under
Creative Commons Attribution-NonCommercial-ShareAl
ike 2.5 India license
2Definitions and Keywords
1
- 1. Peptide Mass Fingerprinting Probable protein
identification method, which compares peptide
mass values of protein analyte to a database of
known proteins to arrive at its probable identity
in the form of the best fit. - 2. Spectrum from MALDI analysis The peptide
fragments generated after proteolytic digestion
are analyzed by MALDI-TOF and the data generated
is represented in the form of spectrum. The
spectrum data can be used for online sequence
databases search. - 3. Online search Several open source databases
are available online, the protein data is updated
on regular basic which allow analysis of the MS
spectrum generated. - 4. Open shareware for PMF Database search
algorithms used for comparing experimental
peptide masses against theoretically calculated
peptide masses derived by applying cleavage
rules to large primary sequence protein
databases. The open shareware consists of the
following fields which need to entered by the
user during the search - Name and Email Used for identification of search
entry and also for e-mailing results page in case
of loss of connection without requiring re-entry
of data. - Search Title Used to identify and label search
entry and typically includes the name of the
protein whose information is required. - Databases The primary sequence protein
databases, including NCBInr and SwissProt against
whom the query is run. A contaminants database is
also recommended to eliminate contaminants such
as keratin, trypsin and BSA.
2
3
4
5
3Definitions and Keywords
1
- Taxonomy the search query to be limited to a
particular species or a group of species. - Enzyme used during sample preparation of analyte
before its mass spectrometric analysis. the
popular one is trypsin but if any other enzyme is
used its site specificity is expected to be
equal to or better than that of trypsin. - Missed Cleavage Allowed Occurrence of partial
digests during trypsinolysis of analyte protein
at one or two Arginine and Lysine sites is a
common phenomenon and needs to be accounted for
during search against calculated peptide masses.
- Modifications During sample prep for Mass Spec
Analysis of proteins, some changes in the mass of
specific residues might occur, such as oxidation
of methionine, carboxymethyl and cysteine etc. To
account for these mass changes, the algorithm
allows two types of modifications to be
pre-selected- Fixed and Variable. - Fixed Modifications Modifications that need to
be applied collectively across the database to
account for change in mass of specific residue/s.
Most common fixed modification is the selection
of the mass of carboxymethyl over cysteine
replacing its mass as 161 Da. - Variable Modifications These are mass changes
suspected to occur during sample handling and
accounted for by increasing the number of primary
sequences compared against experimental masses.
Most common variable modification is the
oxidation of methionine residue in the analyte
protein. - Protein Mass Mass of intact protein in the form
of a contiguous stretch including all matched
peptides. If mass is unknown, this parameter can
be left empty and the mass will remain
unrestricted.
2
3
4
5
4Definitions and Keywords
1
- Peptide Tolerance This is a parameter associated
with accuracy and resolution of the mass
spectrometer and is used to account for shifts in
isotope spacings. - Mass Values To specify the type of charge of the
analyte being examined by Peptide Mass
Fingerprinting, i.e. MH , M-H- or if the masses
correspond to neutral values like Mr . - Monoisotopic Mass Vs Average Mass Value
Depending upon the mass accuracy of a
spectrometer, the experimental masses calculated
for identification of analyte by Peptide mass
fingerprinting is either chosen to be
monoisotopic mass or the average mass of its
isotopic elements. The selection of monoisotopic
mass rests upon the ability of the instrument to
resolve isotopes, and accurately determine peak
mass. Average mass is the sum of
abundance-weighted masses of all isotopes while
the monoisotopic mass is the sum of masses of the
most abundant isotope of each element. If the
instrument has insufficient mass resolution
capabilities combined with poor signal to noise
ratio, the peptide mass of experimental values
must be selected as being average to provide
better identification. - 5. Best fit Score histogram The best fit
is defined as the primary identification of the
analyte protein made by the database search
algorithm representing either the exact protein
being analyzed or the protein with the closest
primary sequence homology, unusually with
equivalent function in a related species. The
score histogram depicts the distribution of
protein scores for all the hits obtained by the
query.
2
3
4
5
5Learning objectives
1
- After interacting with this learning object, the
learner will be able to - Prepare in handling the database search tools
- Operate on setting up the parameters
- Analyze the result output from the database
- Assess the troubleshooting steps involved in the
experiments.
2
3
4
5
6Master Layout
1
2
Parameters settings (Slide8-19)
Data output (Slide20-21)
3
Data analysis (Slide22-37)
4
5
Display the steps along with images next to them
7Step 1
T1 Data input
1
S.no Mass/Charge
1 393
2 703
3 816
4 944
5 1598
6 1602
2
3
Take the user throught the slides of IDD 31
MALDI-TOF data analysis, animate the spectrum
data getting converted to be saved into excel
sheet. Show the excel sheet ready for the
database search.
For the data analysis, user can have the data in
excel sheet for cut, copy and paste the values in
the search window. Or user can upload the saved
excel sheet during the database search.
4
5
8Step 2
T2 Parameter settings
1
2
3
4
5
9Step 2
T2 Parameter settings
1
2
3
Let user open a browsing window. Instruct user to
type www.matrixscience.com and click on enter.
Display the matrix science window like in
previous slide, please re-draw the image. Let
user have control over the icons for selections.
Instruct user to click on Peptide mass
fingerprint.
The matrix science is the online database search
engine. To start up with the data analysis for
peptide masses select the peptide mass
fingerprint.
4
5
10Step 2
T2 Parameter settings
1
2
3
4
5
11Step 2
T2 Parameter settings
1
2
3
Display the Peptide mass fingerprint like in
previous slide. Let user have control over,
giving the details, making the selection on the
scroll down options, making a selection over
check box, buttons and with browse, start search
and reset button. Redraw the figure from previous
slide.
User must feed in the details and parameters
selection for best fit, depending on the
background of sample type and source.
4
5
12Step 2
T2 Parameter settings
1
2
3
Instruct user to provide the details, like user
name, Email and Search title like shown above.
Let user have full control to type in the details.
The user details are needed, incase during the
search if there is any network problem, the data
can be mailed to the user.
4
5
13Step 2
T2 Parameter settings
1
2
3
Instruct user to select from the scroll options
for Database (s), Allow up to missed cleavages
and Enzyme. Animate and provide the options (like
shown in slide) in scroll down for user to
select. When user selects the parameters, audio
narration must start.
Databases The primary sequence protein
databases, including NCBInr and SwissProt against
which the query will run. Enzyme used during
sample preparation of analyte before its mass
spectrometric analysis. Missed Cleavage Allowed
Occurrence of partial digests during
trypsinolysis of analyte protein at one or two
Arginine and Lysine sites
4
5
14Step 2
T2 Parameter settings
1
2
3
Instruct user to select from the scroll options
for Taxonomy. Animate and provide the options in
scroll down for user to select. When user selects
the parameters, audio narration must start.
Taxonomy the search query to be limited to a
particular species or a group of species for
which the sample belongs.
4
5
15Step 2
T2 Parameter settings
1
2
3
Fixed Modifications Modifications that need to
be applied collectively across the database to
account for change in mass of specific residue/s.
Variable Modifications These are mass changes
suspected to occur during sample handling and
accounted for by increasing the number of primary
sequences compared against experimental masses.
Instruct user to select from the scroll options
for Fixed and Variable modifications. And arrow
buttons to add the selected options from the
scroll down menu (like shown in figure). Animate
and provide the options in scroll down for user
to select. When user selects the parameters,
audio narration must start.
4
5
16Step 2
T2 Parameter settings
1
2
3
Instruct user to type a value for Protein mass,
select peptide tolerance of or
Da/mmu//ppm. Now once this selection is done let
user select botton for Mass values like shown in
figure. Later let user select for button for
Monoisotopic. (like shown in figure). Animate and
provide the options in scroll down for user to
select. When user selects the parameters, audio
narration must start.
The parameters can be changed depending upon user
need. Protein Mass Mass of intact protein in the
form of a contiguous stretch including all
matched peptides. Mass Values To specify the
type of charge of the analyte being examined.
Monoisotopic Mass Vs Average Mass Value
Depending upon the mass accuracy of a
spectrometer, the experimental masses calculated
for identification of analyte by Peptide mass
fingerprinting is either chosen to be
monoisotopic mass or the average mass of its
isotopic elements.
4
5
17Step 2
T2 Parameter settings
1
S.no Mass/Charge
1 393
2 703
3 816
4 944
5 1598
6 1602
2
3
Instruct user to upload the peptide mass data
file by clicking on browse button and opening
the saved file. If not user can copy and paste
the peptide mass data in the Query window.
Animate scroll options for Report top Hits and a
check button for Decoy. animate like shown in
figure and provide the options in scroll down for
user to select. When user selects the parameters,
audio narration must start.
User need to feed in the data or can browse and
upload the file. For the report to be displayed,
user can select the hits from the scroll down
menu. Decoy helps to search with same parameter
across database.
4
5
18Step 2
T2 Parameter settings
1
2
3
After user upload the peptide mass data/
ASCII/MGF files. Highlight Start search button
for user to click. In case if user likes to
change the parameters, highlight Reset Form
button. Animate like shown in figure. When user
selects the parameters, audio narration must
start.
Once all the parameters are set and mass data is
uploaded, user can click the start search button.
Depending on the set parameters the search in the
database begins.
4
5
19Step 3
1
Section1
2
Section2
3
Section3
4
5
20Step 3
1
2
3
In Section 1 the parameters set by user during
the search is shown. Section 2 Mascot Score
Histogram. The number of protein hits with score
is plotted along the graph. Section 3 Summary
report. Matched proteins from the database, with
the details of important parameters are
displayed either in concise format, protein
format and even user can export the data.
Display the figure from the previous slide.
Highlight the mascot search window with 3
sections like shown in the figure. Section 1
Parameters details. Section 2 Mascot Score
Histogram. Section 3 Summary report. Instruct
user to go through each section with audio
narration. If any of the parameters are not set
properly, user can go back and select the
required parameters and do the search again.
4
5
21Step 4
1
2
3
If in case the search parameters set range is too
long or too small, the software gives out error
message. Depending on the error message user need
to change the setting and do the search
again. Like example show above.
Display the figure from the previous slide.
Section 1 Parameters details. Animate few
parameters with red highlight as error and
instruct user to carry out the search again by
changing the parameters.
4
5
22Step 4
1
2
3
In Section 2 Mascot Score Histogram. The number
of protein hits and their score is displayed
along the graph.
Display Section 2 Mascot Score Histogram. Its a
display user cannot make any changes in this
section.
4
5
23Step 4
1
2
3
In Section 3 Concise protein summary report.
Matched proteins from the database, with name,
mass, score and other details of important
parameters as a concise details is given. For
individual protein information, the details can
be obtained by click on the blue link for each
protein.
Display the Section 3 Summary report. Instruct
user to select Format As Concise protein with
p-value lt0.5 and Max. number of hits 1-10.
animate and provide the options in scroll down
menu. When user selects concise protein and
clicks on Format as, display like in above fig,
with 5 protein lists, their name, mass, score,
expect value, matches. Redraw the above
figure. Instruct user to click on protein name
(Link highlighted in Blue) for more information.
4
5
24Step 4
1
2
3
In protein view, matching of the query peptide to
the protein sequence in the database is shown.
What is the sequence, at what region matching
occurs, what is the expected and calculated
values of the query peptide is show.
Display the protein view in other window, like
show above. Display the protein sequence,
highlight few letters within the sequence in bold
red as matched peptide and display the
information of matched peptide with start and
end, observed, Mass range(expected), mass
range(calculated) Delta value, Miss and the
sequence.
4
5
25Step 4
1
2
3
In protein summary report, index display the very
concise details followed by the details like in
protein view. matching of the query peptide to
the protein sequence in the database is shown.
What is the sequence, at what region matching
occurs, what is the expected and calculated
values of the query peptide is show.
Display the Section 3 Summary report. Instruct
user to select Format As Protein Summary with
p-value lt0.5 and Max. number of hits 1-10.
animate and provide the options in scroll down
menu. When user selects protein summary and
clicks on Format as, display like in above
figure, with 5 protein details in the index with
Accession, mass, score and description. In
Results list display like above with Accession,
mass, score, Expect, Matches and details like in
protein view from previous slide.
4
5
26Step 4
1
2
3
Display the Section 3 Summary report. Instruct
user to select Format As Export search result
with p-value lt0.5 and Max. number of hits
1-10. animate and provide the options in scroll
down menu. When user selects export search result
and clicks on Format as, display like in above
figure for button selection option for Export
search results, search information, Protein hit
information and peptide information with all the
option for user selection must be displayed.
In export search results, user have all the
options for the data information to be stored wrt
parameters user selects. The result can be
exported and saved, the data can be later taken
for pathway analysis and literature study. In
case user has done MS/Ms of a particular protein
spot the search parameters are all the same with
inclusion of few more parameters for the best
fit.
4
5
27Step 4
1
2
3
4
5
Re-draw the image with the options. Added
parameters for MS/MS
28Step 4
1
2
The MS/MS data analysis shareware has some extra
inputs such as Quantitation, MS/MS tolerance,
peptide charge, instrument etc. in addition to
the fields for PMF and rest other parameters are
similar to that of Peptide mass fingerprint.
Let user to open a browsing window. Instruct user
to type www.matrixscience.com and click on
enter. Display the matrix science window like
slide8, please re-draw the image. Let user have
control over the icons for selections. Instruct
user to click on MS/MS ion search. Display the
image from the previous slide. Each of the fields
must be filled in as shown with some requiring
selection using user control.
3
4
5
29Step 4
1
2
3
Provide user the options for Quantitation like
shown above. Let user have the option to go
through the option for selection and makes a
choice of his own.
The quantitation are the different types of
process carried out before the MS analysis.
Quantitation of the extracted protein, peptide,
mixture of both, extracted from the different
instruments before the MS analysis. For example
iTRAQ, Tandem mass Tags for fixed mass/charge
values, ICAT, ICPL for precursors within a single
data set, SILAC for ion fragments peaks, XICs for
precursors in multiple data set etc. For more
information on iTRAQ, ICAT, ICPL and SILAC follow
the there respective IDD.
4
5
30Step 4
1
2
3
Provide user the options for Instrument like
shown above. Let user have the option to go
through the option for selection and makes a
choice of his own.
The instrument option shows the list of
instrument used to acquire the data.
4
5
31Step 4
1
2
3
All the parameters added must be relevant to the
sample to get the best hit from the database
search.
Let user provide the required information for the
parameters and select the Start Search button
to do the search.
4
5
32Step 4
1
2
3
The Tandem MS protein analysis is used to obtain
protein identities from each of the sequenced
peptides. The results page begins with a list of
probable protein identities and their respective
sources. The score histogram provides details
similar to the PMF analysis, with the probability
distribution being displayed graphically. The
green shaded region is indicative of a match that
has greater than 5 chance of being random while
the red peak indicates that the chances of a
random match is less than 5.
Display the mascot search result window like in
figure. This must be zoomed into to clearly
depict the report as shown. The red box must
appear at the region indicated along with the
blue arrow.
4
5
33Step 4
1
2
3
First show the computer with the screen
displaying the search results. This must be
zoomed into to clearly depict the report as
shown. The green highlight boxes must then appear
with their labels. User must be allowed to click
on these highlighted regions. Clicking on
protein information must redirect user to steps
4 (a) (b) while peptide information must
redirect user to steps 4(c) (d).
The summary report lists all the protein matches
obtained from the database search with their
respective molecular weight, protein score,
source organism and details regarding each of its
fragmented peptides. Further information about
any of the protein sequences can be obtained by
clicking on the corresponding protein link. Data
regarding each of the peptide fragmentation
patterns can also be obtained by clicking on the
peptide link indicated by the query number.
4
5
34Part 3, Step 4 (a)
The protein score is a sum of the highest ion
scores for each sequence, with duplicate matches
being excluded. A score above 67 is considered
significant.
1
Protein information data analysis
interpretation
Predicted mass of the protein.
Predicted isoelectric point of the protein.
2
All peptides are displayed with matching peptides
indicated in red.
Indicates the of matching peptides.
3
4
Show all the text output. Next show the green
highlighted boxes one at a time with the
corresponding dialogue box appearing for each of
the highlighted regions. The results on the next
slide must also be displayed along with this page.
The protein view obtained on selecting a
particular protein link, is very similar to the
protein view observed in PMF. It provides details
regarding the protein score, molecular weight,
isoelectric point, the sequence coverage of the
protein etc. Protein scores above 67 are
considered significant and greater the percentage
sequence coverage, more are the number of
matching peptides for that particular protein.
All sequences are displayed with the matching
sequences being indicated in red.
5
35Part 3, Step 4 (b)
1
Protein information data analysis
interpretation
Indicates score of each ion fragment. Used for
calculation of the protein score.
2
Indicates beginning end of each peptide.
Observed molecular weight.
Experimental molecular weight.
Calculated molecular weight.
Sequence of peptide fragment.
3
4
Information about each of the matched peptides is
also displayed. The start and end amino acid
positions, calculated and experimental molecular
weights, number of missed tryptic cleavages,
sequence of each peptide fragment and their
corresponding ion scores are shown. The highest
ion scores are used for computing the final
protein score.
Show all the text output. Next show the green
highlighted boxes one at a time with the
corresponding dialogue box appearing for each of
the highlighted regions.
5
36Part 3, Step 4 (d)
Peptide sequence whose fragmentation pattern is
shown.
1
Peptide information data analysis and
interpretation
Range values for the x-axis that can be modified
by the user to zoom in or zoom out of the
graphical representation.
2
3
4
Each peptide in Tandem MS/MS undergoes a second
round of fragmentation when it passes through the
second mass analyzer before it reaches the
detector. This provides significantly larger
amount of information regarding each peptide
fragment. This can be viewed by clicking on the
peptide links provided in the summary report. The
fragmentation pattern is displayed graphically,
which can be zoomed into as per the requirement
by adjusting the x-axis plot values.
Show all the text output. Next show the green
highlighted boxes one at a time with the
corresponding dialogue box appearing for each of
the highlighted regions.
5
37Part 3, Step 4 (e)
1
Peptide information data analysis
interpretation
Mass of the peptide fragment displayed.
Amino acid sequence obtained through computation
using y-ion and b-ion values.
b-ions Ions formed with charge retained on
N-terminal.
y-ions Ions formed with positive charge retained
on C-terminal.
b1 (148.0757) b2 (205.0972) 57.0214 ? G
2
y7 (836.4301) y6 (779.4087)) 57.0214 ? G
Immon a a0 b b0 Seq y y y0
1 120.0808 120.0808 148.0757 F 8
2 30.0338 177.1022 205.0972 G 836.4301 819.4036 818.4196 7
3 102.0550 306.1448 288.1343 334.1397 316.1292 E 779.4087 762.3821 761.3981 6
4 44.0495 377.1819 359.1714 405.1769 387.1663 A 650.3661 633.3395 5
5 72.0808 476.2504 458.2398 504.2453 486.2347 V 579.3289 562.3024 4
6 159.0917 662.3297 644.3191 690.3246 672.3140 W 480.2605 463.2340 3
7 120.0808 809.3981 791.3875 837.3930 819.3824 F 294.1812 277.1547 2
8 101.1073 K 147.1128 130.0863 1
3
b6 (690.3246) b7 (837.3930) 147.0684 ? F
y2 (294.1812) - y1 (147.1128) 147.0684 ? F
4
At low collision energy, each peptide fragment is
cleaved at the amide bond which can result in the
formation of two types of ions the y ion b
ion. In y-ions, the positive charge is retained
on the C-terminus of the peptide ion while in
b-ions, charge is retained on the N-terminal.
These ion masses can be used to compute the amino
acid sequence by calculating the mass difference
between consecutive ions. Each mass difference
value corresponds to a particular amino acid,
which can be obtained from a standard information
table. The y-ion series the b-ion series run
opposite to each other as indicated in the
example above.
Show all the text output. Next show the green
highlighted boxes one at a time with the
corresponding dialogue box appearing for each of
the highlighted regions.
5
38Slide 8-19
Slide 22-37
Slide 20 -21
Slide 6- 7
Slide 1-5
Tab 01
Tab 02
Tab 03
Tab 04
Tab 05
Tab 06
Introduction
Name of the section/stage
Animation area In slide-13 for given a sequence
data, let user performs the search within the
databases and compares the result. In slide-17
provide a mass/charge list for user to identify
and remove the contaminant peaks?
Interactivity area
Instructions/ Working area
Credits
39Questionnaire
APPENDIX 1
1. Which one of these is common across all Mass
Spec based proteomics experiments carried out?
A) Liquid Chromatography B) Proteolysis
C) 2-D Gel Electrophoresis D)
Isoelectric Focusing Answer B) Proteolysis 2.
Peptide Mass Fingerprinting or PMF is defined
as? A) Finding the best fit for peptides
identified by fragmentation. B) Finding the best
fir for protein by sequencing in a Triple
Quadrupole Analyzer. C) Finding fingerprints of
proteins on 2-DE Gels. D) Finding the best fit
for masses of peptides identified by
MALDI-TOF. Answer D) Finding the best fit for
masses of peptides identified by MALDI-TOF.
40Questionnaire
APPENDIX 1
- 3. Which one of these mass values represents a
protein/peptide ion? - M-H-
- M-H
- MH
- MH-
- Answer C) MH
- 4. The average mass of which of the following
amino acids corresponds to 87.0782? - A) Serine
- B) Glycine
- C) Alanine
- D) Glutamine
- Answer A) Serine
41Questionnaire
APPENDIX 1
Question 5
The wavelength of laser used for ionization?
a) 337nm b) 437nm c) 537nm d) 637nm Answer a)
337nm
42APPENDIX 2
- Links for further reading
- http//www.matrixscience.com/search_form_select.ht
ml - 1. Henzel.W.J., Watanabe.C., Stults.J.T. (2003).
Protein Identification The Origins of Peptide
Mass fingerprinting. J Am Soc Mass Spectrom,
14(9), pp931-42. - 2. Nesvizhskii , A.I., Vitek, O., Aebersold, R.
(2007). Analysis and validation of proteomic data
generated by tandem mass spectrometry.
Nat.Methods., 49 (1), pp.787-97. - 3. Deutsch, E.W., Lam, H., Abersold, R. (2008)
Data analysis and bioinformatics tools for tandem
mass spectrometry in proteomics. Physiol
Genomics. 33 (1), pp18-25. - 4. Yates, JR., 2008. Mass Spectrometry and the
Age of Proteome. J.Mass.Spec., 33(1), pp.1-19. - Books
- Proteomics A cold spring harbor laboratory
course manual by Andrew J L and Joshua L, 2009.
43APPENDIX 3
Summary
A powerful search engine is required for the
matching the query sequence to the protein
sequences present in the database for protein
identity. The best set parameters helps user to
define the range for the search and get the
required match, to get best protein hit. User
need to have little bit knowledge of the sample
background, to set the respective parameters.
For the best match at least 5peptide masses from
the query sequence must match with database
sequence for a confident identification.