Title: Mass Spectrometry-Based Methods for Protein Identification
1Mass Spectrometry-Based Methods for Protein
Identification
Joseph A. Loo Department of Biological
Chemistry David Geffen School of
Medicine Department of Chemistry and
Biochemistry University of California Los
Angeles, CA USA
2Genomics and ProteomicsCharacterizing many genes
and gene products simultaneously
3Proteomics Aids Biological Research
complex protein mixture
Biology
protein identification protein modification protei
n abundance
protein separation
mass spectrometry
4Proteomics - What is it?
- An assay to systematically analyze the diverse
properties of proteins - Biological processes are dynamic
- A quantitative comparison of states is required
- The study of protein expression and function on a
genome scale - Purpose Examine altered gene expression
pathways in disease states and under different
environmental conditions
The completion of the human genome has provided
researchers with the blueprint for life, and
proteomics offers scientists the means for
analyzing the expressed genome.
5Genome to Proteome
dsDNA
(Gene)
Transcription
mRNA
Translation
Protein
H2N
COOH
MTDLKASSLRALKLMDLTTLNDDDTDEKVIALCHQAKTPVGNTA
AICIYP 51 RFIPIARKTLKEQGTPEIRIATVTNFPHGNDDIDIAL
AETRAAIAYGADE 101 VDVVFPYRALMAGNEQVGFDLVKACKEACA
AANVLLKVIIETGELKDEAL 151 IRKASEISI
Mass spectrometry
The completion of the human genome has provided
researchers with the blueprint for life, and
proteomics offers scientists the means for
analyzing the expressed genome.
6Approaches for Protein Identification
What is this protein?
- Molecular weight
- Isoelectric point
- Amino acid composition
- Other physical/chemical characteristics
- Partial or complete amino acid sequence
- Edman (N-terminal sequence) - if N-term. not
blocked - C-terminal sequence - not commonly performed
- Mass spectrometry-measured information
7Protein Identification by Mass Spectrometry
2-D Gel Electrophoresis
150-
75-
40-
25-
MW x 103
18-
10-
6.5
6.0
5.5
5.0
4.5
pI
1547
1089
Peptide mass fingerprint by MALDI-TOF or
LC-ESI-MS. Additional sequence information can
be obtained by MS/MS.
2384
717
1857
1401
1700
1272
2791
500
2500
3000
1500
m/z
8Mass SpectrometryA method to weigh molecules
A simple measurement of mass is used to confirm
the identity of a molecule, but it can be used
for much more
9Mass Spectrometer for Proteomics
Pre-Separation
Ion Source
Liquid Chromatography
10The Nobel Prize in Chemistry 2002
"for the development of methods for
identification and structure analyses of
biological macromolecules"
"for their development of soft desorption
ionisation methods for mass spectrometric
analyses of biological macromolecules"
11Electrospray Generation of aerosols and droplets
12Electrospray Ionization (ESI)
- Multiple charging
- More charges for larger molecules
- MW range gt 150 kDa
- Liquid introduction of analyte
- Interface with liquid separation methods, e.g.
liquid chromatography - Tandem mass spectrometry (MS/MS) for protein
sequencing
ESI
MS
high voltage
highly charge droplets
20
19
18
21
17
16
22
15
14
500
700
900
1100
mass/charge (m/z)
13ESI-MS of Large Proteins
distribution of multiply charged molecules
(M14H)14
(M15H)15
3323
3102
(M13H)13
3543
(M16H)16
ESI-MS (Q-TOF) pH 7.5
m/z
14History of Electrospray Ionization
- Malcolm Dole demonstrated the production of
intact oligomers of polystyrene up to MW 500,000 - mass analysis of large ions was problematic
- John Fenn (Yale University)
- Chemical engineer - expert in supersonic
molecular beams - Began work on electrospray in 1981
- Adapted ESI to operate on a more conventional
mass spectrometer - Recognized that multiply charged ions were
produced by ESI - Reduced the m/z range required
15Electrospray process
106 charges for 30 micron droplet
- Analyte dissolved in a suitable solvent flows
through a small diameter capillary tube - Liquid in the presence of a high electric field
generates a fine mist or aerosol spray of
highly charged droplets
16Matrix-assisted Laser Desorption/Ionization
(MALDI)
Time-of-Flight (TOF) Analyzer
detector
high voltage
MALDI
sample
laser
drift region
m1 m2 m3
17MALDI Mass Spectrometry of Large Proteins
100
97430
MALDI-MS of rat MVP
(MH)
Intensity
(M2H)2
98563
48658
36446
58309
30811
50608
70405
90202
110000
129797
m/z
18MALDI
sample and matrix
- Developed by Tanaka (Japan) and Hillenkamp/Karas
(Germany) - Peptide/protein analyte of interest is
co-crystallized on the MALDI target plate with an
appropriate matrix - small, highly conjugated organic molecules which
strongly absorb energy at a particular wavelength - Energy is transferred to analyte indirectly,
inducing desorption from target surface - Analyte is ionized by gas-phase proton transfer
(perhaps from ionized matrix molecules)
pulsed laser light
peptide/protein ions desorbed from matrix
20 kV (sample stage or target)
19MALDI matrices
2,5-dihydroxybenzoic acid (DHB) peptides and
proteins
4-hydroxy-?-cyanocinnamic acid (alpha-cyano or
4-HCCA) peptides
3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic
acid) proteins
matrices for 337 nm irradiation
20MALDI
- 337 nm irradiation is provided by a nitrogen (N2)
laser - The target plate is inserted into the high vacuum
region of the source and the sample is irradiated
with a laser pulse. The matrix absorbs the laser
energy and transfers energy to the analyte
molecule. The molecules are desorbed and ionized
during this stage of the process. - MALDI is most commonly interfaced to a
time-of-flight (TOF) mass spectrometer.
21R. Aebersold and M. Mann, Nature (2003), 422,
198-207.
22Time-of-Flight Mass Spectrometer
v1 m1
v2 m2
v3 m3
detector
drift region (L)
high voltage
Principal of Operation of Linear TOF A
time-of-flight mass spectrometer measures the
mass-dependent time it takes ions of different
masses to move from the ion source to the
detector. This requires that the starting time
(the time at which the ions leave the ion source)
is well-defined. Recall that the kinetic energy
of an ion is
where ? is ion velocity, m is mass, e is
charge on electron, and V is electric field.
The ion velocity, ?, is also the length of the
flight path, L , divided by the flight time, t
Substituting this expression for ? into the
kinetic energy relation, we can derive the
working equation for the time-of-flight mass
spectrometer
mass is proportional to (time)2
23Approaches for Protein Sequencing and
Identification
Top Down
MS/MS
MIRERICACVLALGMLTGFTHAFGSKDAAADGKPLVVTTIGMIADAVKNI
AQGDVHLKGLMGPGVDPHLYTATAGDVEWLGNADLILYNGLHLETKMGEV
FSKLRGSRLVVAVSETIPVSQRLSLEEAEFDPHVWFDVKLWSYSVKAVYE
SLCKLLPGKTREFTQRYQAYQQQLDKLDAYVRRKAQSLPAERRVLVTAHD
AFGYFSRAYGFEVKGLQGVSTASEASAHDMQELAAFIAQRKLPAIFIESS
IPHKNVEALRDAVQARGHVVQIGGELFSDAMGDAGTSEGTYVGMVTHNID
TIVAALAR
MS/MS
Enzymatic or chemical degradation
Bottom Up
24Identification of proteins from gels
- Proteins are separated first by high resolution
two-dimensional polyacrylamide gel
electrophoresis and then stained. At this point,
to identify an individual or set of protein
spots, several options can be considered by the
researcher, depending on availability of
techniques. - For protein spots that appear to be relatively
abundant (e.g., more than 1 pmol), traditional
protein characterization methods may be employed. - Methods such as amino acid analysis and Edman
sequencing can be used to provide necessary
protein identification information. With 2-DE,
approximate molecular weight and isoelectric
point characteristics are provided. Augmented
with information on amino acid composition and/or
amino-terminal sequence, a confident
identification can be obtained. - The sensitivity gains of using MS allows for the
identification of proteins below the one pmol
level and in many cases in the femtomole regime.
25Protein Identification by Mass Spectrometry
2-D Gel Electrophoresis
150-
75-
40-
25-
MW x 103
18-
10-
6.5
6.0
5.5
5.0
4.5
pI
1547
1089
Peptide mass fingerprint by MALDI-TOF or
LC-ESI-MS. Additional sequence information can
be obtained by MS/MS.
2384
717
1857
1401
1700
1272
2791
500
2500
3000
1500
m/z
26Protein Cleavage
- For the application of mass spectrometry for
protein identification, the protein bands/spots
from a 2-D gel are excised and are exposed to a
highly specific enzymatic cleavage reagent (e.g.,
trypsin cleaves on the C-terminal side of
arginine and lysine residues). The resulting
tryptic fragments are extracted from the gel
slice and are then subjected to MS-methods. One
of the major barriers to high throughput in the
proteomic approach to protein identification is
the in-gel proteolytic digestion and subsequent
extraction of the proteolytic peptides from the
gel. Common protocols for this process are often
long and labor intensive.
protein digestion robot
27Protein cleavage - proteolysis and chemical
methods
28Mass spectrometry-based protein identification
- A mass spectrum of the resulting digest products
produces a peptide map or a peptide
fingerprint. - The measured masses can be compared to
theoretical peptide maps derived from database
sequences for identification. There are a few
choices of mass analysis that can be selected
from this point, depending on available
instrumentation and other factors. The resulting
peptide fragments can be subjected to MALDI-MS or
ESI-MS analysis. - A small aliquot of the digest solution can be
directly analyzed by MALDI-MS to obtain a peptide
map. The resulting sequence coverage (relative
to the entire protein sequence) displayed from
the total number of tryptic peptides observed in
the MALDI mass spectrum can be quite high, i.e.,
greater than 80 of the sequence, although it can
vary considerably depending on the protein,
sample amount, etc. The measured molecular
weights of the peptide fragments along with the
specificity of the enzyme employed can be
searched and compared against protein sequence
databases using a number of computer searching
routines available on the Internet.
29Protein identification from peptide fragments
Tryptic peptides
Mass spectrum
Protein
Theoretical mass spectrum
Theoretical tryptic peptides
Protein sequence
SEMHIKHYTTK ILGFR EEGDSCPLK QWDDSK ILVAVADK LLEYEE
K ILLFNSAK YLLDESSTYK LMHDDSV
SEMHIKHYTTKILGFREEGDSCPLKQWDDSKILVAVADKLLEYEEKILLF
NSAKYLLDESSTYKLMHDDSV
30MALDI-MS of tryptic peptides
1247.70
all peaks are (MH)
1116.67
1375.76
trypsin autolysis
1505.77
1424.85
1665.89
2005.07
1287.73
2719.48
1574.20
1811.85
1849.12
2476.21
2550.52
1000
1500
2000
2500
3000
m/z
ARIIVVTSGK GGVGKTTSSA AIATGLAQKG KKTVVIDFDI
GLRNLDLIMG CERRVVYDFV NVIQGDATLN QALIKDKRTE
NLYILPASQT RDKDALTREG VAKVLDDLKA MDFEFIVCDS
PAGIETGALM ALYFADEAII TTNPEVSSVR DSDRILGILA
SKSRRAENGE EPIKEHLLLT RYNPGRVSRG DMLSMEDVLE
ILRIKLVGVI PEDQSVLRAS NQGEPVILDI NADAGKAYAD
TVERLLGER PFRFIEEEKK GFLKRLFGG
31ESI-MS and LC-MS for protein identification
- An approach for peptide mapping similar to
MALDI-MS uses ESI-MS. A peptide map can be
obtained by analysis of the peptide mixture by
ESI-MS. An advantage of ESI is its ease of
coupling to separation methodologies such as
HPLC. Thus, alternatively, to reduce the
complexity of the mixture, the peptides can be
separated by HPLC with subsequent mass
measurement by on-line ESI-MS. The measured
masses can be compared to sequence databases.
9.4
LC-MS with ESI
8.4
9.8
8.9
7.7
6.8
6.2
Time (min)
965.3
(M2H)2
MW 1928.6 Da
629.0
m/z
32LC-MS/MS for protein identification
- An improvement in throughput of the overall
method can be obtained by performing LC-MS/MS in
the data dependant mode. As full scan mass
spectra are acquired continuously in LC-MS mode,
any ion detected with a signal intensity above a
pre-defined threshold will trigger the mass
spectrometer to switch over to MS/MS mode. Thus,
the mass spectrometer switches back and forth
between MS- (molecular mass information) and
MS/MS mode (sequence information) in a single LC
run. The data dependant scanning capability can
dramatically increase the capacity and throughput
for protein identification.
9.4
y12
8.4
LC-MS
LC-MS/MS
1261.4
9.8
y10
6.8
y13
1374.5
Time (min)
b6
668.4
b8
965.3
838.5
b5
y9
y11
y14
(M2H)2
MS/MS
y8
1474.4
y4
629.0
b3
m/z
m/z
33Peptide sequencing by mass spectrometry
N-term.
C-term.
- Peptide molecules are fragmented by collisionally
activated dissociation (CAD) - collisions with neutral background gas molecules
(nitrogen, argon, etc) - typically dissociate by cleavage of -CO-NH- bond
A
N-terminal product ions
34Peptide sequencing by mass spectrometry
- Ideally, one can measure the spacings between
product ion peaks to deduce the sequence - if each amide bond dissociates with equal
probability - if only a single amide bond fragments for each
molecule - if only C-terminal or N-terminal products ions
are formed - In reality, this is not the case
C-terminal product ions
35Nomenclature for MS Sequencing of Peptides
Klaus Biemann, MIT
subscript denotes the number of residues
contained in product ion
N-terminal fragments
b1
b2
b3
H2N - C - C - N - C - C - N - C - C - N - C - COOH
y3
y2
y1
C-terminal fragments
36Nomenclature for MS Sequencing of Peptides
- Low-energy collisions promote fragmentation of a
peptide primarily along the peptide backbone - Peptide fragmentation which maintains the charge
on the C terminus is designated a y-ion - Fragmentation which maintains the charge on the N
terminus is designated a b-ion - Low energy collisions ion trap, QQQ, QTOF,
FT-ICR - High energy collisions TOF-TOF
- cleavage of amino acid side chain bonds (d-ion
and w-ion) - differentiate Leu vs. Ile
37Peptide Sequencing by Mass Spectrometry
y4-14
LVDKVIGITNEEAISTAR
Cysteine Synthase A
b3-17
MS/MS of 2 charged tryptic peptides yield
(often) 1 charged product ions (but 2 charged
products can be observed as well)
y12
1261.4
y10
mixture of b-ions and y-ions are present
1091.5
y13
1374.5
Rel. Abund.
b6
668.4
b8
b5
y9
838.5
y14
555.4
990.5
y11
y5
b7
1474.4
y4
b4
b9
y6
y8
b3
y7
b14
b10
b11
b12
b13
b16
b17
b15
m/z
38Computer-based Sequence Searching Strategies
- A list of experimentally determined masses is
compared to lists of computer-generated
theoretical masses prepared from a database of
protein primary sequences. With the current
exponential growth in the generation of genomic
data, these databases are expanding every day. - There are typically three types of search
strategies employed - searching with peptide fingerprint data
- searching with sequence data
- searching with raw MS/MS data.
- One limiting factor that must be considered for
all of the approaches is that they can only
identify proteins that have been identified and
reside within an available database, or very
homologous to one that resides in the database.
39Searching with Peptide Fingerprints
- The majority of the available search engines
allow one to define certain experimental
parameters to optimize a particular search. - Minimum number of peptides to be matched
- Allowable mass error
- Monoisotopic versus average mass data
- Mass range of starting protein
- Type of protease used for digestion
- Information about potential protein modification,
such as N- and C-terminal modification,
carboxymethylation, oxidized methionines, etc.
40Searching with Peptide Fingerprints
- Most protein databases contain primary sequence
information only - Any shift in mass incorporated into the primary
sequence as a result of post-translational
modification will result in an experimental mass
that is in disagreement with the theoretical
mass. - Modifications such as glycation and
phosphorylation can result in missed
identifications. - A single amino acid substitution can shift the
mass of a peptide to such a degree that even a
protein with a great deal of homology with
another in the database can not be identified.
41Searching with Peptide Fingerprints
- A number of factors affect the utility of peptide
fingerprinting. - The greater the experimental mass accuracy, the
narrower you can set your search tolerances,
thereby increasing your confidence in the match,
and decreasing the number of "false positive"
responses. - A common practice used to increase mass accuracy
in peptide fingerprinting is to employ an
autolysis fragment from the proteolytic enzyme as
an internal standard to calibrate a MALDI mass
spectrum. - Peptide fingerprinting is also amenable to the
identification of proteins in complex mixtures. - Peptides generated from the digest of a protein
mixture will simply return two or more results
that are a "good" fit. - Peptides that are "left over" in a peptide
fingerprint after the identification of one
component can be resubmitted for the possible
identification of another component.
42Web addresses of some representative internet
resources for protein identification from mass
spectrometry data
43Mascot
- Among the first programs for identifying proteins
by peptide mass fingerprinting, MOWSE, developed
out of a collaboration between Imperial Cancer
Research Fund (ICRF) and SERC Daresbury
Laboratory, UK. - The name chosen was an acronym of Molecular
Weight Search. The MOWSE databases were fully
indexed so as to allow very rapid searching and
retrieval of sequence data. Subsequently, the
software was further developed and renamed
Mascot. - Licensed and distributed by Matrix Science Ltd.
- Specialized tools include Peptide Mass
Fingerprint, Sequence Query, and MS/MS Ion
Search. - Search output Web-based.
- Good visual representation of search quality
(graphical probability chart). - Simple graphical user interface.
- Reports MOWSE scores as a quantitative measure of
search quality.
44Mowse Scoring
- Rather than just counting the number of matching
peptides, Mowse uses empirically determined
factors to assign a statistical weight to each
individual peptide match. Rapid identification
of proteins by peptide-mass fingerprinting. Curr.
Biol. 3327, 1993.) - Scoring scheme assigns more weight to matches of
higher molecular weight peptides (more
discriminating). - Compensates for the non-random distribution of
fragment molecular weights in proteins of
different sizes. - Was first protein identification program to
recognize that the relative abundance of peptides
of a given length in a proteolytic digest depends
on the lengths of both peptide and protein. - Developed for MALDI peptide mass fingerprinting.
- Probability-Based Mowse
- Mascot incorporates a probability-based enhanced
Mowse algorithm, described in Perkins et al.
(Probability-based protein identification by
searching sequence databases using mass
spectrometry data. Electrophoresis 203551-3567,
1999). - A simple rule can be used to judge whether a
result is significant or not. Different types of
matching (peptide masses and fragment ions) can
be combined in a single search.
45Databases
- Three components are required for database
searching support of proteomics MALDI or MS/MS
data, the algorithms used to search protein
databases with the MALDI or MS/MS data, and the
protein databases themselves. - The protein databases can be as small as one
protein, can be large, public domain databases of
all known and predicted proteins, or may be
predicted open reading frames based on genomic
sequence. - A major challenge for database searching is that
these protein databases are constantly changing,
making database search results potentially
obsolete as new entries are added that better fit
the MALDI or MS data. - Even as genomes are completed there is still flux
as new coding regions are identified and novel
mechanisms of increased translational complexity
are better understood, such as alternative splice
products, RNA editing, and ribosome slippage
leading to novel, unexpected translation products.
46Databases
- NCBI non-redundant (NCBInr)
- Non-redundant database from the National Center
for Biotechnology Information for use with their
search tools BLAST and Entrez comprised of
translated sequences from the Genbank /EMBL/DDBJ
consortium, SwissProt, Protein Information
Resource (PIR), and Brookhaven Protein Data Bank
(PDB). - New releases are published bimonthly while
updates occur daily. - OWL
- OWL is comprised of Swiss-Prot, PIR, translated
Genbank, and NRL-3D (PDB). All sequences are
compared to Swiss-Prot to remove identical and
trivially different sequences. Has not been
updated since May, 1999. - SWISSPROT
- While SwissProt contains only a subset of
proteins, the proteins in this database are much
better annotated and the sequences are much more
reliable than those available in any other
database. - MSDB
- Comprehensive, non-identical protein sequence
database maintained by the Proteomics Department
at the Hammersmith Campus of Imperial College
London. Designed specifically for MS
applications.
47Databases
- EST Clusters (dBEST)
- Division of GenBank that contains "single-pass"
cDNA sequences, or Expressed Sequence Tags
(ESTs), from a number of organisms. - ESTs are relatively short, usually 3 end
sequences from isolated mRNA. - ESTs tend to be highly redundant and the
sequence is much lower quality than from other
sources. An advantage to using these ESTs is
that they represent only expressed sequences (no
introns) and include alternative splice variants
their length, redundancy, and low quality are far
improved by using clustered ESTs, such as the
Compugen clusters. - The EST database has some redundancy because it
contains all possible combinations of alternative
splice products, and so it can be very large (and
slow to search). - During a Mascot search, the nucleic acid
sequences are translated in all six reading
frames. dbEST is a very large database, and is
divided into three sections EST_human,
EST_mouse, and EST_others. Even so, searches of
these databases take far longer than a search of
one of the non-redundant protein databases. You
should only search an EST database if a search of
a protein database has failed to find a match.
48MALDI-MS peptide fingerprint(tryptic digest of a
single protein)
1247.70
all peaks are (MH)
1116.67
1375.76
trypsin autolysis
1505.77
1424.85
1665.89
2005.07
1287.73
2719.48
1574.20
1811.85
1849.12
2476.21
2550.52
1000
1500
2000
2500
3000
m/z
49Mascot (Matrix Science) for peptide mass
fingerprints
enter peak list
50Mascot (Matrix Science) for peptide mass
fingerprints
possible identification
51Mascot (Matrix Science) for peptide mass
fingerprints
get more info on probable proteins
list of all possible matches
52Mascot (Matrix Science) for peptide mass
fingerprints
53Mascot (Matrix Science) for peptide mass
fingerprints
tryptic peptides that matched
peptides that did not match
54Mascot (Matrix Science) for peptide mass
fingerprints
tryptic peptides in protein sequence
better mass accuracy improves identification
process
55LC-MS/MS for protein identification
- To provide further confirmation of the
identification, if a tandem mass spectrometer
(MS/MS) is available, peptide ions can be
dissociated in the mass spectrometer to provide
direct sequence information. Product ions from
an MS/MS spectrum can be compared to available
sequences using powerful software toolsl. - For a single sample, LC-MS/MS analysis included
two discrete steps (a) LC-MS peptide mapping to
identify peptide ions from the digestion mixture
and to deduce their molecular weights, and (b)
LC-MS/MS of the previously detected peptides to
obtain sequence information for protein
identification.
56Automated LC-MS/MS and database searching
- Current mass spectral technology permits the
generation of MS/MS data at an unprecedented
rate. Prior to the generation of powerful
computer-based database searching strategies, the
largest bottleneck in protein identification was
the manual interpretation of this MS/MS data to
extract the sequence information. Today, many
computer-based search strategies that employ
MS/MS data require no operator interpretation at
all. - Analogous to the approach described for peptide
fingerprinting, these programs take the
individual protein entries in a database and
electronically "digest" them to generate a list
of theoretical peptides for each protein. - However, in the use of MS/MS data, these
theoretical peptides are further manipulated to
generate a second level of lists which contain
theoretical fragment ion masses that would be
generated in the MS/MS experiment for each
theoretical peptide.
57Automated LC-MS/MS and database searching
- These programs simply compare the list of
experimentally determined fragment ion masses
from the MS/MS experiment of the peptide of
interest with the theoretical fragment ion masses
generated by the computer program. - The recent advent of data-dependant scanning
functions has permitted the unattended
acquisition of MS/MS data. An example of a raw
MS/MS data searching program that takes
particular advantage of this ability is SEQUEST. - SEQUEST will input the data from a data-dependant
LC/MS chromatogram and automatically strip out
all of the MS/MS information for each individual
peak, and submit it for database searching using
the strategy discussed above. - Each peak is treated as a separate data file,
making it especially useful for the on-line
separation and identification of individual
components in a protein mixture. - SEQUEST cross-correlates uninterpreted MS/MS mass
spectra of peptides from protein/nucleotide
databases. The software can analyze a single
spectrum or an entire LC-MS/MS peptide map. - No user interpretation of MS/MS spectra is
involved.
58Match?
Proteolytic Digest
MS/MS
Experimental Fragment Masses
100
41.63
HPLC-MS
475.3
100
588.3
456.7
54.28
325.2
50
60.16
50
701.3
815.4
62.59
49.20
38.27
46.75
410.3
33.59
851.5
29.02
212.1
912.5
0
900
1000
25
30
35
40
45
50
55
60
m/z
Time (min)
59Direct identification of proteins using mass
spectrometry
- Removes the requirement to separate proteins by
electrophoresis, etc - MudPIT multidimensional protein identification
technology, or Shotgun approach - Protein lysate is digested with trypsin
- The peptide mixture is loaded onto a strong
cation exchange (SCX) column (to separate on the
basis of charge). A discrete fraction of peptides
is displaced from the SCX column using a salt
step gradient to a reversed-phase (RP) column (to
separate on the basis of hydrophobicity). - This fraction is eluted from the RP column into
the MS. This iterative process is repeated,
obtaining the fragmentation patterns of peptides
in the original peptide mixture. - MS/MS spectra are used to identify the proteins
in the original protein complex.
Link et al. Nature Biotechnology 17, 676 (1999)
60Large-scale analysis of the yeast proteome by
MudPIT
- Yates and coworkers, Nature Biotech. (2001) 19,
242-247 - Assigned 5,540 peptides to MS spectra leading to
the identification of 1,484 proteins from the S.
cerevisiae proteome - Of 6,216 ORFs in yeast genome, 83 have CAI
values between 0 and 0.20 (i.e., predicted to be
present at low levels) (Fig. A) - MudPIT data 791 or 53.3 of the proteins
identified have a CAI of lt0.2 (1.7 peptides per
protein) (Fig. B) - Number of peptides per protein increases with
increasing CAI (Fig. C)
61Approaches for Protein Sequencing and
Identification
Top Down
MS/MS
MIRERICACVLALGMLTGFTHAFGSKDAAADGKPLVVTTIGMIADAVKNI
AQGDVHLKGLMGPGVDPHLYTATAGDVEWLGNADLILYNGLHLETKMGEV
FSKLRGSRLVVAVSETIPVSQRLSLEEAEFDPHVWFDVKLWSYSVKAVYE
SLCKLLPGKTREFTQRYQAYQQQLDKLDAYVRRKAQSLPAERRVLVTAHD
AFGYFSRAYGFEVKGLQGVSTASEASAHDMQELAAFIAQRKLPAIFIESS
IPHKNVEALRDAVQARGHVVQIGGELFSDAMGDAGTSEGTYVGMVTHNID
TIVAALAR
The molecular mass of an intact protein defines
the native covalent state of a genes product,
including the effects of post-transcriptional/tran
slational modifications, and associated
heterogeneity, that are modulated by the actions
of other gene products . Moreover, the
fragmentation pattern from large proteins can
generate sufficient information for
identification from sequence databases,
particularly when combined with accurate mass
measurements of both the intact molecule and its
product ions.
62In-Source Decay (ISD) for Protein Sequencing
- Peptides and large proteins can be fragmented by
ISD - Fragmentation occurs in the MALDI ion source
- not generally well controlled
- Reflectron TOF not necessary (linear TOF
sufficient to measure product ions
- Complete sequence information not present, but
extensive stretches of sequence from the N-
and/or C-termini observed
cut
...TIDE...
63MALDI-ISD-TOF Mass Spectrometry of Proteins
Sequence Information (In-Source Decay)
Protein 1
Molecular Weight Information
Protein 2
5000
10000
15000
20000
25000
m/z
64MALDI-ISD-TOF Mass Spectrometry of Proteins
- ISD generally yields c-ions or z-ions
Sequence information from the N-terminus
A
L
Y
G
G
E
G
F
E
D
H
R
L
K/Q
E
D
PW?