Title: Proteomics
1Proteomics Mass Spectrometry
- Nathan Edwards
- Center for Bioinformatics and Computational
Biology
2Outline
- Proteomics
- Mass Spectrometry
- Protein Identification
- Peptide Mass Fingerprint
- Tandem Mass Spectrometry
3Proteomics
- Proteins are the machines that drive much of
biology - Genes are merely the recipe
- The direct characterization of a samples
proteins en masse. - What proteins are present?
- How much of each protein is present?
4Systems Biology
- Establish relationships by
- Choosing related samples,
- Global characterization, and
- Comparison.
5Samples
- Healthy / Diseased
- Cancerous / Benign
- Drug resistant / Drug susceptible
- Bound / Unbound
- Tissue specific
- Cellular location specific
- Mitochondria, Membrane
62D Gel-Electrophoresis
- Protein separation
- Molecular weight (MW)
- Isoelectric point (pI)
- Staining
- Birds-eye view of protein abundance
72D Gel-Electrophoresis
Bécamel et al., Biol. Proced. Online 2002494-104
.
8Paradigm Shift
- Traditional protein chemistry assay methods
struggle to establish identity. - Identity requires
- Specificity of measurement (Precision)
- Mass spectrometry
- A reference for comparison (Measurement ?
Identity) - Protein sequence databases
9Mass Spectrometer
- Time-Of-Flight (TOF)
- Quadrapole
- Ion-Trap
- MALDI
- Electro-SprayIonization (ESI)
10Mass Spectrometer (MALDI-TOF)
UV (337 nm)
Microchannel plate detector
Field-free drift zone
Source
Pulse voltage
Analyte/matrix
Ed 0
Length D
Length s
Backing plate (grounded)
Extraction grid (source voltage -Vs)
Detector grid -Vs
11Mass Spectrum
12Mass is fundamental
13Peptide Mass Fingerprint
Cut out 2D-GelSpot
14Peptide Mass Fingerprint
Trypsin Digest
15Peptide Mass Fingerprint
MS
16Peptide Mass Fingerprint
17Peptide Mass Fingerprint
- Trypsin digestion enzyme
- Highly specific
- Cuts after K R except if followed by P
- Protein sequence from sequence database
- In silico digest
- Mass computation
- For each protein sequence in turn
- Compare computer generated masses with observed
spectrum
18Protein Sequence
- Myoglobin - Plains zebraGLSDGEWQQV LNVWGKVEAD
IAGHGQEVLI RLFTGHPETL EKFDKFKHLK TEAEMKASED
LKKHGTVVLT ALGGILKKKG HHEAELKPLA QSHATKHKIP
IKYLEFISDA IIHVLHSKHP GDFGADAQGA MTKALELFRN
DIAAKYKELG FQG
19Protein Sequence
- Myoglobin - Plains zebraGLSDGEWQQV LNVWGKVEAD
IAGHGQEVLI RLFTGHPETL EKFDKFKHLK TEAEMKASED
LKKHGTVVLT ALGGILKKKG HHEAELKPLA QSHATKHKIP
IKYLEFISDA IIHVLHSKHP GDFGADAQGA MTKALELFRN
DIAAKYKELG FQG
20Peptide Masses
- 1811.90 GLSDGEWQQVLNVWGK
- 1606.85 VEADIAGHGQEVLIR
- 1271.66 LFTGHPETLEK
- 1378.83 HGTVVLTALGGILK
- 1982.05 KGHHEAELKPLAQSHATK
- 1853.95 GHHEAELKPLAQSHATK
- 1884.01 YLEFISDAIIHVLHSK
- 1502.66 HPGDFGADAQGAMTK
- 748.43 ALELFR
21Peptide Mass Fingerprint
YLEFISDAIIHVLHSK
GHHEAELKPLAQSHATK
GLSDGEWQQVLNVWGK
HPGDFGADAQGAMTK
VEADIAGHGQEVLIR
HGTVVLTALGGILK
KGHHEAELKPLAQSHATK
ALELFR
LFTGHPETLEK
22Mass Spectrometry
- Strengths
- Precise molecular weight
- Fragmentation
- Automated
- Weaknesses
- Best for a few molecules at a time
- Best for small molecules
- Mass-to-charge ratio, not mass
- Intensity ? Abundance
23Sample Preparation for MS/MS
24Single Stage MS
MS
25Tandem Mass Spectrometry(MS/MS)
Precursor selection
26Tandem Mass Spectrometry(MS/MS)
Precursor selection collision induced
dissociation (CID)
MS/MS
27Peptide Fragmentation
Peptides consist of amino-acids arranged in a
linear backbone.
N-terminus
H-HN-CH-CO-NH-CH-CO-NH-CH-CO-OH
Ri-1
Ri
Ri1
C-terminus
AA residuei-1
AA residuei
AA residuei1
28Peptide Fragmentation
29Peptide Fragmentation
yn-i-1
-HN-CH-CO-NH-CH-CO-NH-
CH-R
Ri
i1
R
i1
bi1
30Peptide Fragmentation
Peptide S-G-F-L-E-E-D-E-L-K
31Peptide Fragmentation
1166
1020
907
778
663
534
405
292
145
88
b ions
K
L
E
D
E
E
L
F
G
S
147
260
389
504
633
762
875
1022
1080
1166
y ions
100
Intensity
0
m/z
250
500
750
1000
32Peptide Fragmentation
1166
1020
907
778
663
534
405
292
145
88
b ions
K
L
E
D
E
E
L
F
G
S
147
260
389
504
633
762
875
1022
1080
1166
y ions
y6
100
y7
Intensity
y5
b3
b4
y2
y3
b5
y8
y4
b8
y9
b6
b7
b9
0
m/z
250
500
750
1000
33Peptide Identification
- Given
- The mass of the precursor ion, and
- The MS/MS spectrum
- Output
- The amino-acid sequence of the peptide
34Peptide Identification
- Two paradigms
- De novo interpretation
- Sequence database search
35De Novo Interpretation
36De Novo Interpretation
37De Novo Interpretation
38De Novo Interpretation
39De Novo Interpretation
from Lu and Chen (2003), JCB 101
40De Novo Interpretation
41De Novo Interpretation
from Lu and Chen (2003), JCB 101
42De Novo Interpretation
- Find good paths in spectrum graph
- Cant use same peak twice
- Simple peptide fragmentation model
- Usually many apparently good solutions
- Amino-acids have duplicate masses!
- Best de novo interpretation may have no
biological relevance - Identifies relatively few peptides in
high-throughput workflows
43Sequence Database Search
- Compares peptides from a protein sequence
database with spectra - Filter peptide candidates by
- Precursor mass
- Digest motif
- Score each peptide against spectrum
- Generate all possible peptide fragments
- Match putative fragments with peaks
- Score and rank
44Peptide Fragmentation
K
L
E
D
E
E
L
F
G
S
100
Intensity
0
m/z
250
500
750
1000
45Peptide Fragmentation
1166
1020
907
778
663
534
405
292
145
88
b ions
K
L
E
D
E
E
L
F
G
S
147
260
389
504
633
762
875
1022
1080
1166
y ions
100
Intensity
0
m/z
250
500
750
1000
46Peptide Fragmentation
1166
1020
907
778
663
534
405
292
145
88
b ions
K
L
E
D
E
E
L
F
G
S
147
260
389
504
633
762
875
1022
1080
1166
y ions
y6
100
y7
Intensity
y5
b3
b4
y2
y3
b5
y8
y4
b8
y9
b6
b7
b9
0
m/z
250
500
750
1000
47Sequence Database Search
- Sequence fills in gaps in the spectrum
- All candidates have biological relevance
- Practical for high-throughput peptide
identification - Correct peptide might be missing from database!
48Peptide Candidate Filtering
- Digestion Enzyme Trypsin
- Cuts just after K or R unless followed by a P.
- Must allow for missed cleavage sites
- Average peptide length about 10-15 amino-acids
49Peptide Candidate Filtering
- gtALBU_HUMAN MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDL
GEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAK
No missed cleavage sites
MK WVTFISLLFLFSSAYSR GVFR R DAHK SEVAHR FK DLGEENF
K ALVLIAFAQYLQQCPFEDHVK LVNEVTEFAK
50Peptide Candidate Filtering
- gtALBU_HUMAN MKWVTFISLLFLFSSAYSRGVFRRDAHKSEVAHRFKDL
GEENFKALVLIAFAQYLQQCPFEDHVKLVNEVTEFAK
One missed cleavage site
MKWVTFISLLFLFSSAYSR WVTFISLLFLFSSAYSRGVFR GVFRR RD
AHK DAHKSEVAHR SEVAHRFK FKDLGEENFK DLGEENFKALVLIAF
AQYLQQCPFEDHVK ALVLIAFAQYLQQCPFEDHVKLVNEVTEFAK
51Peptide Scoring
- Peptide fragments vary based on
- The instrument
- The peptides amino-acid sequence
- The peptides charge state
- Etc
- Search engines model peptide fragmentation to
various degrees. - Speed vs. sensitivity tradeoff
- y-ions b-ions occur most frequently
52Mascot Search Engine
53Mascot MS/MS Ions Search
54Mascot MS/MS Search Results
55Mascot MS/MS Search Results
56Mascot MS/MS Search Results
57Mascot MS/MS Search Results
58Mascot MS/MS Search Results
59Mascot MS/MS Search Results
60Mascot MS/MS Search Results
61Mascot MS/MS Search Results
62Mascot MS/MS Search Results
63Mascot MS/MS Search Results
64Summary
- Protein identification by mass spectrometry is a
key element of proteomics and systems biology. - Mass spectrometry sequence databases represent
a huge leap for protein (bio-)chemistry. - Sample prep, instruments and algorithms still
maturing, much work to be done.
65Further Reading
- Matrix Science (Mascot) Web Site
- www.matrixscience.com
- Seattle Proteome Center (ISB)
- www.proteomecenter.org
- Proteomic Mass Spectrometry Lab at The Scripps
Research Institute - fields.scripps.edu
- UCSF ProteinProspector
- prospector.ucsf.edu