Tandem MS - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Tandem MS

Description:

Therefore, when PTM is present, one has to try different PTM possibilities to ... I plan to teach SPIDER in my Advanced Bioinformatics class. ... – PowerPoint PPT presentation

Number of Views:84
Avg rating:3.0/5.0
Slides: 60
Provided by: bin1
Category:

less

Transcript and Presenter's Notes

Title: Tandem MS


1
Part II.
  • Tandem MS

2
Mass Analyzer (2) Quadrupole
Ions are lost
Mass filter complete spectrum is obtained by
scanning whole range
Mass range 10- 4,000 Da
3
Hybrid Quadrupole/Time-of-Flight (Q-TOF) MS

Q1 Selection
Q2 Collision
Pusher
Detector
TOF with reflectron
4
Electrospray MS and MS/MS of Proteins
5
Sample Preparation
tissue
gel
fraction
GTDIMR
HPLC
PAK
To MS/MS
MPSER


peptides
Add trypsin
6
Tandem Mass Spectrometer
QTOF
detector
parent ions
fragment ions
ions

Quadrupole mass analyzer
P

AK

TOF mass analyzer
MPSER
PAK





collision
AK
P
P
K
PA
AK


PAK
PAK


K

PAK
PA
SG


K
PA
PAK

peptide sequencing
ESI
7
How Does a Peptide Fragment?
m(y1)19m(A4) m(y2)19m(A4)m(A3) m(y3)19m(A4)
m(A3)m(A2)
m(b1)1m(A1) m(b2)1m(A1)m(A2) m(b3)1m(A1)m(
A2)m(A3)
8
How MS/MS corresponds to peptide
L
G
E
R
b1
b3
b2
N-term
m/z
R
E
G
L
y1
y3
y2
C-term
m/z
9
Put both together
R
E
G
L
L
G
E
R
m/z
m/z
In practice, there are many more peaks other than
b and y peaks Many b and y peaks may disappear.
10
Matching Sequence with Spectrum
11
LGSSEVEQVQLVVDGVK
peptide sequence
tandem mass spectrometry
MS/MS spectrum
12
Database Search Methods
  • Mascot
  • matrix sciences
  • General software
  • Sequest
  • John Yates et. al.
  • Distributed by Thermo Finnigan.
  • Works for Thermos LTQ.
  • PEAKS
  • Bin Ma et. Al.
  • Distributed by Bioinformatics Solutions Inc.
  • General software

13
Mascot
14
PEAKS
15
De Novo Sequencing
  • De Novo Sequencing (Dancik et al., JCB
    6327-342.)
  • Given a spectrum, a mass value M, compute a
    sequence P, s.t. m(P)M, and the matching score
    is maximized.
  • We consider the matching score of P is the sum of
    the scores of the matched peaks.

16
Spectrum Graph Approach
  • Convert the peak list to a graph. A peptide
    sequence corresponds to a path in the graph.
  • Bartels (1990), Biomed. Environ. Mass Spectrom
    19363-368.
  • Taylor and Johnson (1997). Rapid Comm. Mass Spec.
    111067-1075. (Lutefisk)
  • Dancik et al. (1999), JCB 6327-342.
  • Chen et al. (2001), JCB 8325-337.

17
Difficulties
  • Spectrum graph approach has difficulties to
    handle errors
  • Missing of ions break a path.
  • Too many peaks in a small error tolerance too
    many edges connecting to the same peak. (reduce
    efficiency)
  • Error accumulation.
  • A peak is used as both a y-ion and a b-ion.
  • It is still possible to solve these problems
    under the spectrum graph schema
  • E.g. The y-b overlap problem had been addressed
    by Dancik et al (1999) and Chen et al. (2001).
  • But things are getting complicated.
  • A reliable signal preprocessing is required.

18
PEAKS approach
  • It is more natural and easier to handle the
    errors and noises.
  • Less dependent to the signal preprocessing.
  • Solved the missing ions and y-b overlap problems
    naturally.
  • Showed great success on real-life lab data.
  • Has been licensed by tens of research labs in
    public and private sectors.

19
A simplified case Counting Only Y-ions
20
The Score of a Suffix
19
y1
y2
y3
Let Q be a suffix of the peptide. It can
determine some y-ions.
score(Q) are the sum of scores of those y-ions of
Q.
21
Recursive Computation of DP(m)
19
Q
a
Suppose Q is such that DP(m)score(Q).
score(Q)DP(m(Q))
Do not know a?
22
Dynamic Programming
  • for m from 0 to M
  • backtracking to decide the optimal peptide.

23
PEAKS The Software
24
Comparison
  • LCQ data (Iontrap instrument)
  • Generously provided by Dr. Richard Johnson. 144
    spectra.
  • Micromass Q-Tof data
  • Measured in UWOs Protein ID lab. 61 spectra
  • Sciex Q-Star data
  • Provided by U. Victorias Genome BC Proteomics
    Centre. 13 good/okay spectra.

25
PEAKS v.s. Lutefisk
  • completely correct sequences
  • 38/144 v.s. 15/144
  • correct amino acids
  • 1067/1702 v.s. 767/1702 v.s.
  • partially correct sequences with 5 or more
    contiguous correct amino acids
  • 94/144 v.s. 64/144

26
PEAKS v.s. Micromass PLGS
  • completely correct sequences
  • 23/61 v.s. 7/61
  • correct amino acids
  • 559/764 v.s. 232/764
  • partially correct sequences with 5 or more
    contiguous correct amino acids
  • 50/61 v.s. 24/61

27
PEAKS v.s. Sciex BioAnalyst
  • completely correct sequences
  • 7/13 v.s. 1/13
  • correct amino acids
  • 115/150 v.s. 86/150
  • partially correct sequences with 5 or more
    contiguous correct amino acids
  • 12/61 v.s. 7/61

28
Post Translational Modification (PTM)
29
PTM
  • PTMs are important to the functions of proteins.
  • There are more than 500 types of PTMs included in
    the unimod PTM database.
  • For example Reversible phosphorylation of
    proteins is an important regulatory mechanism.
    Many enzymes are switched "on" or "off" by
    phosphorylation and dephosphorylation. This is
    done by the structural change caused by the PTM.

30
Phosphorylation
Monoisotopic mass change PO3H 79.966
pS
pT
pY
H
H
H
S
T
Y
31
PTM increases complexity
  • Most protein databases do not have the PTM
    information. Therefore, when PTM is present, one
    has to try different PTM possibilities to match a
    peptide with a spectrum.
  • For peptide LGSSEVTMVYLK, if only phosphorylation
    is considered, there are 16 possibilities.
  • What if there are 10 possible PTM sites?
  • This type of PTMs are called variable PTMs.

32
Fixed PTM
  • Some PTMs are know to present all the time.
  • These are called fixed PTM.
  • Oxidation of M. Mass 16.
  • It happens automatically in the air. So people
    often make sure that all of the M are oxidized.
  • carboxyamidomethyl cysteine (CamC). Mass 57.02
  • These are added intentionally to break the
    disulphide bonds.
  • Fixed PTMs are easier.

33
Variable PTM in DB Search and DeNovo
  • For DB search, have to try different
    combinations.
  • For De Novo, each variable PTM is like adding a
    new amino acid.
  • For example, if pS, pT, pY are variable, then
    instead of having 20 characters in alphabet, we
    have 23.
  • But too many variable PTMs will reduce the
    accuracy of the de novo sequencing.

34
Peptide Identification v.s. Protein Identification
35
Common procedure for protein ID
Protein ID
digestion
Peptide sequencing
MS/MS
36
Problems
  • A peptide appears in several proteins.
  • A protein family may share many peptides.
  • Usually only one of them is true.
  • A protein may have only one peptide or two weak
    peptides, is it true or false positive?
  • The one hit wonder.

37
Estimate False Positives
  • Suppose you have a score for each identified
    protein. You want to choose a score threshold T.
  • Score gtT ? positive (keep)
  • Score ltT ? negative (discard)
  • It is important to estimate the false positive
    rate for each given result.
  • False Positive Rate
  • In statistics, FPR false positives/negative
    results.
  • We care more about FPR false
    positives/results reported as positives.

The two definitions are different!
38
Decoy Database Method
  • Choose a decoy database
  • for example, reverse the database.
  • Anything from this database is false.
  • Search in a real database and a decoy database
    separately
  • For same T, if there are x proteins in the decoy
    database gtT, then perhaps there are x false
    proteins in the real database with score gtT.
  • Threshold T,
  • real db has 497 proteins gtT,
  • decoy db has 7 proteins gtT,
  • False positive rate is 7/497 1.4

39
Problems
  • Only works for large dataset.
  • Not statistically significant when dataset is
    small.
  • Does not care how many proteins are actually
    kept.
  • Keeping only the true results is not our only
    goal, we also want to keep as many as true
    results as possible.
  • Decoy database is only good for validation and
    cannot substitute a good scoring method.

40
SPIDER listen to both parties!
????,????
  • The solution when there is no protein database
    and no perfect MS/MS.

41
de novo sequencing
EISGNEVR
SI
homology search
PEAKS Ma et. al, Rapid Comm. Mass Spec. 2003
PatternHunter Ma, Tromp and Li, Bioinformatics.
2002
ESIGSEVR
SPIDER Han, Ma and Zhang, JBCB. 2005
42
Two purposes of our research
  • Given de novo sequence with errors, find homolog
    of the real sequence. (searching)
  • Using the de novo sequence and the homolog as
    input, compute the real sequence. (sequencing)

43
Listen to both sides and you will be
enlightened Heed only one side you will be
benighted.
de novo
homolog
LSCFAV
DACFKAV
EACFAV
44
Homology mutations
  • Sequence alignment
  • Also called edit distance

EACF-AVQR DACFKAV-R
45
Common de novo sequencing errors
AN? NA? GAG?
same mass replacement
46
Two exercises
(denovo) X LSCFAV (real) Y SLCFAV
(homolog) Z SLCF-V
(denovo) X LSCFV (real) Y EACFV
(homolog) Z DACFV
blosum62
m(LS)m(EA)200.1mu
47
More formally
  • Let
  • Sequencing Given de novo sequence X, homolog Z,
    find Y such that is minimized.
  • Let
  • Searching search a database for Z such that
    d(X,Z) is minimized.

48
How to compute ds(X,Y)
  • Easily align X and Y together (according to
    mass).
  • For each erroneous mass block with mass mi,
    define the cost to be
  • Define

49
How to compute d(X,Z)
  • A multiple alignment can be built from alignments
    (X,Y) and (Y,Z).
  • Lemma
  • Dynamic Programming!
  • Let

50
Four cases of the last Block
(A)(B)(C) no sequencing error
D(i,j) is the minimum of the four cases.
51
How to compute
52
Three cases of the alignment
(1)
(2)
(3)
53
The algorithm for computing
  • 1. for m from 0 to m(X) step ?
  • for i from 0 to Z
  • for j from i to Z

Time complexity
54
The algorithm for computing d(X,Z) and Y
  • 1. for i from 1 to X
  • for j from 1 to Z
  • 2. output D(X,Z) as d(X,Z).
  • 3. backtracking to get the best middle sequence Y.

Time complexity
Total time complexity
55
Experiment
28
1315
PEAKS
EAEGNEVR
ALBU_BOVIN
SPIDER
28
ESIGSEVR
  • 28 spectra from ALBU_BOVIN.
  • PEAKS de novo sequencing gives 13 correct and 15
    partially correct sequences
  • SPIDER found good peptide homologues in human
    protein DB for all.
  • 24 constructed correct peptide sequences.

ESIGNEVR
244
56
Two exemplary results
sequencing errors
(denovo) X FVEltRDGgtLVTDTLK (real) Y FVE VTK
LVTD LT K (homolog)Z FAEltVSKgtLVTDLTK
homology mutations
(denovo) X CCQW DAEACAFltNNgtltPGgtK (real)
Y CCK AD DAEAC FA VE GP K (homolog)Z
CCKADDKETCFAltEEgtltGKgtK
57
Four modes in SPIDER
  • Homology mode
  • Non-gapped homology mode
  • Assume sequencing error and homology mutations do
    not overlap.
  • Segment match mode
  • Assume no homology mutations.
  • Exact match mode
  • Assume no sequencing errors or homology mutations.

58
Experiment
  • 144 ion trap MS/MS spectra, lower quality
    spectra.
  • The proteins are all in Swissprot but not in
    human database.
  • PEAKS 2.0 was used to de novo sequence.
  • SPIDER searches Swissprot and human databases,
    respectively.

59
People like SPIDER
  • Best Paper Award at CSB2004
  • Some random emails we received
  • I'm a big SPIDER fan! Shinichi Iwamoto,
    Shimadzu Corporation
  • The results I've been getting have been
    consistently very good. Thank you for this great
    piece of software! Jason W. H. Wong, University
    of Oxford
  • Your software is by far the fastest and more
    user-friendly I have found. Juan Luis,
    University of Georgia
  • I plan to teach SPIDER in my Advanced
    Bioinformatics class. I wonder if your powerpoint
    slides are available?Pavel Pevzner, Ronald R.
    Taylor Professor of Computer Science, UCSD
  • Included in PEAKS as both a separate tool and an
    intermediate step in protein candidates
    generation.
  • The best is yet to come
  • People just started using the de novo homology
    approach.
Write a Comment
User Comments (0)
About PowerShow.com