Title: Overview
1New Developments for Open-Source Shotgun
Proteomics Analysis with the Trans-Proteomic
Pipeline Joshua Tasman1, Luis Mendoza1, David
Shteynberg1, James Eddes2, Ning Zhang1, Nichole
King1, Chee-Hong Wong3, Brian Pratt4, Patrick
Pedrioli2, Henry Lam1, Eric Deutsch1, Jimmy Eng5,
Xiao-jun Li6, Alexey Nesvizhskii7, Andrew
Keller8, and Ruedi Aebersold2 1Institute for
Systems Biology, Seattle, WA 2Institute for
Molecular Systems Biology (ETH), Zurich,
Switzerland, 3Bioinformatics Institute,
Singapore 4Insilicos LLC, Seattle, WA
5University of Washington 6Homestead Clinical
6Rosetta Biosoftware, Seattle, WA Seattle, WA
7Department of Pathology, University of Michigan,
Ann Arbor, MI 8Rosetta Biosoftware, Seattle, WA
End-to-End MS/MS Proteomics Analysis with the TPP
Overview
pepXML document
- We have developed a complete, end-to-end data
analysis pipeline that provides an automated,
reliable, consistent, and objective analysis of
high-throughput quantitative LC-MS/MS data from
multiple data sources using multiple search
engines. - The Trans-Proteomics Pipeline (TPP) is a
complete, mature, suite of software tools for MS
data representation, MS data visualization,
peptide identification and validation, protein
identification, quantification, and annotation,
data storage and mining, and biological
inference. - The TPP has been adopted throughout the
international proteomics community, in use at
many prominent academic and corporate labs. - We present an overview of the TPP and describe
newly available functionality. All software tools
are freely available under an open-source
software license at tools.proteomecenter.org
protXML document
mzML document
- MS/MS data Conversion from proprietary (vendor)
to open formats - Choice of common open formats mzXML (SPC/ISB) or
mzML (HUPO PSI, - SPC/ISB, and others see flagship poster 001)
- Converters for Thermo Xcalibur (.raw), Waters
MassLynx (.raw directory), - ABI/MDS Analyst (.wiff), Agilent MassHunter (.d
directory) and others
Downstream analysis with Other TPP-compatible SPC
tools
mzXML document
- Data storage and mining with PeptideAtlas and
SBEAMS (Systems Biology Experiment Analysis
Management System) - Data products of the TPP analysis pipeline are
imported into the database - Data exploration, annotation, and correlation
with other experiments can all be managed - Interface allows flexible analysis of the data
analysis across multiple experiments
Spectral search engine results file
pepXML document
Introduction
High throughput LC-MS/MS is capable of
simultaneously identifying and quantifying
thousands of proteins in a complex sample. The
consistent and objective analysis of the obtained
large amounts of data is challenging and
time-consuming. Over the past 5 years, we have
developed and refined a data analysis pipeline
that facilitates and standardizes such
analysis. The Trans-Proteomic Pipeline (TPP) is
an open-source software package with
well-established community acceptance. The TPP
provides a completely free, open-source
proteomics analysis solution, spanning
conversion of raw MS/MS data to open formats and
standards support for searching MS/MS spectra
with various search engines, including the
bundled X!Tandem engine (www.thegpm.org) as well
as Sequest, Mascot, Phenyx, OMSSA, and others
conversion of search engine results to a uniform
open format statistical validation of peptide
identifications with PeptideProphet
statistically validated protein identification
with ProteinProphet quantitative proteomics
(SILAC, ICAT, ITRAQ, etc) with XPRESS, ASAPRatio,
and Libra and tools for visualization of and
interaction with results. Here we present recent
updates to the software tools to improve analysis
functionality and user experience.
- Spectal search engine output Conversion to open
formats - Supports most common commercial and open-source
data formats - Sequest, Mascot, X!Tandem, SpectraST, Pheynx and
others
Additional visualization, statistical analysis,
and exploration tools enabling investigation of
biological meaning and significance with
Gaggle-compatible tools such as Cytoscape
(network visualization), the stats package R, and
the PIPE (Protein Inference and Property Explorer)
Methods
- The TPP is constantly improved with new
functionality. Highlights of major recent
developments include - Build system improvements and native Windows
distribution Insilicos had previously released
their own version of the TPP (the "IPP".) In
order to combine efforts more efficiently,
Insilicos has integrated their customizations
into the main TPP distribution. The TPP build
system has been improved to allow a native
windows distribution, allowing for significant
performance improvements as well as ease of
installation - Implementation of raw-to-mzML data converters and
full support for parsing mzML throughout the TPP - Implementation of vendor MS/MS-to-mzML converters
and full support for mzML input - PeptideProphet, the TPP module for peptide ID
validation, has been updated with additional
modeling capabilities to compare observed
retention time vs. calculated purported peptide
hydrophobicity. Additionally a high-mass-accuracy
model improves discrimination of IDs with data
from newer instruments. Decoy database entries
can now be taken advantage of in distribution
modeling. A semi-parametric distribution model
allows better discrimination of true and
false-positive results - Inclusion of X!Tandem (from the GPM project) for
a complete, end-to-end MS/MS searching and
validation solution - Upcoming multi-experiment data integration with
iProphet (see Poster TPU 669) - Spectral library searching with SpectraST
Results and Conclusions
Improvements from the Insilicos TPP version (IPP)
have been merged to the TPP, and the build system
has been improved to allow native windows
deployment. Significant speed improvements have
already been seen from moving away from a unix
emulation layer (Cygwin) based distribution. True
versus false-positive peptide ID discrimination
has been improved through addition of the decoy,
retention time, and high-mass-accuracy
PeptideProphet models, as well as through using a
semi-parametric distribution for describing
peptide population distributions. The open-source
search engine X!Tandem is now bundled with the
TPP, allowing us to provide a complete and free
solution for proteomics analysis. Work has begun
on integrating the OMSSA open-source engine as
well.
Specta Information (mzXML/mzML/ mzData Document)
protXML document
This work is performed under the Seattle Proteome
Center, suppored by NHLBI contract No.
N01-HV-28179. We would also like to thank all
Aebersold Lab and external developers who have
contributed to this project.