Title: Essential Software for Analyzing the Proteome
1Essential Software for Analyzing the Proteome
- Proteome and Ingenuity Pathways Analysis
- 6/14/2007
- Yannick Pouliot, PhD
- Bioresearch Informationist
- Lane Medical Library Knowledge Management
Center - lanebioresearch_at_stanford.edu
2Housekeeping
- Did you bring your laptop?
- Yes plug in the power supply, boot it and put it
to sleep - No Find a seat next to an available laptop
- Enable popups in your browser
3Quick Survey
- Faculty/Postdoc/Grad student/Staff?
- Existing users of
- Proteome?
- IPA?
4Contents
- Overview of Proteome and Ingenuity Pathways
Analysis - Important features
- Demo (IPA)
- User queries (IPA)
- Comparison of the two systems
- Objectives
- Putting Proteome and Ingenuity Pathways Analysis
(IPA) in context - Valuable features you should know about
- Focus on enabling interpretation of experiment
data - When to use them
- Strengths and weaknesses
5What are Proteome and IPA?
- Both tools are knowledge bases (KBs) of curated
biomedical content useful for - Querying the literature in a robust manner
- particularly for visualizing complex
information - Analyzing users experiment data in the context
of what is known (IPA) - Used following data acquisition, clean-up and
statistical analysis
6Content Generation Through Extraction, Indexing
and Curation
- Both Proteome and IPA provide much of their value
by applying rigorous curation and indexing of the
data they provide - Systematic extraction of information from a
source (more later) - Making it truly usable by encoding the
information ? enforcing structure - Compensating for weaknesses in source data
- E.g, applying controlled vocabulary to compensate
for original text - NCBI and NLM also provide somewhat similar
curation, but less so
7Proteome Access LaneConnex ? Bioresearch Portal
? Proteome (provides proxy) Retriever User Guide
log into Proteome to access the guide
8Accessing Proteome
- User-unlimited site license purchased by Lane
Library, (thanks Prof. Butte) - Need to obtain (free) user login
- Browser configuration
- Works IE6/7
- Firefox/2.0 1.5 requires manual installation
- has minor problems with SVG viewer
- Must enable pop-ups
- Must install Adobe SVG Viewer to run visualizer
- listed on BioKnowledge Retriever page when you
log in - Numerous browser issues with latest browsers
- Internet Explorer 7 Bioknowledge Workspace (SVG)
works when within Stanford network - Firefox 2.0 SVG viewer requires manual
installation
9Proteome Content Generation
- Extractions from scientific literature
- Expert curation
- DB updated weekly
- Extractions from public databases
- GO, OMIM, Ensembl, etc, etc.
- No support for researcher data
- E.g.,
- Expression datasets
- my pathways
- ? Advantage Ingenuity
10Proteome Data Types
- Protein physical properties
- Sequence, isoelectric point, molecular mass,
transmembrane domain(s), structure, protein
domains, alternative splice forms - Gene ontology classification
- GO Molecular function
- GO Biological process
- GO Cellular component
- Protein binding
- Interaction data
- Protein-protein
- Protein complexes
- Other
- Expression pattern
- Organ/Tissue/Cell type/Tumor type
- Orthology data
- Gene families
- Related proteins
- Classified by species
- BLAST results available, starting with summary
view - Disease association
11Species Covered by Proteome
- Homo sapiens
- Rattus novergicus
- Mus musculus
- Caenorhabditis elegans
- Saccharomyces cerevisae
- Saccharomyces pombe
- Large number of pathogenic fungi
- Overall, gt200 species much greater range than
IPA - ? Advantage Proteome
12Proteome Data Additions Since 2006
13Proteome Querying Using Quick Search
- Searching with
- Protein and gene IDs from all major databases
- Wild cards and Boolean operators are supported,
e.g. - TP5. searches for TP50?TP59
- TP5 searchers for TP5ltanything of any lengthgt
- Protein name
- E.g., TP53
- Searches protein names and descriptions
- Keyword
- Finds keywords in title lines, annotations, GO
biological process, GO molecular function, and
mutant phenotype properties - Disease, including MeSH IDs
- Within Disease tab, search for D010190
(Pancreatic Neoplasms see NLMs MeSH Browser) - Using a list of identifiers
- Click Input button and submit of IDs (see list of
allowable ID types) - Using a sequence
- BLAST using protein sequence (longer sequences)
- Peptide-optimized BLAST search
Supported IDs
14Accessing Proteome
lane.stanford.edu ? bioresearch
15Main Proteome Results Page (BioKnowledge
Retriever)
- Page structure Single page, intra-page
navigation buttons
16Example of Summary Views for Sequence Alignment
Properties
17Bulk Querying
- Can query for proteins using a list of
identifiers - gene IDs from most genome databases
- protein IDs from most proteome databases
- PubMed IDs
- E.g., retrieve all proteins associated with a
given paper identified by its PMID - Example Isolation of differentially expressed
cDNAs from p53-dependent apoptotic cells
activation of the human homologue of the
Drosophila peroxidasin gene, Horikoshi et al.,
1999. - PMID 10441517
- Querying with PMID 10441517 returns 9 proteins (5
more than Entrez, not counting TP53) - Note results not identical to what Entrez
provides - Why?
- Any protein that is referenced in a paper in a
meaningful way comes up when querying with that
papers PMID - Not just listing of discovery of a gene/protein
- IPA can query with lists of gene symbols,
chemical names or drug names only ? advantage
Proteome
details
18Proteome Querying for Protein Properties
- E.g., querying on protein size in AA residues
- Not supported in IPA
- ? Advantage Proteome
Help with properties queries
19Visualizing Protein Information
- BioKnowledge Workspace used visualize data
(Adobe SVG viewer)
- Maximum 100 proteins can be loaded at one time
(more can be loaded using the Interactions menu)
20Exploring the Knowledge Space
Similar to IPA Applet
C curation summary of all curated data I
Interactions summary of proteins that have been
curated as being associated with this protein N
Names curated names for protein P Proteins
proteins that comprise the node ? circles are
orthology groups, not proteins in a given genome
Interactions Red line manually curated
connection Within the line Red circle
association only Green arrow head regulatory
interaction Yellow arrow head Modification
interaction
21Example Interesting Data Mining Using Proteome
- These proteins all have something to do with
methotrexate. - Q There are few listed interactions between
proteins, yet they are all linked by
methotrexate. - Does this mean something?
- A Quite possibly
22Proteome Offers Nice Phenotypic Querying
23Where To Get Help With Proteome
- Lane
- FAQs
- Bioresearch Informationist (Dr. Yannick Pouliot)
- Proteomes Help button
24Ingenuity Pathways Analysis (IPA)
https//analysis.ingenuity.com/pa
Disclaimer IPA is really about analyzing the
proteome via the proxy of the transcriptome ?
25Accessing IPA
- Access is through Web browser and Java app
- Central server resides at Ingenuity (Redwood
City) - All data lives on their server
- including your data if you store anything
- 1 floating seat license usable by anyone at
Stanford - License provided by CMGM Bioinformatics Resource
- Must subscribe to CMGM
- Need to obtain (free) user login
- No need to be on Stanford network
- start from lane.stanford.edu for proxy access to
literature - Individual license 8K lab licenses available.
- Browser requirements/set up
- Internet Explorer 6/7
- Firefox 1.0 and subsequent
- Safari 1.3.2 2.0.3
- Java must be installed (installation link
provides details) - Pop-ups must be enabled
26IPA Species Coverage
- Far fewer than Proteome
- Homo sapiens
- Rattus novergicus
- Mus musculus
- ? Advantage Proteome
27IPA Content Generation
- Similar to Proteomes
- Extractions from scientific literature
- Expert Extraction Objective, protocol-based
finding extraction by Ingenuity-certified, PhD
level scientists. - gt1.3 million findings, structured using
- gt584,000 unique biological concepts
- Extractions from public databases
- e.g. EntrezGene, OMIM, GO, KEGG, LIGAND
- Individual Knowledge Component
- Expression or other numerical data upload
- My Pathways
- Sharing Collaboration
This material derived from an Ingenuity
presentation
28Unique Aspects of IPA Knowledge Base
- Canonical Pathways
- Pathways substantially conserved across the
species supported by IPA - Types
- 57 metabolic
- 47 cell signaling
- More than 6,000 gene concepts represent in one or
more canonical pathway - On-going curation of additional canonical
pathways - Toxicologically-relevant gene lists
- Properties relevant to biomarker identification
29IPA Data Types
- Positive/negative data usually provided
- Protein family, domain information ? Proteome
interface stronger here - Physical properties not given directly (?) ?
Proteome content stronger here - Homologous proteins not provided
- Ontological classifications ? not queryable ? can
filter but not query - Similar to GO Molecular processes/Cellular
processes/Organismal processes - Mutant data, including ? not queryable ? Proteome
interface stronger here - Knockout data
- Deletions
- Missense
- Types of mutations dominant, loss-of-function,
effect on protein structure - Experiment types ? not queryable ? Proteome
interface stronger here - E.g., electrophoretic mobility shift assay,
pulldown assay - Interaction data (can filter but not query)
- Protein-protein, including formation of protein
complexes (association) - Protein-small molecule (e.g., drug, natural
product) - Protein-cell or tissue (e.g., nuclear matrix)
- Expression pattern ? not queryable
- Organ/Tissue/Cell type/Tumor type
Overall, Proteome is more queryable
30IPA Workflows
- IPA is built around a small number of workflows
that relate genes to activities, pathways,
locations and diseases - Searching for gene function, disease
information, chemistry-related data (including
drugs) - Gene expression analysis
- Identifying the functional significance of
patterns of gene expression - Overlaying of expression data unto canonical
pathways - NEW 2007 Molecular toxicology characterization
of a dataset - NEW 2007 Biomarker characterization of a dataset
(filtering)
31Accessing IPA
32Working in IPA
Project manager
33IPA Neighborhood Explorer
- Similar to Proteomes BioKnowledge Workspace
Controls size and contents of neighborhood
34Visualizing Gene Neighborhoods( genes that
have some type of interaction with each other)
DEMO
Conceptually similar to Proteomes BioKnowledge
Workspace
35Querying IPA
Another way to view search results as a network
of interactions. Proteins and chemicals involved
DHFR (the target for methotrexate)
Terms are hyperlinked in species-specific page
Sorted by number of findings
36Example Use Case Visualizing Activation/Inhibitio
n Interactions Upstream from TP53
Iconography
37Overlaying Features Into Neighborhood Explorer
- E.g., disease information
- details
- Functional information
38Uploading Experiment Data
- Why?
- IPA can create networks based on users gene
expression data (or other kind of measurement) - Looks for biases in network composition that
indicate likely functions associated with
regulated genes - Not covered extensively here (gt 2 hours)
- Ingenuity provides excellent tutorials and
training videos - Whats needed?
- Excel files formatted according to IPA
specification - Very easy format (new in v5)
- Templates can be downloaded
- Not just for experiment data any numerical data
will do
39Example IPA Use Case Understanding drug action
by understanding the biological processes
associated with gene networks that are correlated
with drug treatment
- Raponi, M et al., (2004) Microarray analysis
reveals genetic pathways modulated by tipifarnib
in acute myeloid leukemia, BMC Cancer, 456 - Experiment involves treatment with tipifarnib, an
inhibitor of farnesyl protein transferase used to
inhibit ras. - Tipifarnib is an orphan drug that shows clinical
response in adults with refractory or relapsed
acute leukemias. - Goal identify genetic markers and pathways that
are regulated by tipifarnib in acute myeloid
leukemia (AML) - Method ascertain tipifarnib-mediated gene
expression changes in 3 AML cell lines and bone
marrow samples from two patients with AML using a
cDNA microarray containing 7,000 human genes.
40Results Identifying the networks impacted by
tipifarnib, and what this means to the cell
- For down-regulated genes
- 23 genes that were down-regulated in patient
leukemic cells and AML cell lines were analyzed
by IPA. - The major network that was found to be
significantly down-regulated is associated with
proliferation (p 10-10). - Shaded genes are those genes identified by
microarray analysis non-shaded are genes
associated with the regulated genes based on
prior network analysis
- For up-regulated genes
- Twenty-nine genes that were up-regulated were
also analyzed by IPA - Two networks were found to be significantly
up-regulated - One network is associated with apoptosis (p
10-10) - Another network is associated with immunity (p
10-7). - Shaded genes are the genes identified by
microarray analysis non-shaded are those genes
associated with the regulated genes based on
prior network analysis
41Exporting
- Images
- Snapshot tool
- Data Excel or txt
- Lots of limitations
- E-mail
- Can e-mail dynamic networks
- Even to non-IPA users
42IPA Student Exercise
- Log into IPA
- Query for gene DHFR and drug methotrexate
using Advanced query - Select DHFR
- Bring up the Gene View ( gene report)
- Select Human
- Bring up Neighborhood Explorer
- Using the Path Explorer function to filter
43Where To Get Help
- Lane FAQs
- StanfordIPA Google User Group
- Ingenuity support very good
- support_at_ingenuity.com
44In Conclusion Comparing Proteome and IPA
- Proteome
- More protein-oriented
- Provides more information regarding protein
physical properties (including sequence), species
distribution - Easier to query rapidly (no need to launch Java
client) - Database is more queryable
- Does not analyze user data
- IPA
- More oriented toward networks and pathways
- More oriented toward function and interactions
between gene products - Better at in-depth functional analysis
- Can merge user data with their data
- Can analyze user data to propose likely
functional significance - Can store user data
- Useful for remote collaborations
45Yannick Pouliot, PhD lanebioresearch_at_stanford.edu
46(No Transcript)
47Proteome vs Ingenuity Summary
48Comparing Results Proteins Associated with A
Publication
- Protein associated with PMID 10441517
49Resources IPA Iconography
Return
50Resources IPA Supporting Materials
- Training videos
- Require loading Webex player
51Proteome Accepted IDs for List Querying
return
52IPA Supported Platforms
53IPA Invitation Project Sharing
54Interactive Canonical Pathways
55IPA Overlaying Expression Values On Top of
Canonical Pathways
56(No Transcript)
57Visualizing a Network Associated with a Set of
Expression Values
58(No Transcript)
59New in 2007