Essential Software for Analyzing the Proteome - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Essential Software for Analyzing the Proteome

Description:

Essential Software for Analyzing the Proteome – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 55
Provided by: yannick7
Category:

less

Transcript and Presenter's Notes

Title: Essential Software for Analyzing the Proteome


1
Essential Software for Analyzing the Proteome
  • Proteome and Ingenuity Pathways Analysis
  • 6/14/2007
  • Yannick Pouliot, PhD
  • Bioresearch Informationist
  • Lane Medical Library Knowledge Management
    Center
  • lanebioresearch_at_stanford.edu

2
Housekeeping
  • Did you bring your laptop?
  • Yes plug in the power supply, boot it and put it
    to sleep
  • No Find a seat next to an available laptop
  • Enable popups in your browser

3
Quick Survey
  • Faculty/Postdoc/Grad student/Staff?
  • Existing users of
  • Proteome?
  • IPA?

4
Contents
  • Overview of Proteome and Ingenuity Pathways
    Analysis
  • Important features
  • Demo (IPA)
  • User queries (IPA)
  • Comparison of the two systems
  • Objectives
  • Putting Proteome and Ingenuity Pathways Analysis
    (IPA) in context
  • Valuable features you should know about
  • Focus on enabling interpretation of experiment
    data
  • When to use them
  • Strengths and weaknesses

5
What are Proteome and IPA?
  • Both tools are knowledge bases (KBs) of curated
    biomedical content useful for
  • Querying the literature in a robust manner
  • particularly for visualizing complex
    information
  • Analyzing users experiment data in the context
    of what is known (IPA)
  • Used following data acquisition, clean-up and
    statistical analysis

6
Content Generation Through Extraction, Indexing
and Curation
  • Both Proteome and IPA provide much of their value
    by applying rigorous curation and indexing of the
    data they provide
  • Systematic extraction of information from a
    source (more later)
  • Making it truly usable by encoding the
    information ? enforcing structure
  • Compensating for weaknesses in source data
  • E.g, applying controlled vocabulary to compensate
    for original text
  • NCBI and NLM also provide somewhat similar
    curation, but less so

7
Proteome Access LaneConnex ? Bioresearch Portal
? Proteome (provides proxy) Retriever User Guide
log into Proteome to access the guide
8
Accessing Proteome
  • User-unlimited site license purchased by Lane
    Library, (thanks Prof. Butte)
  • Need to obtain (free) user login
  • Browser configuration
  • Works IE6/7
  • Firefox/2.0 1.5 requires manual installation
  • has minor problems with SVG viewer
  • Must enable pop-ups
  • Must install Adobe SVG Viewer to run visualizer
  • listed on BioKnowledge Retriever page when you
    log in
  • Numerous browser issues with latest browsers
  • Internet Explorer 7 Bioknowledge Workspace (SVG)
    works when within Stanford network
  • Firefox 2.0 SVG viewer requires manual
    installation

9
Proteome Content Generation
  • Extractions from scientific literature
  • Expert curation
  • DB updated weekly
  • Extractions from public databases
  • GO, OMIM, Ensembl, etc, etc.
  • No support for researcher data
  • E.g.,
  • Expression datasets
  • my pathways
  • ? Advantage Ingenuity

10
Proteome Data Types
  • Protein physical properties
  • Sequence, isoelectric point, molecular mass,
    transmembrane domain(s), structure, protein
    domains, alternative splice forms
  • Gene ontology classification
  • GO Molecular function
  • GO Biological process
  • GO Cellular component
  • Protein binding
  • Interaction data
  • Protein-protein
  • Protein complexes
  • Other
  • Expression pattern
  • Organ/Tissue/Cell type/Tumor type
  • Orthology data
  • Gene families
  • Related proteins
  • Classified by species
  • BLAST results available, starting with summary
    view
  • Disease association

11
Species Covered by Proteome
  • Homo sapiens
  • Rattus novergicus
  • Mus musculus
  • Caenorhabditis elegans
  • Saccharomyces cerevisae
  • Saccharomyces pombe
  • Large number of pathogenic fungi
  • Overall, gt200 species much greater range than
    IPA
  • ? Advantage Proteome

12
Proteome Data Additions Since 2006
13
Proteome Querying Using Quick Search
  • Searching with
  • Protein and gene IDs from all major databases
  • Wild cards and Boolean operators are supported,
    e.g.
  • TP5. searches for TP50?TP59
  • TP5 searchers for TP5ltanything of any lengthgt
  • Protein name
  • E.g., TP53
  • Searches protein names and descriptions
  • Keyword
  • Finds keywords in title lines, annotations, GO
    biological process, GO molecular function, and
    mutant phenotype properties
  • Disease, including MeSH IDs
  • Within Disease tab, search for D010190
    (Pancreatic Neoplasms see NLMs MeSH Browser)
  • Using a list of identifiers
  • Click Input button and submit of IDs (see list of
    allowable ID types)
  • Using a sequence
  • BLAST using protein sequence (longer sequences)
  • Peptide-optimized BLAST search

Supported IDs
14
Accessing Proteome
lane.stanford.edu ? bioresearch
15
Main Proteome Results Page (BioKnowledge
Retriever)
  • Page structure Single page, intra-page
    navigation buttons

16
Example of Summary Views for Sequence Alignment
Properties
17
Bulk Querying
  • Can query for proteins using a list of
    identifiers
  • gene IDs from most genome databases
  • protein IDs from most proteome databases
  • PubMed IDs
  • E.g., retrieve all proteins associated with a
    given paper identified by its PMID
  • Example Isolation of differentially expressed
    cDNAs from p53-dependent apoptotic cells
    activation of the human homologue of the
    Drosophila peroxidasin gene, Horikoshi et al.,
    1999.
  • PMID 10441517
  • Querying with PMID 10441517 returns 9 proteins (5
    more than Entrez, not counting TP53)
  • Note results not identical to what Entrez
    provides
  • Why?
  • Any protein that is referenced in a paper in a
    meaningful way comes up when querying with that
    papers PMID
  • Not just listing of discovery of a gene/protein
  • IPA can query with lists of gene symbols,
    chemical names or drug names only ? advantage
    Proteome

details
18
Proteome Querying for Protein Properties
  • E.g., querying on protein size in AA residues
  • Not supported in IPA
  • ? Advantage Proteome

Help with properties queries
19
Visualizing Protein Information
  • BioKnowledge Workspace used visualize data
    (Adobe SVG viewer)
  • Maximum 100 proteins can be loaded at one time
    (more can be loaded using the Interactions menu)

20
Exploring the Knowledge Space
Similar to IPA Applet
C curation summary of all curated data I
Interactions summary of proteins that have been
curated as being associated with this protein N
Names curated names for protein P Proteins
proteins that comprise the node ? circles are
orthology groups, not proteins in a given genome
Interactions Red line manually curated
connection Within the line Red circle
association only Green arrow head regulatory
interaction Yellow arrow head Modification
interaction
21
Example Interesting Data Mining Using Proteome
  • These proteins all have something to do with
    methotrexate.
  • Q There are few listed interactions between
    proteins, yet they are all linked by
    methotrexate.
  • Does this mean something?
  • A Quite possibly

22
Proteome Offers Nice Phenotypic Querying
23
Where To Get Help With Proteome
  • Lane
  • FAQs
  • Bioresearch Informationist (Dr. Yannick Pouliot)
  • Proteomes Help button

24
Ingenuity Pathways Analysis (IPA)
https//analysis.ingenuity.com/pa
Disclaimer IPA is really about analyzing the
proteome via the proxy of the transcriptome ?
25
Accessing IPA
  • Access is through Web browser and Java app
  • Central server resides at Ingenuity (Redwood
    City)
  • All data lives on their server
  • including your data if you store anything
  • 1 floating seat license usable by anyone at
    Stanford
  • License provided by CMGM Bioinformatics Resource
  • Must subscribe to CMGM
  • Need to obtain (free) user login
  • No need to be on Stanford network
  • start from lane.stanford.edu for proxy access to
    literature
  • Individual license 8K lab licenses available.
  • Browser requirements/set up
  • Internet Explorer 6/7
  • Firefox 1.0 and subsequent
  • Safari 1.3.2 2.0.3
  • Java must be installed (installation link
    provides details)
  • Pop-ups must be enabled

26
IPA Species Coverage
  • Far fewer than Proteome
  • Homo sapiens
  • Rattus novergicus
  • Mus musculus
  • ? Advantage Proteome

27
IPA Content Generation
  • Similar to Proteomes
  • Extractions from scientific literature
  • Expert Extraction Objective, protocol-based
    finding extraction by Ingenuity-certified, PhD
    level scientists.
  • gt1.3 million findings, structured using
  • gt584,000 unique biological concepts
  • Extractions from public databases
  • e.g. EntrezGene, OMIM, GO, KEGG, LIGAND
  • Individual Knowledge Component
  • Expression or other numerical data upload
  • My Pathways
  • Sharing Collaboration

This material derived from an Ingenuity
presentation
28
Unique Aspects of IPA Knowledge Base
  • Canonical Pathways
  • Pathways substantially conserved across the
    species supported by IPA
  • Types
  • 57 metabolic
  • 47 cell signaling
  • More than 6,000 gene concepts represent in one or
    more canonical pathway
  • On-going curation of additional canonical
    pathways
  • Toxicologically-relevant gene lists
  • Properties relevant to biomarker identification

29
IPA Data Types
  • Positive/negative data usually provided
  • Protein family, domain information ? Proteome
    interface stronger here
  • Physical properties not given directly (?) ?
    Proteome content stronger here
  • Homologous proteins not provided
  • Ontological classifications ? not queryable ? can
    filter but not query
  • Similar to GO Molecular processes/Cellular
    processes/Organismal processes
  • Mutant data, including ? not queryable ? Proteome
    interface stronger here
  • Knockout data
  • Deletions
  • Missense
  • Types of mutations dominant, loss-of-function,
    effect on protein structure
  • Experiment types ? not queryable ? Proteome
    interface stronger here
  • E.g., electrophoretic mobility shift assay,
    pulldown assay
  • Interaction data (can filter but not query)
  • Protein-protein, including formation of protein
    complexes (association)
  • Protein-small molecule (e.g., drug, natural
    product)
  • Protein-cell or tissue (e.g., nuclear matrix)
  • Expression pattern ? not queryable
  • Organ/Tissue/Cell type/Tumor type

Overall, Proteome is more queryable
30
IPA Workflows
  • IPA is built around a small number of workflows
    that relate genes to activities, pathways,
    locations and diseases
  • Searching for gene function, disease
    information, chemistry-related data (including
    drugs)
  • Gene expression analysis
  • Identifying the functional significance of
    patterns of gene expression
  • Overlaying of expression data unto canonical
    pathways
  • NEW 2007 Molecular toxicology characterization
    of a dataset
  • NEW 2007 Biomarker characterization of a dataset
    (filtering)

31
Accessing IPA
32
Working in IPA
Project manager
33
IPA Neighborhood Explorer
  • Similar to Proteomes BioKnowledge Workspace

Controls size and contents of neighborhood
34
Visualizing Gene Neighborhoods( genes that
have some type of interaction with each other)
DEMO
Conceptually similar to Proteomes BioKnowledge
Workspace
35
Querying IPA
Another way to view search results as a network
of interactions. Proteins and chemicals involved
DHFR (the target for methotrexate)
Terms are hyperlinked in species-specific page
Sorted by number of findings
36
Example Use Case Visualizing Activation/Inhibitio
n Interactions Upstream from TP53
Iconography
37
Overlaying Features Into Neighborhood Explorer
  • E.g., disease information
  • details
  • Functional information

38
Uploading Experiment Data
  • Why?
  • IPA can create networks based on users gene
    expression data (or other kind of measurement)
  • Looks for biases in network composition that
    indicate likely functions associated with
    regulated genes
  • Not covered extensively here (gt 2 hours)
  • Ingenuity provides excellent tutorials and
    training videos
  • Whats needed?
  • Excel files formatted according to IPA
    specification
  • Very easy format (new in v5)
  • Templates can be downloaded
  • Not just for experiment data any numerical data
    will do

39
Example IPA Use Case Understanding drug action
by understanding the biological processes
associated with gene networks that are correlated
with drug treatment
  • Raponi, M et al., (2004) Microarray analysis
    reveals genetic pathways modulated by tipifarnib
    in acute myeloid leukemia, BMC Cancer, 456
  • Experiment involves treatment with tipifarnib, an
    inhibitor of farnesyl protein transferase used to
    inhibit ras.
  • Tipifarnib is an orphan drug that shows clinical
    response in adults with refractory or relapsed
    acute leukemias.
  • Goal identify genetic markers and pathways that
    are regulated by tipifarnib in acute myeloid
    leukemia (AML)
  • Method ascertain tipifarnib-mediated gene
    expression changes in 3 AML cell lines and bone
    marrow samples from two patients with AML using a
    cDNA microarray containing 7,000 human genes.

40
Results Identifying the networks impacted by
tipifarnib, and what this means to the cell
  • For down-regulated genes
  • 23 genes that were down-regulated in patient
    leukemic cells and AML cell lines were analyzed
    by IPA.
  • The major network that was found to be
    significantly down-regulated is associated with
    proliferation (p 10-10).
  • Shaded genes are those genes identified by
    microarray analysis non-shaded are genes
    associated with the regulated genes based on
    prior network analysis
  • For up-regulated genes
  • Twenty-nine genes that were up-regulated were
    also analyzed by IPA
  • Two networks were found to be significantly
    up-regulated
  • One network is associated with apoptosis (p
    10-10)
  • Another network is associated with immunity (p
    10-7).
  • Shaded genes are the genes identified by
    microarray analysis non-shaded are those genes
    associated with the regulated genes based on
    prior network analysis

41
Exporting
  • Images
  • Snapshot tool
  • Data Excel or txt
  • Lots of limitations
  • E-mail
  • Can e-mail dynamic networks
  • Even to non-IPA users

42
IPA Student Exercise
  • Log into IPA
  • Query for gene DHFR and drug methotrexate
    using Advanced query
  • Select DHFR
  • Bring up the Gene View ( gene report)
  • Select Human
  • Bring up Neighborhood Explorer
  • Using the Path Explorer function to filter

43
Where To Get Help
  • Lane FAQs
  • StanfordIPA Google User Group
  • Ingenuity support very good
  • support_at_ingenuity.com

44
In Conclusion Comparing Proteome and IPA
  • Proteome
  • More protein-oriented
  • Provides more information regarding protein
    physical properties (including sequence), species
    distribution
  • Easier to query rapidly (no need to launch Java
    client)
  • Database is more queryable
  • Does not analyze user data
  • IPA
  • More oriented toward networks and pathways
  • More oriented toward function and interactions
    between gene products
  • Better at in-depth functional analysis
  • Can merge user data with their data
  • Can analyze user data to propose likely
    functional significance
  • Can store user data
  • Useful for remote collaborations

45
Yannick Pouliot, PhD lanebioresearch_at_stanford.edu
46
(No Transcript)
47
Proteome vs Ingenuity Summary
48
Comparing Results Proteins Associated with A
Publication
  • Protein associated with PMID 10441517

49
Resources IPA Iconography
Return
50
Resources IPA Supporting Materials
  • Training videos
  • Require loading Webex player

51
Proteome Accepted IDs for List Querying
return
52
IPA Supported Platforms
53
IPA Invitation Project Sharing
54
Interactive Canonical Pathways
55
IPA Overlaying Expression Values On Top of
Canonical Pathways
56
(No Transcript)
57
Visualizing a Network Associated with a Set of
Expression Values
58
(No Transcript)
59
New in 2007
Write a Comment
User Comments (0)
About PowerShow.com