Title: Identifying functional subnetworks in large-scale datasets
1Identifying functional subnetworksin large-scale
datasets
- Benno Schwikowski
- Institut Pasteur Systems Biology Group
- http//systemsbiology.fr
2The three levels of this talk
- Discovery of pathways active in HepC infection
- Cytoscape plug-ins
- Cytoscape platform
3Hepatitis C infection
- One person out of 30 is infected
- No vaccine exists
- In 20 of chronic infections, liver fibrosis and
cirrhosis - Frequently requires liver transplants
4Studying HepC infection mRNA changes
- 50 of transplant livers become re-infected with
Hepatitis C - Study expression of 7000 genes in re-infected
livers after transplantation - 1-24 month post-transplant
- Samples in 3-6 month intervals
- 28 biopsies from 11 patients
- Mixture of hepatocytes, hepatic stellate cell,
Kupffer cells, various types of blood cells - Compare against pre-transplant reference pool
5Result of mRNA expression analysis
- Most genes (5968 of 7000)were significantly
under- or overexpressed in one or more
experiments - High patient-to-patient variation
6Our approach
- Construct seed networkamong known molecular
players - Expand seed networkto include differentially
expressed genes - Identify putative pathwaysby the Active Modules
approach
7Seed network
8InteractionFetcher plug-in
- Purpose
- Dynamically retrieves remote information for
selected nodes - From SQL database
- Requests data via XML-RPC protocol
- Currently implemented types
- Protein/gene synonyms
- Orthologs
- Sequences (DNA, protein, DNA upstream)
- Gene, protein,
- Interactions/associations
- Options
- Cross-species queries
- Ortholog information from Homologene
- Inferred interactions (interologs)
- Interactive links to Source Web pages
- 100 open-source (client and server)
92. Expand seed network
- Purpose
- Bring significantly up-/downregulated genes into
the picture - Approach
- Add interactions with differentially expressed
genes (in silico pull-down) - Use BIND, HPRD databases
- Only human-curated interactions
10- Network after InteractionFetcher expansion
11Identifying putative pathwaysWhy clustering can
be problematic
- Many clustering methods are not model-based ?
significance of clusters is unclear - Any given cluster may not be supported by all
experiments noise problem - Clusters tend to contain unrelated genes with
vaguely similar profiles
12The three levels of this talk
- Discovery of pathways active in HepC infection
- Cytoscape plug-ins
- Cytoscape platform
13How can the clustering issuesbe addressed? The
ActiveModules Plug-in
- Define up-/downregulated on the basis of a
well-defined statistical model - Also derive clusters from some of the input
experiments - Use additional evidence to focus on plausible
clusters ? protein interactions
14Interaction networks
Schwikowski, Uetz, FieldsNature Biotechnology
(2000)
15Modular organization of interaction networks
16A lot of interaction data is becoming available
- Databases on...
- Protein-protein interactions
- Protein-DNA interactions
- Genetic interactions
- Metabolic pathways
- Cell signaling pathways, similarity
relationships, literature-based relationships
17Multi-criteria detection of modules
1. Interaction networkbetween genes/proteins
2. Differential Gene/ProteinAbundances/Activities
Experiments
Genes ??
18Scoring a module candidate
Perturbations /conditions
Pz 1-F(zA(j))
Rank adjustment Binomial summation
rA(j)F-1(1-pA(j))
m total number of conditions j size of subset
of conditions
Ideker, Ozier, Schwikowski, Siegel(2002)
Bioinformatics 18. S233-240
19Pathways in Rosettas compendium(300 conditions)
20The three levels of this talk
- Discovery of pathways active in HepC infection
- Cytoscape plug-ins
- Cytoscape platform
21Active Modules plug-in appliedto HCV
re-infection data
- Iterative application results in four significant
highly overlapping subnetworks - Repeat analysis only retaining late-active
re-infection experiments - Eliminates pathways activated by transplant
operation - Cutoff 8 months
22Which observations can we make locally?
Network after InteractionFetcher expansion Bold
Differentially regulated subnetwork Red/Green
Late-active subnetwork
23Cytotalk plug-in
- Overrepresentation analysis using Cytotalk
plug-in, R, of overrepresentation of genes in
Gene Ontology classes - Cytotalk enables interactive communication with
- C/C programs
- Java processes
- Python
- UNIX shell scripts
- R, R scripts
- Can be run on same machine or any other
Internet-connected machine - Can function as Cytoscape plug-in
- 100 open-source
24The three levels of this talk
- Discovery of pathways active in HepC infection
- Cytoscape plug-ins
- Cytoscape platform
25Some Network Visualization Tools
- Pajek - Slovenia
- Osprey - SLRI, Toronto
- VisANT - BU
- Biolayout - EBI
- GraphViz
- PowerPoint
- Others
- Cytoscape (only open-source biology)
26Cytoscape
27Cytoscape Basic Concepts
- Objectsvisualized as nodes
- Relationshipsvisualized as edges
- Attributes (name, sequence, source,...)
- Mappingattributes ? drawing customizable
throughvisual mapper
28Cytoscape file formats
Sample interaction file
- YDR216W pd YIL056W
- YDR216W pd YKR042W
- YDR216W pd YGL096W
- YDR216W pd YDR077W
- ...
Sample interaction file
GENE DESC exp0.sig exp1.sig exp0.sig exp1.sig GEN
E0 G0 0.0 0.0 23.2 11.5 GENE1 G1 0.0 0.0 34.6 5.2
GENE2 G2 0.0 0.0 10.0 28.0 GENE3 G3 0.0 0.0 1.6
4 4.77 ...
29Cytoscape
- Display
- gene protein expression
- protein interactions (physical andnon-physical)
- protein classifications
- Analysis plug-in modules
- http//www.cytoscape.org/
- Java platform independent web-start
- 100 open-source
30Visual Styles
Display gene expressionas clear text
31Visual Styles
Map expression values to node colors using
a continuous mapper
32Visual Styles
Expression data mapped to node colors
33Multidimensional attributes
Cytoscape, pre-release plug-in Data from Ideker
et al., Science (2001)
34Layout
- 16 algorithms available through plug-ins
- Zooming, hide/show, alignment
35yFiles Circular
36(No Transcript)
37Cytoscape Core Differences to most other
approaches
- Emphasis on data analysis integration
- No built-in semantics(added by plug-ins)
- Very simple concepts
- Human-readable input formats
- Extensibility
38Cytoscape extensibility
- Core 100 open source Java
- Plug-in API
- Plug-ins are independently licensed
- Just need to do the biology
- Template code samples
Plug-in
39Biomodules plug-in
Prinz S, Avila-Campillo I, Aldridge C, Srinivasan
A, Dimitrov K, Siegel AF, and Galitski T Genome
Res. 2004 14 380-390
40Cytoscape Plugins
Modules in Complex Networks Iliana
Avila-Campillo, Tim Galitski
Discovering Regulatory and Signaling Circuits in
Molecular Interaction Networks Trey Ideker, Owen
Ozier, Benno Schwikowski, Andrew Siegel
Data Integration in Juvenile Diabetes
Research Marta Janer, Paul Shannon
A network motif sampler David Reiss, Benno
Schwikowski
41Cytoscape Core Features
- Visualize and lay out networks
- Display network data using visual styles
- Easily organize multiple networks
- Birds eye view navigation of large networks
- Supports SIF and GML, molecular profiling
formats, node/edge attributes - Functional annotation from GO KEGG
- Metanode support (hierarchical groupings)
- Extensible through plugins (20 developed)
42Baliga et al.Genome ResearchJune 2004
43Collaborators HCV
- Institute for Systems Biology, Seattle, WA
- David Reiss
- Iliana Avila-Campillo
- Vesteinn Thorsson
- Tim Galitski
44(No Transcript)
45Collaborators Cytoscape
- ISBLeroy HoodRowan Christmas
- Agilent Technologies
- Unilever PLC
- Long-term funding from NIH and participating
institutions
- UCSDTrey IdekerChris Workman
- Memorial-Sloan KetteringCancer CenterChris
SanderGary BaderEthan Cerami - Pasteur Melissa ClineAndrea SplendianiTero
Aittokallio
46Shannon, P., et al. (2003). Cytoscape A software
environment for integrated models of biomolecular
interaction networks. Genome Res 13, 2498-504.
47Collaborators Active Networks
- Trey Ideker
- Owen Ozier
- Andrew Siegel
- Richard Karp
48(No Transcript)
49Levels of Biological Information
DNA mRNA Protein Pathways Networks Cells Tissues O
rgans Individuals Populations Ecologies