Title: Cerebral: Visualizing Multiple Experimental Conditions on a Graph with Biological Context
1Cerebral Visualizing MultipleExperimental
Conditions on a Graph with Biological Context
- M.Sc Thesis Presentation
- Aaron Barsky
- Supervisor Tamara Munzner
- May 21st, 2008
2Cerebral
- Collaboration with systems biologists
- Hancock innate immunity laboratory
- Integrate
- System model (Graph)
- Experimental measurements
3Outline
- Cellular systems biology
- Design decisions
- Related work
- Constrained simulated annealing graph layout
- Interactive data exploration video
- Conclusions / Future work
4Biomolecular interactions are selective
- Cell densely packed with biomolecules
- Interactions rare
- Model interactions as a graph
Image from Nature Publishing group
5Systems biology model
- Graph G V, E
- V proteins, genes, DNA, RNA, tRNA, etc.
- E interacting molecules
- Reactions, information exchange, controls
6Graph summarizes extensive lab work
- Graphs extracted from database
- Each edge summarizes experimental evidence
- TIRAP an adapter molecule in the Toll signaling
pathway. Horng T, Barton GM, Medzhitov R. - Mal (MyD88-adapter-like) is required for
Toll-like receptor-4 signal transduction. Fitzgera
ld KA, Palsson-McDermott EM, Bowie AG, Jefferies
CA, Mansell AS, Brady G, Brint E, Dunne A, Gray
P, Harte MT, McMurray D, Smith DE, Sims JE, Bird
TA, O'Neill LA.
7Models are dynamic
- No official summary source
- Changes with each publication
- Exploration of diagrams useful
- Choose scope to manage complexity
- Reactions associated with a biomolecule
- Reactions associated with a system
- Reactions associated with an organism
8TLR4 context E74, V54
9Immune system context E1263, V760
10Immune system context E1263, V760
11Human cell (E50,000, V10,000)
12Model interprets experiments.Experiments refine
model.
- Systems biologists
- Conduct experiments on cells
- Interpret results in current model
- Propose modifications to the model
13Microarray experiments measure gene expression
level
- Cells express genes to create proteins
- Proteins are specialized tools
- Associate real value with each graph node
Image www.immb.forth.gr
14LL-37 Example
- Suspect LL-37 helps reduce inflammation
- Drug not part of model
- Conduct experiment
- Treat some cells with LL-37
- Controls untreated
- Expose all cells to bacteria
- Measure gene expression over 4 time points
15Experiment results
- Context TLR4 immune response
16Cerebral
17Thesis contributions
- Cerebral Interactive exploration tool
- Views multiple experimental conditions
- In context of graph model
- Facilitates comparison between pairs of
conditions - Graph layout algorithm
- Uses biological meta-data
18Outline
- Cellular systems biology
- Design decisions
- Related work
- Constrained simulated annealing graph layout
- Interactive data exploration video
- Conclusions / Future work
19Many visualization options
- Input
- Graph G V, E
- Descriptive meta-data for each v in V
- Labels, biological attributes
- Sets of experimental results
- Multiple float values associated with each v in V
- Output
- Graphical representation
20Our choices
- Graph layout guided by biological meta-data
- Small multiple views for experimental conditions
- Parallel coordinates for a measurement- driven
view
21Traditional graph layout
- Given graph GV,E
- Create layout in 2D plane
Circular (Six and Tollis, 1999)
Force-directed (Fruchterman and Reingold, 1991)
Hierarchical (Sugiyama 1989)
22Good layout criteria
- Short edges
- Minimal edge crossings
- Minimal node-edge overlap
-
- Compactness
- Symmetry
- Empirical Evaluation of Aesthetics-based Graph
Layout (Purchase, 2002) - Many criteria NP hard
23Biologists found existing layouts unsuitable
- Generic layout criteria form unexpected groupings
- Thats weird. Why is that transcription factor
beside that cell surface protein? - Biologists want graph layout to encode
biological structure
24Biological cells divided by membranes
Image courtesy of Dr.G Weaver
- Interactions generally occur within a
compartment - Crossing membranes interesting
25Hand-drawn diagrams
- Cellular location encoded spatially
26Cerebral spatial encoding
- Similar to hand-drawn
- Spatial position reveals
- Location in cell
- Function
27Small multiple views for experimental conditions
- One graph instance per condition
- Each graph coloured according to the condition
- Tufte, 1990
28Animation over time
29Visual memory poor
- Matthew Plumlee and Colin Ware. Zooming versus
multiple window interfaces Cognitive costs of
visual comparisons. Proceedings of the ACM SIGCHI
Conference on Human Factors in Computing
Systems,13(2)179-209, 2006.
- Barbara Tversky, Julie Bauer Morrison, and
Mireille Betrancourt. Animation can it
facilitate? International Journal of
Human-Computer Studies, 57(4)247-262, 2002.
30Embedded glyphs
- Embed multiple conditions as a chart in the node
- Good detail in local view
- Westenberg 2008
31Glyphs invisible in global view
32Saraiya study
- Purvi Saraiya, Peter Lee, and Chris North.
Visualization of graphs with associated
timeseries data, 2005. - Compared 4 interfaces for analyzing expression
data in a graph - Animated coloured nodes outperformed glyphs
- Multiple linked views improve accuracy
- Aim to do better by distributing over space vs.
over time
33Parallel coordinates for ameasurement driven view
- Each experimental condition is an axis
- Each node in the graph is a line
34Clusters indicate similar function?
- Data driven hypothesis
- Same pattern of gene expression same role in
cell - Parallel coordinates alone untrustworthy
- Data noisy
- Different clustering algorithm different
results
35Linked graph and clustering aid exploration
36Outline
- Cellular systems biology
- Design decisions
- Related work
- Constrained simulated annealing graph layout
- Interactive data exploration video
- Conclusions / Future work
37Related Work
- Systems biology graph viewers
- Constrained graph layout
38Systems biology graph visualization systems
- Cytoscape (Shannon et al. 2003)
- VisANT (Hu et al. 2004)
- GeneSpring (Silicon Genetics)
- GenMapp (Dalquist et al. 2002)
- Graph layout without biological context
- Overlay only a single condition at a time
VisANT (Hu et al. 2004)
39Multiple coordinated views
- Multiple linked views of experiment data
- HCE (Seo and Schneiderman 2002)
- SpotFire (Tibco SpotFire)
- No graph view
40Constrained graph drawing
- Force directed
- Contain with repulsive walls
- Graph drawing by force directed placement
(Fruchterman and Reingold, 1991) - A constrained, force-directed layout algorithm
for biological pathways (Genc and Dogrusoz 2004) - Force balancing a challenge
- Parameter tweaking
- Brittle
41Quadratic programming
- Numerical approaches
- Constrained graph layout (He, Marriott 1998)
- IPSep-CoLa An incremental procedure for
separation constraint layout of graphs (Dwyer,
Marriott 2006) - Handles separation constraints
- Ideal edge length parameter
- Requires tweaking
- Does not optimize edge crossings
42Graph layout with simulated annealing
- Simple and flexible
- Drawing graphs nicely using simulated annealing
(Davidson and Harel 1996) - Automatic drawing of biological networks using
cross cost and subcomponent data (Kato and
Nagasaki 2005) - Historically slow O(V3)
- Our system, Cerebral expected O(EvV)
43Outline
- Cellular systems biology
- Design decisions
- Related work
- Constrained simulated annealing graph layout
- Interactive data exploration video
- Conclusions / Future work
44Graph layout constraints
- Restrict node placement to band according to
subcellular localization - Cluster activated proteins by function
- Optimize edge length, crossings, etc..
45Simulated annealing search
- Choose a random graph layout
- Repeat until cool
- Repeat O(N)
- Move a random node to a new position
- Score new position with evaluation function
- If improved
- accept change
- Else
- accept change with probability 1
Reduce temperature
46Adapting SA to constraints
- Hard constraints
- Layer by subcellular localization
- Soft constraints
- Minimize
- Edge length
- Edge-edge crossings
- Node-edge crossings
- Distance to biologically similar neighbours
Extracellular
Plasma membrane
Cytoplasm
Nucleus
47Soft constraint violation evaluation is frequent
- Must be efficient
- Innermost loop
- 50 cooling cycles 30N nodes
- 1500N evaluations
48Discretization key to efficiency
- Limit node positions to grid centers
- Uniform grid (Akman et al.,1989)
49Clean, regular layouts
50No node overlaps
- Overlaps in dense areas of force directed
algorithms
- Cerebral Impossible by construction
- - No cost to evaluate
51Calculations with L1 distance
- Manhattan or L1 distance
- Cheap, integer only arithmetic
- Measures
- Edge length
- Distance to functional neighbours
52High speed edge crossing count estimation
- Edge crossing count could be very expensive
- Count everything O(E2) or O(E log E)
- Count just the moved node O(deg(n)E)
- Over 98 of unoptimized time
- Cerebral Good estimate
- O (vV)
- Integer only
53Quickly find cells with modified Bresenhams
algorithm
- Modified Bresenhams
- Green classic
- Purple additional corners
- Update grid cell each time edge is moved
54Track edges in each grid cell
- Add up edges in cells a line passes through
- No expensive line intersection tests
- Upper bound estimate
55High angular resolutionaids reading of edge
crossings
- Exact intersection test 3 crossings
- Approx. intersection test 12 crossings
56Cerebral TLR4 (E74, V57) Time2.9 sec
57Force-directed TLR4 Time lt1 sec
58IPSep-CoLa Time1.3 sec
59Cerebral innate immunity (V1263, N760) Time62
sec
60Force-directed Time 64 sec
61IPSep CoLa Time 296 sec
62Cerebral graph layout
- Shows subcellular localization through spatial
positioning - Groups response proteins by biological function
- Requires no user specified parameters
- Runs in expected time O(EvV)
- A few minutes for 1000 nodes and edges
63Outline
- Cellular systems biology
- Design decisions
- Related work
- Constrained simulated annealing graph layout
- Interactive data exploration
- Conclusions / Future work
64Usable, but complex
65Video
66Outline
- Cellular systems biology
- Design decisions
- Related work
- Constrained simulated annealing graph layout
- Interactive data exploration video
- Conclusions/ Future work
67Released in two stages
- Cerebral 1.0 (2007)
- Biologically based graph drawing only
- Announced in a Bioinformatics Application Note
- Cerebral 2.0 (now)
- Multiple experiment viewing with small multiple
and parallel coordinate views
68Implemented as a plugin for Cytoscape
- Cytoscape (Shannon et al. 2003)
- Open source systems biology tool
- Provides model/attribute management
- Replaced standard graph renderer
- Biologically based graph layout
- Added small multiple and parallel coordinate views
69Biologists are using Cerebral
- Published Cerebral-created diagrams to
communicate results - Matthew D Dyer, T. M Murali, and Bruno W Sobral.
The landscape of human proteins interacting with
viruses and other pathogens. PLoS Pathogens,
4(2)e32, 2008.
70- Liqun He et al. The glomerular transcriptome and
a predicted protein-protein interaction network.
Journal of the American Society of Nephrology,
19(2)260-268, 2008.
71InnateDB links to Cerebral
- Integrated as a visualization system for InnateDB
72Future Work
73Visual scaling
- Human cell (V10,000, E 50,000) Time 6199 sec
74Support Organelles
- Membrane-bound regions within Cytoplasm
- Mitochondria
- Lysosomes
- Vesicles
Mitochondria
Vesicles
Cytoplasm
75Biology-specific clustering
- Replace k-means with biology specific clustering
algorithm
76Indicate data-mining bias
- Exploration without hypothesis
- Is pattern a signal?
- Measure significance of pattern given
- Graph size
- Connectivity of members
- Number of experiments
77Conclusion
- Cerebral
- Visualizes experimental data from multiple
conditions simultaneously - Allow interactive exploration of the data
- Uses biological meta-data to guide the graph
layout
78Acknowledgements
- Funding
- Agilent Technologies
- Robert Kincaid
- Hancock lab members
- Jennifer Gardy, David Lynn, Bob Hancock
- Supervisor
- Tamara Munzner
- Information visualization group
- - Stephen Ingram, Peter McLachlan, Dan
Archambault, Heidi Lam, James Slack, and Ciaran
Llachlan Leavitt
79Yeast cell cycle data
- Measure gene expression levels at 24 time points
in yeast - Through a cell divide cycle
- Observe
80(No Transcript)
81(No Transcript)
82(No Transcript)
83(No Transcript)
84Principles of perception
- Group objects by colour/shape/size
- Group by rows, not columns
- Automatically by visual system
- Information Visualization - Perceptions for
Design. Ware (2004) -
85Spatial position overrules colour/shape/size
groupings
- Automatically view 3 groups of mixed objects
- With effort can group by
- Size
- Shape
- colour
86Connectedness even stronger than position
- Each group would be perceived differently without
the connecting line
- Information Visualization - Perceptions for
Design. Ware (2004)
87(No Transcript)