Title: GTL Facilities Computing
1GTL Facilities Computing
- Infrastructure for 21st Century
- Systems Biology
Ed Uberbacher ORNL Mike Colvin LLNL
2Ultimate Goal is to Provide Predictive Models of
Microbes
This goal drives data collection and computing
strategy.
- Experimental
- Complete datasets
- Quantitative measurements
- Comprehensive physical characterization
- Protein expression and interactions
- Spatial distributions
- Process kinetics
- Computational
- Automated data analysis and validation
- Automated integration of diverse data sets
- Human and computer-accessible databases
- Molecular, Pathway and cell-level simulations
The goals require a new synergy between computing
and biology.
3GTL Biology ParadigmIntegrated Large-Scale
Experiment-Computing Cycles
Real-Time Analysis
Large-Scale Data Sets
Design or Revise Models
Experiment
Simulate and Generate Hypotheses
4Facility IProduction and Characterization of
ProteinsEstimating Microbial Genome Capability
- Computational Analysis
- Genome analysis of genes, proteins, and operons
- Metabolic pathways analysis from reference data
- Protein machines estimate from PM reference data
- Knowledge Captured
- Initial annotation of genome
- Initial perceptions of pathways and processes
- Recognized machines, function, and homology
- Novel proteins/machines (including
prioritization) - Production conditions and experience
5Facility II Whole Proteome AnalysisModeling
Proteome Expression, Regulation, and Pathways
- Analysis and Modeling
- Mass spectrometry expression analysis
- Metabolic and regulatory pathway / network
analysis and modeling - Knowledge Captured
- Expression data and conditions
- Novel pathways and processes
- Functional inferences about novel
proteins/machines - Genome super annotation regulation, function,
and processes (deep knowledge about cellular
subsystems)
6Regulatory Gene Network Model for Endomesoderm
Specification
Eric Davidson
Skeletogenic
7Facility III Characterization and Imaging of
Molecular MachinesExploring Molecular Machine
Geometry and Dynamics
- Computational Analysis, Modeling and Simulation
- Image analysis/cryoelectron microscopy
- Protein interaction analysis/mass spec
- Machine geometry and docking modeling
- Machine biophysical dynamic simulation
- Knowledge Captured
- Machine composition, organization, geometry,
assembly and disassembly - Component docking and dynamic simulations of
machines
8Example of Combined Experiment and Modeling to
Understand a Multiprotein Complex DNA Clamps
Clamp-Loading Mechanisms
Homology Modeling Venclovas et al. Prot. Sci.
112403 (2002)
Electron microscopy Mayanagi et al. J. Struct.
Bio. 134 35 (2001)
Atomic Force Microscopy Shiomi, et al. PNAS,
9714127 (2002)
Classical Mol. Dynamics Jeruzalmi et al. Cell
106417 (2001)
Mechanistic model based on physical and
biochemical data Jeruzalmi et al. Cell 106429
(2001)
9Facility IV Analysis and Modeling of Cellular
Systems Simulating Cell and Community Dynamics
- Analysis, Modeling and Simulation
- Couple knowledge of pathways, networks, and
machines to generate an understanding of cellular
and multi-cellular systems - Metabolism, regulation, and machine simulation
- Cell and multicell modeling and flux
visualization - Knowledge Captured
- Cell and community measurement data sets
- Protein machine assembly time-course data sets
- Dynamic models and simulations of cell processes
10Centrally Planned Analysis and Modeling Tools
Libraries
Facility 1 genome annotation regulatory element
and operon identification metabolic pathway
analysis Facility 2 mass spec data
analysis expression analysis and
clustering metabolic and regulatory network
modeling Facility 3 image analysis mass spec
analysis protein / machine modeling docking and
molecular dynamics Facility 4 metabolic
simulation regulatory simulation cell modeling
and simulations
Collect and manage software - Maintain current
versions - Ensure hardware compatability - User
Interfaces - Documentation
11GTL facilities will Require High Performance
Computing for Both Capacity and Capability
ATCGTAGCAATCGACCGT... CGGCTATAGCCGTTACCG TTATGCTA
TCCATAATCGA... GGCTTAATCGCATACGAC...
Best match
Thread onto templates
Capacity e.g., High-throughput protein
structure predictions
Capability e.g., Large scale biophysical
simulations
Large size and timescale classical simulations
Highly accurate quantum mechanical simulations
12GTL High-Performance Computing Roadmap
Protein machine Interactions
?
1000 TF 100 TF 10 TF 1 TF
Molecule-based cell simulation
Molecular machine classical simulation
Cell, pathway, and network simulation
Community metabolic regulatory, signaling
simulations
Constrained rigid docking
Constraint-Based Flexible Docking
Current U.S. Computing
Genome-scale protein threading
?
Comparative Genomics
Teraflops
Biological Complexity
13Swimming in Data Exploding Need to Capture and
Manipulate Data
- Across Scales of Space and Time - Petabytes
- From Acquisition, Refinement, Reduction and
Deposition
14Central Database Planning
- Data Repositories
- Genomes, annotation and community genomes
- Expression data and proteome composition
- Metabolite and flux data
- Metabolic pathways and kinetic parameters
- Protein interactions
- Protein machines repository - machine
composition, function, homology, models - Image data repository
- Regulatory network data and models
- Cell models repository
- Integrated or integrable
- Requires development of cross-facilities approach
microbial genomes
phylogeny
regulatory networks
protein domains
pathways
community genomes
Metabolic models
proteomics
regulatory elements
literature
Expression
protein machines
protein structure
15The GTL Knowledge BaseIntegration of Large
Datasets is a Precursor to Predictive Modeling
- GTL knowledge base will change how information
about microbes - reaches the community
- Models and simulations will be online
- We will know more and more about systems in each
consecutive microbe
16(No Transcript)