Title: Russ Miller
1Enabling Collaborative Science Through Grid
Technology
- Russ Miller
- Director, Center for Computational Research
- UB Distinguished Professor, Computer Science
Engineering - Senior Research Scientist, Hauptman-Woodward
Medical Inst
Top 10 Worldwide Supercomputing Center -
www.gapcon.com
2Outline
- Bioinformatics in Buffalo
- Supercomputing in Buffalo
- Grid Computing
- Grid Computing in Buffalo
- Shake-and-Bake Computational Crystallography
- ECCE Computational Chemistry
3Biomedical Advances
- PSA Test (screen for Prostate Cancer)
- Avonex Interferon Treatment for Multiple
Sclerosis - Artificial Blood
- Nicorette Gum
- Fetal Viability Test
- Implantable Pacemaker
- Edible Vaccine for Hepatitis C
- Timed-Release Insulin Therapy
- Anti-Arrythmia Therapy
- Tarantula venom
- Direct Methods Structure Determination
- Listed on Top Ten Algorithms of the 20th
Century - Vancomycin
- Gramacidin A
- High Throughput Crystallization Method Patented
- NIH National Genomics Center Northeast
Consortium - Howard Hughes Medical Institute Center for
Genomics Proteomics
4Bioinformatics in BuffaloA 290M Initiative
- UB Center for Advanced Bioengineering
Biomedical Technologies - 1M/yr NYS
- Med Tech for Product Dev Commer.
- Center Disease Modeling Therapy Discovery
- UB, HWI, RPCI, Kaleida
- 15.3M NYS
- Software, device development, and drug therapies
- Buffalo Center of Excellence in Bioinformatics
- UB, HWI, RPCI
- 61M NYS
- 10M Federal Government
- 151 Corporate Funding
- UB Faculty Funding 64M
5Partnerships
- Lead Partners SUNY-Buffalo, Hauptman-Woodward
Medical Research Institute, Roswell Park Cancer
Institute
- Corporate PartnersAmersham Pharmacia, ATT,
Beckman Coulter, BioPharma Ireland, Bristol Myers
Squibb, Confederation of Indian Industries, Dell,
General Electric, Human Genome Sciences, HP,
Immco, InforMax, Invitrogen, Pfizer
Pharmaceutical, Q-Chem, Sloan Foundation, SGI,
Stryker, Sun, 3M, Veridian, Wyeth Lederle,
Zeptometrix
6Experimental Facilities I
- Molecular Targeting Laboratory
- Screen 30-50K compounds every 3 months
- Apply compound to cell (different genes treated w
fluor markers) - Rapidly identify effect on specific gene
expression pathways - Gene Expression Laboratory
- High-throughput microarray and gene chip
- Discover new genes, their functions, and pathways
- Proteomics and Molecular Kinetics Lab
- Identify molecular targets found in Gene
Expression Lab - Disease Modeling Laboratory
- In vivo testing (flies, mice, baboons,)
- Gene targeting and genetic mapping facilities
7Experimental Facilities II
- Bioengineering Support Laboratory
- Capabilities in photonics and nano-tech research
- E.g., handheld devices to test for diseases
- Protein Scale-Up and Purification
- High-Throughput Robotic Combinatorial Chemistry/
Parallel Synthetic Chemistry Capabilities - Drugs created robotically Tested for interaction
with target protein - Rapid identification of a large number of
potential drugs - Public Health and Molecular Pathology
- Tissue repositories disease gene maps medical
informatics - High-Throughput Search Process for Structural
Biology - Tests 1536 chemical cocktails to determine
effective parameters for crystallization
8SUNY-B 2002-03 Snapshot
- Personnel
- Hired Jeff Skolnick as Director (7/02)
- Brought 13 additional staff to Buffalo
- Authorized to hire 10 additional research groups
- Hired Norma Nowak as co-Director (4/03)
- Authorized to hire 10 additional research groups
- Additional members TBD
- External Funding (0)
- Applications submitted
- Deliverables
- Six (6) scientific papers
- Resources
- Building
- 6TF ? 10TF Compute Cluster
9Center for Computational Research
- High-Performance Computing and High-End
Visualization - 110 Research Groups in 27 Depts
- 25 Companies and Institutions
- Sample Areas
- Urban Visualization and Simulation
- Computational Chemistry
- Ground Water Modeling
- Geophysical Mass Flows
- Networked Multimedia
- Medical Imaging
- Training
- Workshops Courses
- Degree Programs
10CCR 1999-2003 Snapshot
- Personnel
- 18 State-Supported Staff
- 2 Grant-Supported Staff
- External Funding
- 111M External Funding
- 13.5M as lead
- 97.5M in support
- 41.8M Vendor Donations
- Deliverables
- 350 Publications
- Software, Media, Algorithms, Consulting,
Training, CPU Cycles, etc.
Raptor Image
11Computational Resources (9TF)
- SGI Origin3800
- 64 Processors (400 MHz)
- 32 GB RAM 400 GB Disk
- IBM RS/6000 SP
- 78 Processors
- 26 GB RAM 640 GB Disk
- Sun Microsystems Cluster
- 48 Sun Ultra 5s (333MHz)
- 16 Dual Sunblades (750MHz)
- 30 GB RAM, Myrinet
- SGI Intel Linux Cluster
- 150 PIII Processors (1 GHz)
- 75 GB RAM, 2.5 TB Disk Storage
- Apex Bioinformatics System
- Sun V880 (3), 6800, 280R (2), PIIIs
- Sun 3960 7 TB Disk Storage
- HP/Compaq SAN
- 25 TB Disk 250 TB Tape
- Dell Linux Cluster - 22 on top500
- 600 P4 Processors (2.4 GHz)
- 600 GB RAM 40 TB Disk Myrinet
- Dell Linux Cluster - 187 on top500
- 4036 Processors (PIII 1.2 GHz)
- 2TB RAM 160TB Disk 16TB SN
UBCOEB System
12Sample Computational Research
- Computational Chemistry (King, Kofke, Coppens,
Furlani, Tilson, Lund, Swihart, Ruckenstein,
Garvey) - Algorithm development simulations
- Groundwater Flow Modeling (Rabideau, Jankovic,
Becker, Flewelling) - Predict contaminant flow in groundwater
possible migration into streams and lakes - Geophysical Mass Flows (Patra, Sheridan, Pitman,
Bursik, Jones, Winer) - Study of geophysical mass flows for risk
assessment of lava flows and mudslides - Bioinformatics (Zhou, Miller, Hu, Szyperski NIH
Consortium, HWI) - Protein Folding computer simulations to
understand the 3D structure of proteins - Structural Biology Pharmacology
- Computational Fluid Dynamics (Madnia, DesJardin,
Lordi, Taulbee) - Modeling turbulent flows and combustion to
improve design of chemical reactors, turbine
engines, and airplanes - Physics (Jones, Sen)
- Many-body phenomena in condensed matter physics
- Chemical Reactions (Mountziaris)
- Molecular Simulation (Errington)
13Visualization Resources
- Fakespace ImmersaDesk R2
- Portable 3D Device
- Tiled-Display Wall
- 20 NEC projectors 15.7M pixels
- Screen is 11?7
- Dell PCs with Myrinet2000
- Access Grid Node
- Group-to-Group Communication
- Commodity components
- SGI Reality Center 3300W
- Dual Barcos on 8?4 screen
- VREX VR-4200 Stereo Imaging Projector
- Portable projector works with PC
14Sample Visualization Areas
- Computational Science (Patra, Sheridan, Becker,
Flewelling, Baker, Miller, Pitman) - Simulation and modeling
- Urban Visualization and Simulation (CCR)
- Public projects involving urban planning
- Medical Imaging (Hoffmann, Bakshi, Glick,
Miletich, Baker) - Tools for pre-operative planning predictive
disease analysis - Geographic Information Systems (CCR, Bisantz,
Llinas, Kesavadas, Green) - Parallel data sourcing software
- Historical Reenactments (Paley, Kesavadas, More)
- Faithful representations of previously existing
scenarios - Multimedia Presentations (Anstey, Pape)
- Networked, interactive, 3D activities
153D Medical Visualization App
- Collaboration with Childrens Hospital
- Leading miniature access surgery center
- Application reads data output from a CT Scan
- Visualize multiple surfaces and volumes
- Export images, movies or CAD representation of
model
16Multiple Sclerosis Project
- Collaboration with Buffalo Neuroimaging Analysis
Center (BNAC) - Developers of Avonex, drug of choice for
treatment of MS - MS Project examines patients and compares scans
to healthy volunteers
17Multiple Sclerosis Project
- Compare caudate nuclei between MS patients and
healthy controls - Looking for size as well as structure changes
- Localized deformities
- Spacing between halves
- Able to see correlation between disease
progression and physical structure changes
18Grid Computing 2003
DISCOM SinRG APGrid IPG
19Grid Computing Overview
Thanks to Mark Ellisman
Advanced Visualization
Data Acquisition
Analysis
Computational Resources
Imaging Instruments
Large-Scale Databases
- Coordinate Computing Resources, People,
Instruments in Dynamic Geographically-Distributed
Multi-Institutional Environment - Treat Computing Resources like Commodities
- Compute cycles, data storage, instruments
- Human communication environments
- No Central Control No Trust
20Computational Grids Electric Power Grids
- Similarities/Goals of CG and EPG
- Ubiquitous
- Consumer is comfortable with lack of knowledge of
details - Differences Between CG and EPG
- Wider spectrum of performance services
- Access governed by more complicated issues
- Security
- Performance
- Socio-political factors
21Growth of Data and Load vs. Moores Law
Courtesy of Rick Stevens
Metabolic Pathways
Pharmacogenomics
Human Genome
Combinatorial Chemistry
Computational Load
ESTs
Genome Data
Moores Law
1990
2000
2010
22Biomedical Data High Complexity and Large Scale
Courtesy of Rick Stevens
billions
Protein-Protein Interactions metabolism
pathways receptor-ligand 4º structure
Physiology Cellular biology Biochemistry
Neurobiology Endocrinology etc.
Polymorphism and Variants genetic variants
individual patients epidemiology
millions
millions
Proteins sequence 2º structure 3º structure
Hundredthousands
ESTs Expression patterns Large-scale screens
Genetics and Maps Linkage Cytogenetic
Clone-based
MPMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYT...
DNA sequences alignments
billions
...atcgaattccaggcgtcacattctcaattcca...
millions
23Computational Motivation
Courtesy of Rick Stevens
24A Short History of the Grid
- Grand Challenge Problems (1980s)
- NSF and DOE initiatives
- Science is a team sport
- Initiate multi-resource projects involving
computation, instruments, visualization, data - Evolution of Related Communities
- Parallel computation
- Address resource limitations
- Networking
- Gigabit testbed program
- Investigate potential testbed network
architectures - Explore usefulness for end-users
CASA Gigabit Testbed (1990s)
25The Globus Project(Ian Foster and Carl Kesselman)
The Grid as a Layered Set of Services
- Globus model focuses on providing key Grid
services - Resource access and management
- Grid FTP
- Information Service
- Security services
- Authentication
- Authorization
- Policy
- Delegation
- Network reservation, monitoring, control
26Extensible TeraGrid Facility (ETF)
ANL Visualization
Caltech Data collection analysis
LEGEND
Visualization Cluster
Cluster
IA64
Sun
IA32
0.4 TF IA-64 IA32 Datawulf 80 TB Storage
1.25 TF IA-64 96 Viz nodes 20 TB Storage
IA64
Storage Server
Shared Memory
IA32
IA32
Disk Storage
Backplane Router
Extensible Backplane Network
LA Hub
Chicago Hub
30 Gb/s
30 Gb/s
40 Gb/s
30 Gb/s
30 Gb/s
30 Gb/s
Figure courtesy of Rob Pennington, NCSA
10 TF IA-64 128 large memory nodes 230 TB Disk
Storage GPFS and data mining
6 TF EV68 71 TB Storage 0.3 TF EV7
shared-memory 150 TB Storage Server
4 TF IA-64 DB2, Oracle Servers 500 TB Disk
Storage 6 PB Tape Storage 1.1 TF Power4
EV7
IA64
Sun
EV68
IA64
Pwr4
Sun
NCSA Compute Intensive
SDSC Data Intensive
PSC Compute Intensive
27Enabling the Grid
- Internet is Infrastructure
- Increased network bandwidth and advanced services
- Advances in Storage Capacity
- Terabyte costs less than 5,000
- Internet-Aware Instruments
- Increased Availability of Compute Resources
- Clusters, supercomputers, storage, visualization
devices - Advances in Application Concepts
- Computational science simulation and modeling
- Collaborative environments ? large and varied
teams - Grids Today
- Moving towards production Focus on middleware
28X-Ray Crystallography
- Objective Provide a 3-D mapping of the atoms in
a crystal. - Procedure
- Isolate a single crystal.
- Perform the X-Ray diffraction experiment.
- Determine molecular structure that agrees with
diffration data.
29X-Ray Data Corresponding Molecular Structure
Underlying atomic arrangement is related to the
reflections by a 3-D Fourier transform.
Reciprocal or Phase Space
Real Space
- Phases lost during the crystallographic
experiment. - Phase Problem Determine phases of the
reflections.
30Shake-and-Bake Method Dual-Space Refinement
Trial Structures
Shake-and-Bake
FFT
Tangent Formula
Trial Phases
?
Phase Refinement
Phase Refinement
Density Modification (Peak Picking) (LDE)
FFT-1
Parameter Shift
Solutions
Shake
Bake
31Phasing and Structure Size
32Ph8755 SnB Histogram
Atoms 74 Phases 740 Space Group P1 Triples
7,400
Trials 100
Cycles 40
Rmin range 0.243 - 0.429
33Grid-Based SnBObjectives
- Install Grid-Enabled Version of SnB
- Job Submission and Monitoring over Internet
- SnB Output Stored in Database
- SnB Output Mined through Internet-Based
Integrated Querying Tool - Serve as Template for Chem-Grid Bio-Grid
- Experience with Globus and Related Tools
34Proof of Concept
- Combine CCRs Heterogeneous Compute Platforms
into a Grid - Client/Server Configurations
- Rapid Prototype 4Q02 (not Globus)
- Develop a user interface to monitor system
- Dynamic HTML Grid Interface
- Key Features for Proof of Concept
- Load Balancing
- Fault Tolerance
- Result and Grid Statistics
35Client/Server Configuration
Grid Server
Type 1
Type 3
Type 2
36Internet Grid Console
- Dynamic HTML Grid Status
- Grid Server Information
- Date/Completion Time
- Parallel Run Time/Serial Run Time/Speedup
- Trial Result Rate (Trial/Minute)
- Shows Configured Platform Information
Dynamically - Platform Type/Name/Picture
- Status Idle/Working/Offline
- Resources Nodes/Total Process/Available
Process/Running Process - Shows Job Status Dynamically
- Trails Total Number/Amount Processed
- Platform Server State Block Queue/Float/Race
- Result Figure of Merit Histogram
37Grid Server Console (Vancomycin)
38Status Report
- Grid Portal
- Access control lists, security groups
- User attributes, history, proxies
- Managed through MySQL database
- Distributed data grid
- Globus
- Vers 2.2.4 installed and in production
- Metacomputing Directory Services (MDS) stored in
MySQL - Eliminates need for LDAP
- Condor and Condor-G
- Used for resource management and grid job
submissions
39Red queue color indicates that there are
currently running or queued jobs.
40ECCE Grid at CCR
- Import Scientific Information
- Application independent input
- ECCE automatically formats for target application
(Gaussian98, NWChem) - Computing at CCR
- 881 available CPUs (gt2.5TFlops)
- (Xeon, P3, Power3, R12K)
- Uniform access to all platforms via ECCE job
launcher - Chemical Analysis
- Full complement of visual tools for understanding
data/publication quality graphics
- Computational Chemistry
- Relativistic effects/Heavy elements
- Algorithm development
- Theoretical physical chemistry
- Structural/Systems Biology
- Protein structure
- Enzyme catalysis
- Chemical Engineering
- Condensed phases/Mixed phase predictions
- Catalysis
- Geology, Pharmacology, Medical School
41(No Transcript)
42BioGrids
Genomics is powering the new biology, but
Computing is in the drivers seat.
BioGrids provide scalable computing so that
biologists can focus on biology.
- EUROGRID BioGRID
- Asia Pacific BioGRID
- NC BioGrid
- Bioinformatics Research Network
- Osaka University Biogrid
- Indiana University BioArchive BioGrid
43Contact Information
- miller_at_buffalo.edu
- www.ccr.buffalo.edu
44Acknowledgments
- Mark Green
- Steve Gallo
- Jason Rappleye
- Jeff Tilson
- Martins Innus
- Betty Capaldi
- Bruce Holm
- Janet Penksa
- George DeTitta
- Herb Hauptman
- Charles Weeks
- Steve Potter
- Rohit Bakshi
- Philip Glick
45Protein Folding
- Ability of proteins to perform biological
function is attributed to their 3-D structure. - Protein folding problem refers to the challenge
of predicting 3-D structure from amino-acid
sequence. - Solving the protein folding problem will impact
drug design.
46Protein Dynamics
- Dynamics of Hemoglobin (Example)
- 50 Days of Processing on 16 Processors (800 CPU
Days) - Key
- White Heme Groups
- Red Phe97
- Red Oxygen (in the subunit at bottom)
- Green His 69 and 101
- Blue Tyr 72
- Cyan (Ball) Water Molecules
- Yellow Helix E/F
- Interest
- Flip of the Phe97 ring at top
- Water movement around Phe97
- Heme-heme relative movement
47Academic Programs
- Bachelors Masters Program in Bioinformatics
- Related Disciplines
- Chemical Biology
- Computational Chemistry
- Environmental Analysis (Sloan Support)
- Medical Informatics (Sloan Support)
- Advanced Degrees under Development
- Pharmacometrics, Biophotonics
- UB-HWI Department of Structural Biology
- Complementary Degrees
- Canisius College Niagara University
48Support (2001-2002)
- New York State 61M
- Federal 3.1M
- Competitive Grants 53
- Proteomics 1.5M
- Disease Pathogens and Physiology 27M
- Drug Discovery 6M
- Genomic and Proteomic Infra. 1.8M
- Genomics 4.7M
- Information Technology 12.3M
- Corporate 135M
- Foundation 3.5M
49Confocal Microscopy
- 3D Reconstruction of an Oral Epithelial Cell
- Translucent White Surface Represents the Cell
Membrane - Reddish Surface Represents Groups of Bacteria
50Bioinformatics
- The creation and development of advanced
information and computational technologies to
solve problems in biology. - The use of advanced computational resources and
techniques to analyze data generated by the Human
Genome Project to improve medical treatment. - Precise sequence of 30K human genes have been
mapped - Critical to elucidate the function of each gene.
- Leads to greater understanding of human
development. - Potential to treat many diseases, including AIDS
cancer, MS, and Alzheimers and provide
personalized treatment. - From Human Genome
- Locate genes (tens of thousands in human body)
- Determine what protein a gene regulates (millions
of proteins in body) - Determine structure
- Determine protein function
- Devise drugs to block or enhance protein function
51Childrens Hospital CT
- 3D Reconstruction of CT Dataset
- Created with the Visualization Toolkit (VTK) on a
Linux Workstation - 3D Isosurface Clearly Shows Structure that is
Nearly Impossible to Determine from 2D Slices
52Miniature Access Surgery
53Molecular Structure Determination
- SnB Software by UB/HWI
- Top Algorithms of the Century
- Critical to Rational Drug Design
- Important Link in Structural Biology
- Current Effort
- Grid
- Collaboratory
- Intelligent Learning
54Animal Models and Preclinical Toxicology
55Antibiotics Supercomputers
- Vancomycin solved with SnB (UB/HWI)
- SnB Top Algorithms of the Century
- Antibiotic of Last Resort
- Original molecular structure required 5 months
- (Re)solved in a single day on CCRs
supercomputers - Current Efforts Grid, Collaboratory, Intelligent
Learning
Result New, better drugs in shorter time
56Photograph of Crystal
57Useful Relationships for Multiple Trial Phasing
Tangent Formula
Parameter Shift Optimization
58Structure of SnB
SnB
Process Trials
Histogram
Visualization
59Vancomycin Crystal Structure Views(courtesy of
P. Loll P. Axelsen)
60Computing Platforms
- Workstations
- SGI, Sun, DEC/Alpha
- Linux
- Parallel Computers
- Cray T3D/E, TMC CM-5, IBM SP2
- HP-Convex Exemplar
- SGI Origin2/3000 Onyx 2/3
- IBM SP heterogeneous
- Linux Clusters
- Sun Cluster
- Condor Flock
- Computational Grid
61Molecular Structure Determination
- SnB Software by UB/HWI
- Top Algorithms of the Century
- Critical to Rational Drug Design
- Important Link in Structural Biology
- Current Effort
- Grid
- Collaboratory
- Intelligent Learning
62Vancomycin Crystal(courtesy of P. Loll)
63The Diffraction Pattern
- Experiment yields
- reflections
- associated intensities
- Phase angles are lost in experiment.
64The Phase Problem
- Experiment yields
- reflections
- associated intensities
- Phase angles are lost in experiment.
- Underlying atomic arrangement is related to the
reflections by a 3-D Fourier transform. - Phase Problem determine the set of phases
corresponding to the reflections.
65Extensible Teragrid Facility (ETF)
48 Visualization nodes
Sun Storage Server
IA32
LEGEND
Cluster
Storage Server
48 Visualization nodes 1.25 TF IA-64 20 TB Storage
0.4 TF IA-64 IA32 Datawulf 80 TB Storage
IA64
Disk Storage
IA32
IA64
IA32
Visualization Cluster
ANL Visualization
Caltech Data collection analysis
Shared Memory
LA Hub
Chicago Hub
Extensible Backplane Network
Figure courtesy of NSF
30 Gb/s Extension to PSC
NCSA Compute-Intensive
SDSC Data-Intensive
PSC Heterogeneity
30 Gb/s Net 0.4 TF EV7 shmem 50 TB
Storage Storage Server
2.1 TF IA-64 128 lg-mem nodes 110 TB Storage
280 TB Storage DB2 Server 1.1 TF Power4
Pwr4
IA64
EV7
IA64
EV68
IA64
6 TF EV68 70 TB Storage
8 TF IA-64 300 TB Storage
5 TF IA-64 300 TB Storage
66TeraGrid 13.6 TF, 6.8 TB memory, 79 TB internal
disk, 576 network disk
ANL 1 TF .25 TB Memory 25 TB disk
Extreme Blk Diamond
Caltech 0.5 TF .4 TB Memory 86 TB disk
574p IA-32 Chiba City
256p HP X-Class
32
32
32
32
24
128p Origin
128p HP V2500
32
24
32
24
HR Display VR Facilities
92p IA-32
5
4
5
8
8
HPSS
HPSS
OC-48
NTON
OC-12
Calren
ESnet HSCC MREN/Abilene Starlight
Chicago LA DTF Core Switch/Routers Cisco 65xx
Catalyst Switch (256 Gb/s Crossbar)
Juniper M160
OC-48
OC-12 ATM
OC-12
GbE
NCSA 62 TF 4 TB Memory 240 TB disk
SDSC 4.1 TF 2 TB Memory 225 TB SAN
vBNS Abilene Calren ESnet
OC-12
OC-12
OC-12
OC-3
Myrinet
4
8
HPSS 300 TB
UniTree
2
Myrinet
4
10
1024p IA-32 320p IA-64
1176p IBM SP 1.7 TFLOPs Blue Horizon
14
Sun Server
15xxp Origin
4
16
2 x Sun E10K
67Grids Form the Basis of a National Information
Infrastructure
August 9, 2001 NSF Awarded 53,000,000 to
SDSC/NPACI and NCSA/Alliance for TeraGrid
- TeraGrid will provide in aggregate
- 13.6 trillion calculations per second
- Over 600 trillion bytes of immediately accessible
data - 40 gigabit per second network speed
- Provide a new paradigm for data-oriented
computing - Critical for disaster response, genomics,
environmental modeling, etc.
68- PIs Berman, Foster, Messina, Reed, Stevens
- Sites SDSC/UCSD, Caltech, NCSA/UIUC, ANL
- Partners IBM, Intel, Qwest, Sun, Myricom,
Oracle and others - Cool Things about the TeraGrid
- Big data, simulation, modeling
- Grid computing, Globus, portals, middleware
- Clusters, Linux
- Usability, impact, production facility
- TeraGrid Software Environment
- Linux
- Basic and Core Globus Services
- Advanced Services
- Data Services
- Over .6 Petabytes of on-line disk will provide
ultimate environment for data-oriented
computation - Linux environment provides more direct path from
development on lab cluster to performance on
high-end platform
69Visualization Resources
- Fakespace ImmersaDesk R2
- Portable 3D Device
- VREX VR-4200 Stereo Imaging Projector
- Portable projector works with PC
- Tiled-Display Wall
- 20 NEC projectors Dell PCs Myrinet 15.7M
pixels - Access Grid Node
- Group-to-Group Communication
- Commodity components
- SGI Reality Center 3300W
- Dual Barcos on 8?4 screen
70Status of Grid Services
- Core Grid Services have been Deployed in
Large-Scale Testbeds - Availability of these Services is Enabling Tool
Application Development Projects - Major Challenges Remain
- Advance reservation, policy, accounting
- End-to-end application adaptation (events?)
- Integration with commodity technology
- Grid Forum http//www.gridforum.org
71Conventional Direct Methods
72Ph8755 Trace of SnB Solution
Atoms 74
Space Group P1
SnB Cycles 40
73Vancomycin
- Interferes with formation of bacterial walls
- Last line of defense against deadly
- streptococcal and staphylococcal bacteria strains
- Vancomycin resistance exists (Michigan)
- Cant just synthesize variants and test
- Need structure-based approach to predict
- Solution with SnB (Shake-and-Bake)
- Pat Loll
- George Sheldrick
74End-to-End Factors
- Cross Science Collaboration
- Multiple Physical and Cultural Communities
- Open and New Technologies
- Broadly Accessible
- Flexible and Extensible
- Useful to all Scientists and Engineers
- Contains a Broad Variety of Technologies
75DTF/ETF Driver Applications
- Genomics
- National Virtual Observatory
- National Ecological Observatory Network
- National Earthquake Engineering Simulation
- Neuroscience Imaging
- Laser Interferometer Gravitational Wave
Observatory
76New Results Possible on ETF
- Biomedical Informatics Research Network BIRN
- Evolving reference set of brains
- Essential data for developing therapies for
neurological disorders (Multiple Sclerosis,
Alzheimers) - Pre-TeraGrid
- One PET or MRI lab
- Small patient base
- 4 TB collection
- Post-TeraGrid
- Many collaborating labs
- Larger population sample
- 400 TB data collection
- More brains, higher resolution
- Multiple scale data integration and analysis
77Client/Server Configurations
- Three Main Types
- Type 1 Standard Cluster Configuration
- Represents most CCR platforms (same IP subnet)
- Type 2 Firewall Cluster Configuration
- Represents remote firewall protected platforms
(different IP subnets) - Type 3 Heterogeneous OS Cluster Configuration
- Represents different OS architectures combined
into one internally IP addressed cluster platform
78Type 1 Configuration
Grid Server
Type 1
- Standard Cluster Configuration
- Grid Server communicates with Relay Server that
has a public IP address - Relay Server communicates with Platform Server
that only has an internal IP address - Platform Server communicates with Node Servers
that process Grid Server tasks - All Nodes are of the same OS architecture (Linux,
AIX, Solaris, etc.), but processor class may be
different (PIII-Xeon, Pwr2-Pwr3, Ultra3-Ultra2i,
etc.)
79Type 2 Configuration
Grid Server
Type 2
- Firewall Cluster Configuration
- Grid Server communicates with Platform Server
that only has an internal IP address through SSH
tunnels on a firewall - Platform Server communicates with Node Servers
that process Grid Server tasks - All Nodes are of the same OS architecture (Linux,
AIX, Solaris, etc.), but processor class may be
different (PIII-Xeon, Pwr2-Pwr3, Ultra3-Ultra2i,
etc.)
80Type 3 Configuration
- Heterogeneous OS Cluster Configuration
- Grid Server communicates with Relay Server that
has a public IP address - Relay Server communicates with Platform Server
that only has an internal IP address - Platform Server communicates with Node Servers
that process Grid Server tasks - Nodes can have different OS architecture
(Linux-Alpha, etc.), and processor class may also
be different (PIII-Xeon, Pwr2-Pwr3, etc.)
Grid Server
Type 3
81Grid Server Console (Vancomycin)
82Load Balancing Stages
- Three Stages of Load Balance Implemented
- Block Queues
- 85 of the Shake-and-Bake trials are distributed
based on the Cluster Platform speed - A block of Shake-and-Bake trials are reserved for
each Platform Server in the Grid - Float
- 15 of the Shake-and-Bake trials are reserved
for dynamic load balancing - Wrapup
- When the Float queue trials have been completed
distributing unfinished trials are distributed to
idle Node Servers
83Grid Server Console (Vancomycin)
84Load BalancingBlock Queue
- Block Queue Determination
- A Platform Server speed is determined by timing
one trial on the cluster platforms node servers - Grid Server starts the time when it requests the
trial to be done and stops the time when the
trial solution has been received - A Harmonic Mean is calculated from all of the
respective node servers solution times which
determines the platform speed - The number of processes for each cluster
platform is then used to determine an appropriate
platform load factor - Platform load factor processes ((
processes (avg platform speed platform speed)
/ avg platform speed ) / platform speed adjusted
of processes) - The load factors range from 0 1 and the sum of
all load factors is 1
85Grid Server Console (Vancomycin)
86Load BalancingFloat Queue
- Float Queue
- 15 of the total trials
- When a platform has no more trials to distribute
to its node servers the Block Queue is set at
100 complete - This platform is then at Float status and will
receive the trial numbers to process directly
from the Grid Server one at a time as node
servers become idle - All other platforms that reach this level will
also receive their trial numbers directly from
the Grid Server until the Float queue has been
completed
87Grid Server Console (Vancomycin)
88Load BalancingWrapup
- Wrapup
- Distributing unfinished trials to idle node
servers - When a platform has completed its Block queue
and the Float queue has also been completed the
Grid Server determines which trial results have
not been received and distributes them again one
at a time - There are several possibilities for why trials
did not complete - A much slower platform has not yet finished its
Block queue trials and is still working on them - The trial result was lost during transmission
- The platform that processed the trial has gone
offline or lost network connectivity without
transmitting the trial result - The Grid Server will continue to distribute all
unfinished trials to idle node servers as long as
a valid result has not been received - The first trial result received by the Grid
Server is accepted and any subsequent trial
results received by the Grid Server are flagged
duplicate and ignored
89Grid Server Console (Vancomycin)
90Load BalancingFault Tolerance
- Fault Tolerance
- A platform has gone offline or lost network
connectivity - The grid-enabled SnB implementation is extremely
fault tolerant - The requested SnB trial results will be
completed automatically even if all but one
platform fail - One node server has the ability to
- Complete its platform Block Queue
- Process the Float Queue
- Process all other failed platform Block Queues
- Process failed trials from its own platform
Block Queue - Automatically !
91Grid Server Console (Vancomycin)
92Grid Server Console (ILED)
93 Buffalo Center of Excellence in Bioinformatics
- Act as a research, development, education, and
economic resource for industries based on
bioinformatics, including information technology,
biotech, and pharmaceuticals. - Combine state-of-the-art computational facilities
with high-throughput experimental facilities to
enable the development of new medical treatments.
- Develop and exploit new algorithms for data
acquisition, storage, management, and
transmission.
94Life Sciences Complex(Buffalo-Niagara Medical
Campus)
Training interspersed throughout 3 buildings HWI
20M to replace old building (not shown)
- UB 52M CoE in Bioinformatics
- Research and business partners
- 225 employees and business associates
- 150,000 sq ft 50 labs, 50 computational
facilities
- RPCI 60M Pharmacology/Genetics
- 60 PIs and 200 support staff
- 170,000 sq ft 85 rsrch labs 15 spprt