Title: HPC Asia 2001
1The Computing Continuum
- HPC Asia 2001
- Gold Coast, Queensland
- Australia
- Sid Karin
- skarin_at_sdsc.edu
-
2The thing about change is that things will be
different afterwards. Alan McMahon
ANYTHING
You Are Here
TIME
3Internet Hosts (000s) 1989-2006
Courtesy Vint Cerf, MCI WorldCom
300 Million users
4Peak Speed of the Worlds Fastest Supercomputers
Log Peak speed (flops)
Year installed
5Things Are About to Get Very Interesting
6A Continuum is Emerging
- All information is becoming digital
- Computing is becoming
- Ubiquitous used everywhere
- Continuous interconnected
- Pervasive invisible to the user
7The Continuum Has Many Dimensions
- Performance
- Collaboration
- Integration
- Location
- Function
81976 The SupercomputingIsland
Performance
TodayA Continuum
NUMBER OF MACHINES
PERFORMANCE
9Human Performance is Finite
Performance
- An individual can absorb about one gigabyte of
information per second (1 GB/s). - Most of this information is visual.
10Human Performance is Constant
Performance
HUMAN PERFORMANCE
TIME
11The Continuum Has Many Dimensions
- Performance
- Collaboration
- Integration
- Location
- Function
12Collaboration
A New Direction
In the past Isolation
Now Collaboration
13Assumption
- Any reasonably educated scientist or engineer can
write a useful (Fortran) program and use it
effectively.
Reality
Teams.
14The Continuum Has Many Dimensions
- Performance
- Collaboration
- Integration
- Location
- Function
15Low-End Computing...
Integration
16High-end Computing
Integration
17Supercomputers Give Us an Early View of the Mass
Market Future
18UMASS Web server on a chipborn 10 AM, 14 July
1999
- TCP/IP code itself fits in about 256 bytes
(12-bit) - PIC 12C509A, running at 4MHz
- 24LC256 i2c EEPROM
- HTTP 1.0 and RFC 1122 compliant
- eternity.cs.umass.edu
- 9080/index0.html
Courtesy Vint Cerf, MCI WorldCom
19Gradually Our Bodies Will Move On-Line
Israeli Video Pill Wireless Colonoscopy
20aaaaa
21The Wireless Internet Will Improve the Safety of
Californias 25,000 Bridges
New Bay Bridge Tower with Lateral Shear Links
Cal-(IT)2 Will Develop and Install Wireless
Sensor Arrays Linking Orange and San Diego County
Bridges to Crisis Management Control Rooms
Combined Efforts of UCI and UCSD
22New Mode of Visualization
- Network-accessible TeleManufacturing
- 3-D hardcopy for visualization
- Used by many disciplines
- Molecules to Hurricanes
- Death Valley to Venus
- Reimann Zeta Function to Ozone Hole
23Changing How Science is Done
- Collect data from digital libraries,
laboratories, and observation - Analyze the data with models run on the grid
- Visualize and share data over the Web
- Publish results in a digital library
24The Computational Science Continuum
Data-intensive computing (mining)
Experiment
Data-intensive computing (assimilation)
Numericallyintensive computing
Simulation
25The Continuum Has Many Dimensions
- Performance
- Collaboration
- Integration
- Location
- Function
26HPWRENimplementationandagendaJan 2001
Santa Margarita Ecological Reserve
Sky Oaks ecological Field Stations
Pala Reservation
Palomar Mtn.
Pala Mtn.
Rincon Res
La Jolla Reservation
Various Earthquake Sensors
North Peak
HWB
Mt. Woodson
DB
Stephenson Peak
MLO/SDSU Observatory
UCSD
27Mapping the Nets Terra Incognita
Nature Web Matters, 1/7/99. Science 10/16/98
28 Gilders vs. Moores Law
2x/3-6mo
Log Growth
1M
WAN/MAN Bandwidth
10,000
Processor Performance
100
2x/18mo
Courtesy Greg Papadopoulos, Sun Microsystems
29The Continuum Has Many Dimensions
- Performance
- Collaboration
- Integration
- Location
- Function
30From Entertainment to Science
Function
- Entertainment
- Education
- Communication
- Public Policy
- Business
- Engineering
- Science
31Courtesy Greg Papadopoulos, Sun Microsystems
32Here is an interesting system...
- gt40,000 Processors
- gt4 Terabytes of RAM
- gt1 million simultaneous users
- 7x24 global operation
- gt 1B/yr operating expense
What is it?
Courtesy Greg Papadopoulos, Sun Microsystems
33Answer
Courtesy Greg Papadopoulos, Sun Microsystems
34Protein Folding Landscapes in a Distributed
Environment
- Andrew Grimshaw - UVA
- Katherine Holcomb - UVA
- Anand Natrajan - UVA
- Charles Brooks III - TSRI
- Mike Crowley - TSRI
NPACI-NET - A Persistent Infrastructure
35Computing on the Grid NPACI-netCHARMM-Legion
- One month calculation reduced to 36 hours
- IBM Blue Horizon (SDSC) on 512 processors
- HP V2500 (Caltech) on all 128 processors
- Sun Enterprise 10000 (SDSC) 32 processors
- IBM systems (Michigan and Texas) totaling 56
processors - Centurion Alpha (Virginia) 32 processors
"Back at my PC user interface, I didn't know what
architecture my job was running on" Mike
Crowley, TSRI
Protein L Folded According to CHARMM
Protein L Unfolded
36TELESCIENCE Remote Access for Grid based
Computing, Data Acquisition, Distributed Data
Storage
Mark Ellisman, UCSD Carl Kesselman, USC Fran
Berman, UCSD Rich Wolski, U.Tenn Project
Manager Steve Peltier, UCSD
DATA ACQUISITION
ADVANCEDVISUALIZATION
NETWORK
,ANALYSIS
COMPUTATIONALRESOURCES
IMAGING INSTRUMENTS
LARGE-SCALE DATABASES
37Monte Carlo Cellular Microphysiology on the Grid
Berman, Casanova (UCSD) Sejnowski , Bartol
(Salk) Dongarra, Wolski (UTenn) Ellisman (UCSD)
38The Protein Data Bank
- Worlds single scientific resource for depositing
and searching protein structures - Protein structure data growing very fast
- 14,700 structures in PDB today
- est. 35,000 by the year 2005
- Vital to the advancement of biological sciences
funding from NSF, DOE, NIH - Working towards a digital continuum from primary
data to final scientific publication - Rutgers (Helen Berman), SDSC (Phil Bourne), and
NIST
From Ban, et al., Science Aug 11 2000
905-920 The large ribosomal subunit of the cells
protein producing factory
39New Approaches to Understanding Protein Folding
- New algorithms for protein-protein 3D comparison
- 24,000 Cray T3E hours to compute all x all
- New understanding of protein structure space
- Recognition of residues key to the folding
- process
- Implications for protein engineering and
synthesis
40Web Interface to Grid ComputingThe NPACI
GridPort Architecture
802.11 Wireless
- Interactive Access to
- State of Computer
- Job Status
- Application Codes
-
41MICE Transparent Supercomputing
- Molecular Interactive Collaborative Environment
- Gallery allows researchers, students to search
for, visualize, and manipulate molecular
structures - Integrates key SDSC technological strengths
- Biological databases
- Transparent supercomputing
- Web-based Virtual Reality Modeling Language
42Alliance for Cellular Signaling
www.cellularsignaling.org
"The glue grants is an exemplar flagship
initiative that is refocusing the approach to
biology research," says Shankar Subramaniam
- Goal Understand the relationships between sets
of inputs and outputs in signaling cells that
vary both temporally and spatially and how cells
interpret signals in a context-dependent manner - NIH others grant of 60M / 5 yrs
- Lead - Al Gilman, U Texas SW Medical Center
- Director of two (of eight) laboratories - Shankar
Subramaniam, SDSC/UCSD - Data Acquisition and Dissemination
- Bioinformatics
Are you ready for the revolution? Nature, 2/15/01
Shape of things to come large data sets arising
from genome projects demand new skills of
biologists.
43Brain Mapping
- One brain a lot of data
- At full-color, micrometer resolution, one brain
fills 4.5 petabytes. - Mapping the brain will help understand memory,
consciousness, sleep, aging - Insight about brain structure-function
relationships in health and disease - Federating geographically distributed collections
and tools for data exploration, comparison, and
simulation - Involves all NPACI technology thrusts.
44Biological-scale Modeling
- Biodiversity Species Workshop
- Web interface to modeling tools
- Provides geographic, climate, and other base data
- Species Analyst
- Compiles data from on-line museum collections
- Scientists can focus on the scientific questions
- NPACI partners U Kansas, U New Mexico, and
SDSC
Predicted distribution of the mountain trogon.
Data points (pins) are from 14 museum
collections.
45Looking out for San Diegos Regional Ecology
- Unique partnership
- 31 federal, state, regional,and local agencies
- John Helly, et al., SDSC
- Combines technologies and multi-agency data
- Sensing, analysis, VRML
- Physical, chemical, and biological data
- Web-based tool for science and public policy
46Digital Galaxy
- Collaboration with Hayden Planetarium
- American Museum of Natural History
- Support from NASA
- MPIRE Galaxy Renderer
- Scalable volume visualization
- Linked to database of astronomical objects
- Produces translucent, filament-like objects
- An artificial nebula, modeled after a planetary
nebula
Viewing the Orion Nebula
47The Digital Sky
- Billions of objects can be detected with optical,
infrared, and radio telescopes - Tens of terabytes of image and catalog data
- Digital Sky federating four sky surveys to allow
multi-wavelength studies across the data sets - DPOSS, 2MASS, NVSS, FIRST
- Tom Prince, Caltech, leading federation effort
- Uses MIX, SDSC SRB, and NPACI mass storage systems
A globular cluster from the DPOSS archive. Such
clusters provide a minimum age for the universe.
Image by Thomas Handley, Caltech.
48August 9, 2001 NSF Awards 53,000,000 to NPACI
and the Alliance for TeraGrid
49TeraGrid and the PACI Program
- TeraGrid will form a major focus for PACI program
activities - Considerable infrastructure investment required
- HW
- SW
- People
- Infrastructure must be persistent
- It takes years to assemble and integrate the
components - Development of a National Grid must be a
partnership between the research community and
the funding leadership
TeraGrid will fundamentally change the way
large-scale science is done
50TeraGrid
PIs Berman/SDSC Reed/NCSA
Partners IBM, Intel, Qwest, Myricom, Sun, and
others
- Key foci of the TeraGrid
- Big data, simulation, modeling
- Grid computing, Globus, portals, middleware
- Clusters, Linux
- Usability, impact, production facility
51TeraGrid 13.6 TF, 6.8 TB memory, 79 TB internal
disk, 576 network disk
ANL 1 TF .25 TB Memory 25 TB disk
Caltech 0.5 TF .4 TB Memory 86 TB disk
Extreme Blk Diamond
574p IA-32 Chiba City
256p HP X-Class
32
32
32
32
24
128p Origin
128p HP V2500
32
24
32
24
HR Display VR Facilities
92p IA-32
5
4
5
8
8
HPSS
HPSS
OC-48
NTON
OC-12
Calren
ESnet HSCC MREN/Abilene Starlight
Chicago LA DTF Core Switch/Routers Cisco 65xx
Catalyst Switch (256 Gb/s Crossbar)
Juniper M160
OC-48
OC-12 ATM
OC-12
GbE
NCSA 8 TF 4 TB Memory 240 TB disk
SDSC 4.1 TF 2 TB Memory 225 TB SAN
vBNS Abilene Calren ESnet
OC-12
OC-12
OC-12
OC-3
Myrinet
8
4
UniTree
8
HPSS
2
Sun Server
Myrinet
4
1024p IA-32 320p IA-64
1176p IBM SP 1.7 TFLOPs Blue Horizon
14
16
15xxp Origin
4
Sun E10K
52SDSC node configured to be best site for
data-oriented computing in the world
Argonne 1 TF 0.25 TB Memory 25 TB disk
Caltech 0.5 TF 0.4 TB Memory 86 TB disk
TeraGrid Backbone (40 Gbps)
vBNS Abilene Calren ESnet
NCSA 8 TF 4 TB Memory 240 TB disk
HPSS
Myrinet Clos Spine
Sun
SDSC 4.1 TFLOPS 2 TB Memory 25 TB internal
disk 225 TB network disk
Blue Horizon
Sun E10K
53New Results Possible on TeraGrid
- Biomedical Informatics Research Network BIRN
(NIH Proposal) - Evolving reference set of brains provides
essential data for developing therapies for
neurological disorders (Multiple Sclerosis,
Alzheimers disease). - Pre-TeraGrid
- One lab
- Small patient base
- 4 TB collection
- Post-TeraGrid
- Tens of collaborating labs
- Larger population sample
- 400 TB data collection more brains, higher
resolution - Multiple scale data integration and analysis
54EACH BRAIN REPRESENTS A LOT OF DATA
AND COMPARISONS MUST BE MADE BETWEEN MANY
We need to get to one micron to know location of
every cell. Were just now starting to get to
10 microns the TeraGrid will help get us there
and further
55Growing the TeraGrid to a PetaGrid
EU Grid NASA IPG Data Grid Science Grid PACI
Grids Pacific Rim Grids
Sensor nets, wirelessthrowaway end
devices, Personal digital assistants
56The Pacific Rim Initiative A First Step Towards
the PetaGrid
- Today, SDSC is delighted to announce a new
initiative among Pacific Rim countries for the
purpose of developing a formal Grid
collaboration. - The SDSC Pacific Rim Initiative will assist the
US, Japan, and other Pacific Rim countries to - Co-develop Grid-enabled applications
- Deploy needed infrastructure to allow data,
computing and other resource sharing
57SDSC/NPACI provides a US anchor for international
relationships on the Pacific Rim
- The San Diego Supercomputer Center is a national
facility whose goal is to promote research,
development, and infrastructure at the cutting
edge of computational science and computer
science.
- The National Partnership for Advanced
Computational Infrastructure (NPACI) is SDSCs
largest grant which supports the development of
cutting edge infrastructure for computational
science. - SDSC is the Leading Edge Site for the national
NPACI partnership
58Our Newest Pacific Rim Partner Titech/GSIC
- We are honored to have signed a Memorandum of
Understanding between Titech/GSIC and SDSC
formalizing our collaboration in the area of
grid-based and distributed computing including - development of applications and systems software
- access to facilities for evaluating systems
- joint education, training, and workshops
59The Pacific Rim Initiative will enable scientific
discoveries on a world-wide scale
We have only just begun to imagine what is
possible
60This is just the beginning...
ANYTHING
You Are Here
TIME