Title: Introduction to SDSC and NPACI
1Introduction to SDSC and NPACI
- NSF visit
- May 23, 2002
- Dr. Fran Berman
- Director, NPACI and SDSC
- Professor, CSE Department, UCSD
2Welcome to SDSC and NPACI
- SDSC is the Leading Edge Site for the National
Partnership for Advanced Computational
Infrastructure (NPACI)
- UCSDs San Diego Supercomputer Center is a
national treasure whose mission is to develop
and use technology to advance science.
3Information Infrastructure is a First-Class Tool
for Science Today
4Science as a team sport
- Large-scale, high-end science in America today is
largely collaborative - Both technical and social engineering challenges
- Information technology has become a fundamental
tool for high-end science
5The leading edge has become increasingly more
complex
- Current and future information infrastructure
will capitalize on HPC technology trends - Proliferation of resources (everyone has
computers) - Success of the Internet (remote access, portals,
virtualization) - Globalization
- Open source movement
- Data and large-scale apps as drivers
1969
6Computing Today NSFs Cyberinfrastructure
7Science that Makes a Difference
- SDSCs mission is to develop and use technology
to advance science. - NPACI enables the scale and synergy required to
develop the next generation of advances in
technology, science, and engineering.
Broad Impact
Individual Impact
Deep Impact
8Focus for activities SDSC Programs
Networking
9New Networking Program
10CISE Programs at SDSC
- ACIR
- NPACI (Berman, 170M)
- TeraGrid (Berman, 53M total)
- Virtual Instruments ITR (Berman, 2.5M)
- GriPhyN ITR (Moore, 500k)
- EIA
- I2T/Dig. Gov (Baru, 700k)
- Web-Based IT/Dig Gov (Gupta, 275k)
- Bio-QuBIC (Gupta, 112k)
- ANIR
- CAIDA (Claffy, 3.2M)
- NLANR/MOAT (Braun, 3M)
- HPWREN (Braun, 2.3M)
- IEC Repository (Claffy, 700k)
- NMI GRIDS Center (Papadopoulos, 600k)
- IIS
- GeoGrids ITR (Baru, 500k)
- InterLib (Moore, 250k)
- 3D Knowledge KDI (Bailey, 73k)
SDSC PI
SDSC co-PI or sub
11Other NSF Projects at SDSC
- BIO/DBI
- Protein Data Bank (Bourne, 4M)
- CCMS (Ten Eyck, 1M)
- Biology Workbench (Subramaniam, 700k)
- Structure/Mutation DB (Bourne, 550k)
- QM Bio Framework (Baldridge/Bourne, 500k)
- Support for Plant Genome Program (Gribskov,
100k) - Plant Phosphorylation (Gribskov, 941k)
- Biomolecular Database for Education (Bourne,
188k) - Palmer LTER Info Mgt (100k)
- BIO/Other
- LTER Network Office (270k)
- Knowledge Network for Biocomplexity KDI (728k)
- DLI/NSDL/UCSB (500k)
- NSF/Other
- PRAGMA 2002-03 (SBE/INT, 104k)
- Wireless Networks/Env. Mgt (ITR/OCE, 1.75M)
- SCEC (ITR/GEO, 10M total)
- NVO (ITR/AST, 10M total)
- iVDGL (ITR/PHY, 13.6M total)
- IPBIR (440k)
- Bridging Dig Lib/Archives (ITR/EHR, 300k)
SDSC PI
SDSC co-PI or sub
12Other Agency Funding at SDSC
- NIH
- National Biomedical Computation Resource
- Protein Data Bank
- Alliance for Cell Signaling
- Biomedical Informatics Research Network
- Joint Center for Structural Genomics
- Human Embryo Digital Library
- NARA, NHPRC
- Library of Congress
- NASA
- DARPA
- DOE
- DOD HPCMP
- HHS
- NSA
- Industry Partners
13Partnerships with UC help integrate SDSC locally
and regionally
14NPACI A National Partnership
- NPACI is developing the human, software, data,
networking and computational infrastructure to
address critical science and technology needs - 48 Partner Institutions
- 21 states CA, TX, MI, MD, KN, NM, MS, MT, NY,
OH, OR, NJ, TN, VA, MO, WI, CO, AZ, MA, WA, PA - Italy, Australia, Spain, Sweden
- Hundreds of participating users, developers,
scientists, students
15Meeting the Goals of 1996 NPACI Proposal
- Create a distributed, national metacomputing
infrastructure to benefit the national community
in science and engineering - Infrastructure will integrate data and
computation - Deploy Teraflops-scale resources to solve
problems at the forefront of numerically
intensive computational science and engineering - Extend metacomputing environment to enable
data-intensive computing - Integrate computational science and engineering
education activities into the infrastructure - Pursue collaborative projects to advance
computing technology with vendors, including the
earth resources and automative industries
16NPACI Is Organized by Thrusts Goal of all
thrusts in to develop and deploy infrastructure
EOT
APPLICATIONS
TECHNOLOGIES
Molecular Science Neuroscience Earth Systems
Science Engineering
Grid Computing Programming Tools
Environments Data-Intensive Computing Interaction
Environments
RESOURCES
17Deep Impact
- First-principles simulations to improve materials
in lithium batteries - Ceder (MIT), Physical Review BPotential to make
cell phone, laptop, other batteries lighter, more
powerful - Microtubule properties two orders of magnitude
increase in atoms simulated - McCammon, et al., UCSD, PNAS
- Largest simulations to date of colliding galaxies
-- 1.25 million interacting particles - Hernquist (Harvard-Smithsonian), Dubinski (U
Toronto), Astrophysical Journal
18Broad-based Impact -- Access
Selected NPACI Software Deployment
NPACI HPC Resources
gt 800 new user logins created in the last year
gt 1200 users used allocated time in the last
year gt 7,200 total publications theses citing
NPACI/SDSC resources gt 14.6M hits to NPACI and
SDSC web servers on average per month gt 9.5M
service units allocated in the last year
NPACI Archives
gt 600 Terabytes of Data at Resource Partners
Sample data collections Digital Sky, Human Brain
Project, Biology Workbench, LIGO, ESA, PDB/JCSG,
Digital Embryo, BIRN, NARA, CDL, NSDL
DataCutter
Almost every Grid initiative is using
NPACI-supported technologies including BIRN,
NVO, SCEC, GriPhyN, iVDGL, MCell, TeraGrid
19Individual Impact
- Girls are Great!
- Collaboration with Girl Scouts
- Begun in San Diego, duplicated in Houston
- Has reached 10,000 girls in two cities
- In just FY01, EOT-PACI partners
- Reached 4,000 K-16 students, 1,400 K-12 teachers,
1,000 govt employees, MSI faculty and
researchers - Hosted and sponsored 55 workshops
- Created 60 online resources
- Authored and featured in 25 publications
- Made more than 100 conference presentations
- Implemented 3 undergrad courses
20NPACI Resources are distributed
U Michigan
UC Berkeley
Caltech
SDSC
U Texas
21NPACI Vision Software Roadmap
- Goal Deliver SW which is usable and useful, and
promotes maximal use of the underlying resources - SW development efforts focus on
- Minimizing the impact of heterogeneity (e.g. SRB,
NWS, Globus) - Coordination across remote resources (e.g. APST,
NetSolve, Globus) - Usability (e.g. NPACI Rocks, mySRB, GridPort,
HotPage) - Greater access to high-end and geographically
distributed resources (e.g. TeraGrid, portals,
NPACI HotPage, GridPort) - Maximizing the impact of the resources (e.g.,
NMI, Catalina scheduler, Scalable Vis Toolkits)
22 NPACI Grid modeled after evolving Community Grid
Model
- Roll your own software but agree on interfaces,
service architecture, standards
23The Beginning of Cyberinfrastructure PACI and
TeraGrid
August 9, 2001 NSF Awarded 53,000,000 to
SDSC/NPACI and NCSA/Alliance for TeraGrid
- TeraGrid will provide in aggregate
- 13.6 trillion calculations per second
- Over 600 trillion bytes of immediately accessible
data - 40 gigabit per second network speed
- TeraGrid will provide a new paradigm for
data-oriented computing - Critical for disaster response, genomics,
environmental modeling,
24TeraGrid will be the first high-end
production-level grid TeraGrid will enable new
application paradigms
- TeraGrid Software Environment
- Linux
- Basic and Core Globus Services
- Advanced Services
- Data Services
- Over 0.6 petabytes of on-line disk will provide
ultimate environment for data-oriented
computation - Linux environment provides more direct path from
development on lab cluster to performance on
high-end platform
25New Science Advances with TeraGrid
- Biomedical Informatics Research Network BIRN
- Evolving reference set of brains provides
essential data for developing therapies for
neurological disorders (Multiple Sclerosis,
Alzheimers disease). - NIH building on NSF investment in NPACI DICE,
Interaction Environments, Neuroscience, and Grid
activities - Pre-TeraGrid
- One lab, small patient base
- 4 TB collection
- With TeraGrid
- Tens of collaborating labs
- Larger population sample
- 400 TB data collection more brains, higher
resolution - Multiple-scale data integration, analysis
26Evaluating NPACI
27Requirements for Successful Infrastructure
- Large-scale inclusive of current programs,
community-wide efforts, globalization - Usable stable, persistent, dependable,
accessible - Evolutionary
- Must allow for evolution of user requirements,
software infrastructure - Must allow for graceful addition of new hardware
and resources - Must allow for new users, new paradigms and new
communities - Realistic with respect to costs
- Cost of human infrastructure (development of
community of professionals with multidisciplinary
science and large-scale infrastructure
development skills), cost of adding and
supporting new sites and users, cost of
developing portals and interfaces, etc.
28Major Advantages of SDSC/NPACI for NSF
- SDSC/NPACI enable the scale and synergy required
to develop the next generation of advances in
technology, science, and engineering. - NSF/SDSC investment in large-scale infrastructure
has enabled leadership and world-class programs
in Biology, Data-oriented Computing, etc. - SDSC is a mature resource -- would take many
years to build a leading facility of this size
and capability from scratch - SDSC provides a venue for large-scale projects
which cannot be done in traditional academic
departmental environments - E.g. Protein Data Bank, Cooperative Association
for Internet Data Analysis, Alliance for Cell
Signaling, etc. - NSF/NPACI facilities provide a magnet for the
worlds leading computational scientists and
technologists
29Challenges to Building and Deploying Information
Infrastructure (1)
- Technology Challenges
- Cutting-edge hardware and Grid software not ready
out of the box a production facility is time-
and labor-intensive - Resource heterogeneity provides challenges for
design, deployment, interoperability, performance
- Policy Challenges
- NRAC allocations focus on single-system cycles.
Need to allocate data and Grid access, as well as
access for global collaborations - Difficult to promote big jobs, interactive usage,
grid usage, and data-oriented usage
simultaneously - Usability Challenges
- Last mile efforts to develop usable software
critical but labor-intensive researchers and
students not rewarded for this - New users, communities, and requirements must be
accommodated by infrastructure
30Challenges in Building and Deploying Information
Infrastructure (2)
- Fiscal Challenges
- Staff support critical for
- Continuity, integration, scaling of
infrastructure Development of community codes,
portals, services, tools User support - However budget has essentially remained flat
(effectively a reduction for staff) - High-performance equipment needs high-performance
support
31Major Opportunities for PACI
- NSF has an opportunity to build a world-class
cyberinfrastructure that puts the US in a
leadership position for the next decade and more - This is at a time when the science and
engineering community requires information
infrastructure for the next generation of
advances - This is at a time when the private sector can
help accelerate the development and integration
of grid technologies - This is happening at a time when the US is
leading world efforts in development of
information infrastructure
32Major Risks for PACI
- Being too timid cyberinfrastructure needs
commitment and resources to fully flower - Ignoring the research side many problems beyond
integration - Letting other countries get ahead of the
technology curve e.g. Japan - Overselling the vision without sufficient support
to back it up - Instability risks quality hard to retain good
people when budget stays flat and future is
uncertain
33Impact Being Recognized
- Today I can tell you that we're on the verge of
a major technological breakthrough for the
long-term preservation of computer generated
records of the Federal Government.
Research-and-development work done for us by the
San Diego Supercomputer Center indicates that a
practical Electronic Records Archives may be in
sight. - John Carlin, Archivist of the United States
National Archives and Records Administration -
Presentation to Subcommittee on Management,
Information, and Technology, Committee on
Government Reform, 28 March 2000 - Â
34Overview of the day
- 830-840am Welcome to UCSD UCSD
Administration - 840-940 Introduction to NPACI and SDSC Fran
Berman - 940-1000 Overview of NPACI Program Richard
Moore - 1000-1030 Tour of Visualization Lab (and
break) Mike Bailey et al. - 1030-1100 NPACI Hardware Resources Wayne
Pfeiffer - 1100-1135 NPACI Software Roadmap Carl
Kesselman - 1135am-noon Resource Allocation
- Users, Process, Metrics Nancy
Wilkins-Diehr - 1200-1215pm Lunch
- 1215-1245 Question and Answer Session (SDSC
Auditorium)Â - 1245-130 Tour of Machine Room Phil AndrewsÂ
- 130-155 Education, Outreach and
Training Rozeanne Steckler, Kris Stewart - 155-220 NPACI Technology, Applications,
- Alpha Projects Peter Taylor
- 220-310 Examples of Key Scientific
Accomplishments - 220 Protein Data Bank Phil Bourne
- 235 Dynamics of Molecular Recognition
Andrew McCammon - 250 Storage Resource Broker
Related Reagan Moore - 300 Information Integration in the
Sciences Chaitan Baru