Introduction to SDSC and NPACI - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

Introduction to SDSC and NPACI

Description:

Introduction to SDSC and NPACI – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 35
Provided by: Andr246
Category:
Tags: npaci | sdsc | gar | introduction

less

Transcript and Presenter's Notes

Title: Introduction to SDSC and NPACI


1
Introduction to SDSC and NPACI
  • NSF visit
  • May 23, 2002
  • Dr. Fran Berman
  • Director, NPACI and SDSC
  • Professor, CSE Department, UCSD

2
Welcome to SDSC and NPACI
  • SDSC is the Leading Edge Site for the National
    Partnership for Advanced Computational
    Infrastructure (NPACI)
  • UCSDs San Diego Supercomputer Center is a
    national treasure whose mission is to develop
    and use technology to advance science.

3
Information Infrastructure is a First-Class Tool
for Science Today
4
Science as a team sport
  • Large-scale, high-end science in America today is
    largely collaborative
  • Both technical and social engineering challenges
  • Information technology has become a fundamental
    tool for high-end science

5
The leading edge has become increasingly more
complex
  • Current and future information infrastructure
    will capitalize on HPC technology trends
  • Proliferation of resources (everyone has
    computers)
  • Success of the Internet (remote access, portals,
    virtualization)
  • Globalization
  • Open source movement
  • Data and large-scale apps as drivers

1969
6
Computing Today NSFs Cyberinfrastructure
7
Science that Makes a Difference
  • SDSCs mission is to develop and use technology
    to advance science.
  • NPACI enables the scale and synergy required to
    develop the next generation of advances in
    technology, science, and engineering.

Broad Impact
Individual Impact
Deep Impact
8
Focus for activities SDSC Programs
Networking
9
New Networking Program
10
CISE Programs at SDSC
  • ACIR
  • NPACI (Berman, 170M)
  • TeraGrid (Berman, 53M total)
  • Virtual Instruments ITR (Berman, 2.5M)
  • GriPhyN ITR (Moore, 500k)
  • EIA
  • I2T/Dig. Gov (Baru, 700k)
  • Web-Based IT/Dig Gov (Gupta, 275k)
  • Bio-QuBIC (Gupta, 112k)
  • ANIR
  • CAIDA (Claffy, 3.2M)
  • NLANR/MOAT (Braun, 3M)
  • HPWREN (Braun, 2.3M)
  • IEC Repository (Claffy, 700k)
  • NMI GRIDS Center (Papadopoulos, 600k)
  • IIS
  • GeoGrids ITR (Baru, 500k)
  • InterLib (Moore, 250k)
  • 3D Knowledge KDI (Bailey, 73k)

SDSC PI
SDSC co-PI or sub
11
Other NSF Projects at SDSC
  • BIO/DBI
  • Protein Data Bank (Bourne, 4M)
  • CCMS (Ten Eyck, 1M)
  • Biology Workbench (Subramaniam, 700k)
  • Structure/Mutation DB (Bourne, 550k)
  • QM Bio Framework (Baldridge/Bourne, 500k)
  • Support for Plant Genome Program (Gribskov,
    100k)
  • Plant Phosphorylation (Gribskov, 941k)
  • Biomolecular Database for Education (Bourne,
    188k)
  • Palmer LTER Info Mgt (100k)
  • BIO/Other
  • LTER Network Office (270k)
  • Knowledge Network for Biocomplexity KDI (728k)
  • DLI/NSDL/UCSB (500k)
  • NSF/Other
  • PRAGMA 2002-03 (SBE/INT, 104k)
  • Wireless Networks/Env. Mgt (ITR/OCE, 1.75M)
  • SCEC (ITR/GEO, 10M total)
  • NVO (ITR/AST, 10M total)
  • iVDGL (ITR/PHY, 13.6M total)
  • IPBIR (440k)
  • Bridging Dig Lib/Archives (ITR/EHR, 300k)

SDSC PI
SDSC co-PI or sub
12
Other Agency Funding at SDSC
  • NIH
  • National Biomedical Computation Resource
  • Protein Data Bank
  • Alliance for Cell Signaling
  • Biomedical Informatics Research Network
  • Joint Center for Structural Genomics
  • Human Embryo Digital Library
  • NARA, NHPRC
  • Library of Congress
  • NASA
  • DARPA
  • DOE
  • DOD HPCMP
  • HHS
  • NSA
  • Industry Partners

13
Partnerships with UC help integrate SDSC locally
and regionally
14
NPACI A National Partnership
  • NPACI is developing the human, software, data,
    networking and computational infrastructure to
    address critical science and technology needs
  • 48 Partner Institutions
  • 21 states CA, TX, MI, MD, KN, NM, MS, MT, NY,
    OH, OR, NJ, TN, VA, MO, WI, CO, AZ, MA, WA, PA
  • Italy, Australia, Spain, Sweden
  • Hundreds of participating users, developers,
    scientists, students

15
Meeting the Goals of 1996 NPACI Proposal
  • Create a distributed, national metacomputing
    infrastructure to benefit the national community
    in science and engineering
  • Infrastructure will integrate data and
    computation
  • Deploy Teraflops-scale resources to solve
    problems at the forefront of numerically
    intensive computational science and engineering
  • Extend metacomputing environment to enable
    data-intensive computing
  • Integrate computational science and engineering
    education activities into the infrastructure
  • Pursue collaborative projects to advance
    computing technology with vendors, including the
    earth resources and automative industries

16
NPACI Is Organized by Thrusts Goal of all
thrusts in to develop and deploy infrastructure
EOT
APPLICATIONS
TECHNOLOGIES
Molecular Science Neuroscience Earth Systems
Science Engineering
Grid Computing Programming Tools
Environments Data-Intensive Computing Interaction
Environments
RESOURCES
17
Deep Impact
  • First-principles simulations to improve materials
    in lithium batteries
  • Ceder (MIT), Physical Review BPotential to make
    cell phone, laptop, other batteries lighter, more
    powerful
  • Microtubule properties two orders of magnitude
    increase in atoms simulated
  • McCammon, et al., UCSD, PNAS
  • Largest simulations to date of colliding galaxies
    -- 1.25 million interacting particles
  • Hernquist (Harvard-Smithsonian), Dubinski (U
    Toronto), Astrophysical Journal

18
Broad-based Impact -- Access
Selected NPACI Software Deployment
NPACI HPC Resources
gt 800 new user logins created in the last year
gt 1200 users used allocated time in the last
year gt 7,200 total publications theses citing
NPACI/SDSC resources gt 14.6M hits to NPACI and
SDSC web servers on average per month gt 9.5M
service units allocated in the last year
NPACI Archives
gt 600 Terabytes of Data at Resource Partners
Sample data collections Digital Sky, Human Brain
Project, Biology Workbench, LIGO, ESA, PDB/JCSG,
Digital Embryo, BIRN, NARA, CDL, NSDL
DataCutter
Almost every Grid initiative is using
NPACI-supported technologies including BIRN,
NVO, SCEC, GriPhyN, iVDGL, MCell, TeraGrid
19
Individual Impact
  • Girls are Great!
  • Collaboration with Girl Scouts
  • Begun in San Diego, duplicated in Houston
  • Has reached 10,000 girls in two cities
  • In just FY01, EOT-PACI partners
  • Reached 4,000 K-16 students, 1,400 K-12 teachers,
    1,000 govt employees, MSI faculty and
    researchers
  • Hosted and sponsored 55 workshops
  • Created 60 online resources
  • Authored and featured in 25 publications
  • Made more than 100 conference presentations
  • Implemented 3 undergrad courses

20
NPACI Resources are distributed
U Michigan
UC Berkeley
Caltech
SDSC
U Texas
21
NPACI Vision Software Roadmap
  • Goal Deliver SW which is usable and useful, and
    promotes maximal use of the underlying resources
  • SW development efforts focus on
  • Minimizing the impact of heterogeneity (e.g. SRB,
    NWS, Globus)
  • Coordination across remote resources (e.g. APST,
    NetSolve, Globus)
  • Usability (e.g. NPACI Rocks, mySRB, GridPort,
    HotPage)
  • Greater access to high-end and geographically
    distributed resources (e.g. TeraGrid, portals,
    NPACI HotPage, GridPort)
  • Maximizing the impact of the resources (e.g.,
    NMI, Catalina scheduler, Scalable Vis Toolkits)

22
NPACI Grid modeled after evolving Community Grid
Model
  • Roll your own software but agree on interfaces,
    service architecture, standards

23
The Beginning of Cyberinfrastructure PACI and
TeraGrid
August 9, 2001 NSF Awarded 53,000,000 to
SDSC/NPACI and NCSA/Alliance for TeraGrid
  • TeraGrid will provide in aggregate
  • 13.6 trillion calculations per second
  • Over 600 trillion bytes of immediately accessible
    data
  • 40 gigabit per second network speed
  • TeraGrid will provide a new paradigm for
    data-oriented computing
  • Critical for disaster response, genomics,
    environmental modeling,

24
TeraGrid will be the first high-end
production-level grid TeraGrid will enable new
application paradigms
  • TeraGrid Software Environment
  • Linux
  • Basic and Core Globus Services
  • Advanced Services
  • Data Services
  • Over 0.6 petabytes of on-line disk will provide
    ultimate environment for data-oriented
    computation
  • Linux environment provides more direct path from
    development on lab cluster to performance on
    high-end platform

25
New Science Advances with TeraGrid
  • Biomedical Informatics Research Network BIRN
  • Evolving reference set of brains provides
    essential data for developing therapies for
    neurological disorders (Multiple Sclerosis,
    Alzheimers disease).
  • NIH building on NSF investment in NPACI DICE,
    Interaction Environments, Neuroscience, and Grid
    activities
  • Pre-TeraGrid
  • One lab, small patient base
  • 4 TB collection
  • With TeraGrid
  • Tens of collaborating labs
  • Larger population sample
  • 400 TB data collection more brains, higher
    resolution
  • Multiple-scale data integration, analysis

26
Evaluating NPACI
27
Requirements for Successful Infrastructure
  • Large-scale inclusive of current programs,
    community-wide efforts, globalization
  • Usable stable, persistent, dependable,
    accessible
  • Evolutionary
  • Must allow for evolution of user requirements,
    software infrastructure
  • Must allow for graceful addition of new hardware
    and resources
  • Must allow for new users, new paradigms and new
    communities
  • Realistic with respect to costs
  • Cost of human infrastructure (development of
    community of professionals with multidisciplinary
    science and large-scale infrastructure
    development skills), cost of adding and
    supporting new sites and users, cost of
    developing portals and interfaces, etc.

28
Major Advantages of SDSC/NPACI for NSF
  • SDSC/NPACI enable the scale and synergy required
    to develop the next generation of advances in
    technology, science, and engineering.
  • NSF/SDSC investment in large-scale infrastructure
    has enabled leadership and world-class programs
    in Biology, Data-oriented Computing, etc.
  • SDSC is a mature resource -- would take many
    years to build a leading facility of this size
    and capability from scratch
  • SDSC provides a venue for large-scale projects
    which cannot be done in traditional academic
    departmental environments
  • E.g. Protein Data Bank, Cooperative Association
    for Internet Data Analysis, Alliance for Cell
    Signaling, etc.
  • NSF/NPACI facilities provide a magnet for the
    worlds leading computational scientists and
    technologists

29
Challenges to Building and Deploying Information
Infrastructure (1)
  • Technology Challenges
  • Cutting-edge hardware and Grid software not ready
    out of the box a production facility is time-
    and labor-intensive
  • Resource heterogeneity provides challenges for
    design, deployment, interoperability, performance
  • Policy Challenges
  • NRAC allocations focus on single-system cycles.
    Need to allocate data and Grid access, as well as
    access for global collaborations
  • Difficult to promote big jobs, interactive usage,
    grid usage, and data-oriented usage
    simultaneously
  • Usability Challenges
  • Last mile efforts to develop usable software
    critical but labor-intensive researchers and
    students not rewarded for this
  • New users, communities, and requirements must be
    accommodated by infrastructure

30
Challenges in Building and Deploying Information
Infrastructure (2)
  • Fiscal Challenges
  • Staff support critical for
  • Continuity, integration, scaling of
    infrastructure Development of community codes,
    portals, services, tools User support
  • However budget has essentially remained flat
    (effectively a reduction for staff)
  • High-performance equipment needs high-performance
    support

31
Major Opportunities for PACI
  • NSF has an opportunity to build a world-class
    cyberinfrastructure that puts the US in a
    leadership position for the next decade and more
  • This is at a time when the science and
    engineering community requires information
    infrastructure for the next generation of
    advances
  • This is at a time when the private sector can
    help accelerate the development and integration
    of grid technologies
  • This is happening at a time when the US is
    leading world efforts in development of
    information infrastructure

32
Major Risks for PACI
  • Being too timid cyberinfrastructure needs
    commitment and resources to fully flower
  • Ignoring the research side many problems beyond
    integration
  • Letting other countries get ahead of the
    technology curve e.g. Japan
  • Overselling the vision without sufficient support
    to back it up
  • Instability risks quality hard to retain good
    people when budget stays flat and future is
    uncertain

33
Impact Being Recognized
  • Today I can tell you that we're on the verge of
    a major technological breakthrough for the
    long-term preservation of computer generated
    records of the Federal Government.
    Research-and-development work done for us by the
    San Diego Supercomputer Center indicates that a
    practical Electronic Records Archives may be in
    sight.
  • John Carlin, Archivist of the United States
    National Archives and Records Administration -
    Presentation to Subcommittee on Management,
    Information, and Technology, Committee on
    Government Reform, 28 March 2000
  •  

34
Overview of the day
  • 830-840am Welcome to UCSD UCSD
    Administration
  • 840-940 Introduction to NPACI and SDSC Fran
    Berman
  • 940-1000 Overview of NPACI Program Richard
    Moore
  • 1000-1030 Tour of Visualization Lab (and
    break) Mike Bailey et al.
  • 1030-1100 NPACI Hardware Resources Wayne
    Pfeiffer
  • 1100-1135 NPACI Software Roadmap Carl
    Kesselman
  • 1135am-noon Resource Allocation
  • Users, Process, Metrics Nancy
    Wilkins-Diehr
  • 1200-1215pm Lunch
  • 1215-1245 Question and Answer Session (SDSC
    Auditorium) 
  • 1245-130 Tour of Machine Room Phil Andrews 
  • 130-155 Education, Outreach and
    Training Rozeanne Steckler, Kris Stewart
  • 155-220 NPACI Technology, Applications,
  • Alpha Projects Peter Taylor
  • 220-310 Examples of Key Scientific
    Accomplishments
  • 220 Protein Data Bank Phil Bourne
  • 235 Dynamics of Molecular Recognition
    Andrew McCammon
  • 250 Storage Resource Broker
    Related Reagan Moore
  • 300 Information Integration in the
    Sciences Chaitan Baru
Write a Comment
User Comments (0)
About PowerShow.com