NCARs Response to upcoming OCI Solicitations - PowerPoint PPT Presentation

About This Presentation
Title:

NCARs Response to upcoming OCI Solicitations

Description:

NSF Cyberinfrastructure Strategy (Track-1 ... achieve revolutionary advancement and breakthroughs in science and engineering. ... Science goal: breakthroughs ... – PowerPoint PPT presentation

Number of Views:60
Avg rating:3.0/5.0
Slides: 40
Provided by: TomBe
Category:

less

Transcript and Presenter's Notes

Title: NCARs Response to upcoming OCI Solicitations


1
NCARs Response to upcoming OCI Solicitations
  • Richard Loft
  • SCD Deputy Director for RD

2
Outline
  • NSF Cyberinfrastructure Strategy (Track-1
    Track-2)
  • NCAR generic strategy for NSFXX-625s (Track-2)
  • NCAR response to NSF05-625
  • NSF Petascale Initiative Strategy
  • NCAR response to NSF Petascale Initiative

3
NSFs Cyberinfrastructure Strategy
  • The NSFs HPC acquisition strategy (through FY10)
    for HPC is for three Tracks
  • Track 1 High End O(1 PFLOPS sustained)
  • Track 2 Mid level system O(100 TFLOPS)
    NSFXX-625
  • First instance (NSF05-625) submitted Feb 10,
    2006
  • Next instances due
  • November 30, 2006
  • November 30, 2007
  • November 30, 2008
  • Track 3 Typical University HPC O(1-10 TFLOPS)
  • The purpose of the Track-1 system will be to
    achieve revolutionary advancement and
    breakthroughs in science and engineering.

4
Solicitation NSF05-625Towards a Petascale
Computing Environment for Science and Engineering
  • Award September 2006
  • System in production by May 31, 2007
  • 30,000,000 or 15,000,000.
  • Operating costs funded under separate action.
  • RP serves the broad science community - open
    access.
  • Allocations by LRAC/MRAC or their successors
  • Two 10 Gb/s TeraGrid links

5
NCARs Overall NSFXX-625 Strategy
  • Leverage NCAR/SCD expertise in production HPC.
  • Get a production system -
  • No white box Linux solutions.
  • Stay on path to usable petascale systems
  • NCAR is a Teragrid outsider - must address two
    areas
  • Leverage experience with general scientific
    users
  • Lack of Grid consulting experience
  • Emphasize, but dont over emphasize,
    geosciences.
  • In proposing, NCAR has a facility problem
  • Minimize costs - power, administrative staff,
    level of support.
  • Creative plan for remote user support and
    education.

6
NSF05-625 Partners
  • Facility Partner
  • End-to-End System Supplier
  • User Support Network -
  • NCAR Consulting Service Group
  • University partners

7
NSF05-625 Facility Partner
  • NCAR ML Facility after ICESS is FULL.
  • Key Points
  • A new datacenter is needed whether NCAR wins the
    NSF05-625 solicitation or not.
  • Because of the short timeline, new datacenter
    never factors into the strategy for NSFXX-625.
  • Identified a colocation facility
  • facility features
  • local (Denver-Boulder area)
  • State of the Art, High Availability Center
  • Currently 4 x 2MW generators of power available
  • Familiar with large scale deployments
  • Dark Fibre readily available (good connectivity)

8
NSF05-625 Supercomputer System Details
  • Two systems capability capacity
  • 80 Tflops combined
  • Robotic tape storage system 12PB

9
NCAR NSF05-625 User Support Plan
  • Largest potential differentiator in proposal -
    lets do something unique!
  • System will be used by the generic scientist
    -support plan must
  • Be extensible to other domains than geoscience
  • Address grid user support
  • Strategy leverages OSCER-lead IGERT proposal-
  • Combine teaching of computational science with
    user support
  • Embed application support expertise in key
    institutions
  • Build education and training materials through
    university partnerships.

10
Track-1 System Background
  • Source of funds Presidential Innovation
    Initiative announced in SOTU.
  • Performance goal 1 PFLOPS sustained on
    interesting problems.
  • Science goal breakthroughs
  • Use model 12 research teams per year using whole
    system for days or weeks at a time.
  • Capability system - large everything fault
    tolerant.
  • Single system in one location.
  • Not a requirement that machine be upgradable.

11
Track-1 Project Parameters
  • Funds 200M over 4 years, starting FY07
  • Single award
  • Money is for end-to-end system (as in 625)
  • Not intended to fund facility.
  • Release of funds tied to meeting hw and sw
    milestones.
  • Deployment Stages
  • Simulator
  • Prototype
  • Petascale system operates FY10-FY15
  • Operations funds FY10-15 funded separately.

12
Two Stage Award Process Timeline
  • Solicitation out May, 2006 (???)
  • HPCS down-select June, 2006
  • Preliminary Proposal due August, 2006
  • Down selection (invitation to 3-4 to write Full
    Proposal)
  • Full Proposal due January, 2007
  • Site visits Spring, 2007
  • Award Sep, 2007

13
NSFs view of the problem
  • NSF recognizes the facility (power, cooling,
    space) challenge of this system.
  • Therefore NSF welcomes collaborative approaches
  • University Federal Lab
  • University commercial data center
  • University State Government
  • University consortium
  • NSF recognizes that applications will need
    significant modification to run on this system.
  • User support plan
  • Expects proposer to discuss needs in this area
    with experts in key applications areas.

14
The Cards in NCARs Hand
  • NCAR
  • Is a leader in making the case that geoscience
    grand challenge problems need petascale
    computing.
  • Has many grand challenge problems to offer
    itself.
  • Has experience at large processor counts.
  • Has recently connected to the TeraGrid, and is
    moving towards becoming a full-fledged Resource
    Provider.

15
NCAR Response Options
  • Do Nothing
  • Focus on Petascale Geoscience Applications
  • Partner with a lead institution or consortium
  • Lead a Tier-1 proposal

16
NCAR Response Options
  • Do Nothing
  • Focus on Petascale Geoscience Applications
  • Partner with a lead institution or consortium
  • Lead a Tier-1 proposal

17
Questions, Comments?
18
The Relationship Between OCIs Roadmap and
NCARs Datacenter project
  • Richard Loft
  • SCD Deputy Director for RD

19
Projected CCSM Computing Requirements Exceed
Moores Law
Thanks to Jeff Kiehl/Bill Collins
20
NSFs Cyberinfrastructure Strategy
  • The NSFs HPC acquisition strategy (through FY10)
    for HPC is for three Tracks
  • Track 1 High End O(1 PFLOPS sustained)
  • Track 2 Mid level system O(100 TFLOPS)
    NSFXX-625
  • First instance (NSF05-625) submitted Feb 10,
    2006
  • Next instances due
  • November 30, 2006
  • November 30, 2007
  • November 30, 2008
  • Track 3 Typical University HPC O(1-10 TFLOPS)
  • The purpose of the Track-1 system will be to
    achieve revolutionary advancement and
    breakthroughs in science and engineering.

21
NCAR strategic goals
  • NCAR will stay in the top echelon of geoscience
    computing centers.
  • NCARs immediate strategic goal is to be a
    Track-2 center.
  • To do this, NCAR must be integrated with NSFs
    cyberinfrastructure plans.
  • This means both connecting and ultimately
    operating within the Teragrid framework.
  • The Teragrid is evolving, so this is a moving
    target.

22
NCAR new-facility
  • NCAR ML Facility after ICESS is FULL.
  • Key Points
  • A new datacenter is needed whether NCAR wins the
    NSF05-625 solicitation or not.
  • Because of the short timeline, a new datacenter
    never factors into the strategy for NSFXX-625.
  • Right now, we cant handle a modest budget
    augmentation for computing with the current
    facility.

23
Mesa Lab is full after the ICESS procurement
  • ICESS Integrated Computing Environment for
    Scientific Simulation
  • Were sitting at 980 kW right now.
  • Deinstall of bluesky will give us back 450 kW.
  • This leaves about 600 kW of head-room.
  • The ICESS procurement is expected to deliver a
    system with a maximum power requirement of
    500-600 kW of power.
  • This is not enough to house 15M-30M of
    equipment from NSF05-625, for example.

24
Were fast running out of power
Max power at the Mesa Lab is 1.2 MW!
25
Preparing for the Petascale
  • Richard Loft
  • SCD Deputy Director for RD

26
What to expect in HEC?
  • Much more parallelism.
  • A good deal of uncertainty regarding node
    architectures.
  • Many threads per node.
  • Continued ubiquity of Linux/Intel systems.
  • There will be vector systems
  • Emergence of exotic architectures.
  • Largest (petascale) system likely to have special
    features
  • Power aware design (small memory?)
  • Fault tolerant design features
  • Light-weight compute node kernels
  • Custom networks

27
Top 500Speed of Supercomputers vs Time
28
Top 500Number of Processors vs Time
29
HEC in 2010
  • Based on history, should expect 4K-8K CPU systems
    to be commonplace by the end of the decade.
  • The largest systems on the Top500 list should be
    1-10 PFLOPS.
  • Parallelism in largest system - estimate (2010).
  • Assume a clock speed of 5 GHz a double FMA CPU
    delivers 20 GFLOPS peak
  • 1 PFLOPS peak 50K CPUs.
  • 10 PFLOPS peak 500K CPUs
  • Large vector systems (if they exist) will still
    be highly parallel.
  • To justifying using the largest systems, must use
    a sizable fraction of the resource.

30
Range of Plausible Architectures 2010
  • Power issues will slow rate of increase in clock
    frequency.
  • This will drive trend towards massive
    parallelism.
  • All scalar system with have multiple CPUs per
    socket (chip).
  • Currently 2 CPUs per core, by 2008, 4 CPUs per
    socket will be common place.
  • 2010 scalar architectures will likely continue
    this trend. 8 CPUs are possible - Cell Chip
    already has 8 synergistic processors.
  • Key unknown is which architecture for a cluster
    on a chip will be most effective.
  • Vector systems will be around, but at what
    price?
  • Wildcards
  • Impact of DARPA HPCS program
  • Exotics FPGAs, PIMs, GPUs.

31
How to make science staff aware ofcoming changes?
  • NCAR must develop a science driven plan for
    exploiting petascale systems at the end of the
    decade.
  • Briefed NCAR Director, DD, CISL and ESSL
    Directors
  • Meetings (SEWG at CCSM Breckenridge)
  • Organizing NSF workshops on petascale geoscience
    benchmarking scheduled at DC (June 1-2) and NCAR
    (TBD)
  • Have initiated internal petascale discussions
  • CGD-SCD joint meetings
  • Peta_ccsm mail list.
  • Peta_ccsm Swiki site.
  • Through activities like this. NSA should take
    leadership role.

32
What must be done to secure resources to improve
scalability?
  • Must help ourselves.
  • Invest judiciously in computational science where
    possible.
  • Leverage application development partnerships
    (SciDAC, etc.)
  • Write proposals.
  • Support for applications development for the
    Track-1 system can be built into a NCAR
    partnership deal.
  • NSF has indicated an independent funding track
    for applications. NCAR should aggressively pursue
    those funding sources.
  • New ideas can help - e.g. POP

33
POP Space Filling Curves partition for 8
processors
Credit John Dennis, SCD
34
POP 1/10 Degree BG/L Improvements
35
POP 1/10 Degree performance
BG/L SFC improvement
36
Questions, Comments?
37
Top 500 Processor Types Intel taking over
Today Intel is inside 2/3 of the Top500 machines
38
(No Transcript)
39
The commodity onslaught
  • The Linux/Intel cluster is taking over Top500.
  • Linux has not penetrated at major Weather, Ocean,
    Climate centers- yet - reasons
  • System maturity (SCD experience)
  • Scalability of dominant commodity interconnects
  • Combinatorics (Linux flavor, processor,
    interconnect, compiler)
  • But it affects NCAR indirectly because
  • Ubiquity Opportunity
  • Universities are deploying them.
  • NCAR must rethink services provided to the
    Universities.
  • Puts strain on all community software development
    activities.
Write a Comment
User Comments (0)
About PowerShow.com