NERSC Status and Plans

About This Presentation

Title:

NERSC Status and Plans

Description:

Ann Almgren William Crutchfield Michael Lijewski Charles Rendleman. Vincent Beckner Marcus Day ... Nikravesh (V) John Taylor. SCIENTIFIC DATA MANAGEMENT. ARIE ... – PowerPoint PPT presentation

Number of Views:75

Avg rating:3.0/5.0

Slides: 59

Provided by: alic144

Category:

more less

Transcript and Presenter's Notes

Title: NERSC Status and Plans

1

NERSC Status and Plans
for the
NERSC User Group MeetingFebruary 22,2001
BILL KRAMER
DEPUTY DIVISION DIRECTOR
DEPARTMENT HEAD, HIGH PERFORMANCE COMPUTING
DEPARTMENT
kramer_at_nersc.gov
510-486-7577

2
Agenda

Update on NERSC activities
IBM SP Phase 2 status and plans
NERSC-4 plans
NERSC-2 decommissioning

3
ACTIVITIES AND ACCOMPLISHMENTS
4
NERSC Facility Mission
To provide reliable, high-quality,state-of-the-ar
t computing resourcesand client support in a
timely mannerindependent of client
locationwhile wisely advancing the state of
computational and computer science.
5
2001 GOALS

PROVIDE RELIABLE AND TIMELY SERVICE
Systems
Gross Availability, Scheduled Availability,
MTBF/MTBI, MTTR
Services
Responsiveness, Timeliness, Accuracy, Proactivity
DEVELOP INNOVATIVE APPROACHES TO ASSIST THE
CLIENT COMMUNITY EFFECTIVELY USE NERSC SYSTEMS.
DEVELOP AND IMPLEMENT WAYS TO TRANSFER RESEARCH
PRODUCTS AND KNOWLEDGE INTO PRODUCTION SYSTEMS AT
NERSC AND ELSEWHERE
NEVER BE A BOTTLENECK TO MOVING NEW TECHNOLOGY
INTO SERVICE.
INSURE ALL NEW TECHNOLOGY AND CHANGES IMPROVE (OR
AT LEAST DOES NOT DIMINISH) SERVICE TO OUR
CLIENTS.

6
GOALS (CONT)

NERSC AND LBNL WILL BE A LEADER IN LARGE SCALE
SYSTEMS MANAGEMENT SERVICES.
EXPORT KNOWLEDGE, EXPERIENCE, AND TECHNOLOGY
DEVELOPED AT NERSC, PARTICULARLY TO AND WITHIN
NERSC CLIENT SITES.
NERSC WILL BE ABLE TO THRIVE AND IMPROVE IN AN
ENVIRONMENT WHERE CHANGE IS THE NORM
IMPROVE THE EFFECTIVENESS OF NERSC STAFF BY
IMPROVING INFRASTRUCTURE, CARING FOR STAFF,
ENCOURAGING PROFESSIONALISM AND PROFESSIONAL
IMPROVEMENT

NEW TECHNOLOGY
TIMELYINFORMATION
INNOVATIVE ASSISTANCE
RELIABLE SERVICE
TECHNOLOGY TRANSFER
SUCCESS FOR CLIENTS AND FACILITY
CONSISTENT SERVICE SYSTEM ARCHITECTURE
MISSION
NEW TECHNOLOGY
LARGE SCALE LEADER
STAFF EFFECTIVENESS
CHANGE
RESEARCH FLOW
WISE INTEGRATION
7
Major AccomplishmentsSince last meeting (June
2000)

IBM SP placed into full service April 4, 2000
more later
Augmented the allocations by 1 M Hours in FY 2000
Contributed to 11M PE hours in FY 2000 more
than doubling the FY 2000 allocation
SP is fully utilized
Moved entire facility to Oakland more later
Completed the second PAC allocation process with
lessons learned from the first year

8
Activities and Accomplishments

Improved Mass Storage System
Upgraded HPSS
New versions of HSI
Implementing Gigabit Ethernet
Two STK robots added
Replaced 3490 with 9840 tape drives
Higher density and Higher speed tape drives
Formed Network and Security Group
Succeeded in external reviews
Policy Board
SCAC

9
Activities and Accomplishments

Implemented new accounting system NIM
Old system was
Difficult to maintain
Difficult to integrate to new system
Limited by 32 bits
Not Y2K compliant
New system
Web focused
Available data base software
Works for any type of system
Thrived in a state of increased security
Open Model
Audits, tests

10
2000 Activities and Accomplishments

NERSC firmly established as a leader in system
evaluation
Effective System Performance (ESP) recognized as
a major step in system evaluation and is
influencing a number of sites and vendors
Sustained System Performance measures
Initiated a formal benchmarking effort to the
NERSC Application Performance Simulation Suite
(NAPs) to possibly be the next widely recognized
parallel evaluation suite

11
Activities and Accomplishments

Formed the NERSC Cluster team to investigate the
impact of SMP commodity clusters for High
Performance, Parallel Computing to assure the
most effective implementations of division
resources related to cluster computing.
Coordinate all NERSC Division cluster computing
activities (research, development, advanced
prototypes, pre-production, production, and user
support).
Initiated a formal procurement for mid-range
cluster
In consultation with DOE, decided not to award as
part of NERSC program activities

12
NERSC Division
DIVISION ADMINISTRATOR FINANCIAL
MANAGER WILLIAM FORTNEY
CHIEF TECHNOLOGIST DAVID BAILEY
DISTRIBUTED SYSTEMS DEPARTMENT WILLIAM
JOHNSTON Department Head DEB AGARWAL, Deputy
HIGH PERFORMANCE COMPUTING DEPARTMENT WILLIAM
KRAMER Department Head
HIGH PERFORMANCE COMPUTING RESEARCH
DEPARTMENT ROBERT LUCAS Department Head
APPLIED NUMERICAL ALGORITHMS PHIL COLELLA
IMAGING COLLABORATIVE COMPUTING BAHRAM PARVIN
COLLABORATORIES DEB AGARWAL
ADVANCED SYSTEMS TAMMY WELCOME
HENP COMPUTING DAVID QUARRIE
CENTER FOR BIOINFORMATICS COMPUTATIONAL
GENOMICS MANFRED ZORN
DATA INTENSIVE DIST. COMPUTING BRIAN TIERNEY
(CERN) WILLIAM JOHNSTON (acting)
COMPUTATIONAL SYSTEMS JIM CRAW
MASS STORAGE NANCY MEYER
SCIENTIFIC COMPUTING ESMOND NG
DISTRIBUTED SECURITY RESEARCH MARY THOMPSON
COMPUTER OPERATIONS NETWORKING SUPPORT WILLIAM
HARRIS
USER SERVICES FRANCESCA VERDIER
CENTER FOR COMPUTATIONAL SCIENCE ENGR. JOHN
BELL
SCIENTIFIC DATA MANAGEMENT ARIE SHOSHANI
SCIENTIFIC DATA MGMT RESEARCH ARIE SHOSHANI
NETWORKING WILLIAM JOHNSTON (acting)
FUTURE INFRASTRUCTURE NETWORKING
SECURITY HOWARD WALTER
FUTURE TECHNOLOGIES ROBERT LUCAS (acting)
VISUALIZATION ROBERT LUCAS (acting)
Rev 02/01/01
13
HIGH PERFORMANCE COMPUTING DEPARTMENT WILLIAM
KRAMER Department Head
USER SERVICES FRANCESCA VERDIER Mikhail
Avrekh Harsh Anand Majdi Baddourah Jonathan
Carter Tom DeBoni Jed Donnelley Therese
Enright Richard Gerber Frank Hale John
McCarthy R.K. Owen Iwona Sakrejda David
Skinner Michael Stewart (C) David Turner Karen
Zukor
ADVANCED SYSTEMS TAMMY WELCOME Greg
Butler Thomas Davis Adrian Wong
COMPUTATIONAL SYSTEMS JAMES CRAW Terrence
Brewer (C) Scott Burrow (I) Tina Butler Shane
Canon Nicholas Cardo Stephan Chan William
Contento (C) Bryan Hardy (C) Stephen Luzmoor
(C) Ron Mertes (I) Kenneth Okikawa David
Paul Robert Thurman (C) Cary Whitney
COMPUTER OPERATIONS NETWORKING
SUPPORT WILLIAM HARRIS Clayton Bagwell
Jr. Elizabeth Bautista Richard Beard Del
Black Aaron Garrett Mark Heer Russell Huie Ian
Kaufman Yulok Lam Steven Lowe Anita
Newkirk Robert Neylan Alex Ubungen

MASS STORAGE NANCY MEYER Harvard
Holmes Wayne Hurlbert Nancy Johnston Rick Un (V)
HENP COMPUTING DAVID QUARRIE CRAIG TULL
(Deputy) Paolo Calafiura Christopher Day Igor
Gaponenko Charles Leggett (P) Massimo
Marino Akbar Mokhtarani Simon Patton
FUTURE INFRASTRUCTURE NETWORKING
SECURITY HOWARD WALTER Eli Dart Brent
Draney Stephen Lau
(C) Cray (FB) Faculty UC Berkeley (FD) Faculty
UC Davis (G) Graduate Student Research Assistant
(I) IBM (M) Mathematical Sciences Research
Institute (MS) Masters Student (P) Postdoctoral
Researcher (SA) Student Assistant (V) Visitor
On leave to CERN
Rev 02/01/01
14
HIGH PERFORMANCE COMPUTING RESEARCH
DEPARTMENT ROBERT LUCAS Department Head
APPLIED NUMERICAL ALGORITHMS PHILLIP
COLELLA Susan Graham (FB) Anton Kast
Peter McCorquodale (P) Brian
Van Straalen Daniel Graves Daniel
Martin (P) Greg Miller (FD)
IMAGING COLLABORATIVE COMPUTING BAHRAM
PARVIN Hui H Chan (MS) Gerald Fontenay
Sonia Sachs Qing
Yang Ge Cong (V) Masoud
Nikravesh (V) John Taylor
SCIENTIFIC COMPUTING ESMOND NG Julian Borrill
Xiaofeng He (V) Jodi Lamoureux (P)
Lin-Wang Wang Andrew Canning Yun
He Sherry Li
Michael Wehner (V) Chris Ding
Parry Husbands (P) Osni Marques
Chao Yang Tony Drummond Niels Jensen (FD)
Peter Nugent Woo-Sun
Yang (P) Ricardo da Silva (V) Plamen Koev (G)
David Raczkowski (P)
CENTER FOR BIOINFORMATICS COMPUTATIONAL
GENOMICS MANFRED ZORN Donn Davy
Inna Dubchak Sylvia Spengler
SCIENTIFIC DATA MANAGEMENT ARIE SHOSHANI Carl
Anderson Andreas Mueller
Ekow Etoo M. Shinkarsky
(SA) Mary Anderson Vijaya Natarajan
Elaheh Pourabbas (V) Alexander
Sim Junmin Gu Frank Olken
Arie Segev (FB)
John Wu Jinbaek Kim (G)
CENTER FOR COMPUTATIONAL SCIENCE
ENGINEERING JOHN BELL Ann Almgren
William Crutchfield Michael Lijewski
Charles Rendleman Vincent Beckner
Marcus Day
FUTURE TECHNOLOGIES ROBERT LUCAS (acting) David
Culler (FB) Paul Hargrove
Eric Roman Michael
Welcome James Demmel (FB) Leonid Oliker
Erich Stromeier Katherine
Yelick (FB)
VISUALIZATION ROBERT LUCAS (acting) Edward
Bethel James Hoffman (M)
Terry Ligocki Soon Tee Teoh
(G) James Chen (G) David Hoffman
(M) John Shalf
Gunther Weber (G) Bernd Hamann (FD)
Oliver Kreylos (G)
(FB) Faculty UC
Berkeley (FD) Faculty UC Davis (G) Graduate
Student Research Assistant (M) Mathematical
Sciences Research Institute (MS) Masters Student
(P) Postdoctoral
Researcher (S) SGI (SA) Student Assistant (V)
Visitor Life Sciences Div. On Assignment to
NSF
Rev 02/01/01
15
FY00 MPP Users/Usage by Discipline
16
FY00 PVP Users/Usage by Discipline
17
NERSC FY00 MPP Usage by Site
18
NERSC FY00 PVP Usage by Site
19
FY00 MPP Users/Usage by Institution Type
20
FY00 PVP Users/Usage by Institution Type
21
NERSC System Architecture
FDDI/ ETHERNET 10/100/Gigbit
REMOTE VISUALIZATION SERVER
MAX STRAT
SYMBOLIC MANIPULATION SERVER
IBM And STK Robots
DPSS
PDSF
ResearchCluster
IBM SP NERSC-3 Processors 604/304 Gigabyte memory
CRI T3E 900 644/256
CRI SV1
MILLENNIUM
IBM SP NERSC-3 Phase 2a 2532 Processors/ 1824
Gigabyte memory
LBNL Cluster
VIS LAB
22
Current Systems
23
Major Systems

MPP
IBM SP Phase 2a
158 16-way SMP nodes
2144 Parallel Application CPUs/12 GB per node
20 TB Shared GPFS
11,712 GB swap space - local to nodes
8.6 TB of temporary scratch space
7.7 TB of permanent home space
4-20 GB home quotas
240 Mbps aggregate I/O - measured from user
nodes (6 HiPPI, 2 GE, 1 ATM)
T3E-900 LC with 696 PEs - UNICOS/mk
644 Application Pes/256 MB per PE
383 GB of Swap Space - 582 GB Checkpoint File
System
1.5 TB /usr/tmp temporary scratch space - 1 TB
permanent home space
7- 25 GB home quotas, DMF managed
35 MBps aggregate I/O measured from user nodes
- (2 HiPPI, 2 FDDI)
1.0 TB local /usr/tmp

Serial
PVP - Three J90 SV-1 Systems running UNICOS
64 CPUs Total/8 GB of Memory per System (24 GB
total)
1.0 TB local /usr/tmp
PDSF - Linux Cluster
281 IA-32 CPUs
3 LINUX and 3 Solaris file servers
DPSS integration
7.5 TB aggregate disk space
4 striped fast Ethernet connections to HPSS
LBNL Mid Range Cluster
160 IA-32 CPUs
LINUX with enhancements
1 TB aggregate disk space
Myrinet 2000 Interconnect
GigaEthernet connections to HPSS

Storage
HPSS
8 STK Tape Libraries
3490 Tape drives
7.4 TB of cache disk
20 HiPPI Interconnects, 12 FDDI connects, 2 GE
connections
Total Capacity 960TB
160 TB in use
HPSS - Probe

24
T3E Utilization95 Gross Utilization
Allocation Starvation
Full Scheduling Functionality
4.4 improvement per month
Checkpoint t - Start of Capability Jobs
Allocation Starvation
Systems Merged
25
SP Utilization

In the 80-85 range which is above original
expectations for first year
More variation than T3E

26
T3E Job Size
More than 70 of the jobs are large
27
SP Job Size
Full size jobs more than 10 of usage
60 of the jobs are gt ¼ the maximum size
28
StorageHPSS
29
NERSC Network Architecture
30
CONTINUE NETWORK IMPROVEMENTS
31
LBNL Oakland Scientific Facility
32
Oakland Facility

20,000 sf computer room 7,000 sf office space
16,000 sf computer space built out
NERSC occupying 12,000 sf
Ten year lease with 3 five year options
10.5M computer room construction costs
Option for additional 20,000 sf computer room

33
LBNL Oakland Scientific Facility
Move accomplished between Oct 26 to Nov 4
System Scheduled Actual SP 10/27 9 am no
outage T3E 11/3 10 am 11/3 3 am SV1s 11/3
10 am 11/2 3 pm HPSS 11/3 10 am 10/31
930 am PDSF 11/6 10 am 11/2 11 am Other
Systems 11/3 10 am 11/1 8 am
34
Computer Room Layout
Up to 20,000 sf of computer space Direct Esnet
node at OC12
35
2000 Activities and Accomplishments

PDSF Upgrade in conjunction with building move

36
2000 Activities and Accomplishments

netCDF parallel support developed by NERSC staff
for the Cray T3E.
A similar effort is being planned to port netCDF
to the IBM SP platform.
Communication for Clusters M-VIA and MVICH
M-VIA and MVICH are VIA-based software for
low-latency, high-bandwidth, inter-process
communication.
M-VIA is a modular implementation of the VIA
standard for Linux.
MVICH is an MPICH-based implementation of MPI for
VIA.

37
FY 2000 User Survey Results

Areas of most importance to users
available hardware (cycles)
overall running of the center
network access to NERSC
allocations process
Highest satisfaction (score gt 6.4)
Problem reporting/consulting services (timely
response, quality, followup)
Training
Uptime (SP and T3E)
FORTRAN (T3E and PVP)
Lowest satisfaction (score lt 4.5)
PVP batch wait time
T3E batch wait time
Largest increases in satisfaction from FY 1999
PVP cluster (we introduced interactive SV1
services)
HPSS performance
Hardware management and configuration (we monitor
and improve this continuously)
HPCF website (all areas are continuously
improved, with a special focus on topics
highlights as needing improvement in the surveys)
T3E Fortran compilers

38
Client Comments from Survey

"Very responsive consulting staff that makes the
user feel that his problem, and its solution, is
important to NERSC"
"Provide excellent computing resources with high
reliability and ease of use."
"The announcement managing and web-support is
very professional."
"Manages large simulations and data. The oodles
of scratch space on mcurie and gseaborg help me
process large amounts of data in one go."
"NERSC has been the most stable supercomputer
center in the country particularly with the
migration from the T3E to the IBM SP".
"Makes supercomputing easy."

39
NERSC 3 Phase 2a/b
40
Result NERSC-3 Phase 2a

System built and configured
Started factory tests 12/13
Expect delivery 1/5
Undergoing acceptance testing
General production April 2001
What is different that needs testing
New Processors,
New Nodes, New memory system
New switch fabric
New Operating System
New parallel file system software

41
IBM Configuration

Phase 1 Phase 2a/b
Compute Nodes 256 134
Processors 256x2512 134x162144
Networking Nodes 8 2
Interactive Nodes 8 2
GPFS Nodes 16 16
Service Nodes 16 4
Total Nodes (CPUs) 304 (604) 158 (2528)
Total Memory (compute nodes) 256 GB 1.6 TB
Total Global Disk (user accessible) 10 TB 20 TB
Peak (compute nodes) 409.6 GF 3.2 TF
Peak (all nodes) 486.4 GF 3.8 TF
Sustained System Perf 33 GF 235 GF/280 GF
Production Dates April 1999 April 2001/Oct 2001
is a minimum - may increase due to sustained
system performance measure

42
What has been completed

6 nodes added to the configuration
Memory per node increased to 12 GB for 140
compute nodes
Loan of full memory for Phase 2
System installed and braced
Switch adapters and memory added to system
System configuration
Security Audit
System testing for many functions
Benchmarks being run and problems being diagnosed

43
Current Issues

Failure of two benchmarks need to be resolved
Best case indicate broken hardware likely with
the switch adaptors
Worst case indicate design and load issues that
are fundamental
Variation
Loading and switch contention
Remaining tests
Throughput, ESP
Full System tests
I/O
Functionality

44
General Schedule

Complete testing TBD based on problem
correction
Production Configuration set up
3rd party s/w, local tools, queues, etc
Availability Test
Add early users 10 days after successful testing
complete
Gradually add other users complete 40 days
after successful testing
Shut down Phase 1 10 days after system open to
all users
Move 10 TB of disk space configuration will
require Phase 2 downtime
Upgrade to Phase 2b in late summer, early fall

45
NERSC-3 Sustained System Performance Projections

Estimates the amount scientific computation that
can really be delivered
Depends on delivery of Phase 2b functionality
The higher the last number is the better since
the system remains at NERSC for 4 more years

Test/Config, Acceptance,etc
Software lags hardware
46
NERSC Computational Power vs. Moores Law
47
NERSC 4
48
NERSC-4

NERSC 4 IS ALREADY ON OUR MINDS
PLAN IS FOR FY 2003 INSTALLATION
PROCUREMENT PLANS BEING FORMULATED
EXPERIMENTATION AND EVALUATION OF VENDORS IS
STARTING
ESP, ARCHITECTURES, BRIEFINGS
CLUSTER EVALUATION EFFORTS
USER REQUIREMENTS DOCUMENT (GREENBOOK) IMPORTANT

49
How Big Can NERSC-4 be

Assume a delivery in FY 2003
Assume no other space is used in Oakland until
NERSC-4
Assume cost is not an issue (at least for now)
Assume technology still progresses
ASCI will have a 30 Tflop/s system running for
over 2 years

50
How close is 100 Tflop/s

Available gross space in Oakland is 3,000 sf
without major changes
Assume it is 70 usable
The rest goes to air handlers, columns, etc.
That gives 3,000 sf of space for racks
IBM system used for estimates
Other vendors are similar
Each processor is 1.5 Ghz, to yield 6 Gflop/s
An SMP node is made up of 32 processors
2 Nodes in a frame
64 processors in a frame 384 Gflops per frame.
Frames are 32 - 36" wide and 48 deep
service clearance of 3 feet in front and back
(which can overlap)
3 by 7 is 21 sf per frame

51
Practical System Peak

Rack Distribution
60 of racks are for CPUs
90 are user/computation nodes
10 are system support nodes
20 of racks are for switch fabric
20 of racks for disks
5,400 sf / 21 sf per frames 257 frames
277 nodes that are are directly used by
computation
8,870 CPUS for computation
system total is 9,856 (308 nodes)
Practical system peak is 53 Tflop/s
.192 Tflop/s per node 277 nodes
Some other places would claim 60 Tflop/s

52
How much use will it be

Sustained vs Peak performance
Class A codes on T3E samples at 11
LSMS
44 of peak on T3E
So far 60 of peak on Phase 2a (maybe more)
Efficiency
T3E runs at a 30 day average about 95
SP runs at a 30 day average about 80
Still functionality planned

53
How much will it cost

Current cost for a balanced system is about 7.8M
per Tflop/s
Aggressive
Cost should drop by a factor of 4
1-2 M per Teraflop/s
Many assumptions
Conservative
3.5 M per Teraflop/s
Added costs for install, operate and balance the
facility is 20.
The full cost is 140 M to 250 M
Too Bad

54
The Real Strategy

Traditional strategy within existing NERSC
Program funding. Acquire new computational
capability every three years - 3 to 4 times
capability increase of existing systems
Early, commercial, balanced systems with focus on
- stable programming environment
- mature system management tools
- good sustained to peak performance ratio
Total value of 25M - 30M
- About 9-10M/yr. using lease to own
Have two generations in service at a time
- e.g. T3E and IBM SP
Phased introduction if technology indicates
Balance other system architecture components

55
Necessary Steps

1) Accumulate and evaluate benchmark candidates
2) Create a draft benchmark suite and run it on
several systems
3) Create the draft benchmark rules
4) Set basic goals and options for procurement
and then create a draft RFP document
5) Conduct market surveys (vendor briefings,
intelligence gathering, etc.) - we do this after
the first items so we can be looking for the
right information and also we can tell the
vendors what to expect. It is often the case
that we have to "market" to the vendors on why
they should be bidding - since it costs them a
lot.
6) Evaluate alternative and options for RFP and
tests. This is also where we do a technology
schedule (when what is available) and estimate
prices - price/performance, etc.
7) Refine RFP and benchmark rules for final
release
8) Go thru reviews
9) Release RFP
10) Answer questions from vendors
11) Get responses - evaluate
12) Determine best value - present results and
get concurrence
13) Prepare to negotiate
14) Negotiate
15) Put contract package together
16) Get concurrence and approval
17) Vendor builds the system
18) Factory test
19) Vendor delivers it

56
Rough Schedule

Goal NERSC-4 installation in first half of CY
2003
Vendor responses (11) in early CY 2002
Award in late summer/fall of CY2002.
This is necessary in order to assure delivery and
acceptance ( 22) in FY 2003.
A lot of work and long lead times (for example,
we have to account for review and approval times,
90 days for vendors to craft responses, time to
negotiate, ...)
NERSC Staff kick off meeting first week of march,
Been doing some planning work already.

57
NERSC-2 Decommissioning

RETIRING NERSC 2 IS ALREADY ON OUR MINDS
IF POSSIBLE WE WOULD LIKE TO KEEP NERSC 2 IN
SERVICE UNTIL 6 MONTHS BEFORE NERSC 4
INSTALLATION
Therefore, expect retirement at the end of FY
2002
It is risky to assume there will be a viable
vector replacement
Team is working to determine possible paths for
traditional vector users
Report due in early summer

58
SUMMARY

NERSC does an exceptionally effective job
delivering services to DOE and other researchers
NERSC has made significant upgrades this year
that position it well for future growth and
continued excellence
NERSC has a well mapped strategy for the next
several years

Write a Comment

User Comments (0)