Title: Maui High Performance Computing Center
1Maui High Performance Computing Center
Open System Support An AFRL, MHPCC and UH
Collaboration December 18, 2007
Mike McCraney MHPCC Operations Director
2Agenda
- MHPCC Background and History
- Open System Description
- Scheduled and Unscheduled Maintenance
- Application Process
- Additional Information Required
- Summary and Q/A
3An AFRL Center
- An Air Force Research Laboratory Center
- Operational since 1993
- Managed by the University of Hawaii
- Subcontractor Partners SAIC / Boeing
- A DoD High Performance Computing Modernization
Program (HPCMP) Distributed Center - Task Order Contract Maximum Estimated
Ordering Value 181,000,000 - Performance Dependent 10 Years
- 4 Year Base Period with 2, 3-Year Term
Awards
4A DoD HPCMP Distributed Center
Director, Defense Research and Engineering
DUSD (Science and Technology)
High Performance Computing Modernization Program
- Distributed Centers
- Allocated Distributed Centers
- Army High Performance Computing Research Center
(AHPCRC) - Arctic Region Supercomputing Center (ARSC)
- Maui High Performance Computing Center (MHPCC)
- Space and Missile Defense Command (SMDC)
- Dedicated Distributed Centers
- ATC
- AFWA
- AEDC
- AFRL/IF
- Eglin
- FNMOC
- JFCOM/J9
- Major Shared Resource Centers
- Aeronautical Systems Center (ASC)
- Army Research Laboratory (ARL)
- Engineer Research and Development Center (ERDC)
- Naval Oceanographic Office (NAVO)
- NAWC-AD
- NAWC-CD
- NUWC
- RTTC
- SIMAF
- SSCSD
- WSMR
5MHPCC HPC History
- 1994 - IBM P2SC Typhoon Installed
- 1996 - 2000 IBM P2SC
- 2000 - IBM P3 Tempest Installed
- 2001 - IBM Netfinity Huinalu Installed
- 2002 - IBM P2SC Typhoon Retired
- 2002 - IBM P4 Tempest Installed
- 2004 - LNXi Evolocity II Koa Installed
- 2005 - Cray XD1 Hoku Installed
- 2006 - IBM P3 Tempest Retired
- 2007 - IBM P4 Tempest Reassigned
6Hurricane Configuration Summary
Current Hurricane Configuration
- Eight, 32 processor/32GB nodes IBM P690 Power4
- Jobs may be scheduled across nodes for a total of
288p - Shared memory jobs can span up to 32p and 32GB
- 10TB Shared Disk available to all nodes
- LoadLeveler Scheduling
- One job per node 32p chunks can only support
8 simultaneous jobs - Issues
- Old technology, reaching end of life,
upgradability issues - Cost prohibitive Power consumption constant
400,000 annual power cost
7Dell Configuration Summary
Proposed Shark Configuration
- 40, 4 processor/8GB nodes Intel 3.0Ghz Dual
Core Woodcrest Processors - Jobs may be scheduled across nodes for a total of
160p - Shared memory jobs can span up to 8p and 16GB
- 10TB Shared Disk available to all nodes
- LSF Scheduler
- One job per node 8p chunks can support up to
40 simultaneous jobs - Shared use as Open system and TDS (test and
development system) - Much lower power cost Intel power management
- System already maintained and in use
- System covered 24x7 UPS, generator
- Possible short-notice downtime
Features/Issues
8Jaws Architecture
Cisco 6500 Core
Head Node
- Head Node for System Administration
- Build Nodes
- Running Parallel Tools
- (pdsh, pdcp, etc.)
- SSH Communications Between Nodes
- Localized Infiniband Network
- Private Ethernet
- Dell Remote Access Controllers
- Private Ethernet
- Remote Power On/Off
- Temperature Reporting
- Operability Status
- Alarms
- 10 Blades Per Chassis
- CFS Lustre Filesystem
- Shared Access
- High Performance
- Using Infiniband Fabric
10 Gig-E Ethernet Fibre
Gig-E nodes with 10 Gig-E uplinks. 40 nodes per
uplink.
Fibre Channel
Cisco Infiniband (Copper)
9Shark Software
- Systems Software
- Red Hat Enterprise Linux v4
- 2.6.9 Kernel
- Infiniband
- Cisco Software stack
- MVAPICH
- MPICH 1.2.7 over IB Library
- Gnu 3.4.6 C/C/Fortran
- Intel 9.1 C/C/Fortran
- Platform LSF HPC 6.2
- Platform Rocks
10Maintenance Schedule
- Current
- 200pm 400pm
- 2nd and 4th Thursday (as necessary)
- Check website (mhpcc.hpc.mil) for maintenance
notices
- New Proposed Schedule
- 800am 500pm
- 2nd and 4th Wednesdays (as necessary)
- Check website for maintenance notices
- Only take maintenance on scheduled systems
- Check on Mondays before submitting jobs
11Account Applications and Documentation
- Contact Helpdesk or website for application
information - Documentation Needed
- Account names, systems, special requirements
- Project title, nature of work, accessibility of
code - Nationality of applicant
- Collaborative relevance with AFRL
- New Requirements
- Case File information
- For use in AFRL research collaboration
- Future AFRL applicability
- Intellectual property shared with AFRL
- Annual Account Renewals
- September 30 is final day of the fiscal year
12Summary
- Anticipated migration to Shark
- Should be more productive and able to support
wide range of jobs - Cutting edge technology
- Cost savings from Hurricane (400,000 annual)
- Stay tuned for timeline likely end of
January, early February
13Mahalo