Production Linux Capacity Computing at Los Alamos - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Production Linux Capacity Computing at Los Alamos

Description:

Production Linux Capacity Computing at Los Alamos. Steven ... (soon Turquoise) network. Open. NFS. Servers. 1 dual-processor. BProc master. node. GigE. network ... – PowerPoint PPT presentation

Number of Views:67

Avg rating:3.0/5.0

Slides: 31

Provided by: douglas167

Category:

more less

Transcript and Presenter's Notes

Title: Production Linux Capacity Computing at Los Alamos

1
Production Linux Capacity Computing at Los Alamos

Steven R. Shaw, CCN-7
High Performance Computing Systems
Computing, Communications, and Networking Division

2
Topics

Vision and goals
Methodology
Clustermatic and other components
Current systems
Lightning
Flash
Configuration management
Operational opportunities
Lessons learned
Current and future work
Questions

3
Our Capacity Vision

to meet requirements of programs within
reasonable resources, is to consolidate
architectures, leverage commodity computing,
Linux open source software, and standardized
deployment over capacity systems
Cheryl Wampler, ASC PI Meeting, March 1-4, 2004.

4
Goals for Production Linux Capacity Computing

Respond to the need for additional capacity
computing
Provide stability and continuity for user
community
Lower integration and operational costs by
leveraging internal resources and open source
software
Use repeatable processes and automation to
deploy new capacity quickly and to efficiently
operate and maintain existing systems

5
Goals (continued)

Provide more compute cycles to users by making
systems easier to build and manage do more with
available resources
Move toward a separate common file system, not
tied to specific platforms
Also move toward a separate standards based
scalable IO network to files systems, archival
storage and other services.

6
Methodology

CCN Division took on role of the system
integrator
Successful collaborative relationships were
established with CCS-1 (LA-MPI, Science
Appliance), CCN-8 (Panasas FS, Compliers, and
tools), CCN-5 (network integration), and
third-party softwaresuppliers
Built upon our Linuxcluster experiencefrom Pink
and othersystems

7
Pink Configuration
64 dual-processorI/O nodes
958 dual-processor production computingnodes
GigEnetwork
Myrinet
1 dual-processorBProc masternode
2 dual-processorfront-end nodes
Panasas Global FS
LANLYellow(soon Turquoise)network
OpenNFSServers
8
Science Appliance

The key software in a Science Appliance is a
suite that LANL developed called "Clustermatic"
Clustermatic can completely control a cluster,
from the BIOS up to a high level programming
environment.
It features the Beowulf Distributed Process Space
(BProc), LinuxBios, and a variety of other
open-source kernel modifications, utilities, and
libraries.
Very quick node boot times
Cluster boot and upgrade in minutes
Manageable nodes from power-on
Single system image for the entire cluster
Quick process migration

9
Clustermatic Awards

Research and Development Magazines 2004 Research
and Development 100.
Clustermatic is a revolutionary software
suite for managing, monitoring, administering and
operating clusters on network-connected computers
running as a high-performance system.
Clustermatic increases reliability and
efficiency, decreases node autonomy, simplifies
computer programming, reduces administration
costs, and minimizes a user's reliance on
unpredictable software, enabling commodity-based
cluster networks to compete with the higher-cost
supercomputers.
The Clustermatic system was awarded the
Excellence in Cluster Technology Award for Open
Source Cluster Solutions at the 2004 ClusterWorld
Conference Exposition, in April 2004.

10
Clustermatic Components

A traditional cluster is built by replicating a
complete system software environment on every
node.
In a Science Appliance (Clustermatic system), we
have master nodes and slave nodes, but only the
master nodes have a fully-configured system.
The slave nodes run a minimal software stack
consisting of LinuxBIOS, Linux, and BProc.
Culture change for users, not every tool and
library exists on the slave nodes.

11
Clustermatic Components

Most importantly, BProc enables a distributed
process space across nodes within the cluster
all user processes running on the slave nodes
appear as processes running on the master node.
Users create processes on the master node and
the system migrates them (the processes) to the
slave nodes.
Standard input, output, and error streams are
redirected to the master node.

Slave nodes
Master node

Processes remain visible, controllable on master.

12
Other Key Components

Panasas file system
LA-MPI
User environment similar to other Los Alamos
systems
HPSS
LSF
TotalView
HPC toolkit

13
Science Appliance Systems at LANL

Lightning, Pink, Grendels, Flash, TLC
MPI LSF are BProc-integrated.
Result LANL Science Appliance systems are easy
to use but are different than other LANL systems

14
Los Alamos Platforms
15
Lightning Capacity System Overview (last week)

System Hardware
1408 dual-processor LNXI AMD Opteron nodes
(11.26 TeraOps peak, 5.6 TB memory)
One Arima Rio Works HDAMA system board with AMD
8111 and 8131 chipsets
Two 2.0 GHz 64-bit processors with 1 MB L2
cache/node
Four GB of memory/node
One 120-GB disk drive/node
One ICEBOX controller/node for hardware
monitoring
Scalable to 2048 nodes (scalable design plans for
interconnect)
Myrinet Interconnect (latency 7 usec, bandwidth
250 MB/sec)
Gigabit copper network to network services such
as NFS, Panasas
A copper-based 10/100 network for system
monitoring system reboot, etc.

System Software
Linux
Clustermatic software
Beoboot, LinuxBios, Bproc, Supermon
Compilers
Message Passing
LAMPI
Debugging - TotalView
Archival storage - HPSS
Resource management - Load Sharing Facility (LSF)

16
Lightning Integration and Deployment
Contract signed mid-July 2003
Level 2 (SCS) Lightning User Environment
Level 2 (PC) November 25, 2005
System Delivered
Beta mode
Integration/Acceptance Test
Limited Availability
General
Laboratory Standdown
Secure Environment
Feb
Dec
Nov
Jan
Sept
Oct
Aug
Sep
Oct
Jun
May
Jul
Mar
Apr
Nov
2003
2004
DP Award of Excellence For Integration Effort
Linpack Run 8.051 TF 64-bit Linpack 6 on Top500
17
Lightning last week
18

Linux production and development environment
model Production segmentsDevelopment
environmentsSupport and system functions

19
Flash Timeline

Assemble hardware 11/17-11/19/04
Stabilize hardware 11/20 11/24
Acceptance testing complete 12/1
Software install 12/2 12/17
88 Person-hours
First I/O node system on Opteron
Panasas and network setup in parallel
Friendly users on 12/19

20
Configuration Management

Philosophy
All maintenance and installation is done within
the configuration management system
Motivation
Do more with available resources
Automation is key
Expertise is encoded
Automated systems are consistent and tireless
Prevent errors and mitigate consequences
Avoid creating error-likely situations
Correlate effect with cause
Manual actions reduce the capacity to respond

21
Configuration Management

A framework for automating, to the fullest
extent possible, in a cross-platform and common
fashion the configuration of a product.
Differentiate products at major boundaries that
make sense (O/S, Linux version, Bproc or not,
chip architecture, unique service, etc.)
Databases become the documentation

22
Configuration Management Culture Change

The database is pointless if the system diverges
from its description due to actions taken outside
the data base
All changes, even temporary and debugging in
nature, must be done using our configuration
management tools

23
Configuration Management Tools

Rsync High confidence mirroring of files
systemimager - Installation, replication and
disaster recovery of the core system
Cfengine Rule based files for installation and
configuration actions
systemimager provides the body, cfengine creates
the soul

24
More Configuration Management Tools

Revision Control System (RCS) track origin and
history
Annotated history within the cfengine database
RPM (Redhat Package Manager)
Deterministic, verifiable, removable
Culture change for some of our suppliers

25
Configuration Management Automation and Discipline

Leads to systems that
Are more predictable behavior can be
ascertained from the database
More scalable copies are easier
Better documented
Easier to debug
Easier to repair
Enables us to accomplish more with our available
resources

26
Operational Opportunities

Hardware maintenance
Field replaceable unit is the node
Rapid boot time dramatically shortens the time to
repair
Use operations staff for hands-on maintenance,
vendor becomes a parts supplier and second tier
support
Repair the node during prime time and burn-in,
maintaining a supply of tested spares.
Increased job content and satisfaction for
operators

27
Operational Opportunities

Automated interrupt reporting
When a node becomes interrupted, the HPC
operators are notified by email and a GUI
display.
Event driven notification.
A record for the interrupt is generated
automatically in the Remedy database and its
status is left open awaiting the problem
resolution.
When a node is returned to service, the Remedy
ticket is automatically updated with the time.
In many cases the cause of the interrupt and
associated error message are captured in the
ticket.
Results in more complete and accurate information.

28
Lessons Learned

Integration issues
Be sure your suppliers understand your production
support needs and are committed
Remember you own the complete support chain
Culture change issues
Users shift from every tool everywhere to a
more deterministic model
Be willing to negotiate the rightweight system
Administrators - configuration management
discipline
Software suppliers conform to the configuration
management requirements
BProc Master Nodes loading

29
Current and Future Work