Planned Machines: ASCI Purple, ALC and M

About This Presentation

Title:

Planned Machines: ASCI Purple, ALC and M

Description:

This work was performed under the auspices of the U.S. Department of Energy by ... Some packages are MPI latency dominated. Some packages are MPI BW dominated ... – PowerPoint PPT presentation

Number of Views:14

Avg rating:3.0/5.0

Slides: 9

Provided by: marks117

Learn more at: https://www.cs.sandia.gov

Category:

more less

Transcript and Presenter's Notes

Title: Planned Machines: ASCI Purple, ALC and M

1
Planned Machines ASCI Purple, ALC and MIC MCR

Presented to SOS7
Mark Seager
seager_at_llnl.gov
925-423-3141
ICCD ADH for Advanced Technology
Lawrence Livermore National Laboratory

This work was performed under the auspices of the
U.S. Department of Energy by the University of
California, Lawrence Livermore National
Laboratory under Contract No. W-7405-Eng-48.
2
Q1 What is unique in structure and function of
your machine?

Purples unique structure is fat SMPs with 16
rails of Federation interconnect
MCRALCs unique structure is the shared global
file system
However, most important point is that
applications are highly mobile between Purple,
MCRALC, White, Q and other clusters of SMP
systems

3
Purples unique structure is fat SMPs with 16
rails of interconnect
Fibre Channel 2 I/O Network

16 Federation links per SMP in four switch planes
System Data and Control Networks
System Data and Control Networks
System Data and Control Networks

191 Parallel Batch/Interactive/Visualization Nodes

Purple System
100 TF/s 30-45 TF/s delivered on sPPMUMT2000
50 TB memory, 2.0 PB of disk _at_ 108 GB/s delivered
197 x 64-way Armada SMP w 16 Federation Links
4 Login/network nodes
Login/network nodes for login/NFS
8x10 Gb/s for parallel FTP on each Login
All external networking is 1-10 Gb/s Ethernet
Clustered I/O services for cluster wide file
system
Fibre Channel2 I/O attach does not extend

Programming/Usage Model
Application launch over all compute nodes up to
8,192 tasks
1 MPI task/CPU and Shared Memory, full 64b
support
Scalable MPI (MPI_allreduce, buffer space)
Likely usage
multiple MPI tasks/node with 4-16 OpenMP/MPI task
Single STDIO interface
Parallel I/O to single file, multiple serial I/O
(1 file/MPI task)

4
Unique feature of ALCMCR is Lustre Lite shared
file system
Cluster wide file system leverages DOE/NNSA ASCI
PathForward Open Source Lustre development
5
Q2 What characterizes your applications?
Examples are Intensities of message passing,
memory utilization, computing, IO, and data.

Applications characterized as multi-physics
package simulations
All applications compute/comms intensive
Each package pushes performance envelope along a
different dimension
Some packages are MPI latency dominated
Some packages are MPI BW dominated
Memory BW is critical factor, but expensive
memory subsystems dont perform much better than
commodity ones

6
Q3 What prior experience guided you to this
choice?

Mission and Applications
Budgets
Politics
Delivered performance
Balanced risk and cost performance

7
Strategic Approach straddle multiple curves to
balance risk and opportunity of new disruptive
technologies

Three complementary curves
Delivers to todays stockpiles demanding needs
Production environment
For must have deliverables now
Delivers transition for next generation
Near production environment
Provides cycles for science
Provides cycles for stockpile
Leading to next generation production systems
These are the capacity systems in a strategic
capacity/capability mix
Delivers affordable path to petaFLOP/s
Research environment, leading transition to
petaflop systems?
Are there other paths to a breakthrough regime by
2006-7?

Any given technology curve is ultimately limited
by Moores Law
Cell-Based (IBM BG/L)
IA32/ IA64/AMD Linux
Vendor integrated SMP Cluster (IBM SP, HP SC)
170K/TF
7M/TF (Q)
500K /TF
Mainframes (RIP)
Performance
2M/TF (Purple C)
1.2M/TF (MCR)
Straddle strategy for stability and preeminence
10 M/TF (White)
Today
Time
FY05
8
Q4. Other than your own machine, for your needs
what are the best and worst machines? And, why?

Clusters of SMPs with full node OS makes system
administration and programming much easier, but
scalability is an issue
Vectors suck
10x potential speed-up from vectorization on Cray
YMP class machines yielded only 1.5-2x in
delivered performance boost to stockpile codes

Write a Comment

User Comments (0)