Title: Planned Machines: ASCI Purple, ALC and M
1Planned Machines ASCI Purple, ALC and MIC MCR
- Presented to SOS7
- Mark Seager
- seager_at_llnl.gov
- 925-423-3141
- ICCD ADH for Advanced Technology
- Lawrence Livermore National Laboratory
This work was performed under the auspices of the
U.S. Department of Energy by the University of
California, Lawrence Livermore National
Laboratory under Contract No. W-7405-Eng-48.
2Q1 What is unique in structure and function of
your machine?
- Purples unique structure is fat SMPs with 16
rails of Federation interconnect - MCRALCs unique structure is the shared global
file system - However, most important point is that
applications are highly mobile between Purple,
MCRALC, White, Q and other clusters of SMP
systems
3Purples unique structure is fat SMPs with 16
rails of interconnect
Fibre Channel 2 I/O Network
16 Federation links per SMP in four switch planes
System Data and Control Networks
System Data and Control Networks
System Data and Control Networks
191 Parallel Batch/Interactive/Visualization Nodes
- Purple System
- 100 TF/s 30-45 TF/s delivered on sPPMUMT2000
- 50 TB memory, 2.0 PB of disk _at_ 108 GB/s delivered
- 197 x 64-way Armada SMP w 16 Federation Links
- 4 Login/network nodes
- Login/network nodes for login/NFS
- 8x10 Gb/s for parallel FTP on each Login
- All external networking is 1-10 Gb/s Ethernet
- Clustered I/O services for cluster wide file
system - Fibre Channel2 I/O attach does not extend
- Programming/Usage Model
- Application launch over all compute nodes up to
8,192 tasks - 1 MPI task/CPU and Shared Memory, full 64b
support - Scalable MPI (MPI_allreduce, buffer space)
- Likely usage
- multiple MPI tasks/node with 4-16 OpenMP/MPI task
- Single STDIO interface
- Parallel I/O to single file, multiple serial I/O
(1 file/MPI task)
4Unique feature of ALCMCR is Lustre Lite shared
file system
Cluster wide file system leverages DOE/NNSA ASCI
PathForward Open Source Lustre development
5Q2 What characterizes your applications?
Examples are Intensities of message passing,
memory utilization, computing, IO, and data.
- Applications characterized as multi-physics
package simulations - All applications compute/comms intensive
- Each package pushes performance envelope along a
different dimension - Some packages are MPI latency dominated
- Some packages are MPI BW dominated
- Memory BW is critical factor, but expensive
memory subsystems dont perform much better than
commodity ones
6Q3 What prior experience guided you to this
choice?
- Mission and Applications
- Budgets
- Politics
- Delivered performance
- Balanced risk and cost performance
7Strategic Approach straddle multiple curves to
balance risk and opportunity of new disruptive
technologies
- Three complementary curves
- Delivers to todays stockpiles demanding needs
- Production environment
- For must have deliverables now
- Delivers transition for next generation
- Near production environment
- Provides cycles for science
- Provides cycles for stockpile
- Leading to next generation production systems
- These are the capacity systems in a strategic
capacity/capability mix - Delivers affordable path to petaFLOP/s
- Research environment, leading transition to
petaflop systems? - Are there other paths to a breakthrough regime by
2006-7?
Any given technology curve is ultimately limited
by Moores Law
Cell-Based (IBM BG/L)
IA32/ IA64/AMD Linux
Vendor integrated SMP Cluster (IBM SP, HP SC)
170K/TF
7M/TF (Q)
500K /TF
Mainframes (RIP)
Performance
2M/TF (Purple C)
1.2M/TF (MCR)
Straddle strategy for stability and preeminence
10 M/TF (White)
Today
Time
FY05
8Q4. Other than your own machine, for your needs
what are the best and worst machines? And, why?
- Clusters of SMPs with full node OS makes system
administration and programming much easier, but
scalability is an issue - Vectors suck
- 10x potential speed-up from vectorization on Cray
YMP class machines yielded only 1.5-2x in
delivered performance boost to stockpile codes