Title: Template 1ccarial Preview
1(No Transcript)
2SGI Contributions to Supercomputing by 2010
- Steve Reinhardt
- Director of Engineering
- spr_at_sgi.com
3Supercomputing Aspects of SGI
Visualization
HPC
- Scalable servers and superclusters
- SGI Origin family
- SGI Altix 3000 family
- SGI NUMAflex
- VAN
- Deliver images wherever the users are
- Enable collaboration
Data Access
- Deliver data wherever the users are
- CXFS/WAN demo at SC02
- Each server reads directly, at channel speeds
- Biggest installed configuration .5PB
NOTE Noenterprisereferences
4SGI in HPC
- Memory is unifying theme
- globally addressable up to O(PB)
- incorporating varied processing types
- latency (-gt 500ns for 10KP)
- bandwidth (local stride-1 BF -gt 2.0
local gather/scatter BF .5-1.0
remote bisection BW BF -gt .3) - Sustained performance
- differentiated scaling (latency bandwidth)
- better memory interface
- new synchronization substrate
- Raise the level of programming abstraction
- UPC/CAF (near-term)
- parallel Matlab (radical)
5SGI in HPC
- SGI Origin family
- MIPS processors, Irix OS
- exploit low power consumption, ISA control
- SGI Altix family
- IPF processors, Linux OS
- exploit SGI interconnect, with industry-standard
ISA and rapid OS maturation
6Balancing High Innovation and Profitability
low Profitability high
low Differentiation high
7System / Component Differentiation
Processor
Memory
Interconnect
OS
System Cost
System Value
8Ideal Differentiation
Processor
Memory
Interconnect
OS
System Cost
System Value
9SGI Origin series
Processor
Memory
Interconnect
OS
System Cost
System Value
10Quadrics cluster
Processor
Memory
Interconnect
OS
System Cost
System Value
11IBM SP3 system
Processor
Memory
Interconnect
OS
System Cost
System Value
12SGI Altix system
Processor
Memory
Interconnect
OS
System Cost
System Value
13STREAM Triad Results
- World-record result for a µP-based system fourth
overall - .8 BF (6.4GB/s shared by 2x4GF processors)
- Single kernel NUMA placement support in Linux
14Interconnect Scaling
MPI bandwidth versus distance (MB/s)
Comingsoon
15Altix 3000 Throughput Performance
Throughput of 4 jobs, each 8P, crash
application System Altix 3000, 32P, 64GB, XVM,
TP900
Individual jobs in the throughput mix are between
0.4 and 1.8 slower than the standalone case
16Summary SGI for HPC
- Long-term directions
- Memory globally addressable, high BW, low
latency - Strong delivered performance
- differentiated scaling (latency bandwidth)
- better memory interface
- new synchronization substrate
- Raise the level of programming abstraction
- UPC/CAF (near-term) parallel Matlab (radical)
- Near-term deliverables
- Altix 3000 system
- distinguished performance
- rapidly maturing Open Source software base
17(No Transcript)