Title: Jacquard:
1 Jacquard Architecture and Application
Performance Overview NERSC Users
Group October 2005
2Outline
- An engineering level overview of the HW and SW
that make up jacquard. - CPUs
- Memory
- OS
- Interconnect
- Will use seaborg as a point of reference.
3seaborg.nersc.gov (review?)
16 way SMP NHII Node
Seaborg
380 x
Colony Switch
Resource Speed Bytes
Registers 3 ns 256 B
L1 Cache 5 ns 32 KB
L2 Cache 45 ns 8 MB
Main Memory 300 ns 16 GB
Remote Memory 19 us 7 TB
GPFS 10 ms 50 TB
HPSS 5 s 9 PB
CSS0
CSS1
crossbar
- 6080 dedicated CPUs, 96 shared login CPUs
- Hierarchy of caching, speeds
- Bottleneck determined by first depleted resource
HPSS
4jacquard.nersc.gov basics
2 way Opteron node
Jacquard
320 x
Infiniband Switch
Resource Speed Bytes
Registers 0.5 ns 2 KB
L1 Cache 1.5 ns 64 KB
L2 Cache 45 ns 1 MB
Main Memory 70-117 ns 6 GB
Remote Memory 5 us 2 TB
GPFS 10 ms 15 TB
HPSS 5 s 9 PB
HT
IB
- 640 dedicated CPUs, 8 shared login CPUs
- Smaller caches, HT, Really Fast
- SMP? NUMA? SUMO.
HPSS
5Opteron Block Diagram Not strictly SMP
SDRAM
SDRAM
Switch, I/O
1 TLB per CPU 1K entries 4K pages ? 4MB coverage
6Hyper Transport Good Stuff
Little conflict between data movement and
computation
7SMP size and memory contention
Why is Jacquard 2 way SMP?
Jacquards numbers 1 task 100 2 tasks 98
8Flops _at_ 2.2 GHz
- Peak Theoretical Flops
- Double (64 bit) floats 1 add 1 mult 2.2
GFlop/s - Single (32 bit) floats 2 add 2 mult 4.4
GFlop/s - Peak Realized Flops
- Double (64 bit) floats 1.9 GFlop/s
- Single (32 bit) floats 3.4 GFlop/s
- Your Flops?
- Walltime is more important than flops
- For a known algorithm flops are a sanity check
Memory BW 4 GB/sec per CPU
9MPI Bandwidth seaborg
10MPI Bandwidth Jacquard
11Linux for AIX Users
- Linux and AIX are more similar than different
- Linux is not as good as AIX in keeping processes
scheduled of the same CPU ? processor affinity
work. - Linux has easy interfaces to architectural and
process performance information /proc/cpuinfo,
/proc/self, etc. - AIX MPI is in /usr/bin,lib, Linux MPI is in
modules - Linux doesnt need bmaxdata !
- Little vs. Big Endian
12Conclusions
- The underlying HW technologies HT, IB, etc. are
quite promising. Opteron systems are delivering
great price/performance. - Still working some SDRAMM, OS, and SW issues.
- Whats useful to you? Let us know.