Jacquard: - PowerPoint PPT Presentation

About This Presentation
Title:

Jacquard:

Description:

Jacquard: Architecture and Application Performance Overview NERSC Users Group October 2005 Outline An engineering level overview of the HW and SW that make up ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 13
Provided by: LBNLP3
Learn more at: https://www.nersc.gov
Category:
Tags: jacquard | sumo

less

Transcript and Presenter's Notes

Title: Jacquard:


1
Jacquard Architecture and Application
Performance Overview NERSC Users
Group October 2005
2
Outline
  • An engineering level overview of the HW and SW
    that make up jacquard.
  • CPUs
  • Memory
  • OS
  • Interconnect
  • Will use seaborg as a point of reference.

3
seaborg.nersc.gov (review?)
16 way SMP NHII Node
Seaborg
380 x
Colony Switch
Resource Speed Bytes
Registers 3 ns 256 B
L1 Cache 5 ns 32 KB
L2 Cache 45 ns 8 MB
Main Memory 300 ns 16 GB
Remote Memory 19 us 7 TB
GPFS 10 ms 50 TB
HPSS 5 s 9 PB
CSS0
CSS1
crossbar
  • 6080 dedicated CPUs, 96 shared login CPUs
  • Hierarchy of caching, speeds
  • Bottleneck determined by first depleted resource

HPSS
4
jacquard.nersc.gov basics
2 way Opteron node
Jacquard
320 x
Infiniband Switch
Resource Speed Bytes
Registers 0.5 ns 2 KB
L1 Cache 1.5 ns 64 KB
L2 Cache 45 ns 1 MB
Main Memory 70-117 ns 6 GB
Remote Memory 5 us 2 TB
GPFS 10 ms 15 TB
HPSS 5 s 9 PB
HT
IB
  • 640 dedicated CPUs, 8 shared login CPUs
  • Smaller caches, HT, Really Fast
  • SMP? NUMA? SUMO.

HPSS
5
Opteron Block Diagram Not strictly SMP
SDRAM
SDRAM
Switch, I/O
1 TLB per CPU 1K entries 4K pages ? 4MB coverage
6
Hyper Transport Good Stuff
Little conflict between data movement and
computation
7
SMP size and memory contention
Why is Jacquard 2 way SMP?
Jacquards numbers 1 task 100 2 tasks 98
8
Flops _at_ 2.2 GHz
  • Peak Theoretical Flops
  • Double (64 bit) floats 1 add 1 mult 2.2
    GFlop/s
  • Single (32 bit) floats 2 add 2 mult 4.4
    GFlop/s
  • Peak Realized Flops
  • Double (64 bit) floats 1.9 GFlop/s
  • Single (32 bit) floats 3.4 GFlop/s
  • Your Flops?
  • Walltime is more important than flops
  • For a known algorithm flops are a sanity check

Memory BW 4 GB/sec per CPU
9
MPI Bandwidth seaborg
10
MPI Bandwidth Jacquard
11
Linux for AIX Users
  • Linux and AIX are more similar than different
  • Linux is not as good as AIX in keeping processes
    scheduled of the same CPU ? processor affinity
    work.
  • Linux has easy interfaces to architectural and
    process performance information /proc/cpuinfo,
    /proc/self, etc.
  • AIX MPI is in /usr/bin,lib, Linux MPI is in
    modules
  • Linux doesnt need bmaxdata !
  • Little vs. Big Endian

12
Conclusions
  • The underlying HW technologies HT, IB, etc. are
    quite promising. Opteron systems are delivering
    great price/performance.
  • Still working some SDRAMM, OS, and SW issues.
  • Whats useful to you? Let us know.
Write a Comment
User Comments (0)
About PowerShow.com