MITgcm - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

MITgcm

Description:

Isolating and minimizing communication/synchronization primitives. OS ... promise has been characterized by hyperbole, but whose development has has been ... – PowerPoint PPT presentation

Number of Views:114
Avg rating:3.0/5.0
Slides: 26
Provided by: chr1214
Category:
Tags: hyperbole | mitgcm

less

Transcript and Presenter's Notes

Title: MITgcm


1
MITgcm
History
2
MITgcm Family Tree
3
MITgcm
Algorithm and applications
4
MITgcm UV
  • Ultra-Versatile Implementation

5
Target Compute Environments
  • IBM,
  • SGI,
  • Sun,
  • Intel et al.,
  • Digital,
  • HP.
  • SMP
  • and also
  • Clustered SMP
  • T3E ( SGI/CRAY )
  • Single and multi-processor vector NEC-SX4,
    CRAY-C90 (SGI)

6
Goals
  • Useful today
  • Good performance on current generation
  • machines
  • TAMC compatible
  • With a future
  • Practical route to a teraflop/s
  • Practical route to a desktop gigaflop/s

7
Challenges
  • Cache blocking v. long vectors.
  • Isolating and minimizing communication/synchroniza
    tion primitives.
  • OS idiosyncrasies.
  • Varying degrees of compiler capability.

8
Technologies
  • Vector processing.
  • Caches, deep memory hierarchy.
  • MPI.
  • HPF.
  • Multi-threading.
  • Network interface. SCI, Memory Channel,
    Giga-ring, Arcitc

9
Cache and vector
  • Voodoo numbers

sNx OLx
nSx
sNx
sNy
sNx
sNy OLy
nSy
sNy
10
Vector mode
  • Strips or one whole domain i.e. four proc.
    example sNy

sNy
sNx Nx
11
Cache, deep memory mode
  • Block the domain. sNy
  • sNx

sNy
sNx
12
What about the algorithm?
  • We know it vectorizes
  • Can it be blocked? - lets hope so!

13
MITgcm UV structure
Range 1sNx1,..

Can be block by block
Fill overlaps
Dont need any long vector sweeps
Range 1-OLxsNxOLx1,..
Depends on alg. and problem!

14
Communication
  • Minimize comm. points
  • Keep at high level, not in compute primitives.
  • Overlap with computation (needs hardware and OS
    support to have an effect!).
  • Multi-threaded and/or multi-process (MPI).

15
MITgcm UV communication
Send Gs
Update overlaps
Receive Gs
Depends on alg. and problem!
Send and receive ps
16
MPI and shared memory
  • Repeat domain in each process
  • Shared mem copies -gt messaging calls

.
etc
sNy
Nx
17
Exploiting NI innovation
  • Ongoing collaborations
  • T3E production hardware
  • HP, Sun, Digital - semi-production
  • Intel, IBM experimental
  • Rapidly evolving field
  • MITgcm UV can exploit it

18
Compiler and OS maturity
  • F77 v. F90
  • F77 is universally OK
  • On SMP for predictable performance need batch
    execution, private environment.
  • Not always configured that way.
  • Virtual memory
  • Makes cache speedup hard to predict

19
Is UV really Ugly Version
  • Example code.

20
Some HP Numbers.1
Forward Code
  • Per proc. grid size 64x32x20
  • 100Mflop/s per proc.
  • Number of procs 16
  • Total problem size 244x132x20
  • Total performance
  • 1.6GFlop/s
  • Time per block per time step 0.2 secs

21
Some HP Numbers.2
Inverter
  • Per proc. grid size 64x32
  • 200Mflop/s per proc.
  • Number of procs 16
  • Total problem size 244x132
  • Total performance
  • 3.2GFlop/s
  • Time per block per time step 0.2 secs per
  • timestep.

22
Outstanding Issues 1
  • Base code debugging and testing.
  • Parameterizations
  • Mixed layer
  • Eddy mixing
  • I/O pre and post-process.
  • Diagnostics.
  • SPP customization
  • communication primitives
  • solver
  • TAMC compilation.

23
Outstanding Issues 2
  • Per platform customizations
  • Pipelined slices!
  • T3E TAMC tape
  • Scientific libraries for solver

24
Conclusion
  • Parallel computing has historically been a field
    whose promise has been characterized by
    hyperbole, but whose development has has been
    defined by pragmatism

HYPEPERBOLE - Combine the best of MITgcm Classic
and MITgcm UV. PRAGAMATISM - We want
performance today. - and - UV style
implementation most likely teraflop/s model. UV
style implementation most likely gigaflop/s
desktop.
25
MITgcm UV
  • Ultra-Versatile redesign and implementation of
    MITgcm algorithm
  • Implementation that can exploit
  • Wildfire shared-memory
  • Cache friendly, not vector dominated
  • Toward coupled ocean-atmosphere model.
Write a Comment
User Comments (0)
About PowerShow.com