Optimising of HiGEM on scalar machines: HPCx - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Optimising of HiGEM on scalar machines: HPCx

Description:

Complex and high resolution models require significant computer resources. The more efficiently these resources ... Windows screensaver (climateprediction.net) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 20
Provided by: CGAM5
Category:

less

Transcript and Presenter's Notes

Title: Optimising of HiGEM on scalar machines: HPCx


1
Optimising of HiGEM on scalar machines HPCx
  • Simon Wilson
  • CGAM/Met Office
  • Thanks to Paul Burton and Richard Hill

2
Introduction
  • Why optimise?
  • Complex and high resolution models require
    significant computer resources.
  • The more efficiently these resources are used,
    the more science is possible.
  • Model development and testing will be quicker.

3
The Unified Model (UM) (1)
  • HiGEM is part of the Unified Model (UM) a suite
    of atmospheric and oceanic numerical modelling
    software developed by the UK Met Office.
  • The UM is a complete modelling infrastructure
  • Ancillary generation
  • Model configuration
  • Model compilation
  • Run control
  • Archiving
  • Comprehensive user interface
  • Installed on least 10 operating systems, 10
    Fortran compilers, in 100 institutions, and used
    by 100s of scientists.

4
The Unified Model (UM) (2)
  • It can be used in many configurations and
    resolutions
  • UK Forecast model
  • Climate models (HadCM3/HadGEM/HiGEM)
  • Atmosphere only models/Ocean only models
  • Regional models (HadRM3/PRECIS)
  • Windows screensaver (climateprediction.net)
  • It uses gcom, a interface library for
    inter-processor communication, which can be
    configured for MPI, PVM or Cray SHMEM for
    portability.
  • Has been run on 1-512 processors.

5
The Unified Model (UM) (3)
  • Choice of approximately 200 sub-models.
  • New sub-models constantly being developed.
  • Forecast and climate models share much of the
    same code.
  • In total the UM has well over a million lines of
    code. The HiGEM executable is built from over
    500,000 lines of Fortran.
  • First used in 1992.
  • Ported version of UM available on CDROM.

6
(No Transcript)
7
HPCx
  • In parallel to the Earth Simulator, HiGEM is
    being run on the HPCx.
  • 50 IBM p690 Regatta nodes, each with 32
    processors.
  • SMP architecture, same as SX-6.
  • 10.8 Tflops peak 6 Tflops sustained.
  • Scalar machine, instructions are executed
    sequentially, unlike the SX6 where they are
    pipelined.
  • Based in the UKs CCLRC Daresbury Laboratory.

8
HiGEM configuration
  • N144 atmosphere (288x217), 2D decomposition, but
    usually run 1D on NEC for vectorisation.
  • 1/3 degree ocean (1082x540), 1D decomposition
    only.
  • On the HPCx the initial HiGEM configuration is a
    direct copy of an Earth Simulator experiment,
    with one alteration, and minimal changes for
    compiler compatibility.

9
HiGEM Optimisation
  • At the Met Office there is an ongoing NEC
    optimisation project, concentrating on the
    forecast model and HadGEM (N96).
  • Very successful, a speed up of 50 from initial
    configuration from t3e.
  • Only considered single and two node performance.
  • Different machines have different optimisation
    issues
  • Scalar vs vector
  • I/O
  • Interconnects
  • Memory speed and size
  • Number of processors

10
HPCx Optimisation
  • Initial HiGEM configuration is from the Earth
    Simulator.
  • Unoptimised results very poor, 2.6 mm/day on 128
    processors (4 nodes), 2.9 mm/day on 256.
  • New timer software provides a graphical
    representation of the time spent in each code
    section and gcom routines.
  • The timer does slow the model by 5, but doesnt
    interfere with relative timings.
  • All runs for two days on 128 processors.
  • All plots have same format, but different
    vertical scales.
  • For this study, only ocean optimisations were
    investigated as it dominates on the HPCx (80 of
    runtime).

11
No optimisations
12
Ocean row times
13
Non-regular row decomposition
14
Optimised filter
15
Fast global sum
16
Results for 256 processors
  • Redistribution of rows over processors -18
  • Replace NEC specific matrix multiplier with BLAS
    equivalent in filter 6
  • Fast sum 32
  • Total time reduction 56
  • Speed change from 2.9 to 6.6 model months/day, a
    128 speed increase.
  • For 128 processors, 2.6 to 4.3 model months/day,
    a 62 speed increase.

17
Summary (1)
  • HiGEM is just one configuration of the Met
    Offices Unified Model (UM), which runs on many
    computer systems, in many configurations.
  • HiGEM is complex, and it is the first time that
    the UM has been used for global coupled climate
    runs at high resolutions.
  • The need to use portable and repeatable code
    means that the UM can be inefficient and scale
    poorly.
  • Sub-model performance is significantly different
    on different machines. I/O and diagnostics
    routines problematical on Earth Simulator, while
    ocean slower on HPCx.

18
Summary (2)
  • Although load balancing and scalability in the
    ocean has been improved on the HPCx,
    computationally the ocean model is slow compared
    to the Earth Simulator.
  • The atmosphere model on the HPCx scales well and
    runs relatively quickly.
  • Complexity of UM means that it is difficult to
    predict performance on different machines.
  • Presently, the maximum speed on ES and HPCx is
    roughly the same, 7 model months/day.

19
Future Work
  • Suitability of HPCx optimisations currently being
    tested on the Earth Simulator.
  • Port new timer software to NEC SX-6.
  • Further ocean optimisation on HPCx.
  • Task parallelism/Replacement for gcom.
Write a Comment
User Comments (0)
About PowerShow.com