Title: End-to-end data management capabilities in the GPSC
1End-to-end data management capabilities in the
GPSC CPES SciDACs Achievements and Plans
- SDM AHM
- December 11, 2006
- Scott A. Klasky
- End-to-End Task Lead
- Scientific Computing Group
- ORNL
GPSC
2Outline
- Overview of GPSC activities.
- The GTC and GEM codes.
- On the path to petascale computing.
- Data Management Challenges for GTC.
- Overview of CPES activities.
- The XGC and M3D codes.
- Code Coupling.
- Workflow Solutions.
- ORNLs end-to-end activities.
- Asynchronous I/O.
- Dashboard Efforts.
3Its all about the enabling technologies
Its all about the features which lead us to
Scientific discovery!
Its all about the data
Enabling technologies respond
Applications drive
D. Keyes
4GPSC gyrokinetic PIC codes used for studying
microturbulence in plasma core
- GTC (Z. Lin et al., Science 281, p.1835, 1998)
- Intrinsically global 3D nonlinear gyrokinetic PIC
code - All calculations done in real space
- Scales to gt 30,000 processors
- Delta-f method
- Recently upgraded to fully electromagnetic
- GEM (Y. Chen S. Parker, JCP, in press 2006)
- Fully electromagnetic nonlinear delta-f code
- Split-weight scheme implementation of kinetic
electrons - Multi-species
- Uses Fourier decomposition of the fields in
toroidal and poloidal directions (wedge code) - What about PIC noise.
- It is now generally agreed that these ITG
simulations are not being influenced by particle
noise. Noise effects on ETG turbulence remain
under study but are beginning to seem of
diminishing relevance. PSACI-PAC.
5GTC Code performance.
Historical Prediction of GTC Data Production
Cray XT3
- Increase output because of
- Asynchronous metadata rich I/O.
- Workflow automation.
- More analysis services in the workflow.
6GTC Towards a Predictive Capability for ITER
Plasmas
- Petascale Science
- Investigate important physics problems for ITER
plasmas, namely, the effect of size and isotope
scaling on core turbulence and transport (heat,
particle, and momentum). - These studies will focus on the principal causes
of turbulent transport in tokamaks, for example,
electron and ion temperature gradient (ETG
and ITG) drift
instabilities, collisonless and
collisional (dissipative) trapped
electron mode (CTEM and DTEM) and ways to
mitigate these phenomena.
7Impact How does turbulence cause heat, particles
and momentum to escape from plasmas?
- Investigation of the ITER confinement properties
is required - a dramatic step from 10 MW for 1 second to the
projected 500 MW for 400 seconds. - The race is on to improve predictive capability
before ITER comes on line (projected 2015). - More realistic assessment of ignition margins
requires more accurate calculations of
steady-state temperature and density profiles
for ions, electrons and helium ash. - The success of ITER depends in part on its
ability to operate in a gyroBohm scaling regime
which must be demonstrated computationally. - Key for ITER is the fundamental understanding of
the effect of deuterium-tritium isotope presence
(isotope scaling) on turbulence.
8Calculation Details
- Turbulent transport studies will be carried out
using the present GTC code, which uses a grid of
the size of ion gyroradius. - The electron particle transport physics requires
the incorporation of the size of the electron
skin depth in the code for the TEM physics, which
can be an order of magnitude smaller than the
size of ion gyroradius. - A 10,000x10,000x100 grid and 1 trillion particles
(100 particles/cell) are estimated to be needed.
(700 TB/scalar field, 25TB particles(1 time
step). - For the 250TF machine a 2D domain decomposition
(DD) for electrostatic simulation of ITER size
machine (a/rhogt1000) with kinetic electron is
necessary.
W. Lee
9GTC Data Management Issues
- Problem Move data from NERSC to ORNL then to
PPPL as the data was being generated. - Transfer from NERSC to ORNL, 3000 timesteps,
800GB within the simulation run (34 hours). - Convert each file to HDF5 file
- Archive files to 4GB chunks to HPSS at ORNL.
- Move portion of hdf5 files to PPPL.
- Solution Norbert Podhorszki
Watch
Transfer
Convert
Archive
10GTC Data Management Achievements
- In the process to remove
- Ascii output.
- Hdf5 output.
- Netcdf output.
- Replace with
- Binary (parallel) I/O with metadata tags.
- Conversion to HDF5 during the simulation on a
cheaper resource. - 1 XML file to describe all files output in GTC.
- Only work with 1 file from the entire simulation.
- Large buffer writes.
- Asynchronous I/O when it becomes available.
11The data-in-transit problem
- Particle data needs to be examined occasionally.
- 1 trillion particles 25TB/hour. (Demand lt2 I/O
overhead). - Need 356GB/sec to handle burst! (7GB/sec
aggregate). - We cant store all of this data! (2.3
PB/simulation) x 12 simulations/year 25 PB. - Need to analyze on-the-fly and not save all of
the data for permanent storage. Analyze on
another system. - Scalar data needs to be analyzed during the
simulation. - Computational Experiments too costly to let
simulation run and ignore it. Estimated cost
500K/simulation on Pflop machine. - GTC already 0.5M CPU hours/simulation
approaching 3M CPU hours on 250Tflop system. - Need to compare new simulations with older
simulations and experimental data. - Metadata needs to be stored in databases.
12Workflow Simulation monitoring.
- Images generated from the workflow.
- User needs to set angles, min/max and then the
workflow produces the images. - Still need to put this in our everyday use.
- Really need to identify the features as its
running. - Trace back features once they are known to
earlier timesteps (where are they born?)
135D Data Analysis -1
- Common in fusion to look at puncture plots. (2D).
- To gleam insight, we need to be able to detect
features - Need temporal perspective, involving the grouping
of similar items to possibly identify interesting
new plasma structures (within this 5D-phase
space) at different stages of the simulations.
2D Phase Space
145D Data Analysis -2
- Our turbulence covers the global volume as
opposed to some isolated (local) regions - The spectral representation of the turbulence,
evolves in time by moving to longer wavelengths. - Understanding key nonlinear dynamics here
involves extracting relevant information from the
data sets for the particle behavior. - The trajectories of these particles are followed
self-consistently in phase space - Tracking of spatial coordinates and the
velocities. - The self- consistent interaction between the
fields and the particles is most important when
viewed in the velocity space because particles of
specific velocities will resonate with waves in
the plasma to transfer energy. - Structures in velocity space could potentially be
used in the future development of multi-
resolution compression methods.
W. Tang
15Data Management Challenge
- A new discovery was made by Z. Lin in large ETG
calculations. - We were able to see radial flow across individual
eddies. - The Challenge
- Track the flow across the individual eddies, give
statistical measurements on the velocity of the
flow - Using Local Eddy Motion Density (PCA) to examine
data. - Hard problem for lots of reasons!
Ostrouchov ORNL
Decomposition shows transient wave components in
time
16Physics in tokamak plasma edge
- Plasma turbulence
- Turbulence suppression (H-mode)
- Edge localized mode and ELM cycle
- Density and temperature pedestal
- Diverter and separatrix geometry
- Plasma rotation
- Neutral collision
Diverted magnetic field
Edge turbulence in NSTX (_at_ 100,000 frames/s)
17XGC code
- XGC-0 self-consistently includes
- 5D ion neoclassical dynamics, realistic magnetic
geometry and wall shape - Conserving plasma collisions (Monte Carlo)
- 4D Monte Carlo neutral atoms with recycling
coefficient - Conserving MC collisions, ion orbit loss,
self-consistent Er - Neutral beam source, magnetic ripple, heat flux
from core. - XGC-1 includes
- Particle source from neutral ionization
- Full-f ions, electrons, and neutrals
- Gyrokinetic Poisson equation for neoclassical and
turbulent electric field - Full-f electron kinetics for neoclassical physics
- Adiabatic electrons for electrostatic turbulence
- General 2d field solver in a dynamically evolving
3D B field
18Neoclassical potential and flow of edge plasma
from XGC1
Electric potential
Parallel flow and particle positions
19XGC-MHD coupling plan
Phs-0 Simple coupling with M3D and NIMROD XGC-0 grows pedestal along neoclassical root MHD checks instability and crashes the pedestal
Phs-0 Simple coupling with M3D and NIMROD The same with XGC-1 and 2
Phs-1 Kinetic coupling MHD performs the crash XGC supplies closure information to MHD during crash
Phs-2 Advanced coupling XGC performs the crash M3D supplies the B crash information to XGC during the crash
Blue Developed Red To be developed
20XGC-M3D code couplingCode coupling framework
with Kepler
End-to-end system 160p, M3D runs on
64P Monitoring routines here
XGC on Cray XT3
40 Gb/s
Data replication
User monitoring
Data archiving
Data replication
Post-processing
Ubiquitous and transparent data access via
logistical networking
21Code Coupling Framework
XGC1
R2D0
M3DOMP
Bbcp first then portals with sockets.
lustre
M3DMPP
lustre
- Necessary steps for initial completion
- R2D0, M3DOMP becomes a service
- M3DMPP is launched from Kepler once M3DOMP
returns a failure condition. - XGC1 stops when M3DMPP is launched.
- Get incorporated into Kepler
22Kepler workflow framework
- Kepler developed by the SDM Center
- Kepler is an adaptation of the UC Berkeley tool,
Ptolemy - Can be composed of sub-workflows
- Uses event-based director and actors
methodology - Features in Kepler relevant to CPES
- Launching components (ssh, command line)
- Execution logger keep track of runs
- Data movement Sabul, Gridftp, Logistical
Networks (future), data streaming (future).
23Original View of CPES workflow(a typical
scenario)
KEPLER
Iterate On TS
Run Simulation
Move files In time step
Analyze Time step
Visualize Analyzed data
TS
TS
TS
Kepler Workflow Engine
Simulation Program (MPI)
SRM Data Mover
Analysis Program
CPES VIS tool
Software components
Disk Cache
Disk Cache
Hardware OS
HPSS ORNL
Seaborg NERSC
Disk cacke Ewok-ORNL
- Whats wrong with this picture?
24Whats wrong with this picture?
- Scientists running simulations will NOT use
Kepler to schedule jobs on super-computers - Concern about dependency on another system
- But need to track when files are generated so
Kepler can move them - Need a FileWatcher actor in kepler
- ORNL permit only One-Time-Password (OTP)
- Need a OTP login actor in Kepler
- Only SSH can be used to invoke jobs including
data copying - Cannot use GridFTP (requires GSI security support
at all sites) - Need an ssh-based DataMover actor in Kepler scp,
bbcp, - HPSS does not like a large number of small files
- Need an actor in Kepler to TAR files before
archiving
25New actors in CPES workflowto overcome problems
KEPLER
Start Two Independent processes
Detect when Files are Generated
Move files
Tar files
Login At ORNL (OTP)
Archive files
2
Kepler Workflow Engine
1
OTP Login actor
File Watcher actor
Scp File copier actor
Taring actor
Local archiving actor
Simulation Program (MPI)
Software components
Disk Cache
Disk Cache
Hardware OS
HPSS ORNL
Seaborg NERSC
Disk cacke Ewok-ORNL
26Future SDM work in CPES
- Workflow Automation of the coupling problem.
- Critical for for code debugging.
- Necessary to track provenance to replay
coupling experiments. - Q Do we stream data or write files?
- Dashboard for monitoring simulation.
- Fast SRM movement of data NERSClt--gt ORNL.
27Asynchronous petascale I/O for data in transit
- High-performance I/O
- Asynchronous
- Managed buffers
- Respect firewall constraints
- Enable dynamic control with flexible MxN
operations - Transform using shared-space framework (Seine)
User applications User applications User applications
Seine coupling framework interface Seine coupling framework interface Other program paradigms
Shared space management Load balancing Other program paradigms
Directory layer Storage layer Other program paradigms
Communication layer (buffer management) Communication layer (buffer management) Other program paradigms
Operating system Operating system Operating system
28Current Status Asynchronous I/O
- Currently working on XT3 development machine
(rizzo.ccs.ornl.gov). - Current implementation based on RDMA approach.
- Current benchmarks indicate 0.1 overhead writing
14TB/hour on jaguar.ccs.ornl.gov. - Looking at changes in ORNL infrastructure to deal
with these issues. - Roughly 10 of machine will be carved off for
real-time analysis. (100 Tflop for real-time
analysis with TBs/sec bandwidth).
29SDM/ORNL Dashboard Current Status
- Step 1
- Monitor ORNL and NERSC machines.
- Log in
- https//ewok-web.ccs.ornl.gov/dev/rbarreto/SDMP/We
bContent/SdmpApp/rosehome.php - Uses OTP.
- Working to pull out users jobs.
- Workflow will need to move data to ewok web disk.
- Jpeg, xml files (metadata).
30Dashboard- future
- Current and old simulations will be accessible on
webpage. - Schema from simulation will be determined by XML
file the simulation produces. - Pictures and simple metadata (min/max) are
displayed on the webpage. - Later we will allow users to control their
simulations.
31The End-to-End Framework
Applications
Metadata rich output from components.
Applied Math
VIZ/Dashboard
Workflow Automation
Data Monitoring
CCA
SRM
LN
Async. NXM streaming
32Plans
- Incorporate workflow automation into everyday
work. - Incorporate visualization services into the
workflow. - Incorporate asynchronous I/O (data streaming)
techniques. - Unify Schema in fusion SciDAC PIC codes.
- Further Develop workflow automation for code
coupling. - Will need dual-channel Kepler actors to
understand data streams. - Will need to get certificates to deal with OTP
with workflow systems. - Autonomics in workflow automation.
- Easy to use for non-developers!
- Dashboard.
- Simulation monitoring (via push method) available
end Q2 2004. - Simulation control!