Adaptive Gridenabled SIMOX Simulation on JapanUS Grid Testbed - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Adaptive Gridenabled SIMOX Simulation on JapanUS Grid Testbed

Description:

Adaptive Gridenabled SIMOX Simulation on JapanUS Grid Testbed – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 19
Provided by: yos447
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Gridenabled SIMOX Simulation on JapanUS Grid Testbed


1
Adaptive Grid-enabled SIMOX Simulation on
Japan-US Grid Testbed
  • Yoshio Tanaka, Hiroshi Takemiya, Satoshi
    SekiguchiAIST, Japan
  • Shuji OgataNagoya Institute of Technology,
    Japan
  • Rajiv K. Kalia, Aiichiro Nakano, Priya
    VashishtaUniversity of Southern California

2
Hybrid QM/MD Simulation
  • Enabling large scale simulation with quantum
    accuracy
  • Combining classical MD Simulation with QM
    simulation
  • MD simulation
  • Simulating the behavior of atoms in the entire
    region
  • Based on the classical MD using an empirical
    inter-atomic potential
  • QM simulation
  • Modifying energy calculated by MD simulation only
    in the interesting regions
  • Based on the density functional theory (DFT)

3
QM/MD simulation over the Pacific at SC2004
P32 (512 CPU)
TCS (512 CPU) _at_ PSC
Total number of CPUs 1792
P32 (512 CPU)
Ninf-G
MD Client
F32 (256 CPU)
corrosion of Sillicon under stress
Close-up view
4
Lessons Learned and Next Steps
  • Practically difficult to occupy a large-scale
    single system for few weeks.
  • How can we long-run the simulation?
  • Faults (e.g. HDD crush, network down) cannot be
    avoided.
  • We dont prefer manual restart. The simulation
    should be capable of automatic recovery from
    faults.
  • How can the simulation recover from faults?
  • Our latest adaptive QM/MD simulation allows the
    problem size of embedded QM simulations to change
    automatically during the simulation.
  • This will require the number of processors /
    clusters change dynamically.

5
Objectives
  • Develop flexible, robust, and efficient
    Grid-enabled simulation.
  • Flexible -- allow dynamic resource
    allocation/migration,
  • robust -- detect errors and recover from faults
    automatically for long runs, and
  • efficient -- manage thousands of CPUs.
  • Verify our strategy through large-scale
    experiments.
  • Implemented Grid-enabled SIMOX (Separation by
    Implanted Oxygen) simulation
  • Run the simulation on Japan-US Grid testbed for
    few weeks.

6
Implementation using Ninf-G
  • What is Ninf-G?
  • A reference implementation of the GridRPC API
    (GGF proposed recommendation)
  • Ninf-G includes
  • C/C, Java APIs, libraries for software
    development
  • IDL compiler for stub generation
  • Shell scripts to
  • compile client program
  • build and publish remote libraries
  • sample programs and manual documents
  • Ninf-G is developed using Globus C and Java APIs
  • Two major versions
  • Version 4 (Ninf-G4)
  • Works with GT4 WS GRAM as well as Pre-WS GRAM
  • Has an interface for working with other Grid
    middleware (e.g. Unicore)
  • The latest version is 4.1.0 (in NMI R9)
  • Version 2 (Ninf-G2)
  • Works with GT2 and pre-WS GRAM in GT3, GT4
  • The latest version is 2.4.3
  • Included in NMI Release 8 (the first non-U.S.
    Software)

7
Architecture of Ninf-G
Server side
Client side
IDL file
Numerical Library
grpc_call()
Client
IDL Compiler
grpc_function_handle_init()
Generate
Globus-IO
Interface Request/Reply
Remote Library Executable
GRAM
jobmanager pbs/sge/lsf
GRIS/GIIS
Interface Information LDIF File
retrieve
8
Algorithm and Implementation
  • Algorithm
  • Implementation

initial set-up
Calculate MD forces of QMMD regions
Data of QM atoms
Calculate QM force of the QM region
Calculate QM force of the QM region
Calculate QM force of the QM region
Calculate MD forces of QM region
MD part
QM part
QM forces
Update atomic positions and velocities
9
SIMOX (Separation by Implanted Oxygen)
  • A technique to fabricate a micro structure
    consisting of Si surface on the thin SiO2
    insulator
  • Allows to create higher speed with lower power
    consumption device
  • This technology has advantages for portable
    products, such as laptops, hand-held devices, and
    other applications that depend on battery power.
  • Further advancement of the SIMOX technology to
    fabricate ultra-fine scale SOI structures in
    future, requires to understand the effects of the
    initial velocity and incident position of the
    implanted oxygen on the oxidation processes.

10
SIMOX simulation on the Grid
  • Simulate SIMOX by implanting five oxygen atoms
    with their initial velocities much smaller than
    the usual values.
  • The incident positions of the oxygen atoms
    relative to the surface crystalline structure of
    Si differ.
  • 5 QM regions are initially defined
  • Size and No. of QM regions are changed during the
    simulation
  • 0.11million atoms in total
  • Results of the experiments will demonstrate the
    sensitivity of the process on the incident
    position of the oxygen atom when its implantation
    velocity is small.

11
Testbed for the experiment
  • AIST Super Clusters
  • P32 (2144 CPUs), M64 (528 CPUs), F32 (536 CPUs)
  • TeraGrid Clusters
  • PSC clusters (3000 CPUs), NCSA clusters(1774
    CPUs)
  • USC Clusters
  • USC (7280 CPUs)
  • Japan Clusters
  • U-Tokyo (386 CPUs), TITECH (512 CPUs)

12
Result of the experiment
  • Experiment Time 18. 97 days
  • Simulation steps 270 ( 54 fs)
  • Longest continuous simulation 4.76 day

13
Flexibility
  • Expanding/Dividing regions of QM simulation at
    every 5 time steps
  • number of QM atoms gradually increased from 62 to
    341
  • number of migrations of QM simulations was 244
  • number of CPUs used for QM simulation was
    increased from 10 to 708

14
Robustness
  • Many kinds of errors
  • Queue was not activated
  • Failed to start MPI programs
  • Exceeding a quota limit
  • Our application succeeded in detecting errors and
    continuing simulation using other clusters

Intentional migration
unintentional migration
reservation finished
15
Efficiency
  • Communication time between QM and MD is
    negligible
  • Computation time of QM 1 hour
  • Communication time between QM and MD lt 1 min
  • Execution efficiency was limited to about 60.
  • Main causes
  • Load imbalance among QM simulations
  • Multiple assignment of QM regions on a single
    cluster
  • Cost of fault detection and recovery
  • Not easy to find appropriate timeout value and
    number of retries

16
Summary
  • We could verify that our strategy for long run is
    a practical approach
  • Continue the simulation by migrating from current
    cluster to the other one either by intentionally
    or unintentionally.
  • We could verify the programming using GridRPC and
    MPI could implement real Grid-enabled application
  • Dynamic resource allocation / migration
  • Recover from faults
  • Manage hundreds of CPUs on distributed sites

17
Summary (contd)
  • Problem was heterogeneity
  • NOT hardware, OS
  • heterogeneity exists in more details of the
    system configuration
  • AGW in PSC
  • Strict firewall in USC
  • max wall clock time for batch jobs
  • disk quota limit
  • Ninf-G could adapt to some of these issues, but
    could not to the others
  • We need to ask special (manual) operation for our
    experiments. But we encountered problems.
  • gave us a special (dedicated) queue
  • need help for unexpected errors (jobs were not
    activated)
  • more easy operation for cross-site reservation is
    expected

18
Acknowledgements
  • Resource Providers
  • TeraGrid
  • Esp helpdesk admins in PSC and NCSA
  • USC
  • TITECH and U. Tokyo
  • This work at AIST was partially supported by JST
    (Japan Science and Technology Foundation)
  • This work at USC was partially supported by
    AFOSR-DURINT, ARL-MURI, DOE, and NSF.
Write a Comment
User Comments (0)
About PowerShow.com