Title: MPJ: The second generation MPI for Java
1MPJ The second generation MPI for Java
- Aamir Shafi
- 26th April, 2005
- Distributed Systems Group
- http//dsg.port.ac.uk
2People
- Aamir Shafi
- Bryan Carpenter
- Open Middleware Infrastructure Institute (OMII)
- Mark Baker
3Presentation outline
- Introduction
- Design and implementation of MPJ
- The runtime infrastructure
- Implementation issues
- Conclusion
4Introduction
- MPI was introduced in June 1994 as a standard
message passing API for parallel scientific
computing. - Language bindings for C, C, and Fortran
- Java Grande Message Passing Workgroup defined
Java bindings in 98 - Previous efforts follow two approaches
- JNI approach
- Pure Java approach
- Remote Method Invocation (RMI)
- Sockets
5Introduction Pure Java approach
- RMI
- Meant for client server applications
- Java Sockets
- Java New I/O package
- Adds non-blocking I/O to the Java language,
- Direct Buffers
- Allocated in the native OS memory and the JVM
attempts to provide faster I/O - Communication performance
- Comparison of Java NIO and C Netpipe drivers,
- Java performs similar to C on Fast Ethernet.
- A very naïve comparison
6- The latency is 250 microseconds
- After 1k, the latency starts increasing due to
fragmentation of packets - Netpipe is a single-threaded simple benchmark
7- Max throughput is 90 Mbps
- It will be great if MPJ with all its complexities
can reach 80 Mbps
8 Introduction JNI approach
- Importance of JNI cannot be ignored
- Where Java fails, JNI makes it work
- Advances in HPC communication hardware have
continued to grow - Network latency has been reduced to a couple of
microseconds - Pure Java looks like an impractical solution
- In the presence of myrinet, no application
developer/user would opt for Fast Ethernet - Cons
- Not in essence with Java philosophy of write
once, run anywhere
9Introduction
- For Java messaging
- There is no one size fits all approach
- Portability and high performance are often
contradictory requirements - Portability Pure Java
- High Performance JNI
- The choice between portability and high
performance should best be left to application
developers - The challenging issue is how to manage these
contradictory requirements - How to provide a flexible mechanism to help
applications swap communication protocols?
10Presentation outline
- Introduction
- Design and implementation
- The runtime infrastructure
- Implementation issues
- Conclusion
11Design
- Aims
- Support swapping various communication devices
- Two device levels
- The MPJ Device level (mpjdev)
- Separates native MPI device from all other
devices - native MPI device is a special case
- Possible to cut through and make use of native
implementation of advanced MPI features - The xdev Device level (xdev)
- gmdev xdev based on GM 2.x comms library
- niodev xdev based on Java NIO API
- smpdev xdev based on Threads API
12MPJ design
13Implementation
- Point to point communications
- Collective communications
- Groups, communicators, and contexts
- Derived datatypes
- Vector, Indexed, Contiguous, and Struct
- Explict packing and unpacking
- Process Topologies
- Cartesian
- Graph
- Possible to cut through to the native MPI
implementation - As of today, three methods (Dims_create, Cancel,
and Wtick are left unimplemented)
14Presentation outline
- Introduction
- Design and implementation
- The runtime infrastructure
- Implementation issues
- Conclusion
15The runtime infrastructure
- All MPI libraries face the task of bootstrapping
MPI processes over network computers - RSH/SSH based scripts are the most common
- LAM/MPI daemons and runtime system works on UNIX
based OS - No version of LAM for Windows
- MPICH has recently introduced SMPD (Super Multi
Purpose Daemon) - According to docs
- Works on linux and Windows
- Difficult (if not impossible) to interface with
Java
16Runtime MPJDaemon and MPJStarter modules
- Consists of two modules
- The daemon that runs on compute nodes (MPJDaemon)
- The starter module that runs on head nodes
(MPJStarter) - Installing MPJDaemon on compute nodes
- RSH/SSH based scripts can easily install daemon
on UNIX based OSes - Could be installed as services (/etc/init.d)
- Two files are required to install as a service on
Windows
17Runtime MPJDaemon on UNIX based OSes
- MPJ_HOME/bin/mpjdaemon is a rc shell that starts
and stops the daemon - Installation as an app
- cd MPJ_HOME/bin
- ./mpjdaemon start
- Could use RSH/SSH script to install on whole UNIX
cluster - Installation as a service
- cp MPJ_HOME/bin/mpjdaemon /etc/init.d
- Adding to the default runtime
- rc-update add mpjdaemon default (Gentoo Linux)
- /etc/init.d/mpjdaemon start/stop/status
18Runtime MPJDaemon on Windows
- cd MPJ_HOME/bin
- InstallMPJDaemon-NT.bat
- This bat file installs the daemon as a service
19Runtime MPJDaemon as services
- Apache Commons Daemon
- The source bundle does not even compile
- The project is no more active
- Spent a week trying to make it work on Windows
- Gave up!
- Java Service Wrapper
- Simple and does what it says
- Support for almost platforms available (where you
can run Java) - Distributed under MIT License
- Redistribute without any restricitons
20Runtime JMX MM
- Claims monitoring and management of Java apps
- Start Java app with following switch
- Dcom.sun.management.jmxremote
- Run jconsole
- Possible to connect to remote and local JVMs
- Useful if application is an Mbean
- Application attributes could be get/set remotely
- Possibility
- MPJDaemon could be operated remotely
21JMX MM Connection GUI
22JMX MM Connection summary
23JMX MM JVM memory
24JMX MM JVM threads
25JMX MM JVM info.
26Runtime Dynamic class loading(1)
- The application (parallel program) and MPJ
library is dynamically loaded into the daemon
JVM - No need to copy jar files
- No shared file system assumption
- MPJStarter starts the light-weight HTTP server
(Jetty), which serves the jar file containing
parallel program
27Runtime Dynamic class loading(2)
- For example, HiMPJ.java is a parallel program
- Requires mpj.jar to compile and run
- Bundle it into a jarfile specifying a manifest
file with CLASSPATH attribute pointing to mpj.jar - Write the manifest file,
- Manifest-Version 1.0
- Main-Class HiMPJ
- Class-Path mpj.jar
- jar cfm himpj.jar manifest HiMPJ.class
- Copy it to MPJ_HOME/lib directory
- Executing MPJStarter
- cd MPJ_HOME/bin
- starter.sh/bat 2 himpj.jar ../lib xdev
niodev - JarClassLoader will load himpj.jar and mpj.jar
into the daemons JVM
28Presentation outline
- Introduction
- Design and implementation
- The runtime infrastructure
- Implementation issues
- Conclusion
29Issue 1 Shared memory device
- Based on Java Threads API
- Each thread is an MPI process
- Communicates with other threads by sending
messages - All threads run in the same JVM
- Cannot have static variables in the parallel
program - Static variables within the MPJ library require
synchronized access
30Issue 2 Synchronization problems with threads in
smpdev
- Each MPJDaemon is assigned number of processes to
be executed - In case of smpdev, all processes run on the same
machine - MPJDaemon loads the parallel program
- JarClassLoader.loadClass(parallelProgramName)
- Once loaded, the program is started as follows
- JarClassLoader.invokeClass(pClass, args)
31Issue 2 Synchronization problems with threads in
smpdev
- For example, MPJStarter request MPJDaemons to
start 2 processes (threads) - MPJDaemon started two threads, which first load,
and then start the program - Processes (threads) are started in this way do
not share static variables and cannot synchronize - In order to share static variables and sync them,
the class should be loaded just once, and
exectued N times - It was implemented in this way because niodev
requires the exact opposite behaviour No
sharing of static variables - Currently, the user specifies which device should
be used - In case of niodev, the loading is done twice
- In case of smpdev, the loading is done only once
32Issue 3 cygwin
- If running MPJ on cygwin,
- chmod ow MPJ_HOME/logs
- chmod ax MPJ_HOME/lib/.dll
- Is MPJDaemon a windows service, or a linux
service on cygwin?
33(Future) Issue 4 Specifying multiple devices
- Currently, only one device can be specified
- Either niodev or smpdev will be selected as the
primary comms device - But for SMP clusters, it would be ideal
- To use smpdev on a SMP node
- Use niodev/gmdev for internode comms
34(Future) Issue 5 Starting MPJ with native MPI
device
- mpiJava/native MPI device uses mpirun to
bootstrap MPI processes - To bring it in line with other devices, native
MPI device will have to be started by MPJ runtime
infrastructure
35Issue 6 Multiple users running MPJDaemons at the
same time
- Install daemons as an app,
- Agree on the port numbers.
36Presentation outline
- Introduction
- Design and Implementation
- The runtime infrastructure
- Implementation Issues
- Conclusion
37Summary
- The key issue for Java messaging is not debating
pure Java or JNI approach - But, providing a flexible mechanism to swap
various comm protocols - MPJ has a pluggable architecture
- We are implementing niodev, gmdev, smpdev,
and native MPI device - MPJ runtime infrastructure allows bootstrapping
MPI process across various platforms - MPJDaemons can be installed as native OS service
38Conclusions
- We are slowly but surely moving towards the first
release of MPJ, the next generation of MPI for
Java - Current Status
- Unit Testing
- MPJ follows the same API as mpiJava
- The parallel applications built on top of mpiJava
will work with MPJ - There are some differences in the API
- Bsend, and explicit packing/unpacking -- see
release docs for more details - Arguably, the first MPI library for Java that
implements real messaging stuff in pure Java
39Questions
?