Title: Bibliotecas de Comunicacin Eficiente en Clusters para cdigos Java
1Towards High Performance Cluster Communication
in Java The Java Fast Sockets Approach
Guillermo L. Taboada
taboada_at_udc.es
ACET Seminars, Autumn 2006, University of Reading
2Outline
- Introduction
- Designing a High Performance Java Socket Solution
- Implementing Efficient Java Sockets on Clusters
- Performance Evaluation
- Conclusions
3Introduction
Introduction Design
Implementation Evaluation
Conclusions
- ? interest on clusters (? computation ? cost)
- Growing solution
- Java (and HPC Java) on clusters
- Challenge scalable peformance clusterJava
- Network performance is scalable
- Java middleware less efficient than native
code - ? Java is not going to scale performance
- High Performance Networks not supported or
supported with poor performance - Ways of support
- IP Emulations
- High Performance Sockets
Interconnection Network (SCI,GbE,Myrinet,IB)
4Introduction
Introduction Design
Implementation Evaluation
Conclusions
- Interconnection Networks
- Play (with its associated software libraries) a
key role in High Performance Clustering
Technology - Diferent technologies
- Gb 10Gb Ethernet
- Myrinet, Myrinet 2k, Myri-10G (10GbMyrinet
10GbE) - Scalable Coherent Interface (SCI)
- Infiniband
- Qsnet, Giganet, Quadrics, GSN - HIPPI
- Small hw latencies (1.3-30us)
- High bandwidths ( gt 1Gbps)
5Introduction
Introduction Design
Implementation Evaluation
Conclusions
- SCI (Scalable Coherent Interface)
- IEEE standar 1596-1992
- Implementation of PCI(-X)-NIC
- High Performance
- Latency 1.42 us (theoretical)
- Bandwidth 5333 Mbps (bi-directional)
- Pt2Pt topologies 1D (ring) / 2D (torus 2D) / 3D
- Usually without switch (small clusters)
6Introduction
Introduction Design
Implementation Evaluation
Conclusions
- SCI cluster example (2D torus 4x4)
7Introduction
Introduction Design
Implementation Evaluation
Conclusions
- SCI IP (IP emulation) (ScaIP)
- SISCI (Sw Infrastructure for SCI)
- SMI (Shared Memory Interface)
- SCI-MPICH(ScaMPI) MPI implementations
- SCI-SOCKET High Performance Socket
Implementation
8Introduction
Introduction Design
Implementation Evaluation
Conclusions
- Myrinet
- Most popular technology for high-range clusters
- Delivers High Performance
- Latency 1.3 us (theoretical)
- Bandwidth 512, 1280, 2000 10k Mbps
- Highly Scalable (large efficient switching tech.)
- Lots of developments
- Communication libraries
- Low level GM, MX
- Message-Passing MPICH-GM, MPICH-MX
- Sockets Sockets-GM, Sockets-MX
9Introduction
Introduction Design
Implementation Evaluation
Conclusions
http//www.myri.com/myrinet/performance/Sockets-MX
/socketsmx-concept.png
10Introduction
Introduction Design
Implementation Evaluation
Conclusions
- Java Comm on Clusters
- Javas portability means in networking that only
the widely extended TCP/IP is supported by the
JDK - Use of IP emulations but performance issues
- SCIP, ScaIP, IPoGM, IPoMX, IPoIB
- Emerging High Performance Socket Implementations
for Cluster Interconnects - SCI-SOCKET
- Sockets-MX, Sockets-GM (Myrinet)
- Socket Direct Protocol Infiniband
- Sockets over VIA (SOVIA)
11Introduction
Introduction Design
Implementation Evaluation
Conclusions
- Java Communications on High Performance Cluster
Interconnects - Myrinet
- KaRMI/GM (JavaParty, Univ. Karlsruhe)
- Manta/LFC/Panda/Ibis (Univ. Vrije Holland)
- RMIX myrinet
- mpiJava over MPICH-GM/MPICH-MX
- SCI
- still waiting
- My research motivation is filling the efficiency
gap between Java and high-speed interconnects. - Getting the most of the capabilities of the
interconnects in Java. This could be done
supporting High Performance Sockets libraries in
Java.
12Introduction
Introduction Design
Implementation Evaluation
Conclusions
- Previous work
- Non-blocking communication support
- Java NIO (New I/O)
- Improves scalability, basic in client/server
applications - Message-Passing Java
- mpiJava, wrapper to native MPI implementation
that supports non-blocking comms. - MPJ Express, Java message-passing system with NIO
device - MPJ/Ibis, Java message-passing system with
non-blocking support through multi-threading - High Performance network support
- Almost centered on Myrinet
- Solutions based on protocols designed ad hoc,
poorly maintained and with numerous layers - Numerous libraries ?
- ? communication overhead
13Introduction
JAVA FAST SOCKETS
14Introduction
Introduction Design
Implementation Evaluation
Conclusions
- Solution, Java Fast Sockets (JFS)
- 1st High Performance Java Sockets implementation
- High Performance Network libraries support
- Through native libraries on SCI, MX native
Sockets - Implements an API widely spread (Java Sockets)
with ? performance compared to RMI - Avoids the use of IP emulations (less efficient
protocol for error-prone environments, with
several layers) - Numerous libraries ? ? communication overhead
15Implementing Efficient Java Communication
Libraries on Clusters
Introduction Design
Implementation Evaluation
Conclusions
- Java Fast Sockets (JFS) implements Java Sockets
API in a way - Efficient portable through
- general pure Java solution
- Specific solutions that access native
communication libraries (SCI Sockets) - The fail-over approach applied to the selection
of libraries the system tryes to use highly
efficient native communication libraries. If this
is not possible, uses the pure Java general
solution - User transparency
- Setting JFSFactory as the default Sockets Factory
in a small launcher application with
Socket.setSocketImplFactory(). - This application will invoke using reflection the
main method. All Sockets communications wil use
JFS for then on.
16Implementing Efficient Java Communication
Libraries on Clusters
Introduction Design
Implementation Evaluation
Conclusions
- Sun Java Sockets implementation (Suns JRE)
- Only supports the TCP/IP stack communication
library - Performs unnecessary copies
- Do not implement communication optimization
methods (setPreferences() method) related to - Latency reduction
- Maximizing bandwidth
- The use of Java NIO Sockets is more complex
- Use of Socket Channels, Selectors, Buffers, etc
- Establishment of connections
- Re-design communications of existing Socket-based
applications
17Implementing Efficient Java Communication
Libraries on Clusters
Introduction Design
Implementation Evaluation
Conclusions
ltltusesgtgt
OBJECTOUTPUTSTREAM
OUTPUTSTREAM
Avoids extra copies Implements new IO
functionalities write of arrays of primitive
types, NIO native copies
SOCKETOUTPUTSTREAM
JFSOUTPUTSTREAM
BYTEARRAYOUTPUTSTREAM
Writes to a native socket. Performs serveral
copies JNIGetArrayRegion SOL Avoid extra
copies use JNIGetArrayCritical
Buffers data for sending Positive if sending long
messages, or if small messages have a big cost
18Implementing Efficient Java Communication
Libraries on Clusters
Introduction Design
Implementation Evaluation
Conclusions
- Default scenario in Suns Java Sockets
communication
JAVA VIRTUAL MACHINE
JAVA VIRTUAL MACHINE
HEAP / GARBAGE COLLECTABLE AREA
HEAP / GARBAGE COLLECTABLE AREA
byte buf
byte data
Data to send
Data to receive
byte data
byte buf
char JVM_buffer
char JVM_buffer
NATIVE SOCKETS IMPLEMENTATION
NATIVE SOCKETS IMPLEMENTATION
NET
char driver_buffer
char driver_buffer
LEGEND
DESERIALIZATION
COPY
19Implementing Efficient Java Communication
Libraries on Clusters
Introduction Design
Implementation Evaluation
Conclusions
- JFS communication using Java NIO direct
ByteBuffer
JAVA VIRTUAL MACHINE
JAVA VIRTUAL MACHINE
HEAP / GARBAGE COLLECTABLE AREA
HEAP / GARBAGE COLLECTABLE AREA
byte buf
byte data
Data to send
Data to receive
byte data
byte buf
char JVM_buffer
char JVM_buffer
direct ByteBuffer
direct ByteBuffer
NATIVE SOCKETS IMPLEMENTATION
NATIVE SOCKETS IMPLEMENTATION
NET
char driver_buffer
char driver_buffer
LEGEND
DESERIALIZATION
COPY
20Implementing Efficient Java Communication
Libraries on Clusters
Introduction Design
Implementation Evaluation
Conclusions
- JFS communication optimized (zero-copy)
JAVA VIRTUAL MACHINE
JAVA VIRTUAL MACHINE
HEAP / GARBAGE COLLECTABLE AREA
HEAP / GARBAGE COLLECTABLE AREA
Data to send
Data to receive
direct ByteBuffer
direct ByteBuffer
NATIVE SOCKETS IMPLEMENTATION
NATIVE SOCKETS IMPLEMENTATION
NET
char driver_buffer
char driver_buffer
LEGEND
DESERIALIZATION
COPY
21Implementing Efficient Java Communication
Libraries on Clusters
Introduction Design
Implementation Evaluation
Conclusions
- SCI issues
- Only IPv4 supported, and Java defaults to IPv6
- Myrinet issues
- Some calls to Sockets-MX are faulty
- -Ethernet issues
- Different protocol boundaries. The general
solution and only optimized in reducing number of
copies
22Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
- Experimental configuration
- PIV Xeon at 2.8 GHz 2GB mem (hyperthreading
disabled) - SCI (Dolphin), GbE (Marvell 88E8050), Myrinet
2000 - Java Sun JVM 1.5.0_05
- gcc 3.4.4
- Libraries
- mpiJava 1.2.5 over MPICH 1.2.5
- SCI SOCKET 3.0.3
- DIS 3.0.3 (IRM/SISCI/SCILib/Mbox)
- Linux CentOS 4 kernel 2.6.9
23Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
24Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
25Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
26Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
27Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
28Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
29Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
30Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
31Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
32Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
33Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
34Performance Evaluation
Introduction Design
Implementation Evaluation
Conclusions
35Conclusions
Introduction Design
Implementation Evaluation
Conclusions
- Java Fast Sockets (JFS), a High Performance Java
Sockets implementation supports System Area
Networks through native code, delivering high
performance to pure Java libraries and
applications - Java Sockets communications on clusters can
significantly improve and increase their
performance thanks to the use of this library. - Latency up to 84 latency reduction
- Throughput up to 120 increase
36Towards High Performance Cluster Communication
in Java The Java Fast Sockets Approach
Guillermo L. Taboada
taboada_at_udc.es
ACET Seminars, Autumn 2006, University of Reading