GridMPI - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

GridMPI

Description:

IMPI. Interoperable MPI specification. Grid ADI. Abstraction of ... MPI-1 and IMPI. Prototype of new TCP/IP implementation. Prototype of a LACT implementation ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 18
Provided by: nes6
Category:
Tags: gridmpi | impi

less

Transcript and Presenter's Notes

Title: GridMPI


1
GridMPI
  • Yutaka Ishikawa
  • University of Tokyo
  • and
  • Grid Technology Research Center at AIST

2
Background
  • SCore Cluster System Software
  • Real World Computing Partnership (1992 2001)
  • Funded by Ministry of Economy,Trade and Industry,
    METI.

High Performance Communication Libraries
PMv2 11.0 usec Round Trip time 240 MB/s
Bandwidth MPICH-SCore MPI Library 24.4 usec
Round Trip time 228 MB/s Bandwidth PM/Ethernet
Network Trunking Utilizing more than one
NIC Global Operating System SCore-D Single/Multi
User Environment Gang scheduling Checkpoint and
restart Parallel Programming Language MPC
Multi-Thread Template Library Shared Memory
Programming Support Omni OpenMP on SCASH
3
RWC SCore III
  • Host
  • NEC Express Servers
  • Dual Pentium III 933 MHz
  • 512 Mbytes of Main Memory
  • of Hosts
  • 512 Hosts (1,024 Processors)
  • Networks
  • Myrinet-2000 (2 Gbps 2 Gbps)
  • 2 Ethernet Links
  • Linpack Result
  • 618.3 Gflops

This is the world fastest PC cluster at August of
2001
4
TOP 500 as of December 2002
  • HELICS, rank 64th (825.0 Gflops),
  • 512 Athlons 1.4GHz, Myrinet-2000
  • Heiderberg University IWR, http//helics.iwr.un
    i-heidelberg.de/
  • Presto III, rank 68th (760.2 Gflops),
  • 512 Athlons 1.6GHz, Myrinet-2000
  • GSIC, Tokyo Institute of Technology,
    http//www.gsic.titech.ac.jp/
  • Magi Cluster, rank 86th (654.0 Gflops),
  • 1040 Pentium III 933MHz, Myrinet-2000
  • CBRC-TACC/AIST, http//www.cbrc.jp/magi/
  • RWC SCore III, 90th (618.3 Gflops),
  • 1024 Pentium III 933MHz, Myrinet-2000
  • RWCP, http//www.rwcp.or.jp

5
SCore Users
  • Japan
  • Universities
  • University of Tokyo, Tokyo institute of
    technologies, Tsukuba university,
  • Industries
  • Japanese Car Manufacturing Companies use the
    production line
  • UK
  • Oxford University
  • Warwick University
  • Germany
  • University of Bonn
  • University of Heidelberg
  • University of Tuebingen

Streamline Computing Ltd SCore integration
business
6
PC Cluster Consortium
http//www.pccluster.org
  • Purpose
  • Contribution to the PC cluster market, through
    the development, maintenance, and promotion of
    cluster system software based on the SCore
    cluster system software and the Omni OpenMP
    compiler, developed at RWCP.
  • Members
  • Japanese companies
  • NEC, Fujistu, Hitatchi, HP Japan, IBM Japan,
    Intel Japan, AMD Japan,
  • Research Institutes
  • Tokyo Institute of Technology GSIC, Riken
  • Individuals

7
Lesson Learned
  • New MPI implementation is needed
  • It is tough to change/modify existing MPI
    implementations
  • New MPI implementation
  • Open implementation in addition to open source
  • Customizable to implement a new protocol
  • New transport implementation is needed
  • The PM library is not on top of IP protocol
  • Not acceptable in the Grid environment
  • The current TCP/IP implementation (BSD and Linux)
    does not perform well in a large latency
    environment
  • Mismatch between socket API and MPI communication
    model
  • TCP/IP protocol is not an issue, but its
    implementation is the issue

8
Lesson Learned
  • Mismatch between socket API and MPI communication
    model

MPI_Irecv(buf, MPI_ANY_SOURCDE, MPI_ANY_TAG,
) MPI_Irecv(buf, 1, 2, ) MPI_Irecv(buf, 1,
MPI_ANY_TAG, )
9
GridMPI
  • Latency-aware MPI implementation
  • Development of applications in a small cluster
    located at a lab.
  • Production run in the Grid environment

Internet
Application development
Data Resource
10
Is It Feasible ?
  • Is it feasible to run non-EP (Embarrassingly
    Parallel) applications on Grid-connected
    clusters?
  • NO for long-distance networks
  • YES for metropolitan- or campus-area networks
  • Example Greater Tokyo area
  • Diameter 100km-300km (or 60miles-200miles)
  • Latency 1-2ms one-way
  • Bandwidth 1-10G bps, or more

11
Experimental Environment
Node
Node
Cluster 1
Cluster 2
Node
Node
16 nodes
16 nodes
Delay 0.5ms, 1.0ms, 1.5ms, 2.0ms, 10ms
1Gbps Ethernet
1Gbps Ethernet
Node
Nodes
Router PC (NIST Net)
Router PC
Cluster Nodes
12
NAS Parallel Benchmark Results
CG (Class B)
LU (Class B)
MG (Class B)
Scalability Relative to 16 node MPICH-SCore with
no delay case
  • Speed up 1.2 to twice
  • Memory usage twice

13
Approach
  • Latency-aware Communication Facility
  • New TCP/IP Implementation
  • New socket API
  • Additional feature for MPI
  • New communication protocol in the MPI
    implementation level
  • Message routing
  • Dynamic collective communication path

14
GridMPI Software Architecture
  • MPI Core Grid ADI
  • Providing MPI features Communicator, Group, and
    Topology
  • Providing MPI communication facilities
    implemented using Grid ADI
  • RPIM (Remote Process Invocation Mechanism)
  • Abstraction of remote process invocation
    mechanisms
  • IMPI
  • Interoperable MPI specification
  • Grid ADI
  • Abstraction of communication facilities
  • LACT (Latency-Aware Communication Topology)
  • Transparency of latency and network topology

15
LACT (Latency-Aware Communication Topology)
  • Concerning network bandwidth and latency
  • Message routing using point-to-point
    communication
  • Independent of IP routing
  • Collecting data in collective communication
  • Communication pattern for network topology

16
LACT (Latency-Aware Communication Topology)
An ExampleReduction
  • Concerning network bandwidth and latency
  • Message routing using point-to-point
    communication
  • Independent of IP routing
  • Collecting data in collective communication
  • Communication pattern for network topology

Bandwidth 10 Gbps Latency 1ms
Cluster B
Cluster A
Bandwidth 1 Gbps Latency 0.5ms
Bandwidth 1 Gbps Latency 0.5ms
Cluster C
Cluster D
Bandwidth 100 Mbps Latency 2ms
17
Schedule
  • Current
  • The first GridMPI implementation
  • A part of MPI-1 and IMPI
  • NAS parallel benchmarks run
  • FY 2003
  • GridMPI version 0.1
  • MPI-1 and IMPI
  • Prototype of new TCP/IP implementation
  • Prototype of a LACT implementation
  • FY 2004
  • GridMPI version 0.5
  • MPI-2
  • New TCP/IP implementation
  • LACT implementation
  • OGSA interface
  • Vendor MPI
  • FY 2005
  • GridMPI version 1.0
Write a Comment
User Comments (0)
About PowerShow.com