Parallel Computing on Wide-Area Clusters: the Albatross Project - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel Computing on Wide-Area Clusters: the Albatross Project

Description:

MagPIe: wide-area collective communication. Collective communication among ... MagPIe: MPI's collective operations optimized for hierarchical wide-area systems ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 18
Provided by: csVu
Category:

less

Transcript and Presenter's Notes

Title: Parallel Computing on Wide-Area Clusters: the Albatross Project


1
Parallel Computing on Wide-Area Clusters the
Albatross Project
Henri Bal
Vrije Universiteit Amsterdam Faculty of Sciences
  • Aske Plaat
  • Thilo Kielmann
  • Jason Maassen
  • Rob van Nieuwpoort
  • Ronald Veldema

2
Introduction
  • Cluster computing becomes popular
  • Excellent price/performance ratio
  • Fast commodity networks
  • Next step wide-area cluster computing
  • Use multiple clusters for single application
  • Form of metacomputing
  • Challenges
  • Software infrastructure (e.g., Legion, Globus)
  • Parallel applications that can tolerate
    WAN-latencies

3
Albatross project
  • Study applications and programming environments
    for wide-area parallel systems
  • Basic assumption wide-area system is
    hierarchical
  • Connect clusters, not individual workstations
  • General approach
  • Optimize applications to exploit hierarchical
    structure ? most communication is local

4
Outline
  • Experimental system and programming environments
  • Application-level optimizations
  • Performance analysis
  • Wide-area optimized programming environments

5
Distributed ASCI Supercomputer (DAS)
VU (128)
UvA (24)
Node configuration 200 MHz Pentium Pro 64-128 MB
memory 2.5 GB local disks Myrinet LAN Fast
Ethernet LAN Redhat Linux 2.0.36
6 Mb/s ATM
Leiden (24)
Delft (24)
6
Programming environments
  • Existing library/language expose hierarchical
    structure
  • Number of clusters
  • Mapping of CPUs to clusters
  • Panda library
  • Point-to-point communication
  • Group communication
  • Multithreading

Java
Orca
MPI
Panda
LFC
TCP/IP
ATM
Myrinet
7
Example Java
  • Remote Method Invocation (RMI)
  • Simple, transparent, object-oriented, RPC-like
    communication primitive
  • Problem RMI performance
  • JDK RMI on Myrinet is factor 40 slower than
    C-RPC(1228 vs. 30 µsec)
  • Manta high-performance Java system PPoPP99
  • Native (static) compilation source ? executable
  • Fast RMI protocol between Manta nodes
  • JDK-style protocol to interoperate with JVMs

8
JDK versus Manta
200 MHz Pentium Pro, Myrinet, JDK 1.1.4
interpreter,1 object as parameter
9
Manta on wide-area DAS
  • 2 orders of magnitude between intra-cluster (LAN)
    and inter-cluster (WAN) communication performance
  • Application-level optimizations JavaGrande99
  • Minimize WAN-overhead

10
Example SOR
  • Red/black Successive Overrelaxation
  • Neighbor communication, using RMI
  • Problem nodes at cluster-boundaries
  • Overlap wide-area communication with computation
  • RMI is synchronous ? use multithreading

5600 µsec
µs
50
CPU 3
CPU 2
CPU 1
CPU 6
CPU 5
CPU 4
Cluster 1
Cluster 2
11
Wide-area optimizations
12
Performance Java applications
  • Wide-area DAS system 4 clusters of 10 CPUs
  • Sensitivity to wide-area latency and bandwidth
  • See HPCA99

13
Discussion
  • Optimized applications obtain good speedups
  • Reduce wide-area communication, or hide its
    latency
  • Java RMI is easy to use, but some optimizations
    are awkward to express
  • Lack of asynchronous communication and broadcast
  • RMI model does not help exploiting hierarchical
    structure of wide-area systems
  • Need wide-area optimized programming environment

14
MagPIe wide-area collective communication
  • Collective communication among many processors
  • e.g., multicast, all-to-all, scatter, gather,
    reduction
  • MagPIe MPIs collective operations optimized for
    hierarchical wide-area systems PPoPP99
  • Transparent to application programmer

15
Spanning-tree broadcast
Cluster 1
Cluster 2
Cluster 3
Cluster 4
  • MPICH (WAN-unaware)
  • Wide-area latency is chained
  • Data is sent multiple times over same WAN-link
  • MapPIe (WAN-optimized)
  • Each sender-receiver path contains at most 1
    WAN-link
  • No data item travels multiple times to same
    cluster

16
MagPIe results
  • MagPIe collective operations are wide-area
    optimal, except non-associative reduction
  • Operations up to 10 times faster than MPICH
  • Factor 2-3 speedup improvement over MPICH for
    some (unmodified) MPI applications

17
Conclusions
  • Wide-area parallel programming is feasible for
    many applications
  • Exploit hierarchical structure of wide-area
    systems to minimize WAN overhead
  • Programming systems should take hierarchical
    structure of wide-area systems into account
Write a Comment
User Comments (0)
About PowerShow.com