Title: A Performance Comparison of DSM, PVM, and MPI
1A Performance Comparison of DSM, PVM, and MPI
- Paul Werstein
- Mark Pethick
- Zhiyi Huang
2Introduction
Relatively little is known about the performance
of Distributed Shared Memory systems compared to
Message Passing systems. We compare the
performance of the TreadMarks DSM system with two
popular message passing systems, MPICH-MPI, and
PVM.
3Introduction
Three applications are compared, Mergesort,
Mandelbrot Set Generation, and Backpropergation
Neural Network. Each application represents a
different class of problem.
4TreadMarks DSM
- Provides locks and barriers as primitives.
- Uses Lazy Release Consistency.
- Granularity of sharing is a page.
- Creates page differentials to avoid the false
sharing effect. - Version 1.0.3.3
5Parallel Virtual Machine
- Provides concept of a virtual parallel machine.
- Exists as a daemon on each node.
- Inter-process communication is mediated by the
daemons. - Design for flexibility.
- Version 3.4.3.
6MPICH - MPI
- Standard interface for developing Message Passing
Applications. - Primary design goal is performance.
- Primarily defines communications primitives.
- MPICH is a reference platform for the MPI
standard. - Version 1.2.4
7System
- 32 Node Linux Cluster
- 800mhz Pentium with 256 MB
- Redhat 7.2
- 100mbit Ethernet
- Results determined for 1, 2, 4, 8, 16, 24, and 32
processes.
8Mergesort
- Parallelisation Strategy used is Divide and
Conqueror. - Synchronisation between pairs of nodes.
- Loosely Synchronous class problem.
- Coarse grained synchronisation
- Irregular synchronisation points.
- Alternate phases of computation and
communication.
9Mergesort Results (1)
10Mergesort Results (2)
11Mandelbrot Set
- Strategy used is Data Partitioning.
- Work Pool is used as computation time of sections
differs. - Work Pool size gt 2 num processes.
- Embarrassingly Parallel class problem.
- May involve complex computation, but there is
very little communication. - Give indication of performance Under ideal
conditions.
12Mandelbrot Set Results
13Neural Network (1)
- Strategy is Data Partitioning.
- Each processor trains the network on a subsection
of the data set. - Changes are summed and applied at the end of each
epoch. - Requires large data sets to be effective.
.
14Neural Network (2)
- Synchronous class problem.
- Characterised by algorithm that carries out the
same operation on all points in the data set. - Synchronisation occurs at regular points.
- Often applies to problems that use data
partitioning. - A large number of problems appear to belong to
the synchronous class.
15Neural Network Results (1)
16Neural Network Results (2)
17Neural Network Results (3)
18Conclusion
- In general the performance of DSM is poorer than
that of MPICH or PVM. - Main reasons identified are
- The increased use of memory associated with the
creation of page differentials. - False sharing affect due to the granularity of
sharing. - Differential accumulation in the gather
operation.