mpiP Evaluation Report - PowerPoint PPT Presentation

1 / 15

About This Presentation

Title:

mpiP Evaluation Report

Description:

mpiP is a simple lightweight tool for profiling. Gathers information through the MPI ... runtime environment variable for deactivating profiling is given in ... – PowerPoint PPT presentation

Number of Views:34

Avg rating:3.0/5.0

Slides: 16

Provided by: dral60

Category:

more less

Transcript and Presenter's Notes

Title: mpiP Evaluation Report

1
mpiP Evaluation Report

Hans Sherburne,
Adam Leko
UPC Group
HCS Research Laboratory
University of Florida

2
Basic Information

Name
mpiP
Developer
Jeffrey Vetter (ORNL), Chris Chambreau (LLNL)
Current versions
mpiP v2.8
Website
http//www.llnl.gov/CASC/mpip/
Contacts
Jeffrey Vetter vetterjs_at_ornl.gov
Chris Chambreau chcham_at_llnl.gov

3
mpiP Lightweight, Scalable MPI Profiling

mpiP is a simple lightweight tool for profiling
Gathers information through the MPI profiling
layer
Probably not good candidate to be extended for
UPC or SHMEM
Supports many platforms running Linux, Tru64,
AIX, UNICOS, IBM BG/L
Very simple to use, and output file is very easy
to understand
Provides statistics for the top twenty MPI calls
based on time spent in call, and total size of
messages sent, also provides statistics for MPI
I/O
Callsite traceback depth is variable to allow
user to differentiate between and examine the
behavior of routines that are wrappers for MPI
calls
A mpip viewer, Mpipview, is available as part of
Tool Gear
Some of its functionality is extended to
developers through an API stackwalking,
address-to-source translation, symbol demangling,
timing routines, accessing the name of the
executable
These functions might be useful is source-code
correlation is to be included in a UPC or SHMEM
tool

4
What is mpiP Useful For?

The data collect by mpiP is useful for analyzing
the scalability of parallel applications.
By examining the aggregate time and rank
correlation of the time spent in each MPI call
versus the total time spent in MPI calls while
increasing the number of tasks, one can locate
flaws in load balancing and algorithm design.
This technique is described in 1 Statistical
Scalability Analysis of Communication Operations
in Distributed Applications Vetter, J.
McCracken, M.
The following are courtesy of 1

5
The Downside

mpiP does provide the measurements of aggregate
callsite time, and total MPI call time necessary
for computing the rank correlation coefficient
mpiP does NOT automate the process of computing
the rank correlation, which must utilize data
from multiple experiments
Equations for calculation of coefficients of
correlation (linear and rank), care of 1

6
Partial Sample of mpiP Output
7
Information Provided by mpiP

Information displayed in terms of task
assignments, and callsites, which correspond to
machines and MPI calls in source code arranged in
the following sections
Time per task
(AppTime, MPITime, MPI)
Location of callsite in source code
(callsite, line, parent function, MPI call)
Aggregate time per callsite (top twenty)
(time, app, MPI, variance)
Aggregate sent message size per callsite (top
twenty)
(count, total, avg. MPI)
Time statistics per callsite per task (all)
(max, min, mean, app, MPI)
Sent message size statistics per callsite per
task (all)
(count, max, min, mean, sum)
I/O statistics per callsite per task (all)
(count, max, min, mean, sum)

8
mpiP Overhead
9
Source Code Correlation in Mpipview
10
Bottleneck Identification Test Suite

Testing metric what did profile data tell us?
CAMEL TOSS-UP
Profile showed that MPI time is a small
percentage of overall application time
Profile reveals some imbalance in the amount of
time spent in certain calls, but doesnt help the
user understand the cause
Profile does not provide information about what
occurs when execution is not in MPI calls.
Difficult to grasp overall program behavior from
profiling information alone
NAS LU TOSS-UP
Profile reveals that a MPI function calls consume
a significant portion of application time
Profile reveals some imbalance in the amount of
time spent in certain calls, but doesnt help the
user understand the cause
Profile does not provide information about what
occurs when execution is not in MPI calls.
Difficult to grasp overall program behavior from
profiling information alone

11
Bottleneck Identification Test Suite (2)

Big message PASSED
Profile clearly shows that Send and Recv dominate
the application time
Profiles shows a large number of bytes transfered
Diffuse procedure FAIL
Profile showed large amount of time spent in
barrier
Time is diffused across processes
Profile does not show that in each barrier a
single process is always delaying completion
Hot procedure FAIL
No profile output, due to no MPI calls (other
than setup and breakdown
Intensive server PASSED
Profile showed one process spent very little time
in MPI calls, while the remaining processes spent
nearly all their time in Recvs
Profile showed one process sent an order of
magnitude more data than the others, and spent
far more time in Send

Ping pong PASSED
Profile showed time spent in MPI function calls
dominated the total application time
Profile showed excessive number of Sends and
Recvs with little load imbalance
Random barrier PASS
Profile shows that the majority of execution time
is spent in Barrier called by processes not
holding potato
Small messages PASS
Profile clearly shows single process spends
almost all of the total application time in Recv,
and recvs an excessive amount of messages sent by
all the other processes
System time FAIL
No profile output, due to no MPI calls (other
than setup and breakdown
Wrong way TOSS-UP
One process spends most of the execution time in
sends the other spends most of the execution time
in receives
Profile does not reveal the improperly ordered
communication pattern

12
Evaluation (1)

Available metrics 1/5
Only provides a handful statistical information
about time, message size, and frequency of MPI
calls
No hardware counter support
Cost free 5/5
Documentation quality 4/5
Though brief (a single webpage), documentation
adequately covers installation and available
functionality
Extensibility 2/5
mpiP is designed to use the MPI profiling layer
so they would not be readily adapted to UPC or
SHMEM and so it would be of little use
The source code correlation functions work well
Filtering and aggregation 2/5
mpiP was designed to be lightweight, and presents
statistics for the top twenty callsites
Output size grows with number of tasks (machines)
Hardware support 5/5
64-bit Linux (Itanium and Opteron), IBM SP (AIX),
AlphaServer (Tru64), Cray X1, Cray XD1, SGI
Altix, IBM BlueGene/L
Heterogeneity support 0/5 (not supported)

13
Evaluation (2)

Installation 5/5
About as easy as you could expect
Interoperability 1/5
mpiP has its own output format
Learning curve 4/5
Easy to use
Simple statistics are easily understood
Manual overhead 1/5
All MPI calls automatically instrumented for you
when linking against mpiP library
No way to turn on/off tracing in places without
relinking
Measurement accuracy 4/5
CAMEL overhead 5
Correctness of programs is not affected
Overhead is low (less than 7 for all test suite
programs)

14
Evaluation (3)

Multiple executions 0/5 (not supported)
Multiple analyses views 2/5
Statistics regarding MPI calls are displayed in
output file
Source code location to callsite correlation
provided by Mpipview
Performance bottleneck identification 2.5/5
No automatic methods supported
Some bottlenecks could be deduced by examining
gathered statistics
Lack of trace information makes some bottlenecks
impossible to detect
Profiling/tracing support 2/5
Only supports profiling
Profiling can be enabled for various regions of
code by editing source code
Turning on/off profiling requires recompilation
(a runtime environment variable for deactivating
profiling is given in documentation, and
acknowledged in the profile output file when set,
but profiling is not disabled)

15
Evaluation (4)

Response time 3/5
No results until after run
Quickly assembles report at the end of
experimentation run
Searching 0/5 (not supported)
Software support 3/5
Supports C, C, Fortran
Supports a large number of compilers
Tied closely to MPI applications
Source code correlation 4/5
Line numbers of source code provided for each MPI
callsites in output file
Automatic source code correlation provided by
Mpipview
System stability 5/5
mpiP and Mpipview work very reliably
Technical support 5/5
Co-author, Chris Chambreau, responded quickly,
and provided good information allowing us to
correct a problem with one of our benchmark apps