mpiP Evaluation Report - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

mpiP Evaluation Report

Description:

mpiP is a simple lightweight tool for profiling. Gathers information through the MPI ... runtime environment variable for deactivating profiling is given in ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 16
Provided by: dral60
Category:

less

Transcript and Presenter's Notes

Title: mpiP Evaluation Report


1
mpiP Evaluation Report
  • Hans Sherburne,
  • Adam Leko
  • UPC Group
  • HCS Research Laboratory
  • University of Florida

2
Basic Information
  • Name
  • mpiP
  • Developer
  • Jeffrey Vetter (ORNL), Chris Chambreau (LLNL)
  • Current versions
  • mpiP v2.8
  • Website
  • http//www.llnl.gov/CASC/mpip/
  • Contacts
  • Jeffrey Vetter vetterjs_at_ornl.gov
  • Chris Chambreau chcham_at_llnl.gov

3
mpiP Lightweight, Scalable MPI Profiling
  • mpiP is a simple lightweight tool for profiling
  • Gathers information through the MPI profiling
    layer
  • Probably not good candidate to be extended for
    UPC or SHMEM
  • Supports many platforms running Linux, Tru64,
    AIX, UNICOS, IBM BG/L
  • Very simple to use, and output file is very easy
    to understand
  • Provides statistics for the top twenty MPI calls
    based on time spent in call, and total size of
    messages sent, also provides statistics for MPI
    I/O
  • Callsite traceback depth is variable to allow
    user to differentiate between and examine the
    behavior of routines that are wrappers for MPI
    calls
  • A mpip viewer, Mpipview, is available as part of
    Tool Gear
  • Some of its functionality is extended to
    developers through an API stackwalking,
    address-to-source translation, symbol demangling,
    timing routines, accessing the name of the
    executable
  • These functions might be useful is source-code
    correlation is to be included in a UPC or SHMEM
    tool

4
What is mpiP Useful For?
  • The data collect by mpiP is useful for analyzing
    the scalability of parallel applications.
  • By examining the aggregate time and rank
    correlation of the time spent in each MPI call
    versus the total time spent in MPI calls while
    increasing the number of tasks, one can locate
    flaws in load balancing and algorithm design.
  • This technique is described in 1 Statistical
    Scalability Analysis of Communication Operations
    in Distributed Applications Vetter, J.
    McCracken, M.
  • The following are courtesy of 1

5
The Downside
  • mpiP does provide the measurements of aggregate
    callsite time, and total MPI call time necessary
    for computing the rank correlation coefficient
  • mpiP does NOT automate the process of computing
    the rank correlation, which must utilize data
    from multiple experiments
  • Equations for calculation of coefficients of
    correlation (linear and rank), care of 1

6
Partial Sample of mpiP Output
7
Information Provided by mpiP
  • Information displayed in terms of task
    assignments, and callsites, which correspond to
    machines and MPI calls in source code arranged in
    the following sections
  • Time per task
  • (AppTime, MPITime, MPI)
  • Location of callsite in source code
  • (callsite, line, parent function, MPI call)
  • Aggregate time per callsite (top twenty)
  • (time, app, MPI, variance)
  • Aggregate sent message size per callsite (top
    twenty)
  • (count, total, avg. MPI)
  • Time statistics per callsite per task (all)
  • (max, min, mean, app, MPI)
  • Sent message size statistics per callsite per
    task (all)
  • (count, max, min, mean, sum)
  • I/O statistics per callsite per task (all)
  • (count, max, min, mean, sum)

8
mpiP Overhead
9
Source Code Correlation in Mpipview
10
Bottleneck Identification Test Suite
  • Testing metric what did profile data tell us?
  • CAMEL TOSS-UP
  • Profile showed that MPI time is a small
    percentage of overall application time
  • Profile reveals some imbalance in the amount of
    time spent in certain calls, but doesnt help the
    user understand the cause
  • Profile does not provide information about what
    occurs when execution is not in MPI calls.
  • Difficult to grasp overall program behavior from
    profiling information alone
  • NAS LU TOSS-UP
  • Profile reveals that a MPI function calls consume
    a significant portion of application time
  • Profile reveals some imbalance in the amount of
    time spent in certain calls, but doesnt help the
    user understand the cause
  • Profile does not provide information about what
    occurs when execution is not in MPI calls.
  • Difficult to grasp overall program behavior from
    profiling information alone

11
Bottleneck Identification Test Suite (2)
  • Big message PASSED
  • Profile clearly shows that Send and Recv dominate
    the application time
  • Profiles shows a large number of bytes transfered
  • Diffuse procedure FAIL
  • Profile showed large amount of time spent in
    barrier
  • Time is diffused across processes
  • Profile does not show that in each barrier a
    single process is always delaying completion
  • Hot procedure FAIL
  • No profile output, due to no MPI calls (other
    than setup and breakdown
  • Intensive server PASSED
  • Profile showed one process spent very little time
    in MPI calls, while the remaining processes spent
    nearly all their time in Recvs
  • Profile showed one process sent an order of
    magnitude more data than the others, and spent
    far more time in Send
  • Ping pong PASSED
  • Profile showed time spent in MPI function calls
    dominated the total application time
  • Profile showed excessive number of Sends and
    Recvs with little load imbalance
  • Random barrier PASS
  • Profile shows that the majority of execution time
    is spent in Barrier called by processes not
    holding potato
  • Small messages PASS
  • Profile clearly shows single process spends
    almost all of the total application time in Recv,
    and recvs an excessive amount of messages sent by
    all the other processes
  • System time FAIL
  • No profile output, due to no MPI calls (other
    than setup and breakdown
  • Wrong way TOSS-UP
  • One process spends most of the execution time in
    sends the other spends most of the execution time
    in receives
  • Profile does not reveal the improperly ordered
    communication pattern

12
Evaluation (1)
  • Available metrics 1/5
  • Only provides a handful statistical information
    about time, message size, and frequency of MPI
    calls
  • No hardware counter support
  • Cost free 5/5
  • Documentation quality 4/5
  • Though brief (a single webpage), documentation
    adequately covers installation and available
    functionality
  • Extensibility 2/5
  • mpiP is designed to use the MPI profiling layer
    so they would not be readily adapted to UPC or
    SHMEM and so it would be of little use
  • The source code correlation functions work well
  • Filtering and aggregation 2/5
  • mpiP was designed to be lightweight, and presents
    statistics for the top twenty callsites
  • Output size grows with number of tasks (machines)
  • Hardware support 5/5
  • 64-bit Linux (Itanium and Opteron), IBM SP (AIX),
    AlphaServer (Tru64), Cray X1, Cray XD1, SGI
    Altix, IBM BlueGene/L
  • Heterogeneity support 0/5 (not supported)

13
Evaluation (2)
  • Installation 5/5
  • About as easy as you could expect
  • Interoperability 1/5
  • mpiP has its own output format
  • Learning curve 4/5
  • Easy to use
  • Simple statistics are easily understood
  • Manual overhead 1/5
  • All MPI calls automatically instrumented for you
    when linking against mpiP library
  • No way to turn on/off tracing in places without
    relinking
  • Measurement accuracy 4/5
  • CAMEL overhead 5
  • Correctness of programs is not affected
  • Overhead is low (less than 7 for all test suite
    programs)

14
Evaluation (3)
  • Multiple executions 0/5 (not supported)
  • Multiple analyses views 2/5
  • Statistics regarding MPI calls are displayed in
    output file
  • Source code location to callsite correlation
    provided by Mpipview
  • Performance bottleneck identification 2.5/5
  • No automatic methods supported
  • Some bottlenecks could be deduced by examining
    gathered statistics
  • Lack of trace information makes some bottlenecks
    impossible to detect
  • Profiling/tracing support 2/5
  • Only supports profiling
  • Profiling can be enabled for various regions of
    code by editing source code
  • Turning on/off profiling requires recompilation
  • (a runtime environment variable for deactivating
    profiling is given in documentation, and
    acknowledged in the profile output file when set,
    but profiling is not disabled)

15
Evaluation (4)
  • Response time 3/5
  • No results until after run
  • Quickly assembles report at the end of
    experimentation run
  • Searching 0/5 (not supported)
  • Software support 3/5
  • Supports C, C, Fortran
  • Supports a large number of compilers
  • Tied closely to MPI applications
  • Source code correlation 4/5
  • Line numbers of source code provided for each MPI
    callsites in output file
  • Automatic source code correlation provided by
    Mpipview
  • System stability 5/5
  • mpiP and Mpipview work very reliably
  • Technical support 5/5
  • Co-author, Chris Chambreau, responded quickly,
    and provided good information allowing us to
    correct a problem with one of our benchmark apps
Write a Comment
User Comments (0)
About PowerShow.com