Parallel Performance Wizard: Introduction - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Parallel Performance Wizard: Introduction

Description:

Discouraging for users, new & old; few options for shared-memory computing in ... Could also be used in conjunction with experimental performance measurement ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 14
Provided by: dral60
Category:

less

Transcript and Presenter's Notes

Title: Parallel Performance Wizard: Introduction


1
Parallel Performance WizardIntroduction
  • Professor Alan D. George, Principal Investigator
  • Mr. Hung-Hsun Su, Sr. Research Assistant
  • Mr. Adam Leko, Sr. Research Assistant
  • Mr. Bryan Golden, Research Assistant
  • Mr. Hans Sherburne, Research Assistant
  • Mr. Max Billingsley, Research Assistant
  • Mr. Josh Hartman, Undergraduate Volunteer
  • HCS Research Laboratory
  • University of Florida

2
Outline
  • Motivations and Objectives
  • Background
  • Framework Key Features
  • Phase II Tasks Schedules
  • Todays Schedule

3
Motivations and Objectives
  • Motivations
  • UPC/SHMEM program does not yield the expected
    performance. Why?
  • Due to complexity of parallel computing,
    difficult to determine without tools for
    performance analysis and optimization
  • Discouraging for users, new old few options
    for shared-memory computing in UPC and SHMEM
    communities
  • Objectives
  • Research topics relating to performance analysis
  • Develop framework for a performance analysis tool
  • Design with both performance and user
    productivity in mind
  • Develop a performance analysis tool for UPC and
    SHMEM

4
Need for Performance Analysis
  • Performance analysis of sequential applications
    can be challenging
  • Performance analysis of explicitly communicating
    parallel applications is significantly more
    difficult
  • Mainly due to increase in number of processing
    nodes
  • Performance analysis of Implicitly communicating
    parallel applications is even more difficult
  • Non-blocking, one-sided communication is tricky
    to track and analyze accurately

5
Background - SHMEM
  • SHared MEMory library
  • Based on SPMD model
  • Available for C / Fortran
  • Available for servers and clusters
  • Easier to program than MPI
  • Hybrid programming model
  • Traits of message passing
  • Explicit communication, replication and
    synchronization
  • Need to give remote data location (processing
    element ID)
  • Traits of shared memory
  • Provides logically shared memory system view
  • Non-blocking, one-sided communication
  • Lower latency, higher bandwidth communication
  • PSHMEM available for some implementations

6
Background - UPC
  • Unified Parallel C (UPC)
  • Partitioned GAS parallel programming language
  • Common and familiar syntax and semantics for
    parallel C with simple extensions to ANSI C
  • Many implementations
  • Open source Berkeley UPC, Michigan UPC, GCC-UPC
  • Proprietary HP-UPC, Cray-UPC
  • Easier to program than MPI, software more
    scalable
  • With hand-tuning, UPC performance compares
    favorably with MPI

7
Background Performance Analysis
  • Three general performance analysis approaches
  • Analytical modeling
  • Mostly predictive methods
  • Could also be used in conjunction with
    experimental performance measurement
  • Pros easy to use, fast, can be performed without
    running the program
  • Cons usually not very accurate
  • Simulation
  • Pros allow performance estimation of program
    with various system architectures
  • Cons slow, not generally applicable for regular
    UPC/SHMEM users
  • Experimental performance measurement
  • Strategy used by most modern performance analysis
    tools (PATs)
  • Uses actual event measurement to perform analysis
  • Pros most accurate
  • Cons can be time-consuming

PAT Performance Analysis Tool
8
Background - Experimental Performance Measurement
Stages
  • Instrumentation user-assisted or automatic
    insertion of instrumentation code
  • Measurement actual measuring stage
  • Analysis data analysis toward bottleneck
    detection resolution
  • Presentation display of analyzed data to user,
    deals directly with user
  • Optimization process of finding and resolving
    bottlenecks

9
Framework
10
Key Features
  • Semi-automatic source-level instrumentation as
    default
  • Only P module and part of I module are visible to
    user
  • PAPI will be used
  • Tracing mode as default with profiling support
  • Post-mortem data processing and analysis
  • Analyses load balancing, scalability, memory
    system
  • Visualizations timeline display, speedup chart,
    call-tree graph, communication volume graph,
    memory access graph, profiling table

11
Tasks Schedule
12
Discussion Topic Target Platforms
  • Our current platform list changes needed?
  • Open
  • Quadrics SHMEM on Opterons RHEL4 (qsnet)
  • Berkeley UPC on Opterons RHEL4 (iba)
  • Proprietary
  • Cray UPC on X1E (src. inst)
  • Cray SHMEM on X1E

13
Todays Schedule
  • 0900 0930 AM Project overview
  • 0930 1015 AM Instrumentation (I) module
    presentation
  • 1015 1030 AM BREAK
  • 1030 1115 AM Measurement (M) module
    presentation
  • 1115 1145 AM IM-modules demo
  • 1145 1300 PM LUNCH
  • 1300 1345 PM Analysis (A) module
    presentation
  • 1345 1400 PM A-module demo
  • 1400 1445 PM Presentation (P) module
    presentation
  • 1445 1500 PM P-module demo
  • 1500 1515 PM BREAK
  • 1515 1600 PM Wrap-up planning discussion
Write a Comment
User Comments (0)
About PowerShow.com