Title: Performance Evaluation of Parallel Programming Models
1Performance Evaluation of Parallel Programming
Models
Thesis Defense Achal Prabhakar
2Contents
- Architectures and Programming Models
- Problem Statement
- Current State Project goals
- SkaMPI Framework
- Our Work
- OpenMP Measurements
- Hybrid MPI -OpenMP Measurements
- Status
- Publications
- Questions
3Architechtures and Models
- Distributed Memory
- Distinct processing elements with local memory
and private data ownership - Shared Memory
- Common memory shared among the processors.
- Data is local to the node and ownership is
shared. - Hybrid
- Combination of the above two.
4Architechtures Models contd.
- Message Passing Model
- All communication between the PE is through
explicit message passing. - Shared Memory Model
- Communication occurs via direct reads/writes to
the shared memory. - Hybrid Model
- Communication using reads/writes within the node
and via explicit message passing across the nodes.
5Problem Statement
- The question for an application writer is
- "Which model is best suited for a particular
application?" - Four parameters most significant
- Programming Effort
- Resource Utilisation
- Scalability
- Comparative cost of equivalent operations
6Problem Statement
- The question for an application writer is
- "Which model is best suited for a particular
application?" - Four parameters most significant
- Programming Effort
- Resource Utilisation
- Scalability
- Comparative cost of equivalent operations
7Problem Statement
- "Which model is best suited for a particular
application?" - Four parameters most significant
- Programming Effort
- Resource Utilisation
- Scalability
- Comparative cost of equivalent operations
8Benchmark Suites
- MPI Benchmarks
- Parkbench, Pallas, GENESIS, NAS Parallel
- OpenMP Benchmark
- EPCC Low level benchmarks
- OpenMP version of NPB etc.
- Hybrid
- NONE AVAILABLE!!
9Desirable!
- Integrated
- One suite of benchmark that integrates the three
programming models. - Comprehensive
- Include all the significant measurements from all
the models - Efficient
- No one wants to spend precious computer time on
running benchmarks!! - Extensible
- Easy addition of future measurements and
programming models
10The SKaMPI Framework
- SkaMPI - Special Karlsruhe MPI Benchmark
- Framework for making measurements and presenting
results in a flexible and modular manner. - Features
- Flexible user interface via a parameter file with
simple and extensible grammar. - Automatic control of measurement error.
- Automatic selection and refinement of parameters
- Concatenation of results from multiple runs
- Most important easily extensible.
11Our Work
- Incorporating the significant low-level
measurements in the framework covering - MPI
- OpenMP
- Pthreads
- OpenMP-MPI Hybrid
- Provide interpretation of the results.
- Correlate measurements from different models
provide a basis for comparisons.
12OpenMP Measurements Example
- Lock / Unlock Measurement
- Two possible methods to measure
- EPCC method
- OVERHEAD ACTUAL - REFERENCE
ACTUAL Start_time for ( I 1 to N) lock()
WORK unlock() End_time
REFERENCE Start_time for ( I 1 to
N) WORK End_time
- This measures the time that the master spent in
the lock routine. - If lock fairness is not gauranteed, it will get
wrong results.
13OpenMP Measurements Example
- Lock / Unlock Measurement
- Two possible methods to measure
- Our method
- OVERHEAD ACTUAL - REFERENCE
ACTUAL Start_time for ( I 1 to N) lock()
WORK unlock() BARRIER End_time
REFERENCE Start_time for ( I 1 to
N) WORK BARRIER End_time
- Measure the total time that all the threads spent
in the lock. - Does not rely on lock fairness
- On SUN machine, the EPCC method shows constant
overhead while our method gives linear overhead
as is expected.
14Hybrid Measurements Example
- OpenMP construct with MPI communication.
- Potential for Computation-Communication overlap.
- Consider the combined OpenMP-MPI reduce.
- The potential overlap is while the master is busy
doing MPI REDUCE, the other threads are executing
the first phase of the OpenMP barrier. - Other Measurements
- MPI_Isend OpenMP work MPI_recv
- MPI_Barrier OpenMP Barrier
pragma omp parallel pragma omp for
reduce(omp_sum) for(i0iltNi) omp_sum
i pragma omp master MPI_ALL_REDUCE() pragma
omp barrier
15Status
- OpenMP measurements are in place
- MPI measurements already implemented
- Several OpenMP-MPI tests implemented
16Publications
- Achal Prabhakar , Vladimir Getov , Barbara
Chapman. Performance Comparisons of Basic OpenMP
Constructs. High Performance Computing 4th
International Symposium, ISHPC 2002, Kansai
Science City, Japan, May 15-17, 2002. Proceedings
- Achal Prabhakar,Barbara Chapman, Frederic
Bregier, Amit Patil. Achieving Performance under
OpenMP on ccNUMA and Software Distributed Shared
Memory Systems. Special Issue of Concurrency
Practice and Experience 12 - Achal Prabhakar, Barbara Chapman, Amit Patil.
Performance Oriented Programming for NUMA
Architectures. Proceedings of WOMPAT 2001. - Achal Prabhakar, Barbara Chapman, Frederic
Bregier, Amit Patil. Extensions to OpenMP for
Multiple Memory Hierarchy Architectures. RENPAR
2001, Paris. - Barbara Chapman, Oscar Hernandez, Amit Patil,
Achal Prabhakar.Program Development Environment
for OpenMP Programs on ccNUMA Architectures.
Proc. Large Scale Scientific Computations 2001,
Sozopol - Achal Prabhakar, Vladimir Getov. Performance
Evaluation of Hybrid Parallel Programming
Paradigms. Workshop on Performance Analysis and
Distributed Computing, August 19-23 2002,
Dagstuhl, Germany
17Questions