Atomic Sections - PowerPoint PPT Presentation

About This Presentation

Title:

Atomic Sections

Description:

Source: developed based on MAMBO/HPCS Inner-Product program. The Gram Schmidt Orthonormalization ... Source: IBM MAMBO benchmarks. Radix Sort. Random access ... – PowerPoint PPT presentation

Number of Views:25

Avg rating:3.0/5.0

Slides: 12

Provided by: capsl

Learn more at: https://www.capsl.udel.edu

Category:

more less

Transcript and Presenter's Notes

Title: Atomic Sections

1
Atomic Sections A Design and Evaluation Study
under OpenMP-XN
Presentors Joseph Bryant Manzano
Franco Yuan Zhang Guang R. Gao Computer
Architecture and Parallel System
Laboratory University of Delaware In
Collaboration with Kemal Ebcioglu, Vivek
Sarkar X10 Team, IBM
2
Context IBM PERCS/X10 project

DARPA HPCS program Phase 2 focuses on evaluating
new technologies for productivity and performance.

PERCS Programming Tools performance-guided
parallelization and transformation, static
dynamic checking, separation of concerns --- all
integrated into a single development environment
(Eclipse)
Atomic Section Milestone 4 Under OpenMP-XN

Focusing only on extensions from familiar SPMD
model is essential to PERCS programming model
Facilitate comparative studies on a large body
of OpenMP code
Permit an early study to begin before X10
project is underway.
Permit risky ideas to be studied under a more
focused context.
The algorithms developed can become a basis for
future implementation and integration under the
X10 framework.

PERCS Programming Model
OpenMP
MPI
Static and Dynamic Compilers for base language w/
programming model extensions Mature languages
C/C, Fortran, Java Emerging languages UPC,
StreamIt Experimental language X10
Language Runtime Dynamic Compilation
Continuous Optimization
PERCS System Software (K42, vHype)
PERCS System Hardware
3
Major Goals

We show how the OpenMP can be extended with the
concept of atomic section.
We develop a methodology of implementation of
analyzable atomic sections that include three
steps (1) identify the consistency-list for an
atomic section (2) assignment of locks to
concurrent atomic sections to expose maximum
parallelism with minimum cost of locks (3)
placement of fine-grained synchronization
We develop an OpenMP-XN prototype implementation
framework. We report the results and analysis of
our experiments on some selected set of
benchmarks and their analysis.
We have conducted a productivity study that show
how OpenMP-XN with analyzable atomic sections can
improve the programming productivity via examples
measured by time to the first correct
implementation.

4
Random Access
The one-dimensional array is passed by reference
Initialized the table
Start a parallel region, and specify the shared
and private data
Initialize ran for each thread with a random
number seeded by the thread id and scheduling
information
Each thread begins to execute some iterations of
for loop
Atomic Section synchronizes accesses to shared
table by making its operations atomic and
mutually exclusive.
Atomic section (AS) a section of code that is
intended to be executed atomically, and mutually
exclusive with other conflicting atomic
operations.
5
Atomic Section

A section of code that is intended to be
executed atomically, and mutually exclusive with
other conflicting atomic operations.

6
OpenMP-XN Runtime Model
7
Atomic Section Implementation
Note The five-step process is produced
automatically by the OpenMP-XN compiler. High
productivity programmers need not know about the
lock assignment and data replacement.
An atomic section is implemented as a five-step
process (1) acquire lock (2) refresh (3)
computation (4) write-back (5) release lock.
8
Atomic Section Implementation contd.

Assumptions
No nested atomic sections
No nested parallel regions
A Three-Step Approach
Consistency List Analysis (CLA)
Given an OpenMP-XN program, analyze each atomic
section and identify shared data which might be
read or written within that atomic section.
Lock Refinement and Assignment (LRA)
Given an OpenMP-XN program, assign one or more
locks to guard the entrance of each atomic
section, so that any pair of concurrent atomic
sections that might access the same shared data
will be guarded by the same lock.
Generation of Consistency Actions (GCA)
Generate refresh and write_back operations in
atomic sections so that the runtime number of
these operations is minimized.

9
OpenMP-XN Experimental Testbed Structure (based
on Omni)
10
Experiments

A preliminary implementation of AS in OpenMP-XN
has been completed and tested
Performance analysis based on OpenMP-XN on
DARPA HPCS benchmarks and other benchmarks is in
progress
A productivity study on AS has been conducted

Benchmark
Micro-Benchmarks A set of small benchmarks to test the performance of atomic sections Source Delaware internal benchmarks
Delaware Banker A simple simulator of bank transactions. Implemented in parallel Source Delaware internal benchmarks
TAMMP Toy Another Molecular Mechanics Program. Kernel of the SPEC OMP molecular dynamics benchmark, ammp.
Random access Random access benchmark modified to run under OpenMP XN Source HPC Challenge (modified version)
Radix Sort Implementation of the parallel integer radix sort algorithm Source IBM MAMBO benchmarks
The Gram Schmidt Orthonormalization Compose of dot product derived from IBM benchmarks Source developed based on MAMBO/HPCS Inner-Product program
11
Preliminary Experimental Results

Current OpenMP-XN platform permits users to
collect/derive
Execution Time
Speedup Curves
Performance Statistics
Cache consistency traffic
Cache misses
Number of memory operations
CPU cycles of each computation unit in program

Case Study
Test bed Sun UltraSPARC III, 4 CPU, 400 MHz
Benchmark Random access
Problem Size 214
Compare Atomic Section with Critical Section
With right architectural support, atomic section
will not introduce performance overhead.
Delaware tool chain can help for more in-depth
performance study e.g. the following is one of
interesting observation

Preliminary Performance Observations
On conventional hardware platforms, the memory
wall (especially cache consistency traffic) is a
bottleneck for performance improvement.
Atomic section performance potential requires
architectural innovations.

Number of snoops averaged over 240 runs
Critical Section 2027.6
Atomic Section 1185.6
Observation It appears that the OpenMP-XN
runtime model based on atomic sections reduce the
number of coherence transactions considerably for
this example, compared to standard OpenMP
critical sections. However, more study is needed
for further explanation and exploration.
Future results will be obtained from PowerPC
systems (work already in progress)

Write a Comment

User Comments (0)