Title: 205CSC316 High Performance Computing
1- DATA PARALLELISM High Performance Fortran
- IN THIS SECTION WE WILL ....
- Outline the concept of Data parallelism
- Examine the Compiler Directives for Data
Alignment - Examine the Compiler Directives for Data
Distribution
205CSC316 High Performance Computing
2- Data Parallelism
- A single thread of program control operating over
a large set of data elements. - SPMD (Single Program Multiple Data)
- The same operations are applied in asynchronous
fashion to different sets of - Data.
- Performance depends on the mapping of data to
processors which - determines communication necessary for the
accessing of non-local data - Problem decomposition for SPMD computation is
specifying data decomposition - which would then be executed in parallel the
basic means of specifying - data decomposition is the array structure
3- Data Parallelism
- Extensions for data decomposition
- i) how arrays should be aligned with respect to
one another - -gt problem mapping
- ii) how arrays should be distributed onto an
actual machine - -gt machine mapping
4(No Transcript)
5HIGH PERFORMANCE FORTRAN COALITION OF USERS AND
VENDORS ESTABLISHED JANUARY 1992 - DRAFT
STANDARD SPRING 1993 STANDARD FOR PARALLEL
PROGRAMMING ON A WIDE RANGE OF
MACHINES BASED ON FORTRAN 90 HPF directives
allow the user to advise the compiler on
allocation of data objects to processor
memories. The mapping is performed in two
stages. the group of data objects are aligned
relative to one another, using the ALIGN or
REALIGN directives - describes the interaction
between data objects group of aligned objects
is then mapped onto a set of abstract processors,
using the DISTRIBUTE or REDISTRIBUTE
directives The abstract processors are then
mapped onto the real processors, with this
mapping being done by the compiler in a system
dependent manner.
6Data Alignment Sometimes it is convenient to
express a desired distribution for an array by
describing its relationship with respect to some
other array. -gt alignment of arrays with respect
to each other Provides a mapping between the
data components to be manipulated together.
Enables the compiler to assign array elements
to the same processor thus reducing data
transfers that would be necessary for accessing
non-local data Alignments can be based on
complete or part array dimensions and involve
contiguous or non-contiguous elements Independen
t of the underlying machine architecture
7Data Alignment direct alignment elements
concerned are explicitly aligned with respect to
each other indirect alignments template
arrays provide an abstract indexing space a
number of arrays can be aligned with respect to
the same template facilitates modularity and
increased portability -gt compiler for different
architectures is required only to adapt its code
generation to map templates onto the actual
system processor configuration.
8Processor Declarations The processor
configuration of the distributed system needs to
be conveyed to the compilation system in order
that the compiler may generate appropriate SPMD
code. The PROCESSORS directive specifies the
shape of the grid of abstract processors A
directive is introduced by the characters !HPF
in the first columns of a statement
PARAMETER (P110, P2100) !HPF PROCESSORS
DM(P1, P2) Identifies DM as a two dimensional
1000 processor structure
9Processor Declarations Other processor
declarations allow a declared set of processors
to be viewed as having different shapes !HPF
PROCESSORS DM(P1, P2), DMC (P1P2) !HPF
VIEW OF DM DMC The VIEW directive
designates the processor arrays DM and DMC to be
equivalent, that is, they refer to the same set
of processors. The processor array declarations
do not imply any actual underlying hardware
interconnection topology Primarily for
algorithm design and limited to the program unit
in which they are declared.
10Data Alignments Templates are declared using
directives appearing only in the declaration
part !HPF TEMPLATE T(N) !HPF TEMPLATE,
DIMENSION (N,N) TT Provides index space for
alignment without actually allocating
memory Data arrays are mapped to templates which
are mapped to processor grids, grids are mapped
to physical processors Identical Alignment
Identical alignment occurs when there is an
exact match of array indices to be
aligned. !HPF TEMPLATE T(N) INTEGER X(N),
Y(N) !HPF ALIGN WITH T X, Y The
statements effect the following identical
alignments X(1) to Y(1) X(2) to Y(2) ... X(N)
to Y(N)
11Non-identical Alignment An inexact match of
array indices to be aligned. !HPF TEMPLATE
T(N) REAL X(N), Y(N) !HPF ALIGN X(I) WITH
T(I) !HPF ALIGN Y(I) WITH T(I1)
Non-identical alignments occur between the
elements of vectors X and Y due to the addition
in the decomposition T of an offset X(2)
to Y(1) X(3) to Y(2) X(4) to Y(3) ...
X(N) to Y(N-1)
12Example Assume we wish to align 4 smaller arrays
(NW, NE, SW, SE) of size N x N with the four
corners of a larger array (WORLD) of size N1 x
N1, BUT without allocating the larger space
i.e. a single array which spans the entire index
space of interest !HPF TEMPLATE,DIMENSION(N1
,N1) WORLD REAL, DIMENSION (N,N) NW, NE,
SW, SE !HPF ALIGN NW(i,j) WITH WORLD(i,
j) !HPF ALIGN NE(i,j) WITH WORLD(i,
j1) !HPF ALIGN SW(i,j) WITH WORLD(i1,
j) !HPF ALIGN SE(i,j) WITH WORLD(i1,
j1) Other forms of integer subscript
expressions may be used resulting in irregular
array alignments.
13Permutation Alignment Permutation in the
alignment of arrays occur when the array indices
for alignment is arbitrarily changed. These
are expressed within the alignment specifications
as shown in the following declarations which
effect array transpositions. !HPF TEMPLATE
TT(N, N) INTEGER XX(N, N), YY(N, N) !HPF
ALIGN XX(I, J) WITH TT(I, J) !HPF ALIGN
YY(J, I) WITH TT(I, J) The alignments involve
matrix XX elements with the transposed matrix YY
elements Iit results in the following
alignments XX(1,1) to YY(1,1) XX(N,1) to
YY(1,N) XX(N,2) to YY(2,N) XX(2,1) to YY(1,2)
XX(2,2) to YY(2,2) XX(3,3) to YY(3,3) XX(3,1)
to YY(1,3) XX(3,2) to YY(2,3) XX(4,3) to
YY(3,4) XX(4,2) to YY(2,4) XX(5,3) to
YY(3,5) XX(N,3) to YY(3,N)
14Collapsed Alignment Complete array dimensions can
be mapped onto a single position involving arrays
with non-identical dimension sizes this
corresponds to the collapse of the larger array
dimension. !HPF TEMPLATE T(N) INTEGER
XX(N,N), Y(N) !HPF ALIGN XX(I,J) WITH
T(I) !HPF ALIGN Y(I) WITH T(I) Alignments
between matrix XX and vector Y are effected such
that the row elements of XX are collapsed and
aligned with a single element of Y whose index
value corresponds to the row number XX(1,1),
XX(1,2), XX(1,3), ... XX(1,N) to Y(1) XX(2,1),
XX(2,2), XX(2,3), ... XX(2,N) to Y(2) XX(3,1),
XX(3,2), XX(3,3), ... XX(3,N) to
Y(3) . . . XX(N,1), XX(N,2), XX(N,3), ...
XX(N,N) to Y(N)
15Another example REAL B(4,3), M(4) M B(, 1)
B (, 2) B(, 3) ... requires no
communication? Collapse the first dimension of
the data array B (so that all the elements
required for a particular element of the vector
sum are on the same processor) Align these
columns of B with elements of M, where the result
of the sum will be stored. The ALIGN directive
allows for both of these requirements with the
following syntax, !HPF ALIGN B(, i) WITH
M(i) If the data object is aligned in this way,
the calculation would generate no communication
- the actual distribution of B depends on the
distribution of M If the distribution of M
changed then this would not affect the alignment
the relationship between them is fixed.
16Embedding Alignment Arrays may be aligned with
larger dimensional arrays whereby the smaller
dimensional array is mapped onto the specific
dimensions of the larger array this corresponds
to the embedding of the smaller array in the
larger array. !HPF TEMPLATE TT(N,N)
INTEGER X(N), YY(N,N) !HPF ALIGN YY(,J)
WITH TT(5,J) !HPF ALIGN X(J) WITH
TT(5,J) Alignments occur between vector X and
matrix YY such that the elements of X are
aligned to only the row elements of YY
corresponding to row 5. This would correspond
to the embedding of vector X into row 5 of matrix
YY resulting in the following alignments X(1)
to YY(5,1) X(2) to YY(5,2) X(3) to YY(5,3)
... X(N) to YY(5,N)
17Replication Alignment Array elements may be
duplicated and aligned with a set of elements
from another array this represents alignments
by replication. !HPF TEMPLATE TT(N,N)
DIMENSION X(N), YY(N,N) !HPF ALIGN X(I) WITH
TT(I,1N) !HPF ALIGN YY(I,J) WITH
TT(I,1N) The elements of vector X are
replicated and aligned with the column elements
of matrix YY resulting in the following
alignments X(1) to YY(1,1) X(2) to YY(2,1)
... X(N) to YY(N,1) X(1) to YY(1,2) X(2) to
YY(2,2) ... X(N) to YY(N,2) X(1) to YY(1,3)
X(2) to YY(2,3) ... X(N) to YY(N,3) . . . X(1)
to YY(1,N) X(2) to YY(2,N) ... X(N) to YY(N,N)
18Data Distribution Each processor has a local
memory - large arrays must be distributed across
many different processors Arrays can be
distributed in different ways depending on how
they are to be used Want to chose distribution
which maximises the ratio of local work to
communication DISTRIBUTE Directive Compiler
directive to specify type of distribution to use
Specifies the distribution for each dimension
of array !HPF DISTRIBUTE a(distribution) or !H
PF DISTRIBUTE (distribution) a,
b distribution is a comma-separated list of the
distributions for each array dimension
19Regular distributions Block distribution
Allocation of blocks of array elements amongst
the processors based on the array size and the
number of processors available. The mapping of
evenly sized groups of array elements to
processors or templates !HPF PROCESSORS
P(4) !HPF TEMPLATE T(12) !HPF DISTRIBUTE
T(BLOCK) The template T is divided into evenly
sized blocks for assignment to the processor
array which is indicated by the identifier BLOCK
the following distributions are
effected Processor 1 2 3
4 T(1) T(4) T(7) T(10) T(2) T(5) T(8) T(11)
T(3) T(6) T(9) T(12)
20Regular distributions Cyclic distribution
Round-robin allocation of array elements
amongst the available processors. !HPF
DISTRIBUTE T(CYCLIC) The template T elements are
assigned cyclically to the processor array which
is indicated by the identifier CYCLIC the
following distributions are effected Processor
1 2 3 4 T(1) T(2) T(3) T(4) T(5)
T(6) T(7) T(8) T(9) T(10) T(11) T(12)
21Block-cyclic distribution The mapping of array
elements to processors which have been divided
into evenly sized blocks and then allocated in
round-robin fashion !HPF
DISTRIBUTE T(CYCLIC(2)) Processor 1
2 3 4 T(1) T(3) T(5) T(7) T(2) T(4)
T(6) T(8) T(9) T(11) T(10) T(12)
22Example Two Dimensions Processors
4 i) DIMENSION BB(8,8) !HPF DISTRIBUTE
BB(BLOCK, ) refers to complete column - not
distributed for a BLOCK distribution blocksize
is ceiling(N/P) P - number of processors in the
dimension, N - dimension size rows allocated in a
block size of 2 ii) !HPF DISTRIBUTE BB(,
BLOCK) two columns allocated in a block iii)
!HPF DISTRIBUTE BB(CYCLIC, ) rows allocated
cyclically one at a time
23Example Two Dimensions iv) !HPF
DISTRIBUTE BB(, CYCLIC) columns allocated
cyclically one at a time v) PROCESSORS(2,
2) !HPF DISTRIBUTE BB(BLOCK, BLOCK) rows in a
block size of 4, columns in a block size of
4 vi) !HPF DISTRIBUTE BB(BLOCK, CYCLIC)
rows allocated in two groups group 1 rows 1 2 3
4 group 2 rows 5 6 7 8 then columns cyclically
one at a time vii) !HPF DISTRIBUTE
BB(CYCLIC, BLOCK) columns allocated in groups of
two group 1 columns 1 2 3 4 group 2 columns 5 6 7
8 then rows cyclically viii) !HPF DISTRIBUTE
BB(CYCLIC, CYCLIC) rows cyclic by
element columns cyclic by element