Administrative Stuff - PowerPoint PPT Presentation

About This Presentation

Title:

Administrative Stuff

Description:

... example with use of the mask option. ... to be side-effect free so can be executed ... TEMPLATE Directive. Declares an abstract scalar or an array. no ... – PowerPoint PPT presentation

Number of Views:40

Avg rating:3.0/5.0

Slides: 49

Provided by: amitab

Learn more at: http://charm.cs.uiuc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Administrative Stuff

1
Administrative Stuff

Location change for the lecture on Friday March 2
- Education Building Room 2 - one lecture only.

2
CS 320/ECE 392/CSE 302
Data Parallelism in High Performance Fortran

Department of Computer Science University of
Illinois at Urbana-Champaign
3
Contents

High Performance Fortran
Parallelism constructs
FORALL
PURE functions
INDEPENDENT
Data Distribution Directives
ALIGN
DISTRIBUTE
TEMPLATE
PROCESSORS

4
References

HPF specification (v2.0 available online
http//dacnet.rice.edu/Depts/CRPC/HPFF/versions/hp
f2/hpf-v20/index.html
Includes material from Documentation, slides and
Papers on HPF at Rice University.

5
What is HPF?

HPF is a standard for data-parallel programming.
Extends Fortran-77 or Fortran-90 (in theory also
C - not used in practice).

6
Principle of HPF

Extending sequential language with data
distribution directives that specify on which
processor a certain part of an array should
reside.
Source program for single processor.
Compiler then produces
data parallel program (SPMD),
communication between the processes.

7
What the Standard Says

Can be used with both Fortran-77 and Fortran-90.
Distribution directives are just a hint, compiler
can ignore them.
HPF can be used on both shared memory and
distributed memory hardware platforms.

8
In Reality

HPF is always used with Fortran-90.
Distribution directives are a must.
HPF used on both shared memory and distributed
memory platforms.
But the truth is that the language was really
meant for distributed memory platforms.

9
HPF Additional Expressions of Parallelism

FORALL (data parallel) array assignment.
PURE functions
INDEPENDENT construct.

10
FORALL Array Assignment

FORALL( subscript lower_bound upper_bound
stride, mask) array-assignment
Execute all iterations of the subscript loop in
parallel for the given set of indices, where mask
is true.
May have multiple dimensions.
Same semantics first compute right hand side,
then assign to left hand side.
Only one assignment to particular element (not
checked by the compiler!).

11
Examples

Example1
do i 1,100
X(i,i) 0.0
enddo
becomes
FORALL(i1100) X(i,i) 0.0
Example2
FORALL(i150) D(i) E(2i-1) E(2i)

12
Examples

A multiple dimension example with use of the mask
option.
Set all the elements of X above the diagonal to
the sum of their indices.
FORALL(i1100, j1100, iltj) X(i,j) ij

13
PURE functions/subroutines

Defined to be side-effect free so can be executed
concurrently.
Example If nitns() is declared as a PURE
functions then
FORALL(i1M, j1N) mandel(i,j)
nitns(CMPLX(.1I, .1J))

14
The INDEPENDENT Clause

!HPF INDEPENDENT
DO
ENDDO
Specifies that the iterations of the loop can be
executed in any order (concurrently).

15
Examples

!HPF INDEPENDENT
DO i1, 100
DO j 1, 100
IF(i.NE.j) A(i,j) 1.0
IF(i.EQ.j) A(i,j) 0.0
ENDDO
ENDDO

16
Examples Nesting

!HPF INDEPENDENT
DO i1, 100
!HPF INDEPENDENT
DO j 1, 100
IF(i.NE.j) A(i,j) 1.0
IF(i.EQ.j) A(i,j) 0.0
ENDDO
ENDDO

17
HPF/Fortran-90 Matrix Multiply

C MATMUL( A, B )

18
HPF Matrix Multiply

C 0.0
do k 1, n
FORALL(i1n, j1n )
C(i,j) C(i,j) A(i,k) B(k,j)
enddo

19
HPF Matrix Multiply

!HPF INDEPENDENT
DO i1,n
DO j1,n
C(i,j) 0.0
DO k1,n
C(i,j) C(i,j) A(i,k) B(k,j)
ENDDO
ENDDO
ENDDO

20
HPF Matrix Multiply

!HPF INDEPENDENT
DO i1,n
!HPF INDEPENDENT
DO j1,n
C(i,j) 0.0
DO k1,n
C(i,j) C(i,j) A(i,k) B(k,j)
ENDDO
ENDDO
ENDDO

21
PROCESSORS Directive

Declare abstract processor arrangements (single
processors or processor arrays)

!HPF PROCESSORS p(4), q(NUMBER_OF_PROCESSORS()
/2, 2)
22
ALIGN Directive

Relates elements of an array to those of another
array or templace such that the aligned elements
are stored on the same processor(s)

REAL a(4), b(4), c(8), a2(4,4), b2(4,4) !HPF
ALIGN a() with b() !HPF ALIGN a() with
c(282) !HPF ALIGN a() with c(41-1) !HPF
ALIGN a() with b2(, ) !HPF ALIGN a2(,) with
b () !HPF ALIGN a2(I,J) with b2(J,I)
23
TEMPLATE Directive

Declares an abstract scalar or an array
no storage allocated
used just for data alignment and distribution
data objects can be aligned with templates and
templates can be distributed.

!HPF TEMPLATE t(10), t2(10,10), u(mn)
24
Data Distributions

HPF provides data distribution directives to
specify which processor owns what data.
Owner-computes rule the owner of the data does
the computation on the data.
Goal improve locality, reduce communication, and
improve performance.

25
Data Distribution Definition

!HPF DISTRIBUTE ltarraygt ltdistributiongt
ltdistributiongt (in each dimension)
no distribution
BLOCK
BLOCK(k) k is block size, default n/p
CYCLIC
CYCLIC(k) k is cycle size, default 1
Array without distribution is replicated.

26
Data Distribution Examples

!HPF DISTRIBUTE A(BLOCK,BLOCK)

27
Data Distribution Examples

!HPF DISTRIBUTE A(BLOCK,)

28
Data Distribution Examples

!HPF DISTRIBUTE A(,BLOCK)

29
Data Distribution Examples

!HPF DISTRIBUTE A(,CYCLIC)

30
Data Distribution Examples

!HPF DISTRIBUTE A(,CYCLIC(2))

31
Data Distribution Examples

!HPF DISTRIBUTE A(BLOCK,CYCLIC)

32
Difference between OpenMP and HPF

In OpenMP, user specifies distribution of
iterations.
Data travels to processor executing iteration.
In HPF, user specifies distribution of data.
Computation is done by processor owning data.

33
HPF Matrix Multiply

!HPF DISTRIBUTE C(BLOCK,)
ltstandard matrix multiply codegt
Leads to the same computation as the OpenMP
expression of matrix multiply each processor
computes a contiguous set of rows.

34
HPF Matrix Multiply

!HPF DISTRIBUTE C(,BLOCK)
lt standard matrix multiply code gt
Would cause each processor compute a contiguous
set of columns.

35
HPF Matrix Multiply

!HPF DISTRIBUTE C(BLOCK,BLOCK)
lt standard matrix multiply code gt
Each processor computes a rectangular sub-array
of the result.

36
Gaussian elimination

(without pivoting)
for( i0 iltn i )
for( ji1 jltn j )
for( ki1 kltn k )
ajk ajk - aikaij / ajj
For-j loop is outermost parallelizable loop.

37
OpenMP Gauss

for( i0 iltn i )
pragma omp parallel for private(k)
for( ji1 jltn j )
for( ki1 kltn k )
ajk ajk - aikaij / ajj

38
HPF Gauss

!HPF DISTRIBUTE A(CYCLIC,)
DO i1,n
!HPF INDEPENDENT
DO ji1,n
DO ki1,n
A(j,k) A(j,k) - A(i,k)A(i,j)/A(j,j)

39
Difference with OpenMP Gauss

In HPF, cyclic distribution of A is useful for
load balance.
In OpenMP, block scheduling of iterations
suffices (because iterations are re-distributed
at each new pragma).

40
Difference with OpenMP Gauss

In HPF, each processor keeps on working on the
same data/row (owner computes).
In OpenMP, data/rows move between processors.
HPF potentially more efficient (increased
locality, more about this later).

41
How an HPF compiler works

Parallelization based on Fortran-90 and HPF
concurrency constructs.
Assign data to processor based on distributions.
Compute data on owning processor.
Move other data necessary for computation to that
processor.

42
Hard Part of HPF Compiler

Communication optimization.
Avoid lots of small messages, optimize towards
few large messages.
Absolutely critical to good performance.

43
Performance impact of distribution

Back to Matrix Multiply
!HPF DISTRIBUTE C(BLOCK,)
Causes C to be row-distributed, A and B to be
replicated.
No communication.

44
Performance impact of distribution

Back to Matrix Multiply
!HPF DISTRIBUTE C(BLOCK,), A(BLOCK,)
Causes C and A to be row-distributed, and B to be
replicated.
No communication.

45
Performance impact of distribution

Back to Matrix Multiply
!HPF DISTRIBUTE C(BLOCK,), A(,BLOCK)
Causes C to be row-distributed, A to be
column-distributed, and B to be replicated.
Lots of communication!

46
Performance impact of distribution
C
A
B

X
(replicated)
This data will have to move to processor 0
47
Things Can Get Worse