DATA PARALLEL LANGUAGES Chapter 4b - PowerPoint PPT Presentation

About This Presentation
Title:

DATA PARALLEL LANGUAGES Chapter 4b

Description:

Transmission of data occurs in lock step (SIMD fashion) without congestion or buffering. ... Start | Programs | Accessories | Communications | Remote Desktop ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 36
Provided by: johnni2
Learn more at: https://www.cs.kent.edu
Category:

less

Transcript and Presenter's Notes

Title: DATA PARALLEL LANGUAGES Chapter 4b


1
DATA PARALLEL LANGUAGES(Chapter 4b)
  • multiC,
  • Fortran 90, and HPF

2
The MultiC Language
  • References
  • The multiC Programming Language, Preliminary
    Documentation, WaveTracer, PUB-00001-001-00.80,
    Jan. 1991.
  • The multiC Programming Language, User
    Documentation, WaveTracer, PUB-00001-001-1.02,
    June 1992.
  • Note This presentation is based on the 1991
    manual, unless otherwise noted. (e.g., manuals
    refers to both versions.)
  • MultiC is the language used the WaveTracer and
    the Zephyr SIMD computers.
  • The Zephyr is a second generation WaveTracer, but
    was never commercially available.
  • We were given 10 Zephyrs and several other
    incomplete Zephyrs to use for spare part
  • A MultiC was designed for their third
    generation computer, but neither were released.
  • Both MultiC and a parallel language designed for
    the MasPar are fairly similar to an earlier
    parallel language called C.
  • C was designed by Guy Steele for the Connection
    Machine.
  • All are data parallel and extensions of the C
    language
  • An assembler was also written for the WaveTracer
    (and probably the Zephyr).
  • It was intended for use only by company
    technicians.

3
  • Information about assembler were released to
    WaveTracer customers on a need to know basis.
  • No manual was distributed but some details were
    recorded in a short report.
  • Professor Potter was given some details needed to
    put the ASC language on the WaveTracer
  • MultiC is an extension to ANSI C, as documented
    by the following book
  • The C Programming Language, Second Edition, 1988,
    Kernighan Richie.
  • The WaveTracer computer is called a Data
    Transport Computer (DTC) in manual
  • a large amount of data can be moved in parallel
    using interprocessor communications.
  • Primary expected uses for WaveTracer were
    scientific modeling and scientific computation
  • acoustic waves
  • heat flow
  • fluid flow
  • medical imaging
  • molecular modeling
  • neural networks
  • The 3-D applications are supported by a 3D mesh
    on the WaveTracer
  • Done by sampling a finite set of points (nodes)
    in space.

4
WaveTracer Architecture Background
  • Architecture for Zephyr is fairly similar
  • Exceptions will be mentioned whenever known
  • Each board has 4096 bit-serial processors, which
    can be connected in any of the following ways
  • 16x16x16 cube in 3D space
  • 64x64 square in 2D space
  • 4096 array in 1D space
  • The 3D architecture is native on the WT and the
    other networks are supported in hardware using
    primarily the 3D hardware
  • The Zephyr probably has a 2D network and only
    simulates the more expensive 3D network using
    system software.
  • WaveTracer was available in 1, 2, or 4 boards,
    arranged as follows
  • 2 boards were arranged as a 16x32x16 cube
  • one cube stacked on the top of another cube
  • 8192 processors overall

5
WaveTracer Architecture (Cont)
  • Four boards are arranged as a 32x32x16 cube
  • 16,384 processors
  • Arranged as two columns of stacked cubes
  • Computer supports automatic creation of virtual
    processors and network connections to connect
    these virtual processors.
  • If each processor supports k nodes, this slows
    down execution speed by a factor of k
  • Each processor performs each operation k times.
  • Limited by the amount of memory required for each
    virtual node
  • In practice, slowdown is usually less than k
  • The set of virtual processors supported by a
    physical processor is called its territory.

6
Specifiers for MultiC Variables
  • Any datatype in C except pointers can be declared
    to be a parallel variable using the declaration
    multi
  • This replicates the data object for each
    processor to produce a 1,2, or 3 dimensional data
    object
  • In a parallel execution, all multi objects must
    have the same dimension.
  • The multi declaration follows the same format as
    ANSC C, e.g
  • multi int imag, buffer
  • The uni declaration is used to declare a scalar
    variable
  • Is the default and need not be shown.
  • The following are equivalent
  • uni int ptr
  • int ptr
  • Bit Length Variables
  • can be of type uni or multi
  • Allows user to save memory
  • All operations can be performed on these
    bit-length values
  • Example A 2 color image can be declared by
  • multi unsigned int image 1
  • and an 8 color image by
  • multi unsigned int picture3

7
Some Control Flow Commands
  • For uni type data structures, control flow in
    MultiC is identical to that in ANSI C.
  • The parallel IF-ELSE Statement
  • As in ASC, both the IF and ELSE portions of the
    code is executed.
  • As with ASC, the IF is a mask-setting operation
    rather than a branching command
  • FORMAT Same as for C
  • WARNING In contrast to ASC, both sets of
    statements are executed.
  • Even if no responders are active in one part, the
    sequential commands in that part are executed.
  • Example count count 1
  • The parallel WHILE statement
  • The format used is
  • while(condition)
  • The repetition continues as long as condition
    is satisfied by one or more responders.
  • Only those responders (i.e., ones who satisfies
    condition preceding to this pass through the
    body of while) are active during the execution
    of the body of the while.

8
Other Commands
  • Jump Statements
  • goto, return, continue, break
  • These commands are in conflict with structured
    programming and should be used with restraint.
  • Parallel Reduction Operators
  • Accumulative Product
  • / Reciprocal Accumulative Product
  • Accumulative Sum
  • - Negate then Accumulative Sum
  • Accumulative bitwise AND
  • Accumulative bitwise OR
  • gt? Accumulative Maximum
  • lt? Accumulative Minimum
  • Each of the above reduction operations return a
    uni value and provide a powerful arithmetic
    operation.
  • Each accumulative operation would otherwise
    require one or more ANSI C loop constructs.
  • Example If A is a multi data type
  • largest_value gt? A
  • smallest_value lt? A

9
  • Data Replication
  • Example
  • multi int A 0
  • -
  • -
  • -
  • A 2
  • First statement stores 0 in every A field
    (compile time)
  • Last statement stores 2 in A field of every
    active PE.
  • Interprocessor Communications
  • Operators have the form
  • dx dy dzm
  • This operator can shift the components of the
    multi variable m of all active processors along
    one or more coordinate dimensions.
  • Example A -1 2 1B
  • Causes each active processor to move the data in
    its B field to the A field of the processor at
    the following location
  • one unit in the negative X direction
  • two units in the positive Y direction
  • one unit in the positive Z direction
  • Coordinate Axes

10
  • Conventions
  • If value of dz operator is not specified, it is
    assumed to be 0
  • If the values of dy and dz operators are not
    specified, both are assumed to be 0
  • Example x yV is the same as x y 0V
  • Inactive processor actions
  • Does not send its data to another processor
  • Participates in moving the data from other
    processors along.
  • Transmission of data occurs in lock step (SIMD
    fashion) without congestion or buffering.
  • Coordinate Functions
  • Used to return a coordinate for each active
    virtual processor.
  • Format multi_x(), multi_y(), and multi_z()
  • Example
  • If(multi_x() 0 multi_y 2 multi_z
    1)
  • u A
  • Note that all processors except the one at
    (0,2,1) are inactive with the body of the IF.
  • The accumulated sum of the active components of
    the multivariable A is just the value of the
    component of A at processor (0,2,1)
  • Effect of this example is to store the value in A
    at (0,2,1) in the uni variable u.

11
  • If the second command in the example is changed
    to
  • A u
  • the effect is to store the contents of the uni
    variable u
  • into multi variable A at location (0,2,1).
  • (see manual pg 11-13,14 for more details)
  • Arrays
  • Multi-pointers are not supported.
  • Can not have a parallel variable containing a
    pointer in each component of the array.
  • uni pointers to multi-variables are allowed.
  • Array Examples
  • int array_1 10
  • int array_2 55
  • multi int array_3 5
  • array_1 is a 1 dimensional standard C array
  • array_2 is a 2 dimensional standard C array
  • array_3 is a 1-dimensional array of multi
    variables
  • MULTI_PERFORM Command
  • Command gives the size of each dimension of all
    multi-values prior to calling for a parallel
    execution.
  • Format

12
  • multi_perform is normally called within the main
    program.
  • Usually calls a subroutine that includes all of
    the
  • parallel work
  • parallel I/O
  • The main program usually includes
  • Opening and closing of files
  • Some of the scalar I/O
  • define and include statements
  • When multi_perform is called, it initializes any
    extern and static multi objects
  • In the previous example, multi_perform calls
    func. After func returns, the multi space created
    for it becomes undefined.
  • The perror function is extended to print error
    messages corresponding to errno numbers resulting
    from the execution of MultiC extensions.
  • Has the following format
  • if(multi_perform(func,x,y,z)) perror(argv0)
  • See usage in the examples in Appendix A
  • More information on page 11-2 of manual
  • Examples in Manual
  • Many examples in the manual
  • 17 in appendices alone
  • Also stored under exname.mc in the MultiC package

13
The AnyResponder Function
  • Code Segment for Tallying Responders
  • unsigned int short, tall
  • multi float height
  • load_height / assigns value in inches to
    height /
  • if(height gt 70)
  • tall (multi int)1
  • else
  • short (multi int)1
  • printf(There are d tall people \n, tall)
  • Comments on Code Segment
  • Note that the construct
  • (multi int)1
  • counts the active PE (i.e., responders).
  • This technique avoids setting up a bit field to
    use to tally active PEs.
  • Instead sets up a temporary multi variable.
  • Can be used to see there is at least one
    responder at any given time.
  • Check to see if resulting sum is positive
  • Provides technique to define the AnyResponder
    function needed for associative programming

14
Accessing Components from Multi Variables
  • Code from page 11-13 or 11-14 of MultiC manuals
  • include ltmulti.hgt / includes multi library
    /
  • include ltstdlib.hgt
  • include ltstdio.hgt
  • void work (void)
  • uni int a, b, c, u
  • multi int n
  • / Code goes here to assign values to n /
  • / Code goes here to assign values to a, b, c
    /
  • if (mult_x() a multi_y() b
  • multi_z() c)
  • u n / Assigns value of n at
    PE(a,b,c) /
  • int main (int argc, char, argv )
  • if (multi_perform(work, 7 , 7, 7))
  • perror (argv0)
  • exit (EXIT_SUCCESS)

15
The oneof and next Functions
  • Function oneof provides a way of selecting one
    out of several active processors
  • Defined in Multi Struct program (A.15) in manual
  • Procedure is essential for associative
    programming.
  • Code for oneof
  • multi unsigned oneof(void)1
  • / Store the coordinate values in multi
    variables x and y /
  • multi unsigned x multi_x(),
  • y multi_y(),
  • uno1 0
  • / Next select processor with highest
    coordinate value /
  • if( x gt? x)
  • if( y gt? y)
  • uno 1
  • return uno
  • Note that multi variable uno stores a 1 for
    exactly one processor and all the other
    coordinates of uno stores a 0.
  • The function oneof can be used by another
    procedure which is called by multi_perform.
  • An example of oneof being called by another
    procedure is given on pages A46-50 of the
    manuals.
  • Should be useable in the form
  • if(oneof()) / Check to see if an active
    responder exists /

16
  • Preceding procedure assumed a 2D configuration
    of processors with z1.
  • If configuration is 3D, the process of selecting
    the coordinates can be continued by also
    selecting the highest z-coordinate.
  • Stepping through the active PEs (i.e., next)
  • Provides the MultiC equivalent of the ASC next
    command
  • An additional one-bit multi integer variable
    called bi (for busy-idle) is needed.
  • First set bi to zero
  • Activate the PEs you wish to step through.
  • Next, have the active PEs to write a 1 into bi.
  • Use
  • if(oneof())
  • to restrict the mask to one of the active PEs.
  • Perform all desired operations with active PE.
  • Have active PE set its bi value to 0 and then
    exit the preceding if statement.
  • Use the (accumulative sum) operator to see
    if any PEs remain to be processed.
  • If so, return to step above calling oneof
  • This return can be implemented using a while loop.

17
Sequential Printing of Multi Variable Values
  • Example Print a block of the image 2D bit array.
  • A function select_int is used which will return
    the value of image at the specified (x,y,z)
    coordinate.
  • The printing occurs in two loops which
  • increments the value of x from 0 to some
    specified constant.
  • increments the value of y from 0 to some
    specified constant.
  • This example is from page 8-1 of the manuals and
    is used in an example on pgs A16-18 of 1991
    manual and pgs A12-14 of 1992 manual.
  • The select_int function
  • select_int (multi mptr, int x, int y, int z)
  • / Here, mptr is a uni pointer to type multi /
  • int r
  • if( multi_x x
  • multi_y y
  • multi_z z)
  • / Restricts scope to the one PE at (x,y,z) /
  • r mptr
  • / OR reduction operator transfers binary value
    of multi variable at (x,y,z) to the uni variable
    / return r

18
  • The two loops to print a block of values of the
    image multi variable.
  • for( y 0 y lt ysize y)
  • for (x 0 x lt xsize x)
  • printf( d, select_int (image,x,y,z)
  • printf( \n)
  • Above technique can be adapted to print or read
    multi variables or part of multi variables.
  • Efficient as long as the number of locations
    accessed is small.
  • If I/O operations involve a large amounts of
    data, the more efficient data transfer functions
    described in manuals (Chapter 8 and Section 11.2
    and 11.13) should be used.
  • The functions multi_fread and multi_fwrite are
    analogous to fwrite and fread in C. Information
    about them is given on pages 11-1 to 11-4 of the
    manuals.

19
Moving Data between Uni Arrays and Multi Variables
  • The following functions allow the user to move
    data between uni arrays and multi variables
  • multi_from_uni ...
  • multi_to_uni ...
  • The above may be replaced with a data type
    such as
  • char
  • short
  • int
  • long
  • float
  • double
  • cfloat
  • cdouble
  • These functions are illustrated in several of the
    examples.

20
Compiling and Executing Programs on the Zephyr
  • A 4k Zephyr machine is available for use in the
    Parallel and Associative Computing Lab.
  • It is presently connected to a Windows 2003
    Server which supports remote desktop for
    interactive use. However, you may use the
    computer directly at the console while the lab is
    open
  • Visual Studio 2002 has been installed on the
    server. The MultiC language uses a compiler
    wrapper to translate MultiC code into Visual C
    code.
  • Programming the Zephyr on a Windows 2003 system
    is similar to that using command line programming
    tools in UNIX.
  • You can edit your program using Edit or
    Notepad
  • You can compile and create an executable using
    nmake
  • You can execute your program using the Visual
    Studio Command Shell
  • This is a special DOS shell that has extra path,
    include, and library environment variables used
    by the compiler and linker.

21
Compiling and Executing Programs on the Zephyr
  • Login or use Remote Desktop Connection to
    zserver.cs.kent.edu
  • From Windows XP choose Start Programs
    Accessories Communications Remote Desktop
    Connection
  • Enter your login name and password and click on
    OK
  • Open an command window and run the DTC Monitor
    program
  • Type dtcmonitor at the command prompt.
  • This is a daemon program that serializes and
    controls executables using the Zephyr.
  • When this 100 complete, you can then execute
    programs on the Zephyr.
  • You can minimize this command shell.
  • Important When you are finished enter CTRL-C to
    end the dtcmonitor.
  • Create a folder on your desktop for programs.
    You can copy the example Zephyr MultiC program
    from D\Common\zephyrtest to your local folder
    and rename it for your programming assignment.

22
Compiling and Executing Programs on the Zephyr
  • Create or edit your MultiC program using DOS edit
    or Windows Notepad.
  • From the Visual Studio Command Shell type
  • edit anyprog.mc
  • notepad anyproc.mc
  • Make sure that the file extension is .mc
  • Save your work before compiling
  • Modify the makefile template and change the names
    of the MultiC file and object file to those used
    in your programming assignment.
  • Compile and link your program by typing
  • nmake /f anyprog.mak
  • nmake (for the default Makefile)
  • Execute your program by typing the name of your
    executable at the command prompt.
  • When you are finished enter CTRL-C to end the
    dtcmonitor.

23
OMIT FOR PRESENT(Multi-C Recursion)
  • It is possible to write recursive multi
    functions in multiC, but you have to test if
    there are active PEs still working.
  • Consider the following multiC function
  • multi int factorial( multi int n )
  • multi int r
  • if( n ! 1 )
  • r (factorial(n-1)n)
  • else
  • r 1
  • return( r )
  • What happens?

24
OMIT FOR PRESENT (MultiC Recursion Example)
  • Recursion
  • multi int factorial( multi int n )
  • multi int r
  • / stop calculating if every component has been
    computed /
  • if( ! (multi int) 1 )
  • return(( multi int ) 0 )
  • / otherwise, continue calculating /
  • if( n gt 1 )
  • r factorial( n-1 ) n
  • else
  • r 1
  • return( r )

25
Fortran 90 and HPF (High Performance Fortran)
  • A de facto standard for scientific and
    engineering computations

26
Fortran 90 AND HPF
  • References
  • 19 Ian Foster, Designing and Building Parallel
    Programs, (online copy), chapter 7.
  • 8 Jordan and Alaghband, Fundamentals of
    Parallel Processing, Section 3.6.
  • Recall data parallelism refers to the concurrency
    that occurs when all the same operation is
    executed on some or all elements in a data set.
  • A data parallel program is a sequence of such
    operations.
  • Fortran 90 (or F90)is a data-parallel programming
    language.
  • Some job control algorithms can not be expressed
    in a data parallel language.
  • F90s array assignment statement and array
    functions can be used to specify certain types of
    data parallel computation.
  • F90 forms the basis of HPF (High Performance
    Fortran) which augments F90 with a small set of
    extensions.
  • In F90 and HPF, the (parallel) data structure
    operated on are restricted to arrays.
  • E.g., data types such as trees, sets, etc. are
    not supported.
  • All array elements must be of the same type.
  • Fortran arrays can have up to 7 dimensions.

27
  • Parallelism in F90 can be expressed explicitly,
    as in the array assignment statement
  • A BC ! A,B,C are arrays
  • Compilers may be able to detect implicit
    parallelism, as in the following example
  • do I 1,m
  • do j 1,n
  • A(i,j) B(i.,j) C(i,j)
  • enddo
  • enddo
  • Parallel execution of above code depends on the
    fact that the various do-loops are independent
  • i.e., one loop does not write/read a variable
    that another loop writes/reads.
  • Compilation can also introduce communications
    operations when the computation mapped to one PE
    requires data mapped to another PE.
  • Communication operations in F90 (and HPF) are
    inferred by the compiler and do not need to be
    specified by the programmer.
  • These are derived by the compiler from the data
    decomposition specified by the programmer.
  • F90 allows a variety of scalar operations (i.e.,
    defined on a single value) to be applied to an
    entire array.

28
  • All F90s unary and binary operations can be
    applied to arrays as well, as illustrated in
    below examples
  • real A(10,200), B(10,10), c
  • logical L(10,20)
  • A B c
  • A A 1.0
  • A sqrt(A)
  • L A .EQ. B
  • The function of the mask is handled in F90 by the
    where statement, which has two forms.
  • The first form uses the where to restrict array
    elements on which an assignment is performed
  • For example, the following replaces each nonzero
    entry of array with its reciprocal
  • where(x / 0) x 1.0/X
  • The second form of the where is block structured
    and has the form
  • where (mask-expression)
  • array_assignment
  • elsewhere
  • array_assignment
  • end where

29
Some F90 Array Intrinsic Functions
  • Array intrinsic functions below assume a vector
    version of an array is formed using column
    major ordering
  • Some F90 array intrinsic functions
  • RESHAPE(A,...) converts array A into a new array
    with specified shape and fill
  • PACK(A, MASK, FILL) forms a vector from masked
    elements of A, using fill as needed.
  • UNPACK(A,MASK, FILL) replaces masked elements
    with elements from FILL vector
  • MERGE(A, B, MASK) returns array of masked A
    entries and unmasked entries of B
  • SPREAD(A, DIM, N) replicate array A, using N
    using N copies to form a new array of one larger
  • dimension
  • CSHIFT(A, SHIFT, DIM) column major rotation of
    elements of A
  • EOSHIFT(A,...) elements of A are shifted off the
    end along specified dimension, with end values
    with fill from either a specified scalar or array
    of dimension 1 less than A
  • TRANSPOSE(A) returns transpose of array A.
  • Some array intrinsic functions that perform
    computation
  • MAXVAL(A) returns the maximum value of A
  • MINVAL(A) returns the minimum value of A
  • SUM(A) returns the sum of the element of A
  • PRODUCT(A) product of elements of A
  • MAXLOC(ARRAY) indices of max value in A
  • MINLOC(ARRAY) indices of min value in A
  • MATMUL(A,B) matrix multiplication AB

30
The HPF Data Distribution Extension
  • Reference 19 Ian Foster, Designing and
    Building Parallel Programs, (online copy),
    chapter 7.
  • F90 array expressions specify opportunities for
    parallel execution but no control over how to
    perform these so that communication is minimized.
  • HPF handling of data distribution involves three
    directives
  • The PROCESSOR directive specifies the shape and
    size of the array of abstract processors.
  • The ALIGN directive is used to align elements of
    different arrays with each other, indicating that
    they should be distributed in the same manner.
  • The DISTRIBUTE directive is used to distribute an
    object (and all objects aligned with it) onto an
    abstract processor array.
  • The data distribution directives can have a major
    impact on a programs performance (but not on the
    results computed), affecting
  • Partitioning of data to processors
  • Agglomeration Considering value of combining
    tasks to produce fewer larger tasks.
  • Communications required to coordinate task
    execution.
  • Mapping of tasks to processors

31
HPF Data Distribution (Cont.)
  • Data distribution directives are recommendations
    to a HPF compiler, not instructions.
  • Compiler can ignore them if it determines that
    this will improve performance.
  • PROCESSOR directive
  • Creates an arrangement for abstract processors
    and gives this arrangement a name.
  • Example !HPF PROCESSORS P(4,8)
  • Normally one abstract processor is created for
    each physical processor.
  • There could be more abstract processors than
    physical ones.
  • However, HPF does not specify a way of mapping
    abstract to physical processors.
  • ALIGN Directive
  • Specifies array elements that should, if
    possible, be mapped to the same processor.
  • Operations involving data objects that are
    aligned are likely to be more efficient due to
    reduced communication costs if on same PE.
  • EXAMPLE
  • real B(50), C(50)
  • !HPF ALIGN C() WITH B()

32
HPF Data Distribution (Cont.)
  • ALIGN Directive (cont.)
  • A can be used to collapse dimensions (i.e.,
    to match one element with many elements
  • Considerably flexibility is allowed in specifying
    which array elements are to be aligned.
  • Dummy variables can be used for dimensions
  • Integer formulas to specify offsets.
  • An align statement can be used to specify that
    elements of an array should be replicated over
    certain processors.
  • Costly if replicated arrays are updated often.
  • Increases communication or redundant computation.
  • DISTRIBUTE Directive
  • Indicates how data are to be distributed among
    processor memories.
  • Specifies for each dimension of an array one of
    three ways that the array elements will be
    distributed among the processors
  • No distribution
  • BLOCK(n) Block distribution
  • (default n N/P)
  • CYCLIC(n) Cyclic distribution
  • (default n 1)

33
HPF Data Distribution (Cont.)
  • DISTRIBUTE Directive (cont.)
  • Block distribution divides the items/indices in
    that dimension into equal-sized blocks of size
    N/P.
  • Cyclic distribution maps every Pth index to the
    same processor.
  • Applies not only to the named array but also to
    any array that is aligned to it.
  • The following DISTRIBUTE directives specifies a
    mapping for all three arrays.
  • !HPF PROCESSORS p(20)
  • real A(100,100), B(100,100), C(100,100)
  • !HPF ALIGN B(,) with A(,)
  • !HPF DISTRIBUTE A(BLOCK,) ONTO p

34
HPF Concurrency
  • The F90 array assignment statements provide a
    convenient way of specifying data parallel
    operations.
  • However, this does not apply to all data parallel
    operations, as the array on the right hand must
    have the same shape as the one on the left hand
    side.
  • HPF provides two other constructs to exploit data
    parallelism, namely the FORALL and the
    INDEPENDENT directives.
  • The FORALL Statement
  • Allows a more general assignments to sections of
    an array.
  • General form is
  • FORALL (triplet, ... , triplet, mask) assignment
  • Examples
  • FORALL (i1m, j1,n) X(i,j) ij
  • FORALL (i1n, j1,n, iltj) Y(i,j) 0.0
  • The INDEPENDENT Directive and Do-Loops
  • The INDEPENDENT directive can be used to assert
    that the iterations of a do-loop can be performed
    independently, that is
  • They can be performed in any order
  • They can be performed concurrently
  • The INDEPENDENT directive must immediately
    precede the do-loop that it applies to.
  • Examples of independent and non-independent
    do-loops are given in 19, Foster, pg 258-9.

35
Additional HPF Comments
  • A HPF program typically consists of a sequence of
    calls to subroutines and functions.
  • The data distribution that is best for a
    subroutine may be different than the data
    distribution used in the calling program.
  • Two possible strategies for handling this
    situation are
  • Specify a local distribution using DISTRIBUTE and
    ALIGN, even if this requires expensive data
    movement on entering
  • Cost normally occurs on return as well.
  • Use whatever data distribution is used in the
    calling program, even if not optimal. This
    requires use of INHERIT directive.
  • Both F90 and HPF intrinsic functions (e.g., SUM,
    MAXVAL) combine data from entire arrays and
    involve considerable communication.
  • Some other F90/HPF intrinsic functions such as
    DOT_PRODUCT involve communciation cost only if
    their arguments are not aligned.
  • Array operations involving the FORALL statement
    can result in communication if the computation of
    a value for an element A(i) require data values
    that are not on the same processor (e.g., B(j)).
Write a Comment
User Comments (0)
About PowerShow.com