Designing and Building Parallel Programs - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Designing and Building Parallel Programs

Description:

... in a machine dependent fashion, and providing a basis for ... How these models are used through parallel program design and implementation cycle: ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 18
Provided by: twanca
Category:

less

Transcript and Presenter's Notes

Title: Designing and Building Parallel Programs


1
Parallel/Distributed Programming Summary Report
  • Designing and Building Parallel Programs

Prepared by Solange Artie
2
Introduction
  • Parallelism
  • A strategy for performing large complex tasks
    faster.
  • Parallelism is done by
  • Breaking up the tasks into smaller tasks.
  • Assigning the smaller tasks to multiple workers
    to work simultaneously.
  • Coordinating the worker.

3
Parallel Computers and Computation
  • Attributes of parallel computation
  • Concurrency- the ability to perform many actions
    simultaneously, essential if a program is to
    execute on many processors.
  • Scalability- indicates resilience to increasing
    processor counts, and is equally important as
    processor counts appear likely to grow in most
    environment.
  • Locality- depends on the ratio of remote to local
    access costs.
  • Modularity- decomposition of complex entities
    into simpler components.

4
Parallel Computers and Computation
  • Multicomputer
  • Consists of one or more Von Neuman computers
    connected by an interconnection network.
  • A simple and realistic machine model that
    provides a basis for the design of scalable and
    portable parallel programs.
  • Tasks/Channels
  • Simplifies the programming of multi-computers by
    providing abstractions that allow us to discuss
    concurrency, locality, and communication in a
    machine dependent fashion, and providing a basis
    for the modular construction of parallel programs.

5
Designing Parallel Algorithms
  • Steps
  • Partition- first partition a problem into many
    small pieces, or tasks.
  • Communication- organize the communication
    required to obtain data required for task
    execution.
  • Agglomeration- decrease communication and
    development costs, while maintaining flexibility
    if possible.
  • Map-minimize execution time.

6
Quantitative Basis for Design
  • Mathematical performance model characterize
  • Execution time
  • Efficiency
  • Scalability
  • How these models are used through parallel
    program design and implementation cycle
  • Characterize the computation and communication
    requirements for parallel algorithm by building
    simple performance models. In which models can be
    used to choose between algorithmic alternatives,
    to identify problem areas in the design and to
    verify that algorithms meet performance
    requirements.
  • Define the performance models and conduct simple
    experiments to determine unknown parameters like
  • Computation time
  • Communication costs
  • Validate assumptions
  • Compare the performance of the parallel program
    with its performance model, there. Help
    implementation error and quality of the model.

7
Putting Components Together
  • Major Points in Modular Design Techniques
  • The central tenets of modular designs such as
    simple interfaces and information hiding, apply
    in parallel programming just as in sequential
    programming.
  • Data distribution is an important implementation
    that, if abstracted and of a module interface,
    can facilitate code reuse.
  • Performance models can be composed, but care must
    be taken to account for communication costs at
    interfaces, overlapping of computation and
    communication and other factors.

8
Compositional C(CC)
  • ltActual vs. Baseline Variance.gt
  • ltCost overrun explanations.gt
  • CC
  • Provides a small set of extension to C for
    specifying currency, locality, communication, and
    mapping.
  • Construct can be used to build libraries that
    provide the task and channel abstractions.
  • Provides basic mechanisms that can be used to
    implement a variety of different parallel program
    structures.

ltScope changes from last milestone review.gt
  • ltActual vs. Baseline Variance.gt
  • ltProject file link for more info.gt

9
Fortran M
  • Fortran M
  • Provides a direct and complete implementation of
    the task/channel programming model.
  • Incorporates language constructs for defining
    tasks and channels.
  • It allows mapping decisions to be changed
    independently of other design aspects.

10
High Performance Fortran
  • F90's array language and HPF's data distribution
    directives and related constructs chief features
  • An array language comprising array assignments,
    array intrinsic, and (in HPF) FORALL and
    INDEPENDENT constructs is used to reveal the
    fine-grained concurrency inherent in
    data-parallel operations on arrays.
  • Data distribution directives are introduced to
    provide the programmer with control over
    partitioning, agglomeration, and mapping (and
    hence locality).
  • An HPF compiler translates this high-level
    specification into an executable program by
    generating the communication code implied by a
    particular set of data-parallel operations and
    data distribution directives.

11
High Performance Fortran
  • Pros
  • The most attractive feature of the data-parallel
    approach as exemplified in HPF is that the
    compiler takes on the job of generating
    communication code.
  • Two advantages
  • It allows the programmer to focus on the tasks of
    identifying opportunities for concurrent
    execution and determining efficient partition,
    agglomeration, and mapping strategies.
  • It simplifies the task of exploring alternative
    parallel algorithms in principle, only data
    distribution directives need be changed.
  • Con
  • A problematic feature of HPF is the limited range
    of parallel algorithms that can be expressed in
    HPF and compiled efficiently for large parallel
    computers.

12
Message Passing Interface
  • Parallel algorithm designs developed using the
    techniques that has already been discussed can be
    translated into message-passing programs.
  • Principal features of the message-passing
    programming model
  • A computation consists of a (typically fixed) set
    of heavyweight processes, each with a unique
    identifier (integers 0..P--1).
  • Processes interact by exchanging typed messages,
    by engaging in collective communication
    operations, or by probing for pending messages.
  • Modularity is supported via communicators, which
    allow subprograms to encapsulate communication
    operations and to be combined in sequential and
    parallel compositions.

13
Message Passing Interface
  • Principle features continued
  • Algorithms developed using the techniques set out
    in earlier can be expressed directly if they do
    not create tasks dynamically or place multiple
    tasks on a processor.
  • Algorithms that create tasks dynamically or place
    multiple tasks on a processor can require
    substantial refinement before they can be
    implemented in MPI.
  • Determinism is not guaranteed but can be achieved
    with careful programming.

14
Performance Tools
  • Analytic performance model
  • Is an idealization of program behavior that must
    be validated by comparison with empirical
    results.
  • This validation process can reveal deficiencies
    in both the model and the parallel program.
  • Performance analysis
  • Is most effective when guided by an
    understanding rooted in analytic models, and
    models are most accurate when calibrated with
    empirical data.

15
Performance Tools
  • Basic techniques and surveyed popular tools for
    collecting, transforming, and analyzing
    performance data
  • Data collection level- profile and counter data
    are easier to obtain and to analyze, while races
    can show fine detail.
  • Data transformation and visualization levels-
    data reduction techniques are able to reduce raw
    performance data to more meaningful and
    manageable quantities.
  • Data visualization tools- can facilitate the
    navigation of large multidimensional data sets.

16
Random Numbers
  • Linear congruential method commonly used
    sequential random number and can be adapted for
    parallel execution.
  • Exemplifies how parallel computation can
    introduce new issues even in apparently simple
    problems.
  • In the case of random numbers, these issues
    include reproducibility, scalability, the
    preservation of randomness, and the greater
    number of random values consumed when executing
    on many processors.

17
Hypercube Algorithm
  • The hypercube template
  • Is one of the most useful communication
    structures in parallel computing.
  • Allows information to be propagated among P tasks
    in just log P steps.
  • Nearest-neighbor exchange on a two-dimensional
    torus
  • Used to implement finite difference computations,
    matrix multiplication and graph algorithms.
  • Learning to recognize and apply templates such as
    the hypercube, torus, etc can greatly simplify
    the task of designing and implementing parallel
    programs.
Write a Comment
User Comments (0)
About PowerShow.com