Parallel Algorithms: status and prospects - PowerPoint PPT Presentation

About This Presentation
Title:

Parallel Algorithms: status and prospects

Description:

Parallel Algorithms: status and prospects Guo-Liang CHEN Dept.of computer Science and Technology National High Performance Computing Center at Hefei – PowerPoint PPT presentation

Number of Views:181
Avg rating:3.0/5.0
Slides: 48
Provided by: csNjuEdu1
Category:

less

Transcript and Presenter's Notes

Title: Parallel Algorithms: status and prospects


1
Parallel Algorithms status and prospects
  • Guo-Liang CHEN
  • Dept.of computer Science and Technology
  • National High Performance Computing Center at
    Hefei
  • Univ. of Science and Technology of China
  • Hefei,Anhui,230027,P.R.China
  • glchen_at_ustc.edu.cn
  • http//www.nhpcc.ustc.edu.cn

2
Abstract
  • The parallel algorithm is very important problem
    to parallel processing. In this talk, we first
    briefly introduce to the parallel algorithm. And
    then, we focus on discussion issues and direction
    of the parallel algorithm research. Lastly, the
    existent problems and faced new challenges for
    parallel algorithm research are given. We argue
    that the parallel algorithm research should
    establish a systematic approach to
    Theory-Design-Implementation-Application,
    should form an integrated methodology of
    Architecture-Algorithm-Programming. Only in
    this way, parallel algorithm research becomes
    continuous development and more realistic.

3
Outline
  • Introduction
  • Research Issues
  • Parallel computation models
  • Design techniques
  • Parallel complexity theory
  • Research directions
  • Parallel computing
  • Solving problem from applied domains
  • Non-Tradition computation modes
  • Existent problems and faced new challenges

4
Introduction (1)
  • What is a parallel algorithm?
  • Algorithm------ a method and procedure to solve a
    given problem
  • Parallel algorithm------ an algorithm in which
    multiple operations are performed simultaneously.
  • Why parallelism has been an interesting topic?
  • The real world is inherently parallel it is
    natural and straightforward to express something
    about the real world in a parallel way.
  • There are limits to sequential computing
    performance physical limits such as the speed of
    light.
  • Parallel computation is still likely to be more
    cost-effective for many applications than using
    uniprecessors which are very expensive.

5
Introduction (2)
  • Why parallelism has not led to widespread use?
  • Conscious human thinking appears to be sequential
    to us.
  • The theory required for parallel algorithm in
    immature and was developed after the technology.
  • The hardware platform required for parallel
    algorithm are very expensive.
  • Portability is much more serious issue in
    parallel programming than in sequential.
  • Why we need parallel algorithm?
  • To increase computational speed.
  • To increase computational precision (to generate
    the fine mesh).
  • To meet requirement of real time computation
    (weather forecasting).

6
Introduction (3)
  • Classification of parallel algorithms
  • Numerical parallel algorithms (algebraic
    operation matrix operations, solving a system of
    linear equations etc.).
  • Non-numerical parallel algorithms (symbolic
    operation sorting, searching, graph algorithms
    etc.).
  • Research hierarchy of parallel algorithms
  • Parallel complexity theory (parallelizable
    problem, NC class problem, P-complete problem,
    lower bound etc.)
  • Design and analysis of parallel algorithms
    (efficient parallel algorithms).
  • Implementation of parallel algorithms (hardware
    platform, software supporting).

7
Introduction (4)
  • The history of parallel algorithm research
  • From 70s to 80s two decades, parallel algorithm
    research were very hot, many excellent papers,
    textbooks, monographs of parallel algorithms were
    published.
  • From the middle of 90s, parallel algorithm had
    shifted to parallel computing.
  • New opportunity for parallel algorithm research
  • The dramatically decrease of computer price and
    the rapid development of communication
    technology, make it possible to build PC-cluster
    by ourselves.
  • It is easy to get free software to support
    cluster from internet.

8
Research Issues
  • Parallel computation models
  • PRAM
  • APRAM
  • BSP
  • logP
  • MH and UMH
  • Memory-LogP
  • Design techniques
  • Partitioning Principle
  • Divide-and-Conquer Strategy
  • Balanced Trees Method
  • Doubling Techniques
  • Pipelining Techniques
  • Parallel complexity theory
  • NC class
  • P-complete

9
Research Issues Parallel computation models(1)
  • PRAM(Parallel Random Access Memory)
  • SIMD-SM, Used for fine grain parallel computing,
    Centralized shared memory, Globally synchronized.
  • Advantages
  • Suitable for representing and analyzing
    complexity of parallel algorithms, Simple to use
    , hiding the most of low-level details of
    parallel computer (communication, synchronization
    etc. ).
  • Disadvantages
  • Unsuitable for MIMD computers, Unrealistic to
    neglect the issues of memory contention,
    communication delay etc.

10
Research Issues Parallel computation models(2)
  • Asynchronous PRAM
  • APRAM or MIMD-SM, Used for medium grain parallel
    computation, Centralized shared memory,
    Asynchronous operation, Read/write shared
    variable communication, Explicit synchronous
    (barrier, etc.).
  • Computation in APRAM
  • Computation Consists of global phases separated
    by barriers in a phase all processors execute
    operation asynchronized, the last instruction
    must be a synchronization instruction.
  • Advantages Preserving much of simplicity of
    PRAM, better programmability, program must be
    correctness, easy to analyze complexity.
  • Disadvantages Unsuitable for MIMD computers with
    distributed memory.

11
Research Issues Parallel computation models(3)
  • Bulk Synchronous Parallel (BSP) model
  • MIMD-DM, Consist of a set of processors,
    send/receive message communication , A mechanism
    for synchronization.
  • Bulk synchronous to combine message into a bulk
    one to transfer, to delay communication.
  • BSP parameters
  • p number of processors
  • l Barrier synchronization time
  • g unary packet transmission time (time
    steps/packet)1/bandwidth
  • BSP bulk synchronization can reduce the
    difficulty of design and analysis and ensure
    easily the correctness of the algorithm.

12
Research Issues Parallel computation models(4)
  • LogP model
  • MIMD-DM, point-to-point communication, implicit
    synch.
  • Parameters (LogP)
  • L (network latency), o (communication overhead),
    g (gap1/bandwidth), P (processors).
  • Advantages
  • Capturing the communication bottleneck of
    parallel computers.
  • Hiding details of topology, routing algorithm and
    network protocol.
  • Can be applicable shared variable, message
    passing and data parallel algorithm.
  • Disadvantages
  • Restricting the network capacity, neglecting
    communication congestion.
  • Difficult to describe and design of algorithms.

13
Research Issues Parallel computation models(5)
  • BSP (Bulk Synch.)-gt BSP (Subset synch.)-gt BSP
    (Pairwise synch.) logP
  • BSP can simulate logP with constant factor and
    logP can simulate BSP with at most logarithmic
    factor
  • BSPLogPBarriers-Overhead
  • BSP offers a more convenient abstraction for
    design of algorithm and program and logP provides
    a better control of machine resources
  • BSP seems preferable with greater simplicity and
    portability and more structured programming style

14
Research Issues Parallel computation models(5)
  • MH (Memory Hierarchy) model
  • A sequential computer memory is modeled as a
    sequence of memory modules ltM0, M1, M2, M3, gt,
    With buses connecting adjacent modules, All buses
    may be active simultaneously, M0 (Central
    processor), M1 (Cache), M2(Main memory),
    M3(Storage).
  • MH is oriented address access model, memory
    access cost function f(a) is a monotonic
    increasing function, a is memory address.
  • MH model is suitable for sequential memory
    (magnetic tape etc.)

15
Research Issues Parallel computation models(5)
  • UMH (Uniform Memory Hierarchy) model
  • UMH model captures performance-relevant aspects
    of the hierarchical nature of computer memory
    which is a tool for quantifying the efficiency of
    data movements.
  • Memory access cost function f(k), k is of
    memory hierarchy.
  • The memory access of the algorithm is always
    repeatly from farther memory modules neglecting
    from nearer memory modules if possible.
  • Prefetching operands and overlapping computation
    with memory access operations are encouraged.

16
Research Issues Parallel computation models(5)
  • Memory LogP Model
  • This model is based on data movement across a
    memory hierarchy from resource LM to target LM
    (Local Memory) using point-to-point memory
    communication inspired by LogP to predict and
    analyze the latency of memory copy, pack and
    unpack.
  • Communication lost consist of the sum of memory
    communication and network communication times.
    Memory communication is from user local menory to
    network buffer, network communication is from
    network buffer to network buffer.
  • Estimating the cost of point-to-point
    communication is similar to the original LogP
    only parameters have different meaning.

17
Research Issues Parallel computation models(5)
  • Model Parameters
  • l effective latency, l f (d, s), s (data
    size), d (access pattern) which is the cost of
    data transfer for application, middleware and
    hardware.
  • o ideal overhead, which is the cost of data
    transfer for middleware and hardware.
  • g the reciprocal of g corresponds to per-process
    bandwidth, usually o g.
  • p of processors, p 1 (since consider only
    point-to-point communication).
  • Cost Function (cost per byte)
  • (om l) (Ln/wn) (om l) which is similar to o
    l o of LogP.
  • Om l -- average cost between packing /
    unpacking, Ln -- word size of network
    communication, wn -- word size of instruction
    set.

18
Research Issues Design Techniques(1)
  • Partitioning
  • Breaking up the given problem into several
    nonoverlapping subproblems of almost equal sizes.
  • Solving concurrently these subproblems.
  • Divide and conquer
  • Dividing the problem into several subproblems.
  • Solving recursively the subproblems.
  • Merging solutions of subproblems into a solution
    for original problem.
  • Balanced Tree
  • Building a balanced binary tree on input
    elements.
  • Traveling the tree forward/backward to/from the
    root.

19
Research Issues Design Techniques(2)
  • Pipelining
  • Breaking an algorithm into a sequence of segment
    in which the output of each segment is the input
    of its successor.
  • All segments must produce results at the same
    rate.
  • Doubling
  • It is also called pointer jumping, path doubling.
  • The computation proceeds by a recursive
    application of the calculation.
  • This distance doubles in successive steps. After
    k steps the computation has been performed over
    all elements within a distance of 2K .

20
Research Issues - Parallel Complexity Theory(1)
  • Nick's Class problem
  • Definition if it can be solved in time
    polylogarithmic in the size of the problem using
    at most a polynomial number of processors.
  • Role the class NC in parallel complexity theory
    plays the role of P in sequential complexity.
  • P-complete problem
  • Definition a problem L?P is said to be
    P-complete if every other problem in P can be
    transformed to L in polylogarithmic parallel time
    using a polynomial number of processors.
  • Role P-completeness plays the role of NPC in
    sequential complexity.

21
Research Issues - Parallel Complexity Theory(2)
  • Parallelisable problem
  • NC is the class of problem solvable in
    polylogarithmic Parallel time using a polynomial
    number of processors.
  • Obviously, any problem in NC is also in P (NC?
    P), but few believe P class is also in NC(P ?
    NC).
  • Even if a problem is P-complete, there may be
    efficient(but not necessarily polygarithmic time)
    parallel algorithm for solving it.(Ex. maximum
    flow problem which is P-complete, but several
    efficient parallel algorithms are known for
    solving it).

22
Research Directions
  • Parallel computing
  • Architecture
  • Algorithm
  • Programming
  • Solving problem from applied domains
  • Non-Tradition computation modes
  • Neuro-computing
  • Nature inspired computing
  • Molecular computing
  • Quantum computing

23
Research Directions Parallel Computing(1)Archit
ecture
  • SMP(Symmetric MultiProcessors)
  • MIMD,UMA, medium grain, higher DOP(Degree of
    Parallelism).
  • Commodity microprocessors with on/off-chip
    caches.
  • A high-speed snoopy bus or crossbar switch
  • Central shared memory.
  • Symmetric each processor has equal access to
    SM(Shared Memory),I/O and OS services.
  • Unscalable due to SM and bus.

24
Research Directions Parallel Computing(1)Archit
ecture
  • MPP (Massively Parallel Processors)
  • MIMD, NORMA, medium/large grain.
  • A large number of commodity processors.
  • A customized high bandwidth, low latency
    communication network.
  • Physically distributed memory.
  • May or may not have local disk.
  • Synchronized through blocking message-passing
    operations.

25
Research Directions Parallel Computing(1)Archit
ecture
  • Cluster
  • MIMD, NUMA, coarse grain, Distributed memory.
  • Each node of Cluster is a complete computer (SMP
    of PC), sometimes called headless workstation.
  • A low-cost commodity network.
  • There is always a local disk.
  • A complete OS resides on each node, whereas MPP
    only a microkernel exists.

26
Research Directions Parallel Computing(1)Archit
ecture
  • Constellation
  • Constellation clusters of custom vector
    processors, very expensive
  • Small/medium collection of fast vector nodes,
    vector operation on vector registers
  • Large memory moderate scalability and very
    limited scalability in processor count
  • High bandwidth pipelined memory access
  • Global shared memory (PVP), easy programming model

27
Research Directions Parallel Computing(2)algori
thms
  • Policy Parallelizing a Sequential Algorithm
  • Method Description
  • Detect and exploit any inherent parallelism in an
    existing sequential algorithm.
  • Parallel implementation of a parallelizable code
    segment.
  • Remark
  • parallelizing is most useful and effective
    usually.
  • Not all sequential algorithms can be
    parallelized.
  • A good sequential algorithm is uncertain to be
    parallelized a good parallel algorithm.
  • Many sequential numerical algorithms can be
    parallelized directly into effective parallel
    numerical algorithms.

28
Research Directions Parallel Computing(2)algori
thms
  • Policy Designing a new Parallel Algorithm
  • Method Description
  • In terms of the description of a given problem,
    we redesign or invent a new parallel algorithm
    without regard to the related a sequential
    algorithm.
  • remark
  • Investigating inherent feature of the problem.
  • Inventing a new parallel algorithm is a challenge
    and creative work.

29
Research Directions Parallel Computing(2)algori
thms
  • PolicyBorrowing Other Well-known Algorithm
  • Method Description
  • To find relationship between to be solved problem
    and well-know problem.
  • Design a similar algorithm that solves a given
    problem using a well-know algorithm.
  • remark
  • This work is very skilled work where rich and
    practical experience of algorithm design is
    needed.

30
Research Directions Parallel Computing(2)algori
thms
  • Methods
  • Decomposition
  • Divide-and-Conquer Strategy
  • Randomization
  • Parallel Iterative
  • Pipelining Techniques
  • MultiGrid
  • Conjugate Gradient

31
Research Directions Parallel Computing(2)algori
thms
  • Prcedure (Steps)
  • PCAM Algorithm Design
  • 4 Stages to designing a parallel algorithm
  • P Partitioning
  • C Communication
  • A Agglomeration
  • M Mapping
  • P C focus on concurrency and scalability.
  • A M focus on locality and performance.

32
Research Directions Parallel Computing(2)algori
thms
  • Procedure (Steps)

33
Research Directions Parallel Computing(3)progra
mming
  • Parallel Programming Models
  • Implicit Parallel
  • Sequential programming language, compiler is
    responsible to convert automatically it into a
    parallel codes.
  • Data Parallel
  • Emphasize local computations and data routing
    operations. It can be implemented either on SIMD
    or SPMD.
  • Shared Variable
  • Native model for PVP,SMP and DSM. The portability
    of programs is problematic.
  • Message Passing
  • Native model for MPP and Cluster. The portability
    of programs is enhanced greatly by PVM and MPI
    libraries.

34
Research Directions Parallel
Computing(3)programming
Unified Parallel Programming Model
  • High Abstraction Level
  • Suitable for various distributed and shared
    memory parallel architecture
  • Hide the underlying implementation details of
    message-passing or synchronization
  • Support high abstraction level parallel algorithm
    design and description
  • High Productivity
  • Support fast and intuitive mapping from parallel
    algorithms to parallel programs
  • Support high-performance implementation of
    parallel programs
  • Highly readable parallel programs
  • High Extensibility
  • Can be customized or extended conveniently
  • Can accommodate the needs of various application
    areas

35
Research Directions Parallel
Computing(3)programming
Unified Parallel Programming ModelMain Layers
and Components
  • Core Support Layer
  • GOOMPI Generic Object Oriented MPI
  • PMT Parallel Multi-Thread
  • Core Application Layer
  • Centered on smart parallel and distributed
    abstract data structures
  • Implementation of a highly reusable basic
    parallel algorithm library
  • High-level Framework Layer
  • Provide extensible parallel algorithmic skeletons
  • Support the research and design of new parallel
    algorithms

36
Research Directions Parallel
Computing(3)programming
Unified Parallel Programming Model System
Architecture
37
Research Directions Parallel Computing(3)progra
mming
  • Parallel Programming Languages
  • ANSI X3H5
  • POSIX Threads
  • OpenMP
  • PVMParallel Virtual Machine
  • MPIMessage Passing Interface
  • HPFHigh-Performance Fortran

38
Research Directions Parallel Computing(3)progra
mming
  • Parallel Programming Environment tools
  • Parallelize Compiler
  • SIMDizingVectorizing
  • MIMDizingParallelizing
  • Performance Analysis
  • Data Collection
  • Data Transformation and Visualization

39
Research Directions solving problems from
applied domains
  • Computational ScienceEngineering(CSE)
  • Computational physics
  • Computational chemistry
  • Computational biology
  • Science and Engineering computing requirements
  • Global change
  • Human Genome
  • Fluid turbulence
  • Vehicle dynamics
  • Ocean circulation
  • Superconductor modeling
  • weather forecast

40
Research Directions non-tradition computation
modes(1)
  • Neuro computing Using an amount of 1012 neuron
    to perform parallel and distributed processing.
  • Nature inspired computing Using something
    inspired by natural systems, often has the unique
    characters of self-adaptive, self-organizing and
    self-learning.
  • Molecule parallel computing Using an amount of
    1020 molecules to perform computation of spatial
    parallel instead time parallel.
  • Quantum computing Using quantum superpostition
    principle to make the quantum computation very
    powerful.

41
Research Directions non-tradition computation
modes(2)
  • Neuro computing
  • Principle of Neural Network computing
  • collective-decision
  • cooperation-and-competition
  • learning-and-selforganization
  • massively parallel processing, distributed
    memory, analogy computation
  • Dynamics evolution
  • Complexity theory of NN computing
  • For any NP-hard problem, even finding approximate
    solution by a polynomial size of network is also
    impossible unless NPco-NP.
  • In the sense of the average case, NN is likely to
    be more efficient than conventional computers. A
    great many of experiments have shown it at least.
  • For some particular problems, it is possible to
    find an efficient solution by some NN, but the
    learning of NN is a hard problem.

42
Research Directions non-tradition computation
modes(3)
  • Nature inspired computing
  • Nature Inspired Computation is an emerging
    interdisciplinary area between Computer Science
    and Natural Sciences (especially Life Sciences).
  • Artificial Neural Network
  • Inspired by the function of neurons in the brain
  • Genetic Algorithms
  • Inspired by the biological process of evolution
  • Artificial Immune System
  • Inspired by the principle of biological immune
    system
  • Ant Clone System/Swarm Intelligence
  • Inspired by the behaviour of social insects
  • Ecological computation
  • Inspired by the principle of ecosystem

43
Research Directions non-tradition computation
modes(4)
  • Molecular Computing or DNA Computing
  • in 1994, L. Adleman published a breakthrough for
    making a general-purpose computer with biological
    molecules (DNA).
  • Molecular Computation Project (MCP) is an attempt
    to harness the computational power of molecules
    for information processing. In other words, it is
    a trial to develop a general-purpose computer
    with molecules.
  • Ability to compute quickly (Adlemans experiment
    performed at a rate of 100 teraflops, or 100
    trillion floating point operations per second. By
    comparison, NEC Corporations Earth Simulator,
    the worlds fastest supercomputer, operates at
    approximately 36 teraflops)

44
Research Directions non-tradition computation
modes(5)
  • Quantum computing
  • Reversible computation(Bennett and Fredkin)
  • Quantum complexity
  • Shor's factorization algorithm (1994)
  • Grover's quantum search algorithm (1997)
  • Some scientists believe that the power of quantum
    computation derives from quantum superpostion and
    parallelism, other than entanglement.
  • Shors and Grovers quantum algorithms were only
    of theoretical interest, as it proved extremely
    difficult to build a quantum computer.

45
Existent problems and Faced challenges(1)
  • Existent problems
  • pure theoretical parallel algorithm research is
    somewhat slowish.
  • Some theoretical results of parallel algorithms
    are unrealistic.
  • Parallel software is behind parallel hardware.
  • Parallel applications are not popular and very
    weak.
  • Faced new challenges
  • How to use efficiently thousands upon thousands
    processors to solve practical problem.
  • How to write, map, schedule, run, monitor the an
    amount of parallel processes.
  • What is the parallel computational model for grid
    computing?

46
Existent problems and Faced Challenges(2)
  • What should we do?
  • To establish a systematic approach of the
    "Theory-Design-Implementation-Application" for
    parallel algorithm research.
  • To form an integrate methodology of the
    "Architecture-Algorithm-Programming" for parallel
    algorithm design.
  • Our contributions
  • To cultivate many students in parallel algorithm
    area for our country
  • To publish a series of parallel computing
    textbooks, including
  • parallel computing Architecture Algorithm
    Programming
  • Design and Analysis of Parallel Algorithms
  • Parallel Computer Architectures
  • Parallel Algorithm to Practice.

47
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com