Title: Parallel Algorithms: status and prospects
1Parallel Algorithms status and prospects
- Guo-Liang CHEN
- Dept.of computer Science and Technology
- National High Performance Computing Center at
Hefei - Univ. of Science and Technology of China
- Hefei,Anhui,230027,P.R.China
- glchen_at_ustc.edu.cn
- http//www.nhpcc.ustc.edu.cn
2Abstract
- The parallel algorithm is very important problem
to parallel processing. In this talk, we first
briefly introduce to the parallel algorithm. And
then, we focus on discussion issues and direction
of the parallel algorithm research. Lastly, the
existent problems and faced new challenges for
parallel algorithm research are given. We argue
that the parallel algorithm research should
establish a systematic approach to
Theory-Design-Implementation-Application,
should form an integrated methodology of
Architecture-Algorithm-Programming. Only in
this way, parallel algorithm research becomes
continuous development and more realistic.
3Outline
- Introduction
- Research Issues
- Parallel computation models
- Design techniques
- Parallel complexity theory
- Research directions
- Parallel computing
- Solving problem from applied domains
- Non-Tradition computation modes
- Existent problems and faced new challenges
4Introduction (1)
- What is a parallel algorithm?
- Algorithm------ a method and procedure to solve a
given problem - Parallel algorithm------ an algorithm in which
multiple operations are performed simultaneously.
- Why parallelism has been an interesting topic?
- The real world is inherently parallel it is
natural and straightforward to express something
about the real world in a parallel way. - There are limits to sequential computing
performance physical limits such as the speed of
light. - Parallel computation is still likely to be more
cost-effective for many applications than using
uniprecessors which are very expensive.
5Introduction (2)
- Why parallelism has not led to widespread use?
- Conscious human thinking appears to be sequential
to us. - The theory required for parallel algorithm in
immature and was developed after the technology. - The hardware platform required for parallel
algorithm are very expensive. - Portability is much more serious issue in
parallel programming than in sequential. - Why we need parallel algorithm?
- To increase computational speed.
- To increase computational precision (to generate
the fine mesh). - To meet requirement of real time computation
(weather forecasting).
6Introduction (3)
- Classification of parallel algorithms
- Numerical parallel algorithms (algebraic
operation matrix operations, solving a system of
linear equations etc.). - Non-numerical parallel algorithms (symbolic
operation sorting, searching, graph algorithms
etc.). - Research hierarchy of parallel algorithms
- Parallel complexity theory (parallelizable
problem, NC class problem, P-complete problem,
lower bound etc.) - Design and analysis of parallel algorithms
(efficient parallel algorithms). - Implementation of parallel algorithms (hardware
platform, software supporting).
7Introduction (4)
- The history of parallel algorithm research
- From 70s to 80s two decades, parallel algorithm
research were very hot, many excellent papers,
textbooks, monographs of parallel algorithms were
published. - From the middle of 90s, parallel algorithm had
shifted to parallel computing. - New opportunity for parallel algorithm research
- The dramatically decrease of computer price and
the rapid development of communication
technology, make it possible to build PC-cluster
by ourselves. - It is easy to get free software to support
cluster from internet.
8Research Issues
- Parallel computation models
- PRAM
- APRAM
- BSP
- logP
- MH and UMH
- Memory-LogP
- Design techniques
- Partitioning Principle
- Divide-and-Conquer Strategy
- Balanced Trees Method
- Doubling Techniques
- Pipelining Techniques
- Parallel complexity theory
- NC class
- P-complete
9Research Issues Parallel computation models(1)
- PRAM(Parallel Random Access Memory)
- SIMD-SM, Used for fine grain parallel computing,
Centralized shared memory, Globally synchronized. - Advantages
- Suitable for representing and analyzing
complexity of parallel algorithms, Simple to use
, hiding the most of low-level details of
parallel computer (communication, synchronization
etc. ). - Disadvantages
- Unsuitable for MIMD computers, Unrealistic to
neglect the issues of memory contention,
communication delay etc.
10Research Issues Parallel computation models(2)
- Asynchronous PRAM
- APRAM or MIMD-SM, Used for medium grain parallel
computation, Centralized shared memory,
Asynchronous operation, Read/write shared
variable communication, Explicit synchronous
(barrier, etc.). - Computation in APRAM
- Computation Consists of global phases separated
by barriers in a phase all processors execute
operation asynchronized, the last instruction
must be a synchronization instruction. - Advantages Preserving much of simplicity of
PRAM, better programmability, program must be
correctness, easy to analyze complexity. - Disadvantages Unsuitable for MIMD computers with
distributed memory.
11Research Issues Parallel computation models(3)
- Bulk Synchronous Parallel (BSP) model
- MIMD-DM, Consist of a set of processors,
send/receive message communication , A mechanism
for synchronization. - Bulk synchronous to combine message into a bulk
one to transfer, to delay communication. - BSP parameters
- p number of processors
- l Barrier synchronization time
- g unary packet transmission time (time
steps/packet)1/bandwidth - BSP bulk synchronization can reduce the
difficulty of design and analysis and ensure
easily the correctness of the algorithm.
12Research Issues Parallel computation models(4)
- LogP model
- MIMD-DM, point-to-point communication, implicit
synch. - Parameters (LogP)
- L (network latency), o (communication overhead),
g (gap1/bandwidth), P (processors). - Advantages
- Capturing the communication bottleneck of
parallel computers. - Hiding details of topology, routing algorithm and
network protocol. - Can be applicable shared variable, message
passing and data parallel algorithm. - Disadvantages
- Restricting the network capacity, neglecting
communication congestion. - Difficult to describe and design of algorithms.
13Research Issues Parallel computation models(5)
- BSP (Bulk Synch.)-gt BSP (Subset synch.)-gt BSP
(Pairwise synch.) logP - BSP can simulate logP with constant factor and
logP can simulate BSP with at most logarithmic
factor - BSPLogPBarriers-Overhead
- BSP offers a more convenient abstraction for
design of algorithm and program and logP provides
a better control of machine resources - BSP seems preferable with greater simplicity and
portability and more structured programming style
14Research Issues Parallel computation models(5)
- MH (Memory Hierarchy) model
- A sequential computer memory is modeled as a
sequence of memory modules ltM0, M1, M2, M3, gt,
With buses connecting adjacent modules, All buses
may be active simultaneously, M0 (Central
processor), M1 (Cache), M2(Main memory),
M3(Storage). - MH is oriented address access model, memory
access cost function f(a) is a monotonic
increasing function, a is memory address. - MH model is suitable for sequential memory
(magnetic tape etc.)
15Research Issues Parallel computation models(5)
- UMH (Uniform Memory Hierarchy) model
- UMH model captures performance-relevant aspects
of the hierarchical nature of computer memory
which is a tool for quantifying the efficiency of
data movements. - Memory access cost function f(k), k is of
memory hierarchy. - The memory access of the algorithm is always
repeatly from farther memory modules neglecting
from nearer memory modules if possible. - Prefetching operands and overlapping computation
with memory access operations are encouraged.
16Research Issues Parallel computation models(5)
- Memory LogP Model
- This model is based on data movement across a
memory hierarchy from resource LM to target LM
(Local Memory) using point-to-point memory
communication inspired by LogP to predict and
analyze the latency of memory copy, pack and
unpack. - Communication lost consist of the sum of memory
communication and network communication times.
Memory communication is from user local menory to
network buffer, network communication is from
network buffer to network buffer. - Estimating the cost of point-to-point
communication is similar to the original LogP
only parameters have different meaning.
17Research Issues Parallel computation models(5)
- Model Parameters
- l effective latency, l f (d, s), s (data
size), d (access pattern) which is the cost of
data transfer for application, middleware and
hardware. - o ideal overhead, which is the cost of data
transfer for middleware and hardware. - g the reciprocal of g corresponds to per-process
bandwidth, usually o g. - p of processors, p 1 (since consider only
point-to-point communication). - Cost Function (cost per byte)
- (om l) (Ln/wn) (om l) which is similar to o
l o of LogP. - Om l -- average cost between packing /
unpacking, Ln -- word size of network
communication, wn -- word size of instruction
set.
18Research Issues Design Techniques(1)
- Partitioning
- Breaking up the given problem into several
nonoverlapping subproblems of almost equal sizes. - Solving concurrently these subproblems.
- Divide and conquer
- Dividing the problem into several subproblems.
- Solving recursively the subproblems.
- Merging solutions of subproblems into a solution
for original problem. - Balanced Tree
- Building a balanced binary tree on input
elements. - Traveling the tree forward/backward to/from the
root.
19Research Issues Design Techniques(2)
- Pipelining
- Breaking an algorithm into a sequence of segment
in which the output of each segment is the input
of its successor. - All segments must produce results at the same
rate. - Doubling
- It is also called pointer jumping, path doubling.
- The computation proceeds by a recursive
application of the calculation. - This distance doubles in successive steps. After
k steps the computation has been performed over
all elements within a distance of 2K .
20Research Issues - Parallel Complexity Theory(1)
- Nick's Class problem
- Definition if it can be solved in time
polylogarithmic in the size of the problem using
at most a polynomial number of processors. - Role the class NC in parallel complexity theory
plays the role of P in sequential complexity. - P-complete problem
- Definition a problem L?P is said to be
P-complete if every other problem in P can be
transformed to L in polylogarithmic parallel time
using a polynomial number of processors. - Role P-completeness plays the role of NPC in
sequential complexity.
21Research Issues - Parallel Complexity Theory(2)
- Parallelisable problem
- NC is the class of problem solvable in
polylogarithmic Parallel time using a polynomial
number of processors. - Obviously, any problem in NC is also in P (NC?
P), but few believe P class is also in NC(P ?
NC). - Even if a problem is P-complete, there may be
efficient(but not necessarily polygarithmic time)
parallel algorithm for solving it.(Ex. maximum
flow problem which is P-complete, but several
efficient parallel algorithms are known for
solving it).
22Research Directions
- Parallel computing
- Architecture
- Algorithm
- Programming
- Solving problem from applied domains
- Non-Tradition computation modes
- Neuro-computing
- Nature inspired computing
- Molecular computing
- Quantum computing
23Research Directions Parallel Computing(1)Archit
ecture
- SMP(Symmetric MultiProcessors)
- MIMD,UMA, medium grain, higher DOP(Degree of
Parallelism). - Commodity microprocessors with on/off-chip
caches. - A high-speed snoopy bus or crossbar switch
- Central shared memory.
- Symmetric each processor has equal access to
SM(Shared Memory),I/O and OS services. - Unscalable due to SM and bus.
24Research Directions Parallel Computing(1)Archit
ecture
- MPP (Massively Parallel Processors)
- MIMD, NORMA, medium/large grain.
- A large number of commodity processors.
- A customized high bandwidth, low latency
communication network. - Physically distributed memory.
- May or may not have local disk.
- Synchronized through blocking message-passing
operations.
25Research Directions Parallel Computing(1)Archit
ecture
- Cluster
- MIMD, NUMA, coarse grain, Distributed memory.
- Each node of Cluster is a complete computer (SMP
of PC), sometimes called headless workstation. - A low-cost commodity network.
- There is always a local disk.
- A complete OS resides on each node, whereas MPP
only a microkernel exists.
26Research Directions Parallel Computing(1)Archit
ecture
- Constellation
- Constellation clusters of custom vector
processors, very expensive - Small/medium collection of fast vector nodes,
vector operation on vector registers - Large memory moderate scalability and very
limited scalability in processor count - High bandwidth pipelined memory access
- Global shared memory (PVP), easy programming model
27Research Directions Parallel Computing(2)algori
thms
- Policy Parallelizing a Sequential Algorithm
- Method Description
- Detect and exploit any inherent parallelism in an
existing sequential algorithm. - Parallel implementation of a parallelizable code
segment. - Remark
- parallelizing is most useful and effective
usually. - Not all sequential algorithms can be
parallelized. - A good sequential algorithm is uncertain to be
parallelized a good parallel algorithm. - Many sequential numerical algorithms can be
parallelized directly into effective parallel
numerical algorithms.
28Research Directions Parallel Computing(2)algori
thms
- Policy Designing a new Parallel Algorithm
- Method Description
- In terms of the description of a given problem,
we redesign or invent a new parallel algorithm
without regard to the related a sequential
algorithm. - remark
- Investigating inherent feature of the problem.
- Inventing a new parallel algorithm is a challenge
and creative work.
29Research Directions Parallel Computing(2)algori
thms
- PolicyBorrowing Other Well-known Algorithm
- Method Description
- To find relationship between to be solved problem
and well-know problem. - Design a similar algorithm that solves a given
problem using a well-know algorithm. - remark
- This work is very skilled work where rich and
practical experience of algorithm design is
needed.
30Research Directions Parallel Computing(2)algori
thms
- Methods
- Decomposition
- Divide-and-Conquer Strategy
- Randomization
- Parallel Iterative
- Pipelining Techniques
- MultiGrid
- Conjugate Gradient
31Research Directions Parallel Computing(2)algori
thms
- Prcedure (Steps)
- PCAM Algorithm Design
- 4 Stages to designing a parallel algorithm
- P Partitioning
- C Communication
- A Agglomeration
- M Mapping
- P C focus on concurrency and scalability.
- A M focus on locality and performance.
32Research Directions Parallel Computing(2)algori
thms
33Research Directions Parallel Computing(3)progra
mming
- Parallel Programming Models
- Implicit Parallel
- Sequential programming language, compiler is
responsible to convert automatically it into a
parallel codes. - Data Parallel
- Emphasize local computations and data routing
operations. It can be implemented either on SIMD
or SPMD. - Shared Variable
- Native model for PVP,SMP and DSM. The portability
of programs is problematic. - Message Passing
- Native model for MPP and Cluster. The portability
of programs is enhanced greatly by PVM and MPI
libraries.
34Research Directions Parallel
Computing(3)programming
Unified Parallel Programming Model
- High Abstraction Level
- Suitable for various distributed and shared
memory parallel architecture - Hide the underlying implementation details of
message-passing or synchronization - Support high abstraction level parallel algorithm
design and description - High Productivity
- Support fast and intuitive mapping from parallel
algorithms to parallel programs - Support high-performance implementation of
parallel programs - Highly readable parallel programs
- High Extensibility
- Can be customized or extended conveniently
- Can accommodate the needs of various application
areas
35Research Directions Parallel
Computing(3)programming
Unified Parallel Programming ModelMain Layers
and Components
- Core Support Layer
- GOOMPI Generic Object Oriented MPI
- PMT Parallel Multi-Thread
- Core Application Layer
- Centered on smart parallel and distributed
abstract data structures - Implementation of a highly reusable basic
parallel algorithm library - High-level Framework Layer
- Provide extensible parallel algorithmic skeletons
- Support the research and design of new parallel
algorithms
36Research Directions Parallel
Computing(3)programming
Unified Parallel Programming Model System
Architecture
37Research Directions Parallel Computing(3)progra
mming
- Parallel Programming Languages
- ANSI X3H5
- POSIX Threads
- OpenMP
- PVMParallel Virtual Machine
- MPIMessage Passing Interface
- HPFHigh-Performance Fortran
-
38Research Directions Parallel Computing(3)progra
mming
- Parallel Programming Environment tools
- Parallelize Compiler
- SIMDizingVectorizing
- MIMDizingParallelizing
- Performance Analysis
- Data Collection
- Data Transformation and Visualization
39Research Directions solving problems from
applied domains
- Computational ScienceEngineering(CSE)
- Computational physics
- Computational chemistry
- Computational biology
-
- Science and Engineering computing requirements
- Global change
- Human Genome
- Fluid turbulence
- Vehicle dynamics
- Ocean circulation
- Superconductor modeling
- weather forecast
40Research Directions non-tradition computation
modes(1)
- Neuro computing Using an amount of 1012 neuron
to perform parallel and distributed processing. - Nature inspired computing Using something
inspired by natural systems, often has the unique
characters of self-adaptive, self-organizing and
self-learning. - Molecule parallel computing Using an amount of
1020 molecules to perform computation of spatial
parallel instead time parallel. - Quantum computing Using quantum superpostition
principle to make the quantum computation very
powerful.
41Research Directions non-tradition computation
modes(2)
- Neuro computing
- Principle of Neural Network computing
- collective-decision
- cooperation-and-competition
- learning-and-selforganization
- massively parallel processing, distributed
memory, analogy computation - Dynamics evolution
- Complexity theory of NN computing
- For any NP-hard problem, even finding approximate
solution by a polynomial size of network is also
impossible unless NPco-NP. - In the sense of the average case, NN is likely to
be more efficient than conventional computers. A
great many of experiments have shown it at least. - For some particular problems, it is possible to
find an efficient solution by some NN, but the
learning of NN is a hard problem.
42Research Directions non-tradition computation
modes(3)
- Nature inspired computing
- Nature Inspired Computation is an emerging
interdisciplinary area between Computer Science
and Natural Sciences (especially Life Sciences). - Artificial Neural Network
- Inspired by the function of neurons in the brain
- Genetic Algorithms
- Inspired by the biological process of evolution
- Artificial Immune System
- Inspired by the principle of biological immune
system - Ant Clone System/Swarm Intelligence
- Inspired by the behaviour of social insects
- Ecological computation
- Inspired by the principle of ecosystem
43Research Directions non-tradition computation
modes(4)
- Molecular Computing or DNA Computing
- in 1994, L. Adleman published a breakthrough for
making a general-purpose computer with biological
molecules (DNA). - Molecular Computation Project (MCP) is an attempt
to harness the computational power of molecules
for information processing. In other words, it is
a trial to develop a general-purpose computer
with molecules. - Ability to compute quickly (Adlemans experiment
performed at a rate of 100 teraflops, or 100
trillion floating point operations per second. By
comparison, NEC Corporations Earth Simulator,
the worlds fastest supercomputer, operates at
approximately 36 teraflops)
44Research Directions non-tradition computation
modes(5)
- Quantum computing
- Reversible computation(Bennett and Fredkin)
- Quantum complexity
- Shor's factorization algorithm (1994)
- Grover's quantum search algorithm (1997)
- Some scientists believe that the power of quantum
computation derives from quantum superpostion and
parallelism, other than entanglement. - Shors and Grovers quantum algorithms were only
of theoretical interest, as it proved extremely
difficult to build a quantum computer.
45Existent problems and Faced challenges(1)
- Existent problems
- pure theoretical parallel algorithm research is
somewhat slowish. - Some theoretical results of parallel algorithms
are unrealistic. - Parallel software is behind parallel hardware.
- Parallel applications are not popular and very
weak. - Faced new challenges
- How to use efficiently thousands upon thousands
processors to solve practical problem. - How to write, map, schedule, run, monitor the an
amount of parallel processes. - What is the parallel computational model for grid
computing?
46Existent problems and Faced Challenges(2)
- What should we do?
- To establish a systematic approach of the
"Theory-Design-Implementation-Application" for
parallel algorithm research. - To form an integrate methodology of the
"Architecture-Algorithm-Programming" for parallel
algorithm design. - Our contributions
- To cultivate many students in parallel algorithm
area for our country - To publish a series of parallel computing
textbooks, including - parallel computing Architecture Algorithm
Programming - Design and Analysis of Parallel Algorithms
- Parallel Computer Architectures
- Parallel Algorithm to Practice.
47(No Transcript)