Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

1 / 23
About This Presentation
Title:

Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing

Description:

Floating Point Operations Per Second (FLOPS) Humans doing long division: Milli-flops (1/1000th of ... 17 million particle Cold Dark Matter Cosmology simulation ... –

Number of Views:60
Avg rating:3.0/5.0
Slides: 24
Provided by: ccGa
Category:

less

Transcript and Presenter's Notes

Title: Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing


1
Today's Software For Tomorrow's Hardware An
Introduction to Parallel Computing
  • Rahul .S. Sampath
  • May 9th 2007

2
  • Computational Power Today

3
Floating Point Operations Per Second (FLOPS)
  • Humans doing long division Milli-flops (1/1000th
    of one flop)
  • Cray-1 supercomputer, 1976, 8m 80 MFLOPS
  • Pentium II, 400 mhz 100 MFLOPS
  • TYPICAL HIGH-END PC TODAY 1 GFLOPS
  • Sony Playstation 3, 2006 2 TFLOPS
  • IBM TRIPS, 2010 (one-chip solution, CPU only) 1
    TFLOPS
  • IBM Blue Gene, lt 2010 (with 65,536
    microprocessors) 360 TFLOPS

4
Why do we need more?
  • "DOS addresses only 1 MB of RAM because we cannot
    imagine any application needing more." --
    Microsoft, 1980.
  • "640k ought to be enough for anybody"--Bill
    Gates, 1981.
  • Bottom-line Demand for computational power will
    continue to increase.

5
Some Computationally Intensive Applications Today
  • Computer Aided Surgery
  • Medical Imaging
  • MD simulations
  • FEM simulations with gt 1010 unknowns
  • Galaxy formation and evolution
  • 17 million particle Cold Dark Matter Cosmology
    simulation

6
  • Any application, which can be scaled up
  • should be treated as a computationally intensive
    application.

7
The Need for Parallel Computing
  • Memory (RAM)
  • There is a theoretical limit on the RAM that is
    available on your computer.
  • 32 bit systems 4GB (232)
  • 64 bit systems 16 exabytes (gt 16,000 TB)
  • Speed
  • Upgrading microprocessors cant help you anymore
    ?
  • Flops is not the bottleneck, memory is
  • What we need is more registers
  • Think pre-computing, higher bandwidth memory bus,
    L2/L3 cache, compiler optimizations, assembly
    language ? Asylum ?
  • Or
  • Think parallel

8
Hacks
  • If Speed is not an issue
  • Is out-of-core implementation an option?
  • Parallel programs can be converted into
    out-of-core implementations easily.

9
  • Parallel Algorithms

10
The Key Questions
  • Why?
  • Memory
  • Speed
  • Both
  • What kind of platform?
  • Shared Memory
  • Distributed Computing
  • Typical size of the application
  • Small (lt 32 processors)
  • Medium ( 32 - 256 processors)
  • Large (gt 256 processors)
  • How much time and effort do you want to invest?
  • How many times will the component be used in a
    single execution of the program?

11
Factors to Consider in any Parallel Algorithm
Design
  • Give equal work to all processors at all times
  • Load Balancing
  • Give equal amount of data to all processors
  • Efficient Memory Management
  • Processors should work independently as much as
    possible
  • Minimize communication, especially iterative
    communication
  • If communication is necessary, try to do some
    work in the background as well
  • Overlapping communication and computation
  • Try to keep the sequential part of the parallel
    algorithm as close to the best sequential
    algorithm possible
  • Optimal Work Algorithm

12
Difference Between Sequential and Parallel
Algorithms
  • Not all data is accessible at all times
  • All computations must be as localized as possible
  • Cant have random access
  • New dimension to the existing algorithm
    division of work
  • Which processor does what portion of the work?
  • If communication can not be avoided
  • How will it be initiated?
  • What type of communication?
  • What are the pre-processing and post-processing
    operations?
  • Order of operations could be very critical for
    performance

13
Parallel Algorithm Approaches
  • Data-Parallel Approach
  • Partition the data among the processors
  • Each processor will execute the same set of
    commands
  • Control-Parallel Approach
  • Partition the tasks to be performed among the
    processors
  • Each processor will execute different commands
  • Hybrid Approach
  • Switch between the two approaches at different
    stages of the algorithm
  • Most parallel algorithms fall in this category

14
Performance Metrics
  • Speedup
  • Overhead
  • Scalability
  • Fixed Size
  • Iso-granular
  • Efficiency
  • Speedup per processor
  • Iso-Efficiency
  • Problem size as a function of p in order to keep
    efficiency constant

15
The Take Home Message
  • A good parallel algorithm is NOT a simple
    extension of the corresponding sequential
    algorithm.
  • What model to use? Problem dependent.
  • e.g. abc (ab) (cd)
  • Not much choice really.
  • It is a big investment, but can really be worth
    it.

16
  • Parallel Programming

17
How does a parallel program work?
  • You request a certain number of processors
  • You setup a communicator
  • Give a unique id to each processor rank
  • Every processor executes the same program
  • Inside the program
  • Query for the rank and use it decide what to do
  • Exchange messages between different processors
    using their ranks
  • In theory, you only need 3 functions Isend,
    Irecv, wait
  • In practice, you can optimize communication
    depending on the underlying network topolgoy
    Message Passing Standards

18
Message Passing Standards
  • The standards define a set of primitive
    communication operations.
  • The vendors implementing these on any machine are
    responsible to optimize these operations for that
    machine.
  • Popular Standards
  • Message Passing Interface (MPI)
  • Open Message Passing (OpenMP)

19
Languages that support MPI
  • Fortran 77
  • C/C
  • Python
  • Matlab

20
MPI Implementations
  • MPICH
  • ftp//info.mcs.anl.gov/pub/mpi
  • LAM
  • http//www.mpi.nd.edu/lam/download
  • CHIMP
  • ftp//ftp.epcc.ed.ac.uk/pub/chimp/release
  • WinMPI (Windows)
  • ftp//csftp.unomaha.edu/pub/rewini/WinMPI
  • W32MPI (Windows)
  • http//dsg.dei.uc.pt/wmpi/intro.html

21
Open Source Parallel Software
  • PETSc ( Linear and NonLinear Solvers )
  • http//www-unix.mcs.anl.gov/petsc/petsc-as/
  • ScaLAPACK ( Linear Algebra )
  • http//www.netlib.org/scalapack/scalapack_home.htm
    l
  • SPRNG ( Random Number Generator )
  • http//sprng.cs.fsu.edu/
  • Paraview ( Visualization )
  • http//www.paraview.org/HTML/Index.html
  • NAMD ( Molecular Dynamics )
  • http//www.ks.uiuc.edu/Research/namd/
  • CHARMM ( Parallel Objects )
  • http//charm.cs.uiuc.edu/research/charm/

22
References
  • Parallel Programming with MPI, Peter S. Pacheco
  • Introduction to Parallel Computing, A. Grama, A.
    gupta, G. Karypis, V. Kumar
  • MPI-The Complete Reference, William Gropp et.al.
  • http//www-unix.mcs.anl.gov/mpi/
  • http//www.erc.msstate.edu/mpi
  • http//www.epm.ornl.gov/walker/mpi
  • http//www.erc.msstate.edu/mpi/mpi-faq.html (FAQ)
  • Comp.parallel.mpi (Newsgroup)
  • http//www.mpi-forum.org (MPI Forum)

23
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com