CS 221 IT 221 Lecture 22 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

CS 221 IT 221 Lecture 22

Description:

Generally use a 'parallel' communications package (MPI, etc.)? Homogeneous processes, each operating on a ... Many 'worker bee' systems (PC screen savers) ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 23
Provided by: csN2
Category:

less

Transcript and Presenter's Notes

Title: CS 221 IT 221 Lecture 22


1
CS 221/ IT 221Lecture 22
  • Distributed and Parallel
  • Programs
  • Dr. Jim Holten

2
Overview
  • What are they?
  • Capabilities!
  • How is it done?
  • Complexities!
  • Issues

3
Distributed Programs
  • Separate processes on multiple machines
  • Use loosely coupled communications
  • May or may not be homogeneous processes

4
Parallel Programs
  • Separate processes on multiple machines
  • Generally use a parallel communications package
    (MPI, etc.)?
  • Homogeneous processes, each operating on a
    different part of or another copy of the data
  • Use extremely closely linked processors (SMP
    often, but commonly a high-speed IP network)?

5
Capabilities
  • Load and execute MUCH larger programs
  • Bring more compute power to bear on the problem
  • Dramatically reduce wall clock execution time
  • Provide services to large numbers of users

6
Distributed Implementations
  • Client/server systems
  • Database access
  • Browsers and web servers
  • SETI at home
  • The web A single massively distributed system

7
Client/server systems
  • A server does the specialized or heavy work.
  • A client allows access to the server's data and
    functionality.
  • Generally loosely coupled using TCP/IP sockets.
  • Examples
  • Browser/webserver
  • SQL GUI or remote access library and DB server

8
SETI at Home
  • A coordinator system
  • Many worker bee systems (PC screen savers).
  • Coordinator chops up the problem and assigns
    tasks to each worker bee.
  • Worker bees report results back to the
    coordinator and request another task.

9
The Web
  • Many web servers (hundreds of thousands)?
  • Many browsers (millions)?
  • Many web servers attach to database servers as
    clients (thousands).
  • The result is a massive distributed system with
    many arbitrary interconnects which are
    continuously changing.
  • Retail web servers are redundant (parallel
    servers) with real time load balancers.

10
Parallel Implementations
  • Extremely large supercomputer-based models
  • Partitions the problem data to fit on the
    available processors
  • Scales the algorithm execution time by the number
    of processors
  • Adds some communications overhead

11
Why so large?
  • Good mathematical models applied to fine-grained
    real-world representations
  • Assumes finer resolution gives better
    approximations (space AND time)?
  • Assumes numerical value errors are more easily
    minimized

12
Sample Models
  • Earth atmosphere models 10 miles per side (2500
    x 2500 x 10) boxes versus 1 mile per side
    (25000 x 25000 x 100). (11000 problem size)?
  • Protein folding models 100,000 atoms is a small
    protein, and every atom interacts with every
    other (10,000,000,000 interactions), and to model
    protein folding this must be stepped through a
    relaxation step thousands of times to see how
    the protein will fold to achieve its rest state.

13
More Sample Models
  • Neural Nets 100 Neurons may distinguish among
    only a few binary input combinations. Common
    real world problems require over 1000 inputs,
    giving over 21000 binary valued combinations.
  • Language meaning models Thousands of words,
    with millions of meaningful combinations, to be
    compared to input text and translated to a more
    standardized meaning reference language.

14
The Model Data
  • Three data behaviors Partitioned, Copied, or
    Shared
  • The bulk data are generally divvied up among the
    processes.
  • Common constant data are copied to all
    processes.
  • Overlapping data on logical boundaries are shared.

15
Partitioned Data
  • Easiest is multi-dimensional arrays partitioned
    by dimension index ranges.
  • More complex are graph, network, or generalized
    mesh networks that require specifying each
    component that is to go into each processor.
  • Commonly mesh regions are partitioned, then the
    partitioning is projected to the region faces,
    edges, and vertices to determine their
    partitioning and patterns of sharing.

16
Shared Data
  • Partitioning on mesh regions forces faces,
    edges, and vertices that are between regions on
    the boundaries between different processes to be
    shared.
  • A single vertex, edge, or face may be shared
    among many processes.
  • Shadow regions are regions forced to be shared,
    causing all their faces,edges, and vertices to be
    shared also.

17
Partitioning Criteria for Data
  • Try to balance the loads on the various
    processors.
  • Try to minimize the communications needed between
    any two processors.
  • Try to minimize the overall communications among
    processors.
  • Avoid overloading any processors with too large a
    share to fit within its available memory.

18
Load Balancing
  • Spread out the amount of data evenly!
  • Spread out the expected computation load evenly!
  • Adaptive meshes change the numbers of vertices,
    edges, faces, and regions as execution continues,
    requiring dynamic load balancing.

19
Minimizing Communications
  • Estimating communications message counts, sizes,
    and transmission times.
  • May depend on communications interconnect
    configurations.
  • Requires creating data/processor assignments and
    estimating cost, altering the assignments and
    recalculating the estimates, etc. until the
    best (optimal) or an adequate (suboptimal)
    solution is found.

20
More Complexities
  • Long-running programs
  • Processor resource failures
  • Communications failures
  • Code failures
  • Recovery
  • Repartitioning for fewer processors
  • Mid-course corrections Dump, reconfig, restart

21
Issues
  • Reliable code
  • Reliable model run in spite of hardware failures
  • System size versus mean time between failures
    (MTBF) of overall processor collection
  • Repartitioning to continue on fewer resources
  • Moving data between different model partitionings
    --- i.e. 1024 processes to 1022 processes
  • Parallel model I/O Dump/Restart files

22
Goals
  • BIGGER (petabytes)?
  • FASTER (petaflops)?
  • BETTER (accuracy and reliability)?
  • LONGER RUNS (weeks, or continuous)?
Write a Comment
User Comments (0)
About PowerShow.com