CS 221 IT 221 Lecture 22 - PowerPoint PPT Presentation

1 / 22

About This Presentation

Title:

CS 221 IT 221 Lecture 22

Description:

Generally use a 'parallel' communications package (MPI, etc.)? Homogeneous processes, each operating on a ... Many 'worker bee' systems (PC screen savers) ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 23

Provided by: csN2

Category:

more less

Transcript and Presenter's Notes

Title: CS 221 IT 221 Lecture 22

1
CS 221/ IT 221Lecture 22

Distributed and Parallel
Programs
Dr. Jim Holten

2
Overview

What are they?
Capabilities!
How is it done?
Complexities!
Issues

3
Distributed Programs

Separate processes on multiple machines
Use loosely coupled communications
May or may not be homogeneous processes

4
Parallel Programs

Separate processes on multiple machines
Generally use a parallel communications package
(MPI, etc.)?
Homogeneous processes, each operating on a
different part of or another copy of the data
Use extremely closely linked processors (SMP
often, but commonly a high-speed IP network)?

5
Capabilities

Load and execute MUCH larger programs
Bring more compute power to bear on the problem
Dramatically reduce wall clock execution time
Provide services to large numbers of users

6
Distributed Implementations

Client/server systems
Database access
Browsers and web servers
SETI at home
The web A single massively distributed system

7
Client/server systems

A server does the specialized or heavy work.
A client allows access to the server's data and
functionality.
Generally loosely coupled using TCP/IP sockets.
Examples
Browser/webserver
SQL GUI or remote access library and DB server

8
SETI at Home

A coordinator system
Many worker bee systems (PC screen savers).
Coordinator chops up the problem and assigns
tasks to each worker bee.
Worker bees report results back to the
coordinator and request another task.

9
The Web

Many web servers (hundreds of thousands)?
Many browsers (millions)?
Many web servers attach to database servers as
clients (thousands).
The result is a massive distributed system with
many arbitrary interconnects which are
continuously changing.
Retail web servers are redundant (parallel
servers) with real time load balancers.

10
Parallel Implementations

Extremely large supercomputer-based models
Partitions the problem data to fit on the
available processors
Scales the algorithm execution time by the number
of processors
Adds some communications overhead

11
Why so large?

Good mathematical models applied to fine-grained
real-world representations
Assumes finer resolution gives better
approximations (space AND time)?
Assumes numerical value errors are more easily
minimized

12
Sample Models

Earth atmosphere models 10 miles per side (2500
x 2500 x 10) boxes versus 1 mile per side
(25000 x 25000 x 100). (11000 problem size)?
Protein folding models 100,000 atoms is a small
protein, and every atom interacts with every
other (10,000,000,000 interactions), and to model
protein folding this must be stepped through a
relaxation step thousands of times to see how
the protein will fold to achieve its rest state.

13
More Sample Models

Neural Nets 100 Neurons may distinguish among
only a few binary input combinations. Common
real world problems require over 1000 inputs,
giving over 21000 binary valued combinations.
Language meaning models Thousands of words,
with millions of meaningful combinations, to be
compared to input text and translated to a more
standardized meaning reference language.

14
The Model Data

Three data behaviors Partitioned, Copied, or
Shared
The bulk data are generally divvied up among the
processes.
Common constant data are copied to all
processes.
Overlapping data on logical boundaries are shared.

15
Partitioned Data

Easiest is multi-dimensional arrays partitioned
by dimension index ranges.
More complex are graph, network, or generalized
mesh networks that require specifying each
component that is to go into each processor.
Commonly mesh regions are partitioned, then the
partitioning is projected to the region faces,
edges, and vertices to determine their
partitioning and patterns of sharing.

16
Shared Data

Partitioning on mesh regions forces faces,
edges, and vertices that are between regions on
the boundaries between different processes to be
shared.
A single vertex, edge, or face may be shared
among many processes.
Shadow regions are regions forced to be shared,
causing all their faces,edges, and vertices to be
shared also.

17
Partitioning Criteria for Data

Try to balance the loads on the various
processors.
Try to minimize the communications needed between
any two processors.
Try to minimize the overall communications among
processors.
Avoid overloading any processors with too large a
share to fit within its available memory.

18
Load Balancing

Spread out the amount of data evenly!
Spread out the expected computation load evenly!
Adaptive meshes change the numbers of vertices,
edges, faces, and regions as execution continues,
requiring dynamic load balancing.

19
Minimizing Communications

Estimating communications message counts, sizes,
and transmission times.
May depend on communications interconnect
configurations.
Requires creating data/processor assignments and
estimating cost, altering the assignments and
recalculating the estimates, etc. until the
best (optimal) or an adequate (suboptimal)
solution is found.

20
More Complexities

Long-running programs
Processor resource failures
Communications failures
Code failures
Recovery
Repartitioning for fewer processors
Mid-course corrections Dump, reconfig, restart

21
Issues

Reliable code
Reliable model run in spite of hardware failures
System size versus mean time between failures
(MTBF) of overall processor collection
Repartitioning to continue on fewer resources
Moving data between different model partitionings
--- i.e. 1024 processes to 1022 processes
Parallel model I/O Dump/Restart files

22
Goals