Title: Networks of Workstations
1Networks of Workstations
- Prabhaker Mateti
- Wright State University
2Overview
- Parallel computers
- Concurrent computation
- Parallel Methods
- Message Passing
- Distributed Shared Memory
- Programming Tools
- Cluster configurations
3Granularity of Parallelism
- Fine-Grained Parallelism
- Medium-Grained Parallelism
- Coarse-Grained Parallelism
- NOWs (Networks of Workstations)
4Fine-Grained Machines
- Tens of thousands of Processors
- Processors
- Slow (bit serial)
- Small (K bits of RAM)
- Distributed Memory
- Interconnection Networks
- Message Passing
- Single Instruction Multiple Data (SIMD)
5Sample Meshes
- Massively Parallel Processor (MPP)
- TMC CM-2 (Connection Machine)
- MasPar MP-1/2
6Medium-Grained Machines
- Typical Configurations
- Thousands of processors
- Processors have power between coarse- and
fine-grained - Either shared or distributed memory
- Traditionally Research Machines
- Single Code Multiple Data (SCMD)
7Medium-Grained Machines
- Ex Cray T3E
- Processors
- DEC Alpha EV5 (600 MFLOPS peak)
- Max of 2048
- Peak Performance 1.2 TFLOPS
- 3-D Torus
- Memory 64 MB - 2 GB per CPU
8Coarse-Grained Machines
- Typical Configurations
- Hundreds of Processors
- Processors
- Powerful (fast CPUs)
- Large (cache, vectors, multiple fast buses)
- Memory Shared or Distributed-Shared
- Multiple Instruction Multiple Data (MIMD)
9Coarse-Grained Machines
- SGI Origin 2000
- PEs (MIPS R10000) Max of 128
- Peak Performance 49 Gflops
- Memory 256 GBytes
- Crossbar switches for interconnect
- HP/Convex Exemplar
- PEs (HP PA-RISC 8000) Max of 64
- Peak Performance 46 Gflops
- Memory Max of 64 GBytes
- Distributed crossbar switches for interconnect
10Networks of Workstations
- Exploit inexpensive Workstations/PCs
- Commodity network
- The NOW becomes a distributed memory
multiprocessor - Workstations sendreceive messages
- C and Fortran programs with PVM, MPI, etc.
libraries - Programs developed on NOWs are portable to
supercomputers for production runs
11Parallel Computing
- Concurrent Computing
- Distributed Computing
- Networked Computing
- Parallel Computing
12Definition of Parallel
- S1 begins at time b1, ends at e1
- S2 begins at time b2, ends at e2
- S1 S2
- Begins at min(b1, b2)
- Ends at max(e1, e2)
- Equiv to S2 S1
13Data Dependency
- x a b y c d
- x a b y c d
- y c d x a b
- X depends on a and b, y depends on c and d
- Assumed a, b, c, d were independent
14Types of Parallelism
15Perfect Parallelism
- Also called
- Embarrassingly Parallel
- Result parallel
- Computations that can be subdivided into sets of
independent tasks that require little or no
communication - Monte Carlo simulations
- F(x, y, z)
16MW Model
- Manager
- Initiates computation
- Tracks progress
- Handles workers requests
- Interfaces with user
- Workers
- Spawned and terminated by manager
- Make requests to manager
- Send results to manager
17Reduction
- Combine several sub-results into one
- Reduce r1 r2 rn with op
- Becomes r1 op r2 op op rn
18Data Parallelism
- Also called
- Domain Decomposition
- Specialist
- Same operations performed on many data elements
simultaneously - Matrix operations
- Compiling several files
19Control Parallelism
- Different operations performed simultaneously on
different processors - E.g., Simulating a chemical plant one processor
simulates the preprocessing of chemicals, one
simulates reactions in first batch, another
simulates refining the products, etc.
20Process communication
- Shared Memory
- Message Passing
21Shared Memory
- Process A writes to a memory location
- Process B reads from that memory location
- Synchronization is crucial
- Excellent speed
22Shared Memory
- Needs hardware support
- multi-ported memory
- Atomic operations
- Test-and-Set
- Semaphores
23Shared Memory Semantics Assumptions
- Global time is available. Discrete increments.
- Shared variable s, vi at ti, i0,
- Process A s v1 at time t1
- Assume no other assignment occurred after t1.
- Process B reads s at time t and gets value v.
24Shared Memory Semantics
- Value of Shared Variable
- v v1, if t gt t1
- v v0, if t lt t1
- v ??, if t t1
- t t1 - discrete quantum
- Next Update of Shared Variable
- Occurs at t2
- t2 t1 ?
25Condition Variables and Semaphores
- Semaphores
- V(s) lt s s 1 gt
- P(s) ltwhen s gt 0 do s s 1gt
- Condition variables
- C.wait()
- C.signal()
26Distributed Shared Memory
- A common address space that all the computers in
the cluster share. - Difficult to describe semantics.
27Distributed Shared Memory Issues
- Distributed
- Spatially
- LAN
- WAN
- No global time available
28Messages
- Messages are sequences of bytes moving between
processes - The sender and receiver must agree on the type
structure of values in the message - Marshalling of data
29Message Passing
- Process A sends a data buffer as a message to
process B. - Process B waits for a message from A, and when it
arrives copies it into its own local memory. - No memory shared between A and B.
30Message Passing
- Obviously,
- Messages cannot be received before they are sent.
- A receiver waits until there is a message.
- Asynchronous
- Sender never blocks, even if infinitely many
messages are waiting to be received - Semi-asynchronous is a practical version of above
with large but finite amount of buffering
31Message Passing Point to Point
- Q send(m, P)
- Send message M to process P
- P recv(x, Q)
- Receive message from process P, and place it in
variable x - The message data
- Type of x must match that of m
- As if x m
32Broadcast
- One sender, multiple receivers
- Not all receivers may receive at the same time
33Types of Sends
34Synchronous Message Passing
- Sender blocks until receiver is ready to receive.
- Cannot send messages to self.
- No buffering.
35Message Passing Speed
- Speed not so good
- Sender copies message into system buffers.
- Message travels the network.
- Receiver copies message from system buffers into
local memory. - Special virtual memory techniques help.
36Message Passing Programming
- Less error-prone cf. shared memory
37Message Passing Synchronization
- Synchronous MP
- Sender waits until receiver is ready.
- No intermediary buffering
38Barrier Synchronization
- Processes wait until all arrive
39Parallel Software Development
- Algorithmic conversion by compilers
40Development of DistributedParallel Programs
- New code algorithms
- Old programs rewritten
- in new languages that have distributed and
parallel primitives - With new libraries
- Parallelize legacy code
41Conversion of Legacy Software
- Mechanical conversion by software tools
- Reverse engineer its design, and re-code
42Automatically parallelizing compilers
- Compilers analyze programs and parallelize
(usually loops).Easy to use, but with limited
success
43OpenMP on Networks of Workstations
- The OpenMP is an API for shared memory
architectures. - User-gives hints as directives to the compiler
- http//www.openmp.org
44Message Passing Libraries
- Programmer is responsible for data distribution,
synchronizations, and sending and receiving
information - Parallel Virtual Machine (PVM)
- Message Passing Interface (MPI)
- BSP
45BSP Bulk Synchronous Parallel model
- Divides computation into supersteps
- In each superstep a processor can work on local
data and send messages. - At the end of the superstep, a barrier
synchronization takes place and all processors
receive the messages which were sent in the
previous superstep - http//www.bsp-worldwide.org/
46BSP Library
- Small number of subroutines to implement
- process creation,
- remote data access, and
- bulk synchronization.
- Linked to C, Fortran, programs
47Parallel Languages
- Shared-memory languages
- Parallel object-oriented languages
- Parallel functional languages
- Concurrent logic languages
48Tuple Space Linda
- ltv1, v2, , vkgt
- Atomic Primitives
- In (t)
- Read (t)
- Out (t)
- Eval (t)
- Host language e.g., JavaSpaces
49Data Parallel Languages
- Data is distributed over the processors as a
arrays - Entire arrays are manipulated
- A(1100) B(1100) C(1100)
- Compiler generates parallel code
- Fortran 90
- High Performance Fortran (HPF)
50Parallel Functional Languages
- Erlang http//www.erlang.org/
- SISAL http//www.llnl.gov/sisal/
- PCN Argonne
51Clusters
52Buildings-Full of Workstations
- Distributed OS have not taken a foot hold.
- Powerful personal computers are ubiquitous.
- Mostly idle more than 90 of the up-time?
- 100 Mb/s LANs are common.
- Windows and Linux are the top two OS in terms of
installed base.
53Cluster Configurations
- NOW -- Networks of Workstations
- COW -- Clusters of Dedicated Nodes
- Clusters of Come-and-Go Nodes
- Beowulf clusters
54Beowulf
- Collection of compute nodes
- Full trust in each other
- Login from one node into another without
authentication - Shared file system subtree
- Dedicated
55Close Cluster Configuration
High Speed Network
High Speed Network
File Server node
compute node
compute node
compute node
compute node
compute node
compute node
compute node
compute node
Service Network
Front-end
Front-end
gateway node
gateway node
External Network
External Network
56Open Cluster Configuration
High Speed Network
File Server node
compute node
compute node
compute node
compute node
compute node
compute node
compute node
compute node
Front-end
External Network
57Interconnection Network
- Most popular Fast Ethernet
- Network topologies
- Mesh
- Torus
- Switch v Hub
58Software Components
- Operating System
- Linux, FreeBSD,
- Parallel programming
- PVM, MPI
- Utilities,
- Open source
59Software Structure of PC Cluster
60Single System View
- Single system view
- Common filesystem structure view from any node
- Common accounts on all nodes
- Single software installation point
- Benefits
- Easy to install and maintain system
- Easy to use for users
61Installation Steps
- Install Operating system
- Setup a Single System View
- Shared filesystem
- Common accounts
- Single software installation point
- Install parallel programming packages such as
MPI, PVM, BSP - Install utilities, libraries, and applications
62Linux Installation
- Linux has many distributions Redhat, Caldera,
SuSe, Debian, - Caldera is easy to install
- All above upgrade with RPM package management
- Mandrake and SuSe come with a very complete set
of software
63Clusters with Part Time Nodes
- Cycle Stealing Running of jobs on a workstation
that don't belong to the owner. - Definition of Idleness E.g., No keyboard and no
mouse activity - Tools/Libraries
- Condor
- PVM
- MPI
64Migration of Jobs
- Policies
- Immediate-Eviction
- Pause-and-Migrate
- Technical Issues
- Checkpointing Preserving the state of the
process so it can be resumed. - Migrating from one architecture to another
65Summary
- Parallel
- computers
- computation
- Parallel Methods
- Communication primitives
- Message Passing
- Distributed Shared Memory
- Programming Tools
- Cluster configurations