Title: High Performance Cluster Computing
1High Performance Cluster Computing
By Rajkumar Buyya, Monash University,
Melbourne. rajkumar_at_ieee.org
http//www.dgs.monash.edu.au/rajkumar
2 Agenda
- Overview of Computing
- Motivations Enabling Technologies
- Cluster Architecture its Components
- Clusters Classifications
- Cluster Middleware
- Single System Image
- Representative Cluster Systems
- Berkeley NOW and Solaris-MC
- Resources and Conclusions
3Announcement formation of
- IEEE Task Force on Cluster Computing
- (TFCC)
- http//www.dgs.monash.edu.au/rajkumar/tfcc/
- http//www.dcs.port.ac.uk/mab/tfcc/
4Computing Power andComputer Architectures
5Need of more Computing PowerGrand Challenge
Applications
- Solving technology problems using
- computer modeling, simulation and analysis
Life Sciences
Aerospace
Mechanical Design Analysis (CAD/CAM)
6How to Run App. Faster ?
- There are 3 ways to improve performance
- 1. Work Harder
- 2. Work Smarter
- 3. Get Help
- Computer Analogy
- 1. Use faster hardware e.g. reduce the time per
instruction (clock cycle). - 2. Optimized algorithms and techniques
- 3. Multiple computers to solve problem That is,
increase no. of instructions executed per clock
cycle.
7Sequential Architecture Limitations
- Sequential architectures reaching physical
limitation (speed of light, thermodynamics) - Hardware improvements like pipelining,
Superscalar, etc., are non-scalable and requires
sophisticated Compiler Technology. - Vector Processing works well for certain kind of
problems.
8Why Parallel Processing NOW?
- The Tech. of PP is mature and can be exploited
commercially significant R D work on
development of tools environment. - Significant development in Networking technology
is paving a way for heterogeneous computing.
9History of Parallel Processing
- PP can be traced to a tablet dated around 100 BC.
- Tablet has 3 calculating positions.
- Infer that multiple positions
- Reliability/ Speed
10Motivating Factors
- Aggregated speed with
- which complex calculations
- carried out by millions of neurons in human
brain is amazing! although individual neurons
response is slow (milli sec.) - demonstrate the
feasibility of PP
11Taxonomy of Architectures
- Simple classification by Flynn
- (No. of instruction and data streams)
- SISD - conventional
- SIMD - data parallel, vector computing
- MISD - systolic arrays
- MIMD - very general, multiple approaches.
- Current focus is on MIMD model, using general
purpose processors or multicomputers.
12MIMD Architecture
Instruction Stream A
Instruction Stream B
Instruction Stream C
Data Output stream A
Data Input stream A
Processor A
Data Output stream B
Processor B
Data Input stream B
Data Output stream C
Processor C
Data Input stream C
- Unlike SISD, MISD, MIMD computer works
asynchronously. - Shared memory (tightly coupled) MIMD
- Distributed memory (loosely coupled) MIMD
13Shared Memory MIMD machine
Processor A
Processor B
Processor C
Global Memory System
- Comm Source PE writes data to GM destination
retrieves it - Easy to build, conventional OSes of SISD can be
easily be ported - Limitation reliability expandability. A
memory component or any processor failure affects
the whole system. - Increase of processors leads to memory
contention. - Ex. Silicon graphics supercomputers....
14Distributed Memory MIMD
IPC channel
IPC channel
Processor A
Processor B
Processor C
- Communication IPC on High Speed Network.
- Network can be configured to ... Tree, Mesh,
Cube, etc. - Unlike Shared MIMD
- easily/ readily expandable
- Highly reliable (any CPU failure does not affect
the whole system)
15Main HPC Architectures..1a
- SISD - mainframes, workstations, PCs.
- SIMD Shared Memory - Vector machines, Cray...
- MIMD Shared Memory - Sequent, KSR, Tera, SGI,
SUN. - SIMD Distributed Memory - DAP, TMC CM-2...
- MIMD Distributed Memory - Cray T3D, Intel,
Transputers, TMC CM-5, plus recent workstation
clusters (IBM SP2, DEC, Sun, HP).
16Main HPC Architectures..1b.
- NOTE Modern sequential machines are not purely
SISD - advanced RISC processors use many
concepts from - vector and parallel architectures (pipelining,
parallel execution of instructions, prefetching
of data, etc) in order to achieve one or more
arithmetic operations per clock cycle.
17Parallel Processing Paradox
- Time required to develop a parallel application
for solving GCA is equal to - Half Life of Parallel Supercomputers.
18The Need for Alternative Supercomputing Resources
- Vast numbers of under utilised workstations
available to use. - Huge numbers of unused processor cycles and
resources that could be put to good use in a
wide variety of applications areas. - Reluctance to buy Supercomputer due to their cost
and short life span. - Distributed compute resources fit better into
today's funding model.
19Scalable Parallel Computers
20Design Space of Competing Computer Architecture
21Towards Inexpensive Supercomputing
- It is
- Cluster Computing..
- The Commodity Supercomputing!
22Motivation for using Clusters
- Surveys show utilisation of CPU cycles of desktop
workstations is typically lt10. - Performance of workstations and PCs is rapidly
improving - As performance grows, percent utilisation will
decrease even further! - Organisations are reluctant to buy large
supercomputers, due to the large expense and
short useful life span.
23Motivation for using Clusters
- The communications bandwidth between workstations
is increasing as new networking technologies and
protocols are implemented in LANs and WANs. - Workstation clusters are easier to integrate into
existing networks than special parallel computers.
24Motivation for using Clusters
- The development tools for workstations are more
mature than the contrasting proprietary solutions
for parallel computers - mainly due to the
non-standard nature of many parallel systems. - Workstation clusters are a cheap and readily
available alternative to specialised High
Performance Computing (HPC) platforms. - Use of clusters of workstations as a distributed
compute resource is very cost effective -
incremental growth of system!!!
25Rise Fall of Computing Technologies
Mainframes Minis PCs
Minis PCs Network Computing
1970 1980 1995
26What is a cluster?
- Cluster
- a collection of nodes connected together
- Network Faster, closer connection than a typical
network (LAN) - Looser connection than symmetric multiprocessor
(SMP)
271990s Building Blocks
- Building block complete computers(HW SW)
shipped in 100,000sKiller micro, Killer DRAM,
Killer disk,Killer OS, Killer packaging, Killer
investment - Interconnecting Building Blocks gt Killer Net
- High Bandwidth
- Low latency
- Reliable
- Commodity(ATM,.)
28Why Clusters now?
- Building block is big enough
- Workstations performance is doubling every 18
months. - Networks are faster
- Higher link bandwidth
- Switch based networks coming
- Interfaces simple fast
- Striped files preferred (RAID)
- Demise of Mainframes, Supercomputers, MPPs
29Architectural Drivers(cont)
- Node architecture dominates performance
- processor, cache, bus, and memory
- design and engineering gt performance
- Greatest demand for performance is on large
systems - must track the leading edge of technology without
lag - MPP network technology gt mainstream
- system area networks
- System on every node is a powerful enabler
- very high speed I/O, virtual memory, scheduling,
30...Architectural Drivers
- Clusters can be grown Incremental scalability
(up, down, and across) - Individual nodes performance can be improved by
adding additional resource (new memory
blocks/disks) - New nodes can be added or nodes can be removed
- Clusters of Clusters and Metacomputing
- Complete software tools
- Threads, PVM, MPI, DSM, C, C, Java, Parallel
C, Compilers, Debuggers, OS, etc. - Wide class of applications
- Sequential and grand challenging parallel
applications
31Example ClustersBerkeley NOW
- 100 Sun UltraSparcs
- 200 disks
- Myrinet SAN
- 160 MB/s
- Fast comm.
- AM, MPI, ...
- Ether/ATM switched external net
- Global OS
- Self Config
32Basic Components
MyriNet
160 MB/s
Myricom NIC
M
M
I/O bus
Sun Ultra 170
33Massive Cheap Storage Cluster
- Basic unit
- 2 PCs double-ending four SCSI chains of 8 disks
each
Currently serving Fine Art at http//www.thinker.o
rg/imagebase/
34Cluster of SMPs (CLUMPS)
- Four Sun E5000s
- 8 processors
- 4 Myricom NICs each
- Multiprocessor, Multi-NIC, Multi-Protocol
- NPACI gt Sun 450s
35Millennium PC Clumps
- Inexpensive, easy to manage Cluster
- Replicated in many departments
- Prototype for very large PC cluster
36Adoption of the Approach
37So Whats So Different?
- Commodity parts?
- Communications Packaging?
- Incremental Scalability?
- Independent Failure?
- Intelligent Network Interfaces?
- Complete System on every node
- virtual memory
- scheduler
- files
- ...
38OPPORTUNITIES CHALLENGES
39Opportunity of Large-scaleComputing on NOW
40Windows of Opportunities
- MPP/DSM
- Compute across multiple systems parallel.
- Network RAM
- Idle memory in other nodes. Page across other
nodes idle memory - Software RAID
- file system supporting parallel I/O and
reliablity, mass-storage. - Multi-path Communication
- Communicate across multiple networks Ethernet,
ATM, Myrinet -
41Enabling Technologies
- Efficient communication hardware and software
- Global co-ordination of multiple workstation
Operating Systems
42Efficient Communication
- The key Enabling Technology
- Communication overheads components
- bandwidth
- network latency and
- processor overhead
- Switched LANs allow bandwidth to scale
- Network latency can be overlapped with
computation - Processor overhead is the real problem - it
consumes CPU cycles
43Efficient Communication (Contd...)
- SS10 connected by Ethernet
- 456 ?s processor overhead
- With ATM
- 626 ?s processor overhead
- Target
- MPP communication performance low latency and
scalable bandwidth - CM5 user-level network overhead 5.7 ?s
44- Cluster Computer and its Components
45Clustering Today
- Clustering gained momentum when 3 technologies
converged - 1. Very HP Microprocessors
- workstation performance yesterday
supercomputers - 2. High speed communication
- Comm. between cluster nodes gt between processors
in an SMP. - 3. Standard tools for parallel/ distributed
computing their growing popularity.
46Cluster Computer Architecture
47Cluster Components...1aNodes
- Multiple High Performance Components
- PCs
- Workstations
- SMPs (CLUMPS)
- Distributed HPC Systems leading to Metacomputing
- They can be based on different architectures and
running difference OS
48Cluster Components...1bProcessors
- There are many (CISC/RISC/VLIW/Vector..)
- Intel Pentiums, Xeon, Merceed.
- Sun SPARC, ULTRASPARC
- HP PA
- IBM RS6000/PowerPC
- SGI MPIS
- Digital Alphas
- Integrate Memory, processing and networking into
a single chip - IRAM (CPU Mem) (http//iram.cs.berkeley.edu)
- Alpha 21366 (CPU, Memory Controller, NI)
49Cluster Components2OS
- State of the art OS
- Linux (Beowulf)
- Microsoft NT (Illinois HPVM)
- SUN Solaris (Berkeley NOW)
- IBM AIX (IBM SP2)
- HP UX (Illinois - PANDA)
- Mach (Microkernel based OS) (CMU)
- Cluster Operating Systems (Solaris MC, SCO
Unixware, MOSIX (academic project) - OS gluing layers (Berkeley Glunix)
50Cluster Components3High Performance Networks
- Ethernet (10Mbps),
- Fast Ethernet (100Mbps),
- Gigabit Ethernet (1Gbps)
- SCI (Dolphin - MPI- 12micro-sec latency)
- ATM
- Myrinet (1.2Gbps)
- Digital Memory Channel
- FDDI
51Cluster Components4Network Interfaces
- Network Interface Card
- Myrinet has NIC
- User-level access support
- Alpha 21364 processor integrates processing,
memory controller, network interface into a
single chip..
52Cluster Components5 Communication Software
- Traditional OS supported facilities (heavy weight
due to protocol processing).. - Sockets (TCP/IP), Pipes, etc.
- Light weight protocols (User Level)
- Active Messages (Berkeley)
- Fast Messages (Illinois)
- U-net (Cornell)
- XTP (Virginia)
- System systems can be built on top of the above
protocols
53Cluster Components6aCluster Middleware
- Resides Between OS and Applications and offers in
infrastructure for supporting - Single System Image (SSI)
- System Availability (SA)
- SSI makes collection appear as single machine
(globalised view of system resources). Telnet
cluster.myinstitute.edu - SA - Check pointing and process migration..
54Cluster Components6bMiddleware Components
- Hardware
- DEC Memory Channel, DSM (Alewife, DASH) SMP
Techniques - OS / Gluing Layers
- Solaris MC, Unixware, Glunix
- Applications and Subsystems
- System management and electronic forms
- Runtime systems (software DSM, PFS etc.)
- Resource management and scheduling (RMS)
- CODINE, LSF, PBS, NQS, etc.
55Cluster Components7aProgramming environments
- Threads (PCs, SMPs, NOW..)
- POSIX Threads
- Java Threads
- MPI
- Linux, NT, on many Supercomputers
- PVM
- Software DSMs (Shmem)
56Cluster Components7bDevelopment Tools ?
- Compilers
- C/C/Java/
- Parallel programming with C (MIT Press book)
- RAD (rapid application development tools).. GUI
based tools for PP modeling - Debuggers
- Performance Analysis Tools
- Visualization Tools
57Cluster Components8Applications
- Sequential
- Parallel / Distributed (Cluster-aware app.)
- Grand Challenging applications
- Weather Forecasting
- Quantum Chemistry
- Molecular Biology Modeling
- Engineering Analysis (CAD/CAM)
- .
- PDBs, web servers,data-mining
58Key Operational Benefits of Clustering
- System availability (HA). offer inherent high
system availability due to the redundancy of
hardware, operating systems, and applications. - Hardware Fault Tolerance. redundancy for most
system components (eg. disk-RAID), including both
hardware and software. - OS and application reliability. run multiple
copies of the OS and applications, and through
this redundancy - Scalability. adding servers to the cluster or by
adding more clusters to the network as the need
arises or CPU to SMP. - High Performance. (running cluster enabled
programs)