Title: Single System Image
1Single System Image
2Cluster Computer Architecture
Parallel Applications
Parallel Applications
Parallel Applications
Sequential Applications
Sequential Applications
Sequential Applications
Parallel Programming Environment
Cluster Middleware (Single System Image and
Availability Infrastructure)
Cluster Interconnection Network/Switch
3A major issues in Cluster design
- Enhanced Performance (performance _at_ low cost)
- Enhanced Availability (failure management)
- Single System Image (look-and-feel of one system)
- Size Scalability (physical application)
- Fast Communication (networks protocols)
- Load Balancing (CPU, Net, Memory, Disk)
- Security and Encryption (clusters of clusters)
- Distributed Environment (Social issues)
- Manageability (admin. And control)
- Programmability (simple API if required)
- Applicability (cluster-aware and non-aware app.)
4A typical Cluster Computing Environment
Applications
PVM / MPI/ RSH
???
Hardware/OS
5The missing link is provide by cluster
middleware/underware
PVM / MPI/ RSH
6Middleware Design Goals
- Complete Transparency (Manageability)
- Lets the see a single cluster system..
- Single entry point, ftp, telnet, software
loading... - Scalable Performance
- Easy growth of cluster
- no change of API automatic load distribution.
- Enhanced Availability
- Automatic Recovery from failures
- Employ checkpointing fault tolerant
technologies - Handle consistency of data when replicated..
7What is Single System Image (SSI)?
- SSI is the illusion, created by software or
hardware, that presents a collection of computing
resources as one, more whole resource. - SSI makes the cluster appear like a single
machine to the user, to applications, and to the
network.
8Benefits of SSI
- Use of system resources transparent.
- Transparent process migration and load balancing
across nodes. - Improved reliability and higher availability.
- Improved system response time and performance
- Simplified system management.
- Reduction in the risk of operator errors.
- No need to be aware of the underlying system
architecture to use these machines effectively.
9Desired SSI Services
- Single Entry Point
- telnet cluster.my_institute.edu
- telnet node1.cluster. institute.edu
- Single File Hierarchy /Proc, NFS, xFS, AFS, etc.
- Single Control Point Management GUI
- Single virtual networking
- Single memory space - Network RAM/DSM
- Single Job Management Glunix, Codine, LSF
- Single GUI Like workstation/PC windowing
environment it may be Web technology
10Availability Support Functions
- Single I/O space
- Any node can access any peripheral or disk
devices without the knowledge of physical
location. - Single process Space
- Any process on any node create process with
cluster wide process wide and they communicate
through signal, pipes, etc, as if they are one a
single node. - Checkpointing and process migration
- Can saves the process state and intermediate
results in memory to disk to support rollback
recovery when node fails. RMS Load balancing...
11SSI Levels
- SSI levels of abstractions
12SSI at Application and Sub-system Levels
(c) In search of clusters
13SSI at OS Kernel Level
(c) In search of clusters
14SSI at Hardware Level
memory and I/O
(c) In search of clusters
15SSI Characteristics
- Every SSI has a boundary.
- Single system support can exist at different
levels within a system, one able to be build on
another.
16SSI Boundaries
Batch System
(c) In search of clusters
17Relationship Among Middleware Modules
18SSI via OS path!
- 1. Build as a layer on top of the existing OS
- Benefits makes the system quickly portable,
tracks vendor software upgrades, and reduces
development time. - i.e. new systems can be built quickly by mapping
new services onto the functionality provided by
the layer beneath. e.g. Glunix. - 2. Build SSI at kernel level, True Cluster OS
- Good, but Cant leverage of OS improvements by
vendor. - E.g. Unixware, Solaris-MC, and MOSIX.
19SSI Systems Tools
- OS level SSI
- SCO NSC UnixWare
- Solaris-MC
- MOSIX, .
- Middleware level SSI
- PVM, TreadMarks (DSM), Glunix, Condor, Codine,
Nimrod, . - Application level SSI
- PARMON, Parallel Oracle, ...
20SCO Non-stop Cluster for UnixWare
http//www.sco.com/products/clustering/
Other nodes
21How does NonStop Clusters Work?
- Modular Extensions and Hooks to Provide
- Single Clusterwide Filesystem view
- Transparent Clusterwide device access
- Transparent swap space sharing
- Transparent Clusterwide IPC
- High Performance Internode Communications
- Transparent Clusterwide Processes,
migration,etc. - Node down cleanup and resource failover
- Transparent Clusterwide parallel TCP/IP
networking - Application Availability
- Clusterwide Membership and Cluster timesync
- Cluster System Administration
- Load Leveling.
22Sun Solaris MC
- Solaris MC A High Performance Operating System
for Clusters - A distributed OS for a multicomputer, a cluster
of computing nodes connected by a high-speed
interconnect - Provide a single system image, making the cluster
appear like a single machine to the user, to
applications, and the the network - Built as a globalization layer on top of the
existing Solaris kernel - Interesting features
- extends existing Solaris OS
- preserves the existing Solaris ABI/API compliance
- provides support for high availability
- uses C, IDL, CORBA in the kernel
- leverages spring technology
23Solaris-MC Solaris for MultiComputers
- global file system
- globalized process management
- globalized networking and I/O
http//www.sun.com/research/solaris-mc/
24Solaris MC components
- Object and communication support
- High availability support
- PXFS global distributed file system
- Process management
- Networking
25MOSIX Multicomputer OS for UNIX
http//www.mosix.cs.huji.ac.il/ mosix.org
- An OS module (layer) that provides the
applications with the illusion of working on a
single system. - Remote operations are performed like local
operations. - Transparent to the application - user interface
unchanged.
Application
PVM / MPI / RSH
MOSIX
Hardware/OS
26Main tool
Preemptive process migration that can migrate ?
any process, anywhere, anytime
- Supervised by distributed algorithms that
respond on-line to global resource availability
transparently. - Load-balancing - migrate process from over-loaded
to under-loaded nodes. - Memory ushering - migrate processes from a node
that has exhausted its memory, to prevent
paging/swapping.
27MOSIX for Linux at HUJI
- A scalable cluster configuration
- 50 Pentium-II 300 MHz
- 38 Pentium-Pro 200 MHz (some are SMPs)
- 16 Pentium-II 400 MHz (some are SMPs)
- Over 12 GB cluster-wide RAM
- Connected by the Myrinet 2.56 G.b/s LANRuns
Red-Hat 6.0, based on Kernel 2.2.7 - Upgrade HW with Intel, SW with Linux
- Download MOSIX
- http//www.mosix.cs.huji.ac.il/