Title: Current and Emerging Trends in Cluster Computing
1Current and Emerging Trends in Cluster Computing
University of Portsmouth, UK NAG Annual
Symposium, Oxford University 22nd September 2000
http//www.dcs.port.ac.uk/mab/Talks/
2Talk Content
- Background and Overview
- Cluster Architectures
- Cluster Networking
- SSI
- Cluster Tools
- Conclusions
3Commodity Cluster Systems
- Bringing high-end computing to broader problem
domain - new markets - Order of magnitude price/performance advantage.
- Commodity enabled no long development lead
times. - Low vulnerability to vendor-specific decisions
companies are ephemeral Clusters are forever !!!
4Commodity Cluster Systems
- Rapid response to technology tracking.
- User-driven configuration-potential for
application specific ones. - Industry-wide, non-proprietary software
environment.
5Cluster Architecture
6Beowulf-class Systems
- Cluster of PCs
- Intel x86
- DEC Alpha
- Mac Power PC.
- Pure Mass-Market COTS.
- Unix-like O/S with source
- Linux, BSD, Solaris.
- Message passing programming model
- MPI, PVM, BSP, homebrews
- Single user environments.
- Large science and engineering applications.
7Decline of Heavy Metal
- No market for high-end computers
- minimal growth in last five years.
- Extinction
- KSR, TMC, Intel, Meiko, Cray!?, Maspar, BBN,
Convex - Must use COTS
- Fabrication costs skyrocketing
- Development lead times too short
- US Federal Agencies Fleeing
- NSF, DARPA, DOE, NIST
- Currently no new good IDEAS.
8Enabling Drivers
- Drastic reduction in vendor support for HPC.
- Component technologies for PCs matches that for
workstations (in terms of capability). - PC hosted software environments similar in
sophistication and robustness to mainframe OS. - Low cost network hardware and software enable
balanced PC clusters. - MPPs establish low-level of expectation.
- Cross-platform parallel programming model
(MPI/PVM/HPF).
9HPC Architectures Top 500
10Taxonomy
Cluster Computing
11Beowulf Accomplishments
- Many Beowulf-class systems installed.
- Experience gained (implementation/apps).
- Many applications (some large) routinely executed
on Beowulfs. - Basic software fairly sophisticated and robust.
- Supports dominant programming/execution paradigm.
- Single most rapidly growing area in HPC.
- Ever larger systems in development (_at_SNL).
- Now recognised as mainstream.
12Overall Hardware Issues
- All necessary components available in mass market
(M2COTS). - Powerful computational nodes (SMPs).
- Network bandwidth impacts high volume
communication-intensive applications. - Network latency impacts random access (with short
messages) applications. - Many applications work well with 1Mbps/Mflop.
- X10 improvements in bandwidth and latency.
- Price/Perf. advantage of X10 in many cases.
13Technology Drivers
- Reduced recurring costs approx 10 of MPPs.
- Rapid response to technology advances.
- Just-in-place configuration and reconfigurable.
- High reliability if system designed properly.
- Easily maintained through low cost replacement.
- Consistent portable programming model
- Unix, C, Fortran, message passing.
- Applicable to wide range of problems and
algorithms.
14Operating Systems
- Little work on OSs specifically for clusters.
- Turnkey clusters are provided with versions of a
companies mainline products. - Typically there may be some form of SSI
integrated into a conventional OS. - Two variants encountered for
- System administration/job-scheduling purposes -
middleware that enables each node to deliver the
required services. - Kernel-level e.g., transparent remote device
usage or to use a distributed storage facility
that is seen by users as a single standard file
system.
15Linux
- Most popular cluster OS for clusters is Linux.
- It is free
- It is an open source - anyone is free to
customize the kernel to suit ones needs - It is easy - large user community users and
developers have created an abundant number of
tools, web sites, and documentation, so that
Linux installation and administration is
straightforward enough for a typical cluster user.
16Examples Solaris MC
- A multi-computer version of their Solaris OS
called Solaris MC. - Incorporates some advances made by Sun, including
an object-oriented methodology and the use of
CORBA IDL in the kernel. - Consists of a small set of kernel extensions and
a middleware library - provides SSI to the level
of the device - Processes running on one node can access remote
devices as if they were local also provides a
global file system and process space.
17Examples micro-kernels
- Another approach is a minimalist approach by
using micro-kernels - Exokernel is such system. - With this approach, only the minimal amount of
system functionality is built into the kernel -
allowing services that are needed to be loaded. - It maximizes the available physical memory by
removing undesirable functionality - The user can alter the characteristics of the
service, e.g., a scheduler specific to a cluster
application may be loaded that helps it run more
efficiently.
18How Much of the OS is Needed?
- This brings up the issue of OS configuration - in
particular why provide a node OS with the ability
to provide more services to applications than
they are ever likely to use? - e.g. a user may want to alter the personality of
the local OS, e.g. "strip down" to a minimalist
kernel to maximise the available physical memory
and remove undesired functionality. - Mechanisms to achieve this range from
- Use of a new kernel
- Dynamically linking service modules into the
kernel.
19Networking - Introduction
- One of the key enabling technologies that has
established clusters as a dominant force has been
networking technologies. - High performance parallel applications need
low-latency, high-bandwidth and reliable
interconnects. - Existing LAN/WAN technologies/protocols
(10/100Mbps Ethernet, ATM) are not well suited to
support Clusters. - Hence the birth of SANs.
20Comparison
Â
Â
 AA1I put the references with the text that
describes these.
21Why Buy a SAN
- Well, it depends on your application
- For scientific HPC, Myrinet seems to offer a good
MBytes/, lots of software and proven
scalability - Synfinity with its best-in-class 1.6 GBytes/s
could be a valuable alternative for
small/medium-sized clusters, untried at the
moment. - Windows-based users should give Giganet a try
- QsNet and ServerNet II are likely the most
expensive solutions, but an Alpha-based cluster
from Compaq with one of these should be a good
number cruncher.
22Emerging Technologies
- ATOLL
- VIA
- Infiniband
- Active Networks
23ATOLL, fully integrated Network in a single chip.
- New 64/32 bit, 66/33Mhz SAN which aims at the
single chip solution. - All existing components to build a large SAN are
integrated into one single chip. - Includes
- 4 independent host interfaces
- 4 network interfaces
- 8x8 crossbar.
24ATOLL System Configuration
25Communications Concerns
- New Physical Networking Technologies are Fast
- Gigabit Ethernet, ServerNet, Myrinet
- Legacy Network Protocol Implementations are Slow
- System Calls
- Multiple Data Copies.
- Communications Gap
- Systems use a fraction of the available
performance.
26Communications Solutions
- User-level (no kernel) networking.
- Several Existing Efforts
- Active Messages (UCB)
- Fast Messages (UIUC)
- U-Net (Cornell)
- BIP (Univ. Lyon, France)
- Standardization VIA
- Industry Involvement
- Killer Clusters.
User Process
OS
NIC
27VIA
- VIA is a standard that combines many of the best
features of various academic projects, and will
strongly influence the evolution of cluster
computing. - Although VIA can be used directly for application
programming, it is considered by many systems
designers to be at too low a level for
application programming. - With VIA, the application must be responsible for
allocating some portion of physical memory and
using it effectively.
28VIA
- It is expected that most OS and middleware
vendors will provide an interface to VIA that is
suitable for application programming. - Generally, the interface to VIA that is suitable
for application programming comes in the form of
a message-passing interface for scientific or
parallel programming.
29What is VIA?
- Use the kernel for set-ppand get It out of the
way for send/receive! - The Virtual Interface (VI)
- Protected application-application channel
- Memory directly accessible by user process.
- Target Environment
- LANs and SANs at Gigabit speeds
- No reliability of underlying media assumed
(unlike MP fabrics) - Errors/Drops assumed to be rare generally fatal
to VI.
30InfiniBand - Introduction
- System bus technologies are beginning to reach
their limits in terms of speed. - Common PCI buses can only support up to 133 Mbps
across all PCI slots, and even with the 64-bit,
66 MHz buses available in high-end PC servers,
566 Mbps of shared bandwidth is the most a user
can hope for.
31InfiniBand - Introduction
- To counter this, a new standard based on switched
serial links to device groups and devices is
currently in development. - Called InfiniBand, the standard is actually a
merged proposal of two earlier groups Next
Generation I/O (NGIO) led by Intel, Microsoft,
and Sun and Future I/O, supported by Compaq,
IBM, and Hewlett-Packard.
32Infiniband - Performance
- A single InfiniBand link operates at 2.5 Gbps,
point-to-point in a single direction. - Bi-directional links offer twice the throughput
and can be aggregated together into larger pipes
of 1 GBytes/ (four co-joined links), or 3
GBytes/s (12 links). - Higher aggregations of links will be possible in
the future.
33Active Networks
- Traditionally, the function of a network has been
to deliver packets from one end-point to another. - Processing within the network has been limited
largely to routing, simple QOS schemes, and
congestion control. - There is considerable interest in pushing other
kinds of processing into the network. - Examples include the transport-level support for
wireless links of snoop-TCP and the
application-specific filtering of network
firewalls
34Active Networks
- Active networks take this trend to the extreme.
- They allow servers and clients to inject
customised programs into the nodes of the
network, thus inter-posing application-specified
computation between communicating end-points. - In this manner, the entire network may be treated
as part of the overall system that can be
specialised to achieve application efficiency.
35Recap
- Whistle stop tour looked at some of the existing
and merging network technologies that are being
used with current clusters. - Hardware is more advanced than software that
comes with it. - Software is starting to catch up VIA and Active
networks are providing performance and
functionality that todays sophisticated
applications require.
36Single System Image (SSI)
- SSI is the illusion, created by software or
hardware, that presents a collection of computing
resources as one whole unified resource. - SSI makes the cluster appear like a single
machine to the user, to applications, and to the
network.
37Benefits of SSI
- Use of system resources transparent.
- Transparent process migration and load balancing
across nodes. - Potentially improved
- reliability and higher availability, system
response time and performance. - Simplified system management.
- Reduction in the risk of operator errors.
- No need to be aware of the underlying system
architecture to use these machines effectively.
38Desired SSI Services
- Single Entry Point
- telnet cluster.my_institute.edu
- telnet node1.cluster. institute.edu
- Single File Hierarchy /Proc, NFS, xFS, AFS, etc.
- Single Control Point Management GUI
- Single memory space - Network RAM/DSM
- Single Job Management Codine LSF
- Single GUI Like workstation/PC windowing
environment it may be Web technology
39Cluster Tools Introduction
- Essential that there are numerical libraries and
programming tools available to application
developers and system maintainers. - Clusters present a different software
environments on which to build libraries and
applications, and requires a new level of
flexibility in the algorithms to achieve adequate
levels of performance. - Many advances and developments in the creation of
parallel code and tools for distributed memory
and SMP-based machines.
40Cluster Tools
- In most cases, MPI-based libraries and tools will
operate on cluster, but they may not achieve an
acceptable level of efficiency or effectiveness
on clusters that comprise SMP nodes. - Little software exists that offers the mixed-mode
parallelism of distributed SMPs. - Need to consider how to create more effective
libraries and tools for the hybrid clusters.
41Cluster Tools
- A major recent architectural innovation is
clusters of shared-memory multi-processors
referred to as a Constellation - architectures of
the ASCI machines, and promise to be the fastest
general-purpose machines available for the next
few years. - It is the depth of the memory hierarchy with its
different access primitives and costs at each
level that makes Constellations more challenging
to design and use effectively than their SMP and
MPP predecessors.
42Cluster Tools
- Users need a uniform programming environment that
can be used across uni-processors, SMPs, MPPs,
and Constellations. - Currently, each type of machine has a completely
different programming model - SMPs have dynamic thread libraries with
communication through shared memory - MPPs have SPMD parallelism with message passing
communication (e.g., MPI) - Constellations have the union of these two
models, requiring that the user write two
different parallel programs for a single
application.
43System Tools
- The lack of good management tools represents a
hidden operation cost that is often overlooked. - There are numerous tools and techniques available
for the administration of clusters, few of these
tools ever see the outside of their developers
cluster basically they are developed for
specific in-house tasks. - This results in a great deal of duplicated effort
among cluster administrators and software
developers.
44System Tools
- The tools should provide the look and feel of
commands issued to a single machine. - This is accomplished through using lists, or
configuration files, to represent the group of
machines on which a command will operate.
45System Tools Security
- Security inside a cluster, between cluster nodes,
is somewhat relaxed for a number of practical
reasons. - Some of these include
- Improve performance
- Ease programming
- All nodes are generally compromised if one
cluster nodes security is compromised. - Thus, security from outside the cluster into the
cluster is of utmost concern.
46System Tools Scalability
- A user may tolerate an inefficient tool that
takes minutes to perform an operation across a
cluster of 8 machines as it is faster than
performing the operation manually 8 times. - However, that user will most likely find it
intolerable to wait over an hour for the same
operation to take effect across 128 cluster
nodes. - A further complication is federated clusters
-extending even further to wide area
administration.
47System Tools Some Areas
- Move disk images from image server to clients.
- Copy/Move/remove client files.
- Build a bootable diskette to initially boot a new
cluster node prior to installation. - Secure shell ssh.
- Cluster-wide ps manipulate cluster-wide
processes. - DHCP, used to allocate IP addresses to machines
on a given network- lease IP to node. - Shutdown/reboot individual nodes.
48Towards the Future
- 2 Gflop/s peak processors.
- 1000 per processor (already there!).
- 1 Gbps at lt 250 per port.
- new backplane performance Infiniband!
- Light-weight communications lt10?s latency (VIA).
- Optimised math libraries.
- 1 GByte of main memory per node.
- 24 Gbytes of disk storage per node.
- De facto standardised middleware.
49Million Tflops
- Today, 3M peak Tflops/s.
- Before year 2002 1M peak Tflops/s.
- Performance efficiency is serious challenge
- System integration
- does vendor support of massive parallelism have
to mean massive markup. - System administration, boring but necessary.
- Maintenance without vendors how?
- New kind of vendors for support.
- Heterogeneity will become major aspect.
50Summary of Immediate Challenges
- There are more costs than recurring costs.
- Higher level of expertise is required in house.
- Software environments behind vendor offerings.
- Tightly coupled systems easier to exploit in some
cases. - Linux model of development scares people.
- Not yet for everyone.
- PC-clusters have not achieved maturity.
51ConclusionsFuture Technology Trends
- Systems On a Chip (SOC) new Transputers!
- Multi-GHz processors.
- 64-bit processors applications.
- Gbit DRAM.
- micro-disks on a board.
- Optical fibre and wave-division multiplexing.
- Very high bandwidth back-planes.
- Low-latency/high bandwidth COTS switches.
- SMP on a chip.
- Processor In Memory (PIM).
52The Future
- Common standards and Open Source software.
- Better
- Tools, utilities and libraries
- Design with minimal risk to accepted standards.
- Higher degree of portability (standards).
- Wider range and scope of HPC applications.
- Wider acceptance of HPC technologies and
techniques in commerce and industry. - Emerging GRID-based environments.
53Ending
- Like to thank
- Thomas Sterling for use of some the materials
used. - Recommend you monitor TFCC activities
http//www.ieeetfcc.org - Join TFCCs mailing list.
- Send me a reference to your projects.
- Join in TFCCs efforts (sponsorship, organise
meetings, contribute to publications). - White paper constructive comments please
54IEEE Computer Society
- Task Force on Cluster Computing
- (TFCC)
- http//www.ieeetfcc.org