Title: Building a Beowulf: My Perspective and Experience
1Building a BeowulfMy Perspective and Experience
- Ron Choy
- Lab. for Computer Science
- MIT
2Outline
- History/Introduction
- Hardware aspects
- Software aspects
- Our class Beowulf
3The Beginning
- Thomas Sterling and Donald Becker CESDIS, Goddard
Space Flight Center, Greenbelt, MD - Summer 1994 built an experimental cluster
- Called their cluster Beowulf
4The First Beowulf
- 16 x 486DX4, 100MHz processors
- 16MB of RAM each, 256MB in total
- Channel bonded Ethernet (2 x 10Mbps)
- Not that different from our Beowulf
5The First Beowulf (2)
6Current Beowulfs
- Faster processors, faster interconnect, but the
idea remains the same - Cluster database http//clusters.top500.org/db/Qu
ery.php3 - Top cluster 1.433 TFLOPS peak
7Current Beowulfs (2)
8What is a Beowulf ?
- Massively parallel computer built out of COTS
- Runs a free operating system (not Wolfpack, MSCS)
- Connected by high speed interconnect
- Compute nodes are dedicated (not Network of
Workstations)
9Why Beowulf?
- Its cheap!
- Our Beowulf, 18 processors, 9GB RAM 15000
- A Sun Enterprise 250 Server, 2 processors, 2GB
RAM 16000 - Everything in a Beowulf is open-source and open
standard - easier to manage/upgrade
10Essential Components of a Beowulf
- Processors
- Memory
- Interconnect
- Software
11Processors
- Major vendors AMD, Intel
- AMD Athlon MP
- Intel Pentium 4
12Comparisons
- Athlon MPs have more FPUs (3) and higher peak
FLOP rate - P4 with highest clock rate (2.2GHz) beats out the
Athlon MP with highest clock rate (1.733GHz) in
real FLOP rate - Athlon MPs have higher real FLOP per dollar,
hence it is more popular
13Comparisons (2)
- P4 supports SSE2 instruction set, which perform
SIMD operations on double precision data (2 x
64-bit) - Athlon MP supports only SSE, for single precision
data (4 x 32-bit)
14Memory
- DDR RAM (double data rate) used mainly by
Athlons, P4 can use them as well - RDRAM (Rambus DRAM) used by P4s
15Memory Bandwidth
- Good summary
- http//www6.tomshardware.com/mainboard/02q1/02031
1/sis645dx-03.html - DDR beats out RDRAM in bandwidth, and is also
cheaper
16Interconnect
- The most important component
- Factors to consider
- Bandwidth
- Latency
- Price
- Software support
17Ethernet
- Relatively inexpensive, reasonably fast and very
popular. - Developed by Bob Metcalfe and D.R. Boggs at Xerox
PARC - A variety of flavors (10Mbps, 100Mbps, 1Gbps)
18Pictures of Ethernet Devices
19Myrinet
- Developed by Myricom
- OS bypass, the network card talks directly to
host processes - Proprietary, but very popular because of its low
latency and high bandwidth - Usually used in high-end clusters
20Myrinet pictures
21Comparison
Fast Ethernet Gigabit Ethernet Myrinet
Latency 120?s 120 ?s 7 ?s
Bandwidth 100Mbps peak 1Gbps peak 1.98Gbps real
22Cost Comparison
- To equip our Beowulf with
- Fast ethernet 1700
- Gigabit ethernet 5600
- Myrinet 17300
23How to choose?
- Depends on your application!
- Requires really low latency e.g. QCD? Myrinet
- Requires high bandwidth and can live with higher
latency e.g. ScaLAPACK? Gigabit ethernet - Embarrassingly parallel? Anything
24What would you gain from fast interconnect?
- Our cluster Single fast ethernet (100Mbps)
- 36.8 GFLOPS peak, HPL 12 GFLOPS
- 32.6 efficiency
- GALAXY Gigabit ethernet
- 20 GFLOPS peak, HPL 7 GFLOPS
- 35 efficiency old, slow tcp/ip stack!
- HELICS Myrinet 2000
- 1.4 TFLOPS peak, HPL 864 GFLOPS
- 61.7 efficiency
25My experience with hardware
- How long did it take for me to assemble the 9
machines? 8 hours, nonstop
26Real issue 1 - space
- Getting a Beowulf is great, but do you have the
space to put it? - Often space is at a premium, and Beowulf is not
as dense as traditional supercomputers - Rackmount? Extra cost! e.g. cabinet 1500,
case for one node 400
27Real issue 2 heat management
- The nodes, with all the high powered processors
and network cards, run hot - Especially true for Athlons - can reach 60C
- If not properly managed, the heat can cause crash
or even hardware damage! - Heatsink/fans remember to put in in the right
direction
28Real issue 3 - power
- Do you have enough power in your room?
- UPS? Surge protection?
- You dont want a thunderstorm to fry your
Beowulf! - For our case we have a managed machine room -
lucky
29Real issue 4 - noise
- Beowulfs are loud. Really loud.
- You dont want it on your desktop.
Bad idea
30Real issue 5 - cables
- Color scheme your cables!
31Software
- Well concentrate on the cluster management core
- Three choices
- Vanilla Linux/FreeBSD
- Free cluster management software (a very patched
up Linux) - Commercial cluster management software (very very
patched up Linux)
32The issues
- Beowulfs can get very large (100s of nodes)
- Compute nodes should setup themselves
automatically - Software updates must be automated across all the
nodes - Software coherency is an issue
33Vanilla Linux
- Most customizable, easiest to make changes
- Easiest to patch
- Harder for someone else to inherit the cluster
a real issue - Need to know a lot about Linux to properly setup
34Free cluster management softwares
- Oscar http//oscar.sourceforge.net/
- Rocks http//rocks.npaci.edu
- MOSIX http//www.mosix.org/
- (usually patched) Linux that comes with software
for cluster management - Reduces dramatically the time needed to get
things up and running - Open source, but if something breaks, you have
one more piece of software to hack
35Commercial cluster management
- Scyld www.scyld.com - founded by Donald Becker
- Scyld father of Beowulf
- Sells a heavily patched Linux distribution for
clustering, free version available but old - Based on bProc, which is similar to MOSIX
36My experience/opinions
- I chose Rocks because I needed the Beowulf up
fast, and its the first cluster management
software I came across - It was a breeze to setup
- But now the pain begins severe lack of
documentations - I will reinstall everything after the semester is
over
37Software (contd)
- Note that I skipped a lot of details e.g. file
system choice (NFS? PVFS?), MPI choice (MPICH?
LAM?), libraries to install - I could talk forever about Beowulfs but it wont
fit in one lecture
38Recipe we used for our Beowulf
- Ingredients 15000, 3 x 6 packs of coke, 1 grad
student - Web surf for 1 weeks, try to focus on the Beowulf
sites, decide on hardware - Spend 2 days filling in various forms for
purchasing and obtaining competitive quotes - Wait 5 days for hardware to arrive, meanwhile web
surf some more, and enjoy the last few days of
free time in a while
39Recipe (contd)
- 4. Lock grad student, hardware (not money), and
coke in an office. Ignore scream. The hardware
should be ready after 8 hours.
Office of the future
40Recipe (contd 2)
- 5. Move grad student and hardware to its final
destination. By this time grad student will be
emotionally attached to the hardware. This is
normal. Have grad student set up software. This
would take 2 weeks.
41Our Beowulf
42Things I would have done differently
- No Rocks, try Oscar or may be vanilla Linux
- Color scheme the cables!
- Try a diskless setup (saves on cost)
- Get rackmount
43Design a 30000 Beowulf
- One node (2 processors, 1GB RAM) costs 1400,
with 4.6 GFLOPS peak - Should we get
- 16 nodes, with fast ethernet, or
- 8 nodes, with Myrinet?
44Design (contd)
- 16 nodes with fast ethernet
- 73.6 GFLOPS peak
- 23.99 GFLOPS real (using the efficiency of our
cluster) - 16 GB of RAM
- 8 nodes with Myrinet
- 36.8 GFLOPS peak
- 22.7 GFLOPS real (using the efficiency of HELICS)
- 8 GB of RAM
45Design (contd 2)
- First choice is good if you work on linear
algebra applications, and require lots of memory - Second choice is more general purpose