October 25 Slide 1 - PowerPoint PPT Presentation

About This Presentation

Title:

October 25 Slide 1

Description:

Dual-processor nodes. Less memory bandwidth per processor. Dual-core processors ... Dual-core CPUs. October 25 | Slide 27. Tim Skirvin and Jim Phillips ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 62

Provided by: Lakh

Learn more at: https://www.ks.uiuc.edu

Category:

Tags: october

more less

Transcript and Presenter's Notes

Title: October 25 Slide 1

1
Linux Clusters for High-Performance Computing

Jim Phillips and Tim Skirvin
Theoretical and Computational Biophysics
Beckman Institute

2
HPC vs High-Availability

There are two major types of Linux clusters
High-Performance Computing
Multiple computers running a single job for
increased performance
High-Availability
Multiple computers running the same job for
increased reliability
We will be talking about the former!

3
Why Clusters?

Cheap alternative to big iron
Local development platform for big iron code
Built to task (buy only what you need)
Built from COTS components
Runs COTS software (Linux/MPI)
Lower yearly maintenance costs
Single failure does not take down entire facility
Re-deploy as desktops or throw away

4
Why Not Clusters?

Non-parallelizable or tightly coupled application
Cost of porting large existing codebase too high
No source code for application
No local expertise (dont know Unix)
No vendor hand holding
Massive I/O or memory requirements

5
Know Your Users

Who are you building the cluster for?
Yourself and two grad students?
Yourself and twenty grad students?
Your entire department or university?
Are they clueless, competitive, or malicious?
How will you to allocate resources among them?
Will they expect an existing infrastructure?
How well will they tolerate system downtimes?

6
Your Users Goals

Do you want increased throughput?
Large number of queued serial jobs.
Standard applications, no changes needed.
Or decreased turnaround time?
Small number of highly parallel jobs.
Parallelized applications, changes required.

7
Your Application

The best benchmark for making decisions is your
application running your dataset.
Designing a cluster is about trade-offs.
Your application determines your choices.
No supercomputer runs everything well either.
Never buy hardware until the application is
parallelized, ported, tested, and debugged.

8
Your Application Parallel Performance

How much memory per node?
How would it scale on an ideal machine?
How is scaling affected by
Latency (time needed for small messages)?
Bandwidth (time per byte for large messages)?
Multiprocessor nodes?
How fast do you need to run?

9
Budget

Figure out how much money you have to spend.
Dont spend money on problems you wont have.
Design the system to just run your application.
Never solve problems you cant afford to have.
Fast network on 20 nodes or slower on 100?
Dont buy the hardware until
The application is ported, tested, and debugged.
The science is ready to run.

10
Environment

The cluster needs somewhere to live.
You wont want it in your office.
Not even in your grad students office.
Cluster needs
Space (keep the fire martial happy).
Power
Cooling

11
Environment Power

Make sure you have enough power.
Kill-A-Watt
30 at ThinkGeek
1.3Ghz Athlon draws 183 VA at full load
Newer systems draw more measure for yourself!
More efficient power supplies help
Wall circuits typically supply about 20 Amps
Around 12 PCs _at_ 183VA max (8-10 for safety)

12
Environment Power Factor

More efficient power supplies do help!
Always test your power under load.

W V x A x PF
13
Environment Uninterruptible Power

5kVA UPS (3,000)
Holds 24 PCs _at_183VA (safely)
Will need to work out building power to them
May not need UPS for all systems, just root node

14
Environment Cooling

Building AC will only get you so far
Make sure you have enough cooling.
One PC _at_183VA puts out 600 BTU of heat.
1 ton of AC 12,000 BTUs 3500 Watts
Can run 20 CPUs per ton of AC

15
Hardware

Many important decisions to make
Keep application performance, users, environment,
local expertise, and budget in mind
An exercise in systems integration, making many
separate components work well as a unit
A reliable but slightly slower cluster is better
than a fast but non-functioning cluster
Always benchmark a demo system first!

16
Hardware Networking

Two main options
Gigabit Ethernet cheap (100-200/node),
universally supported and tested, cheap commodity
switches up to 48 ports.
24-port switches seem the best bang-for-buck
Special interconnects
Myrinet very expensive (thousands per node),
very low latency, logarithmic cost model for very
large clusters.
Infiniband similar, less common, not as well
supported.

17
Hardware Other Components

Filtered Power (Isobar, Data Shield, etc)
Network Cables buy good ones, youll save
debugging time later
If a cable is at all questionable, throw it away!
Power Cables
Monitor
Video/Keyboard Cables

18
User Rules of Thumb

1-4 users
Yes, you still want a queueing system.
Plan ahead to avoid idle time and conflicts.
5-20 users
Put one person in charge of running things.
Work out a fair-share or reservation system.
gt 20 users
User documentation and examples are essential.
Decide who makes resource allocation decisions.

19
Application Rules of Thumb

1-2 programs
Dont pay for anything you wont use.
Benchmark, benchmark, benchmark!
Be sure to use your typical data.
Try different compilers and compiler options.
gt 2 programs
Select the most standard OS environment.
Benchmark those that will run the most.
Consider a specialized cluster for dominant apps
only.

20
Parallelization Rules of Thumb

Throughput is easyapp runs as is.
Turnaround is not
Parallel speedup is limited by
Time spent in non-parallel code.
Time spent waiting for data from the network.
Improve serial performance first
Profile to find most time-consuming functions.
Try new algorithms, libraries, hand tuning.

21
Some Details Matter More

What limiting factor do you hit first?
Budget?
Space, power, and cooling?
Network speed?
Memory speed?
Processor speed?
Expertise?

22
Limited by Budget

Dont waste money solving problems you cant
afford to have right now
Regular PCs on shelves (rolling carts)
Gigabit networking and multiple jobs
Benchmark performance per dollar.
The last dollar you spend should be on whatever
improves your performance.
Ask for equipment funds in proposals!

23
Limited by Space

Benchmark performance per rack
Consider all combinations of
Rackmount nodes
More expensive but no performance loss
Dual-processor nodes
Less memory bandwidth per processor
Dual-core processors
Less memory bandwidth per core

24
Limited by Power/Cooling

Benchmark performance per Watt
Consider
Opteron or PowerPC rather than Xeon
Dual-processor nodes
Dual-core processors

25
Limited by Network Speed

Benchmark your code at NCSA.
10,000 CPU-hours is easy to get.
Try running one process per node.
If that works, buy single-processor nodes.
Try Myrinet.
If that works, can you run at NCSA?
Can you run more, smaller jobs?

26
Limited by Serial Performance

Is it memory performance? Try
Single-core Opterons
Single-processor nodes
Larger cache CPUs
Lower clock speed CPUs
Is it really the processor itself? Try
Higher clock speed CPUs
Dual-core CPUs

27
Limited by Expertise

There is no substitute for a local expert.
Qualifications
Comfortable with the Unix command line.
Comfortable with Linux administration.
Cluster experience if you can get it.

28
System Software

Linux is just a starting point.
Operating system,
Libraries - message passing, numerical
Compilers
Queuing Systems
Performance
Stability
System security
Existing infrastructure considerations

29
Scyld Beowulf / Clustermatic

Single front-end master node
Fully operational normal Linux installation.
Bproc patches incorporate slave nodes.
Severely restricted slave nodes
Minimum installation, downloaded at boot.
No daemons, users, logins, scripts, etc.
No access to NFS servers except for master.
Highly secure slave nodes as a result

30
Oscar/ROCKS

Each node is a full Linux install
Offers access to a file system.
Software tools help manage these large numbers of
machines.
Still more complicated than only maintaining one
master node.
Better suited for running multiple jobs on a
single cluster, vs one job on the whole cluster.

31
System Software Compilers

No point in buying fast hardware just to run poor
performing executables
Good compilers might provide 50-150 performance
improvement
May be cheaper to buy a 2,500 compiler license
than to buy more compute nodes
Benchmark real application with compiler, get an
eval compiler license if necessary

32
System Software Message Passing Libraries

Usually dictated by application code
Choose something that will work well with
hardware, OS, and application
User-space message passing?
MPI industry standard, many implementations by
many vendors, as well as several free
implementations
Others Charm, BIP, Fast Messages

33
System Software Numerical Libraries

Can provide a huge performance boost over
Numerical Recipes or in-house routines
Typically hand-optimized for each platform
When applications spend a large fraction of
runtime in library code, it pays to buy a license
for a highly tuned library
Examples BLAS, FFTW, Interval libraries

34
System Software Batch Queueing

Clusters, although cheaper than big iron are
still expensive, so should be efficiently
utilized
The use of a batch queueing system can keep a
cluster running jobs 24/7
Things to consider
Allocation of sub-clusters?
1-CPU jobs on SMP nodes?
Examples Sun Grid Engine, PBS, Load Leveler

35
System Software Operating System

Any annoying management or reliability issues get
hugely multiplied in a cluster environment.
Plan for security from the outset
Clusters have special needs use something
appropriate for the application and hardware

36
System Software Install It Yourself

Dont use the vendors pre-loaded OS.
They would love to sell you 100 licenses.
What happens when you have to reinstall?
Do you like talking to tech support?
Are those flashy graphics really useful?
How many security holes are there?

37
Security Tips

Restrict physical access to the cluster, if
possible.
Make sure youre involved in all tours, to make
sure nobody touches anything.
If youre on campus, put your clusters into the
Fully Closed network group
Might cause some limitations if youre trying to
submit from off-site
Will cause problems with GLOBUS
The built-in firewall is your friend!

38
Purchasing Tips Before You Begin

Get your budget
Work out the space, power, and cooling capacities
of the room.
Start talking to vendors early
But dont commit!
Dont fall in love with any one vendor until
youve looked at them all.

39
Purchasing Tips Design Notes

Make sure to order some spare nodes
Serial nodes and hot-swap spares
Keep them running to make sure they work.
If possible, install HDs only in head node
State law and UIUC policy requires all hard
drives to be wiped before disposal
It doesnt matter if the drive never stored
anything!
Each drive will take 8-10 hours to wipe.
Save yourself a world of pain in a few years
or just give your machines to some other campus
group, and make them worry about it.

40
Purchasing Tips Get Local Service

If a node dies, do you want to ship it?
Two choices
Local business (Champaign Computer)
Major vendor (Sun)
Ask others about responsiveness.
Design your cluster so that you can still run
jobs if a couple of nodes are down.

41
Purchasing Tips Dealing with Purchasing

You will want to put the cluster order on a
Purchase Order (PO)
Do not pay for the cluster until it entirely
works.
Prepare a ten-point letter
Necessary for all purchases gt25k.
Examples are available with your business office
(or bug us for our examples).
These arent difficult to write, but will
probably be necessary.

42
Purchasing Tips The Bid Process

Any purchase gt28k must go up for bid
Exception sole-source vendors
Number grows every year
Adds a month or so to the purchase time
If you can keep the numbers below the magic 28k,
do it!
The bid limit may be leverage for vendors to drop
their prices just below the limit plan
accordingly.
You will get lots of junk bids
Be very specific about your requirements to keep
them away!

43
Purchasing Tips Working the Bid Process

Use sole-source vendors where possible.
This is a major reason why we buy from Sun.
Check with your purchasing people.
This wont help you get around the month time
loss, as the item still has to be posted.
Purchase your clusters in small chunks
Only works if youre looking at a relatively
small cluster.
Again, you may be able to use this as leverage
with your vendor to lower their prices.

44
Purchasing Tips Receiving Your Equipment

Let Receiving know that the machines are coming.
It will take up a lot of space on the loading
dock.
Working with them to save space will earn you
good will (and faster turnaround).
Take your machines out of Receivings space as
soon as reasonably possible.

45
Purchasing Tips Consolidated Inventory

Try to convince your Inventory workers to tag
each cluster, and not each machine
Its really going to be running as a cluster
anyway (right?).
This will make life easier on you.
Repairs are easier when you dont have to worry
about inventory stickers
This will make life easier for them.
3 items to track instead of 72

46
Purchasing Tips Assembly

Get extra help for assembly
Its reasonably fun work
as long as the assembly line goes fast.
Demand pizza.
Test the assembly instructions before you begin
Nothing is more annoying than having to realign
all of the rails after theyre all screwed in.

47
Purchasing Tips Testing and Benchmarking

Test the cluster before you put it into
production!
Sample jobs cpuburn
Look at power consumption
Test for dead nodes
Remember vendors make mistakes!
Even their demo applications may not work check
for yourself.

48
Case Studies

The best way to illustrate cluster design is to
look at how somebody else has done it.
The TCB Group has designed four separate Linux
clusters in the last six years

49
2001 Case Study

Users
Many researchers with MD simulations
Need to supplement time on supercomputers
Application
Not memory-bound, runs well on IA32
Scales to 32 CPUs with 100Mbps Ethernet
Scales to 100 CPUs with Myrinet

50
2001 Case Study 2

Budget
Initially 20K, eventually grew to 100K
Environment
Full machine room, slowly clear out space
Under-utilized 12kVA UPS, staff electrician
3 ton chilled water air conditioner (Liebert)

51
2001 Case Study 3

Hardware
Fastest AMD Athon CPUs available (1333 MHz).
Fast CL2 SDRAM, but not DDR.
Switched 100Mbps Ethernet, Intel EEPro cards.
Small 40 GB hard drives and CD-ROMs.
System Software
Scyld clusters of 32 machines, 1 job/cluster.
Existing DQS, NIS, NFS, etc. infrastructure.

52
2003 Case Study

What changed since 2001
50 increase in processor speed
50 increase in NAMD serial performance
Improved stability of SMP Linux kernel
Inexpensive gigabit cards and 24-port switches
Nearly full machine room and power supply
Popularity of compact form factor cases
Emphasis on interactive MD of small systems

53
2003 Case Study 2

Budget
Initially 65K, eventually grew to 100K
Environment
Same general machine room environment
Additional machine room space is available in
server room
Just switched to using rack-mount equipment
Still using the old clusters dont want to get
rid of them entirely
Need to be more space-conscious

54
2003 Case Sudy 3

Option 1
Single processor, small form factor nodes.
Hyperthreaded Pentium 4 processors.
32 bit 33 MHz gigabit network cards.
24 port gigabit switch (24-processor clusters).
Problems
No ECC memory.
Limited network performance.
Too small for next-generation video cards.

55
2003 Case Study 4

Final decision
Dual Athlon MP 2600 in normal cases.
No hard drives or CD-ROMs.
64 bit 66 MHz gigabit network cards.
24 port gigabit switch (48-proc clusters).
Clustermatic OS, boot slaves off of floppy.
Floppies have proven very unreliable, especially
when left in the drives.
Benefits
Server class hardware w/ ECC memory.
Maximum processor count for large simulations.
Maximum network bandwidth for small simulations.

56
2003 Case Study 5

Athlon clusters from 2001 recycled
36 nodes outfitted as desktops
Added video cards, hard drives, extra RAM
Cost 300/machine
Now dead or in 16-node Condor test cluster
32 nodes donated to another group
Remaining nodes move to server room
16-node Clustermatic cluster (used by guests)
12 spares and build/test boxes for developers

57
2004 Case Study

What changed since 2003
Technologically, not much!
Space is more of an issue.
A new machine room has been built for us.
Vendors are desperate to sell systems at any
price.

58
2004 Case Study 2

Budget
Initially 130K, eventually grew to 180K
Environment
New machine room will store the new clusters.
Two five-ton Liebert air conditioners have been
installed.
There is minimal floor space, enough for four
racks of equipment.

59
2004 Case Study 3

Final decision
72x Sun V60x rack-mount servers.
Dual 3.06GHz Intel processors only slightly
faster
2GB RAM, Dual 36GB HDs, DVD-ROM included in deal
Network-bootable gigabit ethernet built in
Significantly more stable than any old cluster
machine
3x 24 port gigabit switch (3x 48-processor
clusters)
6x serial nodes (identical to above, also serve
as spares)
Sun Rack 900-38
26 systems per rack, plus switch and UPS for head
nodes
Clustermatic 4 on RedHat 9

60
2004 Case Study 4

Benefits
Improved stability over old clusters.
Management is significantly easier with Sun
servers than PC whiteboxes.
Network booting of slaves allows lights-off
management.
Systems use up minimal floor space.
Similar performance to 2003 allows all 6 clusters
(3 old 3 new) to take jobs from a single queue.
Less likely to run out of memory when running an
express queue job.
Complete machines easily retasked.

61
For More Information

http//www.ks.uiuc.edu/Development/Computers/Clust
er/
http//www.ks.uiuc.edu/Training/Workshop/Clusters/
We will be setting up a Clusters mailing list
some time in the next week or two
We will also be setting up a Clusters User Group
shortly, but that will take some more effort.

Write a Comment

User Comments (0)