Title: Building Beowulfs for High Performance Computing
1Building Beowulfs for High Performance Computing
- Duncan Grove
- Department of Computer Science
- University of Adelaide
2Three Computational Paradigms
- Data Parallel
- Regular grid based problems
- Parallelising compilers, eg HPF
- Eg physicists running lattice gauge calculations
- Message Passing
- Unstructured parallel problems.
- MPI, PVM
- Eg chemists running molecular dynamics
simulations. - Task Farming
- High throughput computing - batch jobs
- Queuing systems
- Eg chemists running gaussian.
3Anatomy of a Beowulf
- Cluster of networked PCs
- Intel PentiumII or Compaq Alpha
- Switched 100Mbit/s Ethernet or myrinet
- Linux
- Parallel and batch software support
4Why build Beowulfs?
- Science/
- Some problems take lots of processing
- Many supercomputers are used as batch processing
engines - Traditional supercomputers wasteful high
throughput computing - Beowulfs
- useful computational cycles at the lowest
possible price. - Suited to high throughput computing
- Effective at an increasingly large set of
parallel problems
5A Brief Cluster History
- Caltech Prehistory
- Berkeley NOW
- NASA Beowulf
- Stone SouperComputer
- USQ Topcat
- UIUC NT Supercluster
- LANL Avalon
- SNL Cplant
- AU Perseus?
6Beowulf Wishlist
- Single System Image (SSI)
- Unified process space
- Distributed shared memory
- Distributed file system
- Performance easily extensible
- Just add more bits
- Is fault tolerant
- Is simple to administer and use
7Current Sophistication?
- Shrinkwrapped solutions or do-it-yourself
- Not much more than a nicely installed network of
PCs - A few kernel hacks to improve performance
- No magical software for making the cluster
transparent to the user - Queuing software and parallel programming
software can create the appearance of a more
unified machine
8Stone SouperComputer
9Iofor
- Learning platform
- Program development
- Simple benchmarking
- Simple performance evaluation of real applcaions
- Teaching machine
- Money lever
10iMacwulf
- Student lab by day, Beowulf by night?
- MacOS with Appleseed
- LinuxPPC 4.0, soon LinuxPPC 5.0
- MacOS/X
11Gigaflop harlotry
- Machine Cost Processors Peak Speed
- Cray T3E 10s million 1084 1300Gflop/s
- SGI Origin 2000 10s million 128 128Gflop/s
- IBM SP2 10s million 512 400Gflop/s
- Sun HPC 1s million 64 50Gflop/s
- TMC CM5 5 Million (1992) 128 20Gflop/s
- SGI PowerChallenge 1 Million (1995) 20 20Gflop/s
- Beowulf cluster myrinet 1 Million 256 120Gflop
/s - Beowulf cluster 300K 256 120Gflop/s
12The obvious, but important
- In the past
- Commomdity processors way behind supercomputer
processors - Commodity networks way, way, way behind
supercomputer networks - In the now
- Commomdity processors only just behind
supercomputer processors - Commmodity networks still way, way behind
supercomputer networks - More exotic networks still way behind
supercomputer networks - In the future
- Commodity processors will be supercomputer
processors - Will the commodity networks catch up?
13Hardware possibilities
14OS possibilities
15Network technologies topologies
- So many choices! -gt interfaces, cables, switches,
hubs, routers - ATM, ethernet, fast ethernet, gigabit ethernet,
firewire, HiPPI, serial HiPPI, Myrinet, SCI - latency, bandwidth, availability, price! VIA?
- Issues price, performance, price/performance
(network) , price/performance (entire system).
16Disk subsystems?
- 1) I/O a problem in parallel systems
- 2) Data Server itself an interesting idea
- Beowulf Bulk Data Server
- cf with slow, expensive tape silos...
- Eg our chem beowulf will have 0.7TB
- Could easily put 50GB of cheap disk per node
- gt 1 TB on-line storage with 20 nodes...
- RAID. Software or hardware?
- Distributed/parallel file systems? NOW
- Home dirs not on compute nodes is a perf. hit
- cache NFS? CODA (open source AFS replacement)
17Advantages of Open Source
- Linux immature eg lacking caching file system.
Good hpc tools. - Recent announcments
- SGI has release xfs open source.
- Sun has released its hpc solutions open source.
- Linux can make use of all of these!!! Tried and
true good hpc code comes to free open source
linux for cheap machines.
18Perseus
- Machine for chemistry simulations
- Mainly high throughput computing
- In excess of 300K
- 128 nodes. For lt 2K per node
- Dual processor PII450
- At least 256MB RAM
- Some nodes up to 1GB
- 6GB local disk each
- 5x24 (2x4) port Intel 100Mbit/s switches
19Perseus Initial Phase
- Prototype
- 16 dual processor PII
- 100Mbit/s switched Ethernet
- For sale!
20Software on perseus
- Software to support the three computational
paradigms - Data Parallel
- Portland Group HPF
- Message Passing
- MPICH, LAM/MPI, PVM
- High throughput computing
- Condor, GNU Queue
- Gaussian94, Gaussian98
21Expected performance
- Loki, 1996
- 16 Pentium Pro processors, 10Mbit/s Ethernet
- 3.2 Gflop/s peak, achieved 1.2 real Gflop/s on
Linpack benchmark - Perseus, 1999
- 256 PentiumII processors, 100Mbit/s Ethernet
- 115 Gflop/s peak
- 40 Gflop/s on Linpack benchmark?
- Compare with top 500!
- Would get us to about 200 currently
- Other Australian machines?
- NEC SX/4 _at_ BOM at 102
- Sun HPC at 181, 182, 255
- Fujitsi VPP _at_ ANU at 400
22Preliminary performance results
23Reliability in a large systems
- Build it right! Racks and bolts and cable ties.
- Heat going to be a problem?
- Daemon to monitor cluster
- normal stuff
- cpu, network, memory, disk utilisation and
performance - switch performance (SNMP)
- More exotic stuff
- case and cpu fan speeds
- motherboard and cpu temperatures
- More advanced tools? Web interfaces?...
- Kickstart installations
- Monitoring software
- Packaging
- Node job control
- Parallel interactive shell?!
24Beodoh!
- Load balancing
- Effects of machines capabilities
- Desktop machines vs. dedicated machines
- Resource allocation
- Scalability - switch fabric limited?
- Task migration, I/O, fault tol., security!
- Breakin on iofor!
- upgrading still problematic, eg latest upgrade.
Probably only do this every couple of years - Maintenance requirements, heterogeneity problems,
ownership hurdles
25Beowhere-now?
- This is mainly an integration problem. We hope to
be able to make contributions to... - System packaging
- Distributed shared memory
- System documentation
- System monitoring control tools (web)
- Fault-tolerancing
- Load-balancing
- Performance models
- Traffic monitoring
- Versioning!!!!
- Write a comprehesive in detail beowulf howto-
everyone elses is BAD. - Build perseus2
- Real benchmarks actual applications, production
machine. Etc - which ones are integration, which ones research?
26Summary Slide
- Beowulf compute for
- current system is for chemists - mainly for high
throughput computing. Slow networks, queue
managed batch jobs - they can do parallel in a box with smp
- The future? Fast networks for their highly
parallel probelm
27Top 500
- The top two machines, ASCI Red and ASCI Blue are
custom built. 2.1TF and 1.6TF Respectively. - 3 (T3E) 891Gflop, 10 510 Gflop, 50 150 Gflop
100 62Gflop 200 40 Gflop, 500 25 Gflop (up
from 19 Gflop) - SGI 182/500, 7/10, 2
- IBM 118/500, 1/10, 8
- Sun 95/100, 0/10, 54
- H/P 39/100, 0/10, 150
- Fujitsu 23/500, 0/10, 26
- NEC 18/500, 0/10, 29
- Hitachi 12/500, 1/10, 4
- Compaq 5/500, 0/10, 49
- Intel 4/500, 1/10, 1
- Self-made 3/500, 0/10
- Cplant 129 54 Gflop/s
- Avalon 160 48 Gflop/s
- Parnass2 362 29 Gflop/s
- Others 1/500, 0/10