Title: Building a scientific research computing environment
1Building a scientific research computing
environment
- Eric Wu, BBN Technologies
- 10/29/2003
2Building a scientific research computing
environment
- Eric Wu, BBN Technologies
- 10/29/2003
3BBN Techologies
- Consulting firm founded by MIT Professors and a
student in 1948. Leo Beranek (B) receiving the
2002 National Medal Of Science - Located in Cambridge, MA
- Accomplishments
- First ARPAnet
- _at_ symbol in email
- First router
- Analyzed Nixon watergate tapes
- My department
- Speech recognition. Transcription, not
translation. - English, Arabic, Japanese
- 150 node network
- http//www.bbn.com
4What should I buy?
Hardware depends on software to realize full
potential
Software depends on hardware to realize full
potential
5Software
- Test speed of software (benchmark)
- Rules for benchmarking
- First rule of benchmarking
- The only benchmark that matters is your code!!!
- SPEC, Vendor benchmarks are worthless (my
opinion) - Always try to benchmark before buying a new
architecture - Benchmarking resources
- Your friends
- Web
- Supercomputing centers
- Testdrive.hp.com (Alpha, Pentium, Itanium)
- Buy one
6Software - Benchmarking example
- Performance
- VASP Alpha is better than Xeon
- CP90 - Alpha and Xeon are same
- Alpha costs 4-5x as much
7Hardware
- Hardware features
- Memory speed
- Interconnects (Front Side Bus)
- Clock speed
- 32 bit vs. 64 bit
- Cache
- Processor architecture
- Understanding hardware can help to understand or
predict speed
8Hardware
Processor
Processor
9Memory and Front Side Bus
- Dont ignore memory and interconnects(FSB)!
- Memory and Front Side Bus (FSB) speed make a
difference in performance - Be careful when vendors are upgrading
- FSB for Xeons lag behind Pentium4
- FSB effects on a dual-processor machine
- 1 job ( 1 free processor) takes 1 hour
- 2 jobs (no free processors) each take 1.25 hours
- Bandwidth limitations!
10Processor clock speed
- Defined as rate the processor runs (cycles per
second) - Useful only when comparing within an architecture
(Pentium to Pentium) - Useless when comparing across architectures
- For VASP, Alpha 1.25 GHz is 2x as fast as Xeon
2.8 GHz - For VASP, Itanium 900 MHz is 1.6x as fast as Xeon
2.8 GHz - Many other factors matter
- Example Instructions per clock cycle also matter
(IPC) - Pentium 4 2
- Itanium Madison 6
11Processor 32 bit vs. 64 bit
- Definitions
- 32 bit can store range of 232 integers
- 64 bit can store range of 264 integers
- Does not mean 64 bit is automatically faster or
better! - Advantages of 64 bit
- High memory applications
- Each number points to an address space in memory
- 232 4x109, or 4G
- 264 4x109, or 18 billion G
- 32 bit can access gt 4G with OS tricks, but slow
- Applications with large range of numbers
- Scientific computing
- Cryptography
- 32 bit can access 264 with compiler tricks, but
slow
12Processor Cache
Processor
Fast
Processor
Cache
Slow
Slow
13Processor Cache
- Cache
- Bypass slow interconnect and memory
- Reduce access time to information
- Reduce bandwidth requirements to memory
- L2 vs L3
- Lower Ln means closer to processor, more
potential for improvement - Effects
- Faster code
- Superlinear speedup in parallel code
- Examples
- Xeon 3.06 GHz 512k L2, 1MB L3
- Opteron 1MB L2
- Itanium Madison 6MB L3
- Alpha 16 MB L2
Processor
L2 Cache
L3 Cache
Memory
14Hardware
- Many processor features can influence speed
- Effect on speed will depend on software
There is no substitute for benchmarking
15Purchasing Strategies
- Dont forget to ask your friends
- How much did they pay?
- Which vendors?
- How reliable?
- Picking vendors
- Know your group
- How many students?
- How many machines?
- Know the differences between vendors
- Vendor A vs. Vendor B
- Hardware Repair on site vs. send it back
- Memory Next day air replacement vs. send it back
- Diagnosing problems Motherboard lights vs. send
it back - Rack Rails Snap in vs. Screw in
- Problem rate 2/16 machines (9) vs. 5/24
machines (21) - Machine cooling 5 fans vs. 2 fans
- Cost Vendor A is 550 more per node, 1/6th more!
16Purchasing Strategies
- Beware new hardware
- 3 points of failure hardware, compiler, software
- Case study 1 Pentium 2 Xeons (1998) (donation)
- Operating system?
- Windows was slow
- Linux was buggy
- Compilers were new, no standards
- Software (VASP) did not have Pentium support
- Case study 2 Itanium I 600 (2000) on
testdrive.hp.com - Processors were slower than expected
- Intel compiler operated differently on Itanium
and Xeon - Math libraries had bugs (MKL)
- Software (VASP) did not have Itanium support
- Sometimes, its better to let somebody else be
the guinea pig
17Purchasing Strategies - Examples
- Buying Xeons
- Quotation from Vendor A 4500.
- Quotation from Vendor B 3000!
- Go back to Vendor A, Vendor A lowers price to
3000 - This is extreme, but you should price shop.
- SW Technologies http//www.swt.com gives prices
of cheap Xeons. - Ask your friends what they paid.
- Buying Alphas
- Quotation from Vendor A 12,500
- Threaten to buy all Xeons!
- New quotation from Vendor A 11,000
18Parallel computing
- Moores law is slowing down
Source http//www.nersc.gov/simon/cs267/
19Parallel computing
- Even with Moores law, at best we can only double
system size every two years (with N scaling) - Parallel computing
- Advancements in hardware
- SMP machines
- More processors/machine
- Networking of Intel-type machines
- Myrinet
- Gigabit is cheaper
- Advancements in software
- MPICH and LAM are more robust
- Your favorite code is probably parallel now
- Cost
- Usually cheaper (can be 50). Some costs
(cooling, power) usually covered by school or lab
20Parallel computing Hardware
- Networking Hardware
- Fast Ethernet (100 Mbits/s)
- Gigabit (1000 Mbits/s)
- Myrinet
- Quadrics, Infiniband, etc
- Definition of terms
- Latency Time to decide where to send packet.
- Low latency is good for many small packets
- Bandwidth
- How fast does it transmit?
- Maximum switching capacity
- Maximum volume it can handle (relevant for
gigabit)
21Parallel computing Hardware
- Buy a vendor architecture
- 8-16 processors on each machine
- Examples HP GS160, HP GS320, IBM Power 4
- Advantages
- Less sysadmin
- More reliabile
- Easier in every way
- Division of machine into OS partitions (more for
businesses) - Disadvantages
- Cost - 500,000 vs. 50,000-150,000
- Can pay for sysadmins instead
22Parallel computing Hardware
- Gigabit
- Pricing
- Cards are often free (standard)
- Switches are moderately expensive, and falling
- Few ports cheap. Pricing does not scale well
to gt60 ports. - Latency
- Moderate. Depends on switch and packet size
- Be careful of switching capacity!! Make sure to
buy a switch that is made for high performance
computing, not routing. - Brands
- Foundry
- Extreme
- Cisco
23Parallel computing Hardware
- Myrinet
- Pricing
- Total 1100 a port (http//www.myri.com)
- Linear scaling up to 128 ports.
- Latency
- Lowest latency
- Needs setup of drivers (not too bad, but)
- Easy to expand
- Best performance for large number of processors
(at highest price)
24Parallel computing Hardware
- Remember the first rule of benchmarking. Example
- PWSCF or ABINIT, parallelize over k-points
- Little communication
- Drawback need a lot of memory, need
kpointsgtprocessors - No need for either gigabit or Myrinet
- VASP parallelize over plane waves
- A lot of communication
- Reduce memory usage
- Gigabit or Myrinet is essential
- Know your code and how you will use it!
25Parallel computing Software
- PVM Parallel Virtual Machine
- MPI Message Passing Interface
- LAM http//www.lam-mpi.org
- Designed for TCP/IP (clusters)
- Performance (?)
- MPICH http//www-unix.mcs.anl.gov/mpi/mpich/
- Stack architecture flexibility. Not just
TCP/IP - More popular
- Slightly easier to use
- Both MPICH and LAM can coexist. Pick the one
you like.
26Compilers
- Often overlooked
- Compilers can increase speed 10-100
- Compilers are cost-effective
- Compiler may cost 500
- Cost to increase speed 10-100 can be
200-2000/machine! - Disadvantages
- Each compiler is different alter code for each
compiler - Students hate compiling codes
27Compilers gcc (2.95.3, 3.3)
- Available at http//gcc.gnu.org
- Advantages
- Free
- Portable
- Wide base of users
- Newer versions produce fast code
- Disadvantages
- Poor Fortran support
28Compilers Intel
- Available at
- http//www.intel.com/software/products/compilers/f
lin/noncom.htm - http//www.intel.com/software/products/compilers/c
lin/noncom.htm - Advantages
- Free (academia)
- Wide base of users (more so for Fortran)
- FAST code on Intel chips. Reported fast code for
AMD chips - Disadvantages
- Harder to use (my opinion)
- Character of different versions
- No Red Hat 9 support
29Compilers Portland, and others
- Pricing info at http//www.pgroup.com/pricing/ae.h
tm - Advantages
- Works for all platforms
- Robust
- Disadvantages
- Some cost
- Not as fast
- Other compilers
- NAG
- Fujitsu
- Absoft
30Math Libraries
- BLAS/LAPACK
- Intel MKL - http//www.intel.com/software/products
/mkl/ - ATLAS - http//math-atlas.sourceforge.net
- K. Gotos BLAS - http//www.cs.utexas.edu/users/fl
ame/goto/ - FFTW (http//www.fftw.org)
- Vendor only
- HP/Compaq cxml
- IBM essl
- SGI scsl
31Disk Storage
- Should be done on a RAID (Redundant Array of
Inexpensive Disks) - RAID configuration provides fault tolerance
- Different types of RAID
- RAID 1 (mirroring) - 2 disks (two 100 G disks
100G of data) - RAID 5 3 disks (three 100G disks 200G data,
four 100G disks 300G data, etc) - Implemented within software or hardware
- Disk type SCSI or IDE
- May take one day to set up. Can save your
hide!!!
32What type of RAID should I use?
- Software or Hardware?
- Software RAID is free
- Hardware RAID has better performance (especially
with more clients), but costs . Usually can
buy a PCI card and some cables. - SCSI or IDE disks?
- IDE is cheap
- SCSI is , but better performance. Most
believe better quality. - SATA disks are another alternative.
- Costs
- Hardware/SCSI can cost 3x more
- Dont forget cost of computer to house disks
- My recommendation
- Hardware/SCSI
- Graduate students hate to do sysadmin tasks.
- Graduate students tend to be lax with sysadmin
tasks - Force your students to delete old files/use gzip
- Hardware/IDE If you need 100s of G of storage
33Backups
- Youre only as good as your last backup
- Ancient computing proverb
- MIT-TSM backups http//web.mit.edu/is/help/tsm/qui
ckstart.html - 7.50 a month
- Unlimited storage (rsync) limited only by
restore speed - With scripts, can backup every day
- Disk mirroring with rsync
- Buy a few cheap IDE disks
- Use an old machine
- Tape backups
- Youre only as good as your last restore
- Modern computing proverb
34Further reading
- 32 vs. 64 bit
- Good article http//www.arstechnica.com/cpu/03q1/
x86-64/x86-64-1.html - Courses on supercomputers (recommended)
- Berkeley http//www.nersc.gov/simon/cs267/
- Buffalo http//www.ccr.buffalo.edu/content/educat
ion.htmcourses - Building a Beowulf
- Ron Choy_at_mit http//www.mit.edu/people/cly/beowulf
.ppt - ROCKS, automatic install of Beowulf cluster
http//www.x2ca.com/articles/ICCS2003.pdf - Parallel computing/supercomputing links
- Parascope http//www.computer.org/parascope/
- Nans page http//www.cs.rit.edu/ncs/parallel.htm
l - Top 500 http//www.top500.org/
35Conclusions
- Hardware understanding can help you make an
intelligent decision - Nothing beats a benchmark of your code
- Dont forget the compiler and math libraries
- Consider your parallel computing options
- Be sure to implement fault-tolerant systems (RAID
and backups)