Title: Cluster Management the IBM Way
1Cluster Managementthe IBM Way
- Background
- ScotGRID project
- IBM Management Solutions
- Technologies
- Boot, Installation, Server, IBM x330 server, KVM,
Console Server, Raid,Vlan - Procedures
- putting the hardware together
- installing RedHat
2ScotGRID
- http//www.scotgrid.ac.uk/
- JREI funded
- ScotGRID is a 800k prototype two-site Tier 2
centre in Scotland for the analysis of data
primarily from the ATLAS and LHCb experiments
at the Large Hadron Collider and from other
experiments. The centre currently consists of a
118 CPU Monte Carlo production facility run by
the Glasgow PPE group and a 5TB datastore and
associated high-performance server run by
Edinburgh Parallel Computing Centre.
3ScotGRID-Glasgow
4IBM Management Suites
Xcat is set of scripts tables and goodies written
to help install a Beowulf Cluster the IBM
Wayhttp//publib-b.boulder.ibm.com/Redbooks.nsf/R
edbookAbstracts/sg246623.html?Open
- CSM is alternative suite of software from IBM to
install and manage a cluster of PC servers. It
seems to have its origin in the RS2000/AIX world - http//publib-b.boulder.ibm.com/Redbooks.nsf/Redbo
okAbstracts/sg246601.html?Open
IBM seem to have other management software with
titles including words like Tivoli, Director.
5Booting - PXE
- http//support.intel.com/support/network/adapter/p
ro100/bootagent/index.htm - http//syslinux.hackerdojo.com/pxe.php
- ftp//download.intel.com//labs/manage/wfm/download
/pxespec.pdf
6PXE
- PXE is Intels Pre-eXecution Environment and
includes a dhcp/tftp loader in PROM on an
Ethernet Card. It serves much the same role as
MOP on Vaxstations and Decstations doing diskless
booting. - The Xcat solution uses PXE to load PXELINUX - a
network variant of SYSLINUX as used on RedHat CDs
7PXE
- PXELINUX can load a kernel either over the
network or from the local disk according to a
tftp-ed configuration file. - IBM suggest a boot order - floppy,CD,net,disk
- In Xcat, network loading is mainly used to
initiate kickstart installation and the local
disk load is used for normal operation. - (Re)installation is just a matter of changing the
configuration file.
8Automated Linux installation
- kickstart
- http//www.lcfg.org/
- http//www.sisuite.org/
- http//www.openclustergroup.org/
9kickstart
- RedHats scheme to automate installation
- installation parameters in file as
- file on bootnet floppy
- nfs accessible file named with ip number
- http ..
- possibility of scripts to customise installation
after the RedHat code has finished - Xcat calculates kickstart files for compute, head
and storage nodes from local configuration tables
and locally modified templates.
10Servers
- Redundancy - fileservers have dual power
supplies, RAID - error checking - ECC
- reduce chances of undetected data corruption
- CDF had an incident recently that proved to be
undetected corruption of data in one system - mountable in 19 inch rack
- Blades (like PCs as CAMAC Modules) are now
supported by xcat - http//www.pc.ibm.com/uk/xseries/bladecenter.html
11IBM x330 Server
X330 server is 1U high ( 1U 1¾ inch and a full
rack is 42U )
12(Remote) Support Processors
- Each system has a PowerPC support Processor to
power on/off main processor, do monitoring,... - Support processors are daisy chained with RS485
to a card with serial and Ethernet connections - Remote access via Ethernet and serial with snmp,
telnet, http and proprietary protocols
13KVM Switch
- in a box - ScotGRID has an 8 port KVM switch from
Apex (www.apex.com) - integrated C2T - the IBM x330 compute nodes do
not have conventional Keyboard,Video and Mouse
connections - they have part of a KVM switch
onboard and a pair of connectors to daisy-chain a
racks worth of servers into a distributed KVM
switch
14KVM Switch box
- On screen control by pressing PrintScreen
- gt1 port for real K, V and M
- Cascadeable in organised manner
15KVM Switch integrated
An adapter cable connects keyboard, video and
mouse to first system and short interconnecting
cables daisy-chains the rest of the racks worth
of servers into a distributed KVM
switch. Switching is effected by keyboard
shortcuts or a button on the front of the server
16Console Servers
- conserver package http//www.conserver.com
- Terminal Servers in reverse
- IBM Supplied 4 Equinox ELS16
- accept incoming telnet
17Serial Line access
- Terminal Server lines connect to COM1
- talks to support processor when server unbooted
- is /dev/ttyS0 when linux is running
- Conserver establishes telnet to all COM1 ports
and multiplexes access to multiple linux clients
- guess this is important to avoid finding access
blocked.
18Raid
- remote management of the IBM ServeRAID
controllers using RaidMan software - configuration
- monitoring
- the remote agent component seems to provoke
crashes of kernel 2.4.9-34
19Switches
- IBM supplied 3 x 48 port Cisco Catalyst Switches
and 1 x 8 port Gigabit switch to interconnect the
file servers, other switches and the Campus
Backbone - The switches divide the ports between an Internet
accessible VLAN and a private 10.0.0.0 VLAN for
the compute nodes and management units - The management interface is via the serial port,
telnet and http
20Cisco Catalyst 3500XL Switch
- Sockets on right take GBICs
- ScotGRID-Glasgow has 3 of 48 port switches and a
single unit with 8 GBIC sockets - Separate VLANs for private 10.0.0.0 and Internet.
( GBIC GigaBit Interface Converter? Cisco
Gigabit Transceiver to various fibres and
1000baseT)
21ScotGRID-Glasgow
- Masternode and Fileserver nodes on left
- Compute Nodes in middle and right with headnodes
low down - Daisy chained KVM and Support Proc network
- Cascaded starwired Ethernet and serial lines
22Putting it together
Masternode
Storage Nodes
Head Nodes
Campus Backbone
Internet VLAN
10.0.0.0 VLAN
100 Mbps
1000 Mbps
Compute Nodes
23Compute Nodes
Console in
Console out
eth0
Keyboard/Video/Mouse
ethernet
2 Pentium III
Support Processor
COM1
eth1
Remote Support Adapter
Serial
RS485 in
RS485 out
24Software
- Configure VLANs
- Install RedHat and xcat manually on masternode
- Tabulate topology
- Update BIOS
- Configure Terminal Servers
25VLANs
- Cisco boxes use a serial port to start
configuration - Use CISCO flavoured RJ45 lt-gt 9 Pin D type adapter
- cu - l /dev/ttyS0 -s 9600
- set ip address enable telnet,...
- ongoing management via telnet/tftp/http
26RedHat
- Install everything off CD errata
- Fail to spot unconfigured mailman rpm -
eventually exhausted inodes with 95,671 empty
error logs - /etc/syslog.conf
- 1/10/02 Trick learned from IBM - Ctl Alt F12
displays /dev/tty12.infomail.nonenews.noneaut
hpriv.none
/dev/tty12
27Tables
- Cisco3500.tabstorage1 cisco1,1storage2
cisco1,2storage3 cisco1,3node01
cisco2,1node02 cisco2,2 - Tables describe the connections, ip networking
parameters, selectable options - file of Global Settings for ScotGRID cluster
21/5/02 DJM -
- r-series commands pathnames - actually ssh
equivalents - rsh /usr/bin/ssh
- rcp /usr/bin/scp
-
- Absolute location of the SSH Global Known Hosts
file that contains the ssh pub - lic keys for all the nodes
- gkhfile
/usr/local/xcat/etc/gkh -
- Directory used by the tftp daemon to retrieve
2nd stage boot loaders - .
28BIOS
- Boot floppy and use CD in every Box
29Terminal Servers
- Use serial port to start configuration
- Use Equinox flavoured RJ45 lt-gt 9 Pin D type
adapter - cu -l /dev/ttyS0 -s 9600
- Setup script over /dev/ttyS0
- Ongoing management via telnet
30Harvest MAC Addresses
- cause compute nodes to remotely boot kernel and
application to send packets out ethernet ports
and emit MAC addresses on COM1 - collect MAC Addresses from serial lines or via
Cisco management interface - using topology tables and MAC addresses to
calculate dhcpd.conf
31Configure Remote Support Adapters
- The RSA Ethernet lt-gt RS485 boxes can live in
their own box or in a spare PCI slot - as in
ScotGRID-Glasgow - boot utility in hosting server and set ip address
- xcat script uses tables to locate and configure
the RSA over ethernet
32Compute Node Installation
- calculate kickstart file
- reboot compute node
- press button
- xcat command rpower noderange boot
- last stage of kickstart post-installation script
resets booting to local disk
33Batch
- OpenPBShttp//www.openpbs.org
- MAUI schedulerhttp//www.supercluster.org
- plugin scheduler for PBS/WIKI/Loadleveler/Suns
Grid Engine - able to organise parallel jobs using gt1 cpu,
perhaps slight overkill for us - lots of control - per user, per group, per job
length, target fair share, time waiting,...
34Ongoing Management
- Snmp
- mainly from support processors via syslog to
warning emails - psh noderange command
- parallel ssh
- party trick psh compute eject
- xcat rcommands
35rvitals
root_at_masternode martin rvitals node08
all node08 CPU 1 Temperature 21.0 C (69.8
F)node08 CPU 2 Temperature 21.0 C (69.8
F)node08 hard shutdown 85.0 C (185.0
F)node08 soft shutdown 80.0 C (176.0
F)node08 warning 75.0 C (167.0 F)node08
warning reset 63.0 C (145.4 F)node08 DASD 1
Temperature not available.node08 Ambient
Temperature 16.0 C (60.8 F)node08 System
Board 5V 5.06node08 System Board 3V
3.29node08 System Board 12V 11.85node08
System Board 2.5V 2.63node08 VRM1
1.78node08 VRM2 1.78node08 Fan 1
70node08 Fan 2 72node08 Fan 3 78node08
Fan 4 78node08 Fan 5 76node08 Fan 6
75node08 Power is on.node08 System uptime
3556node08 The number of system restarts
139node08 System State Currently booting the
OS, or no transition was reported.