Title: Dolly for system management of PC Linux cluster
1Dolly forsystem management ofPC Linux cluster
- A.Manabe (CRC_at_KEK)
- Atsushi.Manabe_at_kek.jp
- http//corvus.kek.jp/manabe
2Motivation
- System (Software) installation update to more
than 10 PC was boring and even hard work for me. - How hard will installation to over 100PCs be?
- If installation is very fast, you can easily
switch the OS from old to new version and its
opposite way as well. It is very convenient for
testing a brand-new version of OS. - If it is, it is also good for system recovery
from HD trouble.
3An idea and Objective
- An way of installation process.
- At first, you install a system to one PC.
- Then clone the disk image to other PCs via
network. - Config. files unique for each node such as
hostname, IP address or so on will be created and
overwrited to each PCs disk. - Target
- Installation to very many PCs (1001000) of
almost same spec.. - Objective
- Very fast installation for example, 100PC
installation in 10min. - Good scalability against the number of nodes.
- Necessary human operation as small as possible.
If there is, do in centralized way.
4Dolly and Dolly
- Dolly
- A Linux application software to copy/clone files
or/anddisk images among many PCs through a
network. - Dolly is originally developed by CoPs project in
ETH (Swiss) and a free software. - Dolly features
- Sequential files (no limitation of over 2GB)
and/or normal files (optinaldecompress and untar
on the fly) transfer/copy via TCP/IP network. - Virtual RING network connection topology.
- Pipeline and multi-threading mechanism for
speed-up. - Fail recovery mechanism for robust operation.
turn up to the next page
5Dolly How do you start it on linux
- Server side (which has the original file)
- dollyS -v -f config_file
- Nodes side
- dollyC -v
-
Config file example
iofiles 3 /dev/hda1 gt /tmp/dev/hda1 /data/file.gz
gtgt /data/file boot.tar.Z gtgt /boot server
n000.kek.jp firstclient n001.kek.jp lastclient
n020.kek.jp client 20 n001 n002
n020 endconfig
of files to Xfer server name of
client nodes clients names end code
The left of gt is input file in the server.
The right is output file in clients. 'gt' means
dolly does not modify the image. 'gtgt' indicate
dolly should cook (decompress , untar ..) the
file according to the name of the file.
6Dolly Virtual Ring Topology
Server host having original image
- Physical network connection is as you like.
- Logically Dolly makes a node ring chain. Its
order is specified by dollys config file. - Though transfer is only between its two adjacent
nodes, it can utilize max. performance ability of
switching network of full duplex ports. - Good for network complex by many switches.
node PC
network hub switch
physical connection
Logical (virtual) connection
7Other possibility of network connection which is
not supported by Dolly
8(few) Server - (Many) Client model
- Server could be a daemon process.(you dont need
to start it by hand) - Performance is not scalable against of nodes.
- Server bottle neck. Network congestion.
Multicasting or Broadcasting
- No server bottle neck.
- Get max performance of network which support
multicasting in switch fablics. - Nodes failure does not affect to all the process
very much, it could be robust. - Since failed node need re-transfer. Speed is
governed by the slowest node as in RING topology. - Not TCP but UDP, so application must take care
of transfer reliability.
9Cascade Topology
- Server bottle neck could be overcome.
- Cannot get maximum network performance but better
than many to only one topology. - Week against a node failure. Failure will spread
in cascade way as well and difficult to recover.
10PIPELINING multi threading
3 thread in parallel
11Fail recovery mechanism
- Only one node failure could be show stopper in
RING (series connection) topology. - Dolly provides automatic short cut mechanism
in node problem. - In a node trouble, the upper stream node detect
it by sending time out. - The upper stream node negotiate with the lower
stream node for reconnection and retransfer of a
file chunk. - RING topology makes its implementation easy.
time out
Short cutting
12Re-transfer in short cutting
BOF
EOF
1 2 3 4 5 6 7 8 9 ..
File chunk 4MB
6
9
8
7
6
network
Server
5
8
7
Node 1
network
5
7
6
Node 2
Next node
13Performance (measured and expected)
- Measured performance (see the next page graph!)
- 1Server - 1Nodes (Pent.III 1GHz x 2cpu)
- ATA 100 IDE disk/ full duplex 100Base-TX network
MB/s - 2GB image copy 8.2MB/s, elapsed time
230sec. - 1Server - 10Nodes (At the moment I have only 11
PCs available for the test.) - All nodes are the same type hardware as above.
- 2GB image copy 720MB/s in aggregate, elapsed
time 260 sec - Thanks to pipelining mechanism, elapsed time does
not increase not so much as the nodes increase.
(See the next page graph!) - 5min for 1 node then 10min for 500 nodes
theoretically. - The measured time is only for image cloning,
Actually you need around more 4 min. for
booting process ( PXEkickstart).
turn up to the next page
14(No Transcript)
15measured
16How does dolly start after pushing reset button.
- You setup kickstart, PXE and DHCP config. file
and run these servers. Prepare one installed PC.
Connect all PC to network. - Push reset button of all nodes.
- Starting PXE process in all nodes.
- 3.1) PXE ask DHCP server for booting process
(local boot, kickstart installation or diskless
client ), IP address and its hostname. - Default booting process you can setup by DHCP
config file. So keyboard operate on each node is
not necessary. - (assume you select kickstart installation)
- 3.2) PXE download small OS kernel with kickstart
images (2MB) by multicast TFTP from PXE server. - 3.3) Running RAM disk root Linux on all nodes.
Red items you have to operate Blue items
process automatically
turn up to the next page
17How does dolly start after pushing reset button.
(2)
- Starting kickstart process
- 4.1) kickstart get IP address and hostname from
DHCP server andkeep it at the local RAM disk. - 4.2) make file system in nodes disks according
to the kickstart config. file. - 4.3) kickstart invokes a post shell script on
nodes. - 4.3.1) running dollyC on nodes
- Start dollyS in the pre-installed machine.
- 5.1) After all nodes are ready,you start dollyS
in the PC which has been installed before this
process. You can know the nodes readiness by
using ping command.
turn up to the next page
18How does dolly start after pushing reset button.
(3)
- Continuing kickstart post shell script
- 4.4) overwrite individual host information (IP
address, hostname, fstab ..) to the cloned
local disk from the RAM disk. - 4.5) Do LILO.
- Re-configure the DHCP servers for nodes to boot
locally. - Push reset button of all nodes again
- PXE (Pre eXecution Environment)
http//developer.intel.com/ial/WfM/wfmspecs.htm - A standard propsed by Intel of Network Card
firmware for network booting. It requires PXE
compliant NIC hardware. - Kickstart (RedHat)
- RedHat batch installer.
19Conclusion
- I have developed a fast installation process
suitable for Linux cluster which consists of
massive number of PCs. - Dolly is a disk image cloning software and is a
part of here proposed installation process. It
only takes almost same minutes for 10PCs disk
image cloning as for 1 PC cloning and high
scalability upto several hundreds PC installation
is expected. - We are using it for 16 PCs cluster and will use
for a cluster of 70 PCs in this September. - Software is available from http//corvus.kek.jp/
manabe/pcf/dolly - Thank you for your reading !