Rocks

1 / 45
About This Presentation
Title:

Rocks

Description:

Target audience: Scientists who want a capable computational resource in their own lab ... When you want to customize, the structure is there for you. ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 46
Provided by: gregb61

less

Transcript and Presenter's Notes

Title: Rocks


1
Rocks
  • University of Michigan
  • MGRID April 2005
  • Federico D. Sacerdoti
  • SDSC Rocks Cluster Group

2
Primary Goal
  • Make clusters easy
  • Target audience Scientists who want a capable
    computational resource in their own lab

A (mad) scientist in need of computing power
3
The Way to manage software
  • Not fun to care and feed for a system
  • Codify all configuration
  • Test every software component a priori
  • Code the configuration of services like you code
    applications
  • Test test test
  • Takes longer at the outset
  • But we can repeat config with 100 accuracy every
    time
  • Rocks will win the INSTALLATION OLYMPICS
  • Time to bring a node from bare bones to fully
    functional, with an arbitrary number of services
    and components
  • All compute nodes are automatically installed
  • Critical for scaling in clusters

4
Complexity Hiding
  • Too hard to repeat OS, service configuration for
    N nodes, where N is large. Automate.
  • Rocks is not the first one to do this
  • OSCAR / SystemImager (Some assembly required)
  • Radmin
  • Rocks is unique in its Complexity Hiding
    philosophy
  • All configuration is pre-tested and hidden
  • End user needs computing capabilities, not
    exposure to pipes, wires, and fittings of cluster
    services.
  • Mature industries are all similar
  • Automotive
  • Electrical
  • Civil Engineering (Structures)
  • No such thing as a wall socket administrator
    (Katz 2004)

5
Complexity Hiding
  • Cluster System Administrators still valuable
  • When you want to customize, the structure is
    there for you. Can always raise the hood and
    tinker.
  • Rocks uses unmodified RPMs for software bits and
    bytes easy to find, add, and use.
  • Integration and testing of new software is the
    hard part for a SysAdmin. Always has been.
  • We all have to learn how to configure a new
    service (condor, sge, globus, x509, myproxy), but
    LEARN IT ONCE.
  • Every command typed to configure a service is
    codified
  • Takes forever, testing is time-consuming, but
    rewards are immeasurable

6
Codify All Configuration
  • How do you configure NTP on Rocks compute nodes?

ltpostgt lt!-- Configure NTP to use an external
server --gt ltfile name"/etc/ntp.conf"gt server
ltvar name"Kickstart_PrivateNTPHost"/gt authenticat
e no driftfile /var/lib/ntp/drift lt/filegt lt!--
Force the clock to be set to the server upon
reboot --gt /bin/mkdir -p /etc/ntp ltfile
name"/etc/ntp/step-tickers"gt ltvar
name"Kickstart_PrivateNTPHost"/gt lt/filegt lt!--
Force the clock to be set to the server right now
--gt /usr/sbin/ntpdate ltvar name"Kickstart_Privat
eNTPHost"/gt /sbin/hwclock --systohc lt/postgt
ntp-client.xml
7
More Philosophy
  • Use Installation as common mechanism to manage a
    cluster
  • Rocks formats and installs a / partition
  • On initial install (from bare metal)
  • When replacing a dead node
  • Adding new nodes
  • Rocks also uses installation to keep software
    consistent
  • If you catch yourself wondering if a nodes
    software is up-to-date, reinstall!
  • In 10 minutes, all doubt is erased
  • Rocks doesnt attempt to incrementally update
    software

8
Rocks uses hard disks
  • Rocks employs disks on nodes
  • As a performance optimization (compare to NFS
    mounting /)
  • Why not? Disks are free, and reliable
  • MTBF of disk is now higher than chassis (1mill
    hrs, vs 45k for chassis)
  • No significant discount from buying nodes without
    disk
  • Less flexible?
  • Use NFS overlay mounts wherever you want
  • Less secure?
  • Use an encrypted filesystem if you need red/black
    modes.

9
Architecture
  • Rocks Clusters

10
Philosophy
  • Run on heterogeneous standard high volume
    components
  • Use the components that offer the best
    price/performance!
  • Given the track record of General Purpose
    Processors, any other strategy is risky.
  • No stopping at the thermal frequency wall dual
    core, quad core.
  • Requires more intelligent installer if your
    hardware is not identical.
  • Redhat fundamentally has this problem (set of
    worldwide linux users is maximally heterogeneous)
  • Their ability to discover configure hardware is
    top-notch, why not leverage their work!

11
Rocks Hardware Architecture
12
Minimum Components
Local Hard Drive
Power
Ethernet
OS on all nodes (not SSI)
i386 (Pentium/Athlon), x86_64 (Opteron/EM64T), ia6
4 (Itanium) server
13
Minimum Hardware Requirements
  • Frontend
  • 2 ethernet connections
  • 18 GB disk drive
  • 512 MB memory
  • Compute
  • 1 ethernet connection
  • 18 GB disk drive
  • 512 MB memory
  • Power
  • Ethernet

14
Optional Components
  • High-performance network
  • Myrinet
  • Infiniband (Infinicon or Voltaire)
  • Network-addressable power distribution unit
  • keyboard/video/mouse network not required
  • Non-commodity
  • How do you manage your management network?

15
Storage
  • NFS
  • The frontend exports all home directories
  • Parallel Virtual File System version 1
  • System nodes can be targeted as Compute PVFS or
    strictly PVFS nodes
  • Lustre Roll is in development

16
Standard Rocks Storage
  • Exported to compute nodes via NFS

17
Network Attached Storage
  • A NAS box is an embedded NFS appliance

18
Parallel Virtual File System
19
Cluster Software Stack
20
Rocks Rolls
  • Rolls are containers for software packages and
    the configuration scripts for the packages
  • Rolls dissect a monolithic distribution

21
Rolls
  • Think of a roll as a package for a car

22
Rolls User-Customizable Frontends
  • Rolls are added by the Red Hat installer
  • Software within a roll is added and configured at
    initial installation time

23
Red Hat Installer Modified to Accept Rolls
24
Approach
  • Install a frontend
  • Insert Rocks Base CD
  • Insert Roll CDs (optional components)
  • Answer 7 screens of configuration data
  • Drink coffee (takes about 30 minutes to install)
  • Install compute nodes
  • Login to frontend
  • Execute insert-ethers
  • Boot compute node with Rocks Base CD (or PXE)
  • Insert-ethers discovers nodes
  • Goto step 3
  • Add user accounts
  • Start computing
  • Optional Rolls
  • Condor
  • Grid (based on NMI R4)
  • Intel (compilers)
  • Java
  • SCE (developed in Thailand)
  • Sun Grid Engine
  • PBS (developed in Norway)
  • Area51 (security monitoring tools)

25
Login to Frontend
  • Create ssh public/private key
  • Ask for passphrase
  • These keys are used to securely login into
    compute nodes without having to enter a password
    each time you login to a compute node
  • Execute insert-ethers
  • This utility listens for new compute nodes

26
Insert-ethers
  • Used to integrate appliances into the cluster
  • A DHCP listener - Registers new nodes

27
Boot a Compute Node in Installation Mode
  • Instruct the node to network boot
  • Network boot forces the compute node to run the
    PXE protocol (Pre-eXecution Environment)
  • Also can use the Rocks Base CD
  • If no CD and no PXE-enabled NIC, can use a boot
    floppy built from Etherboot (http//www.rom-o-ma
    tic.net)

28
Insert-ethers Discovers the Node
29
Insert-ethers Status
30
eKVEthernet Keyboard and Video
  • Monitor your compute node installation over the
    ethernet network
  • No KVM required!
  • During compute node installation execute on
    frontend ssh -p2200 compute-0-0

31
eKV View Console Install via SSH
32
Node Info Stored In A MySQL Database
  • If you know SQL, you can execute powerful
    commands
  • Rocks-supplied command line utilities are tied
    into the database
  • E.g., get the hostname for the bottom 8 nodes of
    each cabinet

cluster-fork --query"select name from nodes
where ranklt9" hostaname
33
Cluster Database Backbone
34
Kickstart
  • Red Hats Kickstart
  • Monolithic flat ASCII file
  • No macro language
  • Requires forking based on site information and
    node type.
  • Rocks XML Kickstart
  • Decompose a kickstart file into nodes and a graph
  • Graph specifies OO framework
  • Each node specifies a service and its
    configuration
  • Macros and SQL for site configuration
  • Compile flat kickstart file from a web cgi
    script

35
Kickstart Compile from Graph
Sent to node (http)
Compile (kgen)
36
Sample Node File
ltkickstartgt ltdescriptiongt Enable
SSH lt/descriptiongt ltpackagegtopenssh/packagegt
ltpackagegtopenssh-clientslt/packagegt ltpackagegtopen
ssh-serverlt/packagegt ltpackagegtopenssh-askpasslt/pa
ckagegt ltpostgt ltfile name"/etc/ssh/ssh_config"gt H
ost CheckHostIP no
ForwardX11 yes ForwardAgent
yes StrictHostKeyChecking
no UsePrivilegedPort no
FallBackToRsh no Protocol
1,2 lt/filegt chmod orx /root mkdir
/root/.ssh chmod orx /root/.ssh lt/postgt lt/kickst
artgt
37
Sample Graph File
lt?xml version"1.0" standalone"no"?gt ltgraphgt ltd
escriptiongt Default Graph for Rocks. lt/descripti
ongt ltedge from"base" to"scripting"/gt ltedge
from"base" to"ssh"/gt ltedge from"base"
to"ssl"/gt ltedge from"base" to"grub"
arch"i386,x86_64"/gt ltedge from"base"
to"elilo" arch"ia64"/gt ltedge from"node"
to"base"/gt ltedge from"node" to"accounting"/gt
ltedge from"slave-node" to"node"/gt ltedge
from"slave-node" to"autofs-client"/gt ltedge
from"slave-node" to"dhcp-client"/gt ltedge
from"slave-node" to"snmp-server"/gt ltedge
from"slave-node" to"node-certs"/gt ltedge
from"compute" to"slave-node"/gt ltedge
from"master-node" to"node"/gt ltedge
from"master-node" to"x11"/gt lt/graphgt
38
Kickstart framework
39
Kickstart Graph with Roll
HPC
base
40
Compute Node Installation Timeline
41
Available Rolls
  • Area51
  • Tripwire and rootkit
  • Condor
  • High-throughput computing grid package
  • IB
  • Infiniband drivers and MPI from Infinicon
  • Intel
  • Compiler and libraries for Intel-based clusters
    (Scalable Systems)
  • Grid
  • NMI packaging of Globus
  • PBS/Maui
  • Job scheduling
  • SCE
  • Scalable cluster environment (Thailand)
  • SGE
  • Job scheduling
  • Viz
  • Easily set up nVidia-based viz clusters
  • Java
  • Java environment
  • RxC
  • Graphical cluster management tool (Scalable
    Systems)
  • Lava
  • Workload management (Platform Computing)
  • IB-Voltaire
  • Infiniband drivers and MPI from Voltaire

42
Futures
43
Rocks 4.0.0
  • Currently in beta
  • Based on RHEL 4.0
  • Kernel v2.6
  • Using CentOS as base operating environment
  • CentOS is a RHEL rebuild
  • When asked for a roll, input stock CentOS CDs
  • Implication
  • Opens the door for using any RHEL-based media
  • Official RHEL bits
  • Other RHEL clones (e.g., Scientific Linux)

44
More Rolls
  • Application-specific rolls
  • Oil and Gas
  • Computational Chemistry
  • Rendering
  • Bioinformatics

45
Largest Known Rocks Clusters
  • Scientific
  • Our bread and butter.
  • Tungston2 (1040 CPUS - NCSA)
  • Fermilab Farms (1500 cpus, subclustered)
  • Lonestar (1024 cpus - TACC)
  • Iceburg (600 cpus - Stanford)
  • Commercial
  • Niobe Cluster (288 cpus - AMD Sunnyvale labs)
  • Oil Gas Rumors of 1000s of nodes (unspecified)
  • Dell as hardware vendor / Platform for Rocks
    support
  • Beginning to see Rocks on RFP requirement lists
Write a Comment
User Comments (0)