Title: NPACI Rocks
1NPACI Rocks
- Mason Katz
- San Diego Supercomputer Center
2Who is NPACI Rocks?
- Cluster Computing Group at SDSC
- UC Berkeley Millennium Project
- Provide Ganglia Support
- Linux Competency Centre in SCS Enterprise Systems
Pte Ltd in Singapore - Provide PVFS Support
- Working on user documentation for Rocks 2.2
3What sets us apart
- Fully automated cluster deployment
- Get and burn ISO CD image from Rocks.npaci.edu
- Fill-out form to build initial kickstart file for
your first front-end machine - Kickstart naked frontend with CD and kickstart
file - Reboot frontend machine
- Integrate compute nodes with Insert Ethers
- Ready to go!
- Complete out of the box solution with rational
default settings
4Who is Using It?
- Growing (and partial) list of users that we know
about - SDSC, SIO, UCSD (8 Clusters, including CMS
(GriPhyN) prototype) - Caltech
- Burnham Cancer Institute
- PNNL (several clusters, small, medium, large)
- University of Texas
- University of North Texas
- Northwestern University
- University of Hong Kong
- Compaq (Working relationship with their Intel
Standard Servers Group) - Singapore Bioinformatics Institute
- Myricom (Their internal development cluster)
- Cray (partnered with DELL)
5Motivation
- Care and feeding for a system isnt fun.
- Enable non-cluster experts to run clusters
- Essential to track software updates
- Open source moves fast!
- On the order of 3 updates a week for Red Hat
- Essential to track Red Hat releases
- Feature rot
- Unplugged security holes
- Run on heterogeneous, standard high-volume
components
6Philosophy
- All nodes are 100 automatically installed
- Zero hand configuration
- Scales very well
- NPACI Rocks is an entire cluster-aware
distribution - Included packages
- Full Red Hat release
- De-facto standard cluster packages (MPI,
PBS/Maui, etc.) - NPACI Rocks packages
7More Philosophy
- Use installation as common mechanism to manage
cluster - Install when
- Initial bring-up
- Replacing a dead node
- Adding new nodes
- Also use installation to keep software consistent
- If you catch yourself wondering if a nodes
software is up-to-date, reinstall! - In 10 minutes, all doubt is erased.
8Basic Architecture
Front-end Node(s)
Power Distribution (Net addressable units as
option)
Public Ethernet
Fast-Ethernet Switching Complex
Gigabit Network Switching Complex
9Major Components
10Configuration Derived from Database
Automated node discovery
Node 0
mySQL DB
insert-ethers
Node 1
makehosts
makedhcp
pbs-config-sql
Node N
/etc/hosts
/etc/dhcpd.conf
pbs node list
11Key Tables - Nodes
12Software Installation
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
13Software Repository
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
14Rocks-dist
- Distribution builder
- Rocks
- Red Hat
- Same thing
- Version Manager
- Resolves software updates
- Default to the most recent software
- Can force package versions as needed
- Distribution versioning
- Allows multiple distributions at once
- CDROM building
- Build your own bootable Rocks CD
15How we use rocks-dist
16How you use rocks-dist
17Inheritance
- Rocks
- Red Hat plus updates
- Rocks software
- Campus
- Rocks software
- Campus changes
- Cluster
- Campus Rocks
18Installation Instructions
Collection of all possible software
packages (AKA Distribution)
Descriptive information to configure a node
Kickstart file
RPMs
Appliances
Compute Node
IO Server
Web Server
19Kickstart
- Description based installation
- Manage software components not the bits on the
disk - Only way to deal with heterogeneous hardware
- System imaging (aka bit blasting) relies on
homogeneity - Homogenous clusters do not exist
- Red Hats Kickstart
- Flat ASCII file
- No macro language
- Requires forking based on site information and
node type - Rocks XML Kickstart
- Decompose a kickstart file into nodes and graphs
- Macros and SQL for site configuration
- Driven from web cgi script
20XML Kickstart
- Nodes
- Describe a single set of functionality
- Ssh
- Apache
- Kickstart file snippets (XML tags map to
kickstart commands) - Pull site configuration from SQL Database
- Over 80 node files in Rocks
- Graph
- Defines interconnections for nodes
- Think OOP or dependencies
- A single graph file in Rocks
- Graph Nodes SQL gt Node specific kickstart
file
21Sample Node File
lt?xml version"1.0" standalone"no"?gt lt!DOCTYPE
kickstart SYSTEM "_at_KICKSTART_DTD_at_" lt!ENTITY ssh
"openssh"gtgt ltkickstartgt ltdescriptiongt Enable
SSH lt/descriptiongt ltpackagegtsshlt/packagegt
ltpackagegtssh-clientslt/packagegt ltpackagegtssh-s
erverlt/packagegt ltpackagegtssh-askpasslt/packagegt
ltpostgt cat gt /etc/ssh/ssh_config ltlt
'EOF lt!-- default client setup --gt Host
ForwardX11 yes ForwardAgent
yes EOF chmod orx /root mkdir /root/.ssh chmod
orx /root/.ssh lt/postgt lt/kickstartgtgt
22Sample Graph File
lt?xml version"1.0" standalone"no"?gt lt!DOCTYPE
kickstart SYSTEM "_at_GRAPH_DTD_at_"gt ltgraphgt ltdescrip
tiongt Default Graph for NPACI Rocks. lt/descripti
ongt ltedge from"base" to"scripting"/gt ltedge
from"base" to"ssh"/gt ltedge from"base"
to"ssl"/gt ltedge from"base" to"lilo"
arch"i386"/gt ltedge from"base" to"elilo"
arch"ia64"/gt ltedge from"node" to"base"
weight"80"/gt ltedge from"node"
to"accounting"/gt ltedge from"slave-node"
to"node"/gt ltedge from"slave-node"
to"nis-client"/gt ltedge from"slave-node"
to"autofs-client"/gt ltedge from"slave-node"
to"dhcp-client"/gt ltedge from"slave-node"
to"snmp-server"/gt ltedge from"slave-node"
to"node-certs"/gt ltedge from"compute"
to"slave-node"/gt ltedge from"compute"
to"usher-server"/gt ltedge from"master-node"
to"node"/gt ltedge from"master-node"
to"x11"/gt ltedge from"master-node"
to"usher-client"/gt lt/graphgt
23Kickstart framework
24Composition
- Aggregate FunctionalityScripting
- IsA perl-development
- IsA python-development
- IsA tcl-development
25Minor Differences
- Specify only the deltas
- Desktop IsA
- Standalone
- Laptop IsA
- Standalone
- Pcmcia
26Architecture
- Conditional inheritance
- Annotate edges with target architects
27Payoff Never before seen hardware
- Dual Athlon, White box, 20 GB IDE, 3Com Ethernet
- 300 PM In cardboard box
- Shook out the loose screws
- Dropped in a Myrinet card
- Inserted it into cabinet 0
- Cabled it up
- 325 PM Inserted the NPACI Rocks CD
- Ran insert-ethers (assigned node name
compute-0-24) - 340 PM Ran Linpack
28Futures
- Improve Monitoring, debugging, self-diagnosis of
cluster-specific software - Improve documentation!
- Continue Tracking RedHat updates/releases
- Prepare for Infiniband Interconnect
- Global file systems, I/O is an Achilles heel of
clusters - Grid Tools (Development and Testing)
- Globus
- Grid research tools (APST)
- GridPort toolkit
- Integration with other SDSC projects
- SRB
- MiX - data mediation
- Visualization Cluster - Display Wall
29Summary
- Rocks significantly lowers the bar for users to
deploy usable compute clusters - Very simple hardware assumptions
- XML module descriptions allows encapsulation
- Graph interconnection allows appliances to share
configuration - Deltas among appliances easily visualize
- HTTP transport scalable in
- Performance
- Distance