Title: The Inferno Grid (and the Reading Campus Grid)
1The Inferno Grid(and the Reading Campus Grid)
- Jon Blower
- Reading e-Science Centre
- Many others,
- School of Systems Engineering, IT Services
http//www.resc.rdg.ac.uk resc_at_rdg.ac.uk
2Introduction
- Reading are in early stages of Campus Grid
construction - Currently consists of two flocked Condor pools
- More of which later
- Also experimenting with the Inferno Grid
- Condor-like system for pooling ordinary desktops
- Although (like Condor) it could be used for more
than this - The Inferno Grid is commercial software but free
to UK e-Science community - Secure, low maintenance, firewall-friendly
- Perhaps not (yet) as feature-rich as Condor
3The Inferno operating system
- The Inferno Grid is based upon the Inferno OS
- Inferno OS is built from the ground up for
distributed computing - Mature technology, good pedigree (Bell Labs, Pike
Ritchie) - Extremely lightweight ( 1MB RAM) so can run as
emulated application identically on multiple
platforms (Linux, Windows, etc) - Hence it is a powerful base for Grid middleware
- Everything in Inferno is represented as a file or
set of files - cf. /dev/mouse in Unix
- So to create a distributed system, just have to
know how to share files uses a protocol
called Styx for this - Inferno OS is released under Liberal Licence
(free and open source) for non-commercial use - Can run applications in the host OS (Linux,
Windows etc) - Secure certificate-based authentication, plus
strong encryption built-in at OS level
4The Inferno Grid
- Built as an application in the Inferno OS
- Hence uses OSs built-in security and ease of
distribution - Can run under all platforms that Inferno OS runs
on - Essentially high-throughput computing cf. Condor
- Created by Vita Nuova (http//www.vitanuova.com)
- Free academic licence, but also used for real
- Evotec OAI (speeds up drug discovery) 90
utilisation of machines - Major government department modelling disease
spread in mammals - Other major company (cant say more!)
- University installations at Reading and York
- (AHM2004 created Inferno Grid from scratch
easily)
5(Can also run Inferno native on bare hardware)
Could write all applications in Limbo (Infernos
own language) and run on all platforms,
guaranteed!
Inferno Grid software (a Limbo program)
Inferno OS (Virtual OS)
Host OS (Windows, Linux, MacOSX, Solaris, FreeBSD)
6Inferno Grid system overview
- Matches jobs submitted to abilities of worker
nodes - The whole show is run by a scheduler machine
- Jobs are ordinary Windows/Linux/Mac executables
- Process is different from that of Condor
- Unless Condor has changed/is changing
- In Condor, workers run daemon processes that wait
for jobs to be sent to them - i.e. scheduler-push
- Requires incoming ports to be open on each worker
node - In the Inferno Grid, workers dial into the
scheduler and ask have you got any work for me? - i.e. worker-pull or labour exchange
- No incoming ports need to be open
- Doesnt poll uses persistent connections
- Studies have shown this to be more efficient (not
sure which ones -)
7Architecture
Worker firewalls No incoming ports open. Single
outgoing port open (to fixed, known server)
Workers can be in different admin. domains
Workers can connect and disconnect at will.
Job submission is via supplied GUI. Could create
other apps (command-line, Web interface)
Firewall single Incoming port open
Scheduler listens for job submissions and
workers reporting for duty
8Job Administration
9Node Administration
10Pros and Cons
- Pros
- Easy to install and maintain
- Good security
- See next slide
- Industry quality
- Cons
- Small user base and not-great documentation
- Hence learning curve
- Doesnt have all Condors features
- E.g. migration, MPI universe, reducing impact on
primary users - No Globus integration yet
- But probably not hard to do JobManager for
Inferno? - Security mechanism is Infernos own
- But might see other mechanisms in Inferno in
future - Question over scalability (100s of machines,
fine 1000s not sure) - Inferno Grids dont flock yet
11Security and impact on primary users
- Only one incoming port on the scheduler needs to
be open through the firewall - Nothing runs as root
- All connections in the Inferno Grid can be
authenticated and encrypted - Public-key certificates for auth, variety of
encryption algs - Cert. usage is transparent, user is not aware
its there - Similar to SSL in principle
- Can setup worker nodes to only run certain jobs
- So can prevent arbitrary code from being run
- Doesnt have all of Condors options for pausing
jobs on keyboard press, etc - Runs jobs under low priority
- But could set up so that workers dont ask for
work if they are loaded - But what happens to a job that has already
started?
12Other points
- Slow-running tasks are reallocated until whole
job is finished. - Could fairly easily write different front-ends
for Inferno Grid for job submission and
monitoring - Dont have to use supplied GUI
- ReSCs JStyx library could be used to write Java
GUI or JSP - In fact, code base is small enough to make
significant customisation realistic - Customise worker node behaviour
- Flocking probably not hard to do
- Schedulers could exchange jobs
- Or workers could know about more than one
scheduler - Inferno OS can be used to very easily create a
distributed data store - This data store can link directly with the
Inferno Grid - Caveat We havent really used this in anger yet!
13Building an Inferno Grid in this room
- These are conservative estimates (I think)
- Install scheduler (Linux machine) 10 minutes
- Install worker node software (Windows) 2
minutes each - Run toy job and monitor it within 15 minutes of
start - Set up Inferno Certificate Authority 1 minute
- Provide Inferno certificates to all worker nodes
2 minutes per node - Provide Inferno cert to users admins 2
minutes each - Fully-secured (small) Inferno Grid up and ready
in an hour or two. - If you know what youre doing!! (remember that
docs arent so good ? )
14Reading Campus Grid so far
- Collaboration between School of Systems
Engineering, IT Services and e-Science Centre - Havent had as much time as wed like to
investigate Inferno Grid - But have an embryonic Campus Grid of two
flocked Condor pools - Although both at Reading, come under different
admin domains - Getting them to share data space was challenging,
and firewalls caused initial problem - (Incidentally, the Inferno Grid had no problems
at all crossing the domains) - Small number of users running MPI and batch jobs
- Animal and Microbial Sciences, Environmental
Systems Science Centre - Ran demo project for University
- An heroic effort at the moment, but we are
trying to secure funding
15Novel features of RCG
- Problem Most machines are Windows but most
people want nix environment for scientific
programs - Diskless Condor
- Windows machines reboot into Linux overnight
- Loads Linux from a network-shared disk image
- Uses networked resources only (zero impact on
hard drive) - In morning, reboots back into Windows
- Looking into CoLinux (www.colinux.org)
- Free VM technology for running Linux under
Windows - Early days, but initial look is promising.
16Future work
- Try to get funding!
- Intention is to make CG key part of campus
infrastructure - IT Services are supportive
- Installation of SRB for distributed data store
- Add clusters/HPC resources to Campus Grid
- Working towards NGS compatibility
17Conclusions
- Inferno Grid has lots of good points, especially
in terms of security, ease of installation and
maintenance - Should be attractive to IT Services
- We havent used it in anger yet but it is used
successfully by others (in academia, industry and
govt) - Caveat these people tend to run a single app (or
small number of apps) rather than general code - Doesnt have all of Condors features
- We dont want to fragment effort or become
marginalised - Would be great to see good features of Inferno
appear in Condor, esp. worker pull mechanism