Title: The VM deployment process has 3 major steps:
1Quality of Life in the Grids VMs meet
Bioinformatics Applications Daniel Galron1 Tim
Freeman2 Kate Keahey3 Stephanie Gato4
Natalia Maltsev5 Alex Rodriguez6 Mike
Wilde7 1 The Ohio State University.
galron_at_cis.ohio-state.edu 2 Argonne
National Laboratory. tfreeman_at_mcs.anl.gov
3Argonne National Laboratory.
keahey_at_mcs.anl.gov 4Indiana University.
sgato_at_cs.indiana.edu 5 Argonne National
Laboratory. maltsev_at_mcs.anl.gov 6Argonne
National Laboratory. arodri7_at_mcs.anl.gov 7Argon
ne National Laboratory. wilde_at_mcs.anl.gov
A Glossary of Terms VMM (Virtual Machine
Monitor) a 3rd-party tool providing the
interface between a Virtual Machine and the host
machine. Some examples of VMMs are VMWare and Xen.
- Using VMs has many benefits for scientists
running complex applications - Broader resource base a virtual machine can be
pre-configured with a required OS, library
signature and application installation and then
deployed on many different nodes independently of
that nodes configuration - Simplified deployment/distribution VMs can be
used as distribution packages to duplicate an
installation, just copy a VM image - Easy Migration capability an executing VM image
can be frozen, transferred to (another)
resource and restarted within milliseconds - Fine grained resource management one can
confine resource usage within most VM
implementations - Enhanced security VMs provide outstanding
isolation protecting the resource from the user
and isolating users from each other
- Complex applications require customized software
configurations such environments may not be
widely available on Grid nodes - Installing scientific applications by hand can
be arduous, lengthy and error-prone the ability
to amortize this process over many installations
would help - Providing good isolation of Grid computations is
a key security requirement the currently used
mechanism of Unix accounts is not sufficient - Providing a vehicle for fine-grained resource
usage enforcement is critical for more efficient
use of Grid resources, yet such technology is not
widely available - The ability to migrate or restart applications
would be of enormous value in a Grid environment
yet the current Grid frameworks do not support it
VMManager Grid service interface to allow a
remote client to interact with the
VMM VMRepository Grid service which catalogues
VM images of a VO and which stores them for
retrieval and deployment Authorization Service
Grid service which the VMManager and VMRepository
services call to check if a user is authorized to
perform the requested operation
Virtual Machines meet the Grids
Performance Implications
In a nutshell
The performance of applications running on a VM
depends on the third-party VMMs and the
applications themselves. A purely CPU-bound
program will have almost no performance
degradation as all instructions will be executed
directly on hardware. Typically, virtual machines
intercept privileged instructions (such as I/O)
resulting in a performance hit for those
instructions although new methods, such as those
implemented by Xen, improve this factor. In our
implementation, we experimented with VMWare
Workstation and Xen and in our experience
slowdown was never more than 30 and is often
less than 5. (The Xen slowdown was much less
than 30)
Instead of running Grid software within VMs, we
integrated VM deployment into the Grid
infrastructure mapping a client credential to a
Unix account was replaced by deploying a VM and
starting the clients environment within it.
We implemented the architecture using Globus
Toolkit 3.2, an open-source grid middleware
toolkit which provides a framework for resource,
data, and security management.
3
3
3
2
Migration
Describing VM Properties
- Integrating Virtual Machines with Grid
technology allows easy migration of applications
from one node to another. The steps are as
follows - Using Grid software, the client freezes execution
of the VM - The client then sends the migrate command to
the VMManager, specifying the new host node as a
parameter - After checking for the proper authorization, the
VM is registered with the new host and a GridFTP
call transfers the image - In terms of performance this is on a par with
deployment it is mainly bound by the length of
transfer. In our tests, we migrated a 2GB VM
image from two identical nodes through a Fast
Ethernet connection.
2
1
1
A VM constitutes a virtual workspace configured
to meet the requirements of Grid computations. We
use an XML Schema to describes various aspects of
such workspace including virtual hardware (RAM
size, disk size, Virtual CD-ROM drives, serial
ports, parallel ports), installed software
including the operating system (e.g. kernel
version, distribution type) as well as library
signature, as well as other properties such as
image name and VM owner. Based on those
descriptions VMs can be selected, duplicated, or
further configured.
Legend
- VMManager
- VMRepository
The graph to the right shows the proportion of
time taken by the constituents of the deployment
process, measured in seconds. Note that the graph
does not include time for authorization, but
those times are comparable to registration time.
Also, the actual migration time depends on the
network latency and bandwidth. The pause and
resume times are dependent on 3rd party VMM.
VM Deployment
- The VM deployment process has 3 major steps
- The client queries the VM repository, sending a
list of criteria describing a workspace. The
repository returns a list of VM descriptors that
match them. - The client contacts the VMManager, sending it the
descriptor of the VM they want to deploy, along
with an identifier, and a lifetime for the VM.
The VMManager authorizes the request using an
access control list. - The VM instance is registered with the VMManager
and the VM is copied from the VMRepository. The
VMManager then interfaces with the VMM on the
resource to power on the VM.
The low level features of our architecture are
detailed in the diagram to the right. The diagram
describes for nodes, each running a (potentially
different) host OS. Each node is running a VMM
and a VMManager Grid Service. On top of that
layer, run the actual VMs, which are installed
with Grid software, allowing them to be run as
Grid nodes. The VMs could also be used as
independent execution environments, without Grid
middleware installed on them. (Instead they would
run applications directly).
The graph to the right shows the proportion of
time taken by the constituents of the deployment
process, measured in seconds. The authorization
time is not included, but it is comparable to
registration time. The dominant factor in overall
deployment time depends on network latency and
bandwidth.
After a scientist has deployed a VM onto the
resource, he may run an application in it. For
this purpose, each of our VMs was configured with
the Globus Toolkit. This picture represents a
scientist running the TOPO program, creating an
image of a transmembrane protein.
How does using VMs help the Bioinformatics
community?
Do VMs fulfill their promise?
Issues or Problems Encountered
- Broader base of resources Our tests show that
this first promise is met. Consider the following
situation a scientist can use a testbed on DOE
Science Grid across several clusters. A scientist
has access to 20 Solaris nodes in LBNL, 20 nodes
in ANLs Jazz Cluster (Linux nodes), and 20
Linux nodes on NERSCs pdsf cluster. If only the
Jazz nodes have the necessary configuration to
run EMBOSS, it would take a lot more work to get
EMBOSS to run on the LBNL and pdsf clusters. If
we install EMBOSS on a VM, and then run an
instance of the VM on each node we can use all 60
nodes instead of just 20. - Easier deployment/distribution Using VMs makes
deployment easier and faster. In our tests we
experimented with a 2 GB minimal VM image with
the following results - EMBOSS installation 45 minutes
- VM deployment on our testbed 6 minutes 23
seconds - Peace of mind (not having to debug
installation) priceless! - Fine Grained resource management Depending on
the implementation of, a VM can provide
fine-grained resource usage enforcement critical
in many scenarios in the Grid - Enhanced security VMs offer enhanced isolation
and are therefore a more secure solution for
representing user environments.
When developing the architecture we encountered
several important but interesting issues and
problems we had to resolve. Clocks While a VM
image is frozen or powered-off, the VMs clock
does not update. We need a way to update a VMs
clock as soon as it is powered-on or unpaused.
IP Addresses We need a way to assign unique IP
addresses to each new VM instance (i.e. each time
a VM is deployed) so that multiple copies of the
same VM can be deployed on the same
subnet. Starting a Grid container We also need a
way to automatically start up a Grid container on
startup of a VM if we want it to be a
full-fledged Grid node, or at least launch a User
Hosting Environment. We solved these issues by
installing a daemon on the VM upon deployment,
it sets the IP address of the VM, launches a UHE
and, if needed, updates the clock.