Virtuoso: Distributed Computing Using Virtual Machines - PowerPoint PPT Presentation

About This Presentation
Title:

Virtuoso: Distributed Computing Using Virtual Machines

Description:

Install and learn complex Grid software. 10 ... Install and learn complex Grid software. Deal with local accounts and privileges ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 48
Provided by: csNorth
Category:

less

Transcript and Presenter's Notes

Title: Virtuoso: Distributed Computing Using Virtual Machines


1
Virtuoso Distributed Computing Using Virtual
Machines
  • Peter A. Dinda
  • Prescience Lab
  • Department of Computer Science
  • Northwestern University
  • http//plab.cs.northwestern.edu

2
People and Acknowledgements
  • Students
  • Ashish Gupta, Ananth Sundararaj, Dong Lu, Jason
    Skicewicz, Billy Davidson, Andrew Weinrich
  • Collaborators
  • In-Vigo project at University of Florida
  • Renato Figueiredo, Jose Fortes
  • Funder
  • NSF through several awards

3
Outline
  • Motivation
  • Virtuoso Model
  • Virtual networking and remote devices
  • Information services
  • Resource measurement and prediction
  • Resource control
  • Related work
  • Conclusions

R. Figueiredo, P. Dinda, J. Fortes, A Case For
Grid Computing on Virtual Machines, ICDCS 2003
4
  • How do we deliver arbitrary amounts of
    computational power to ordinary people?

5
Distributed and Parallel Computing
  • How do we deliver arbitrary amounts of
    computational power to ordinary people?

Interactive Applications
6
Distributed and Parallel Computing
  • How do we deliver arbitrary amounts of
    computational power to ordinary people?

Interactive Applications
7
IBM xSeries virtual cluster (64 CPUs), 1 TB RAID
Interactivity Environment Cluster, CAVE (90
CPUs), 8 TB RAID
2 Distributed Optical Testbed Clusters IBM
xSeries (14-28 CPUs), 1 TB RAID
DOT clusters with optical connectivity IBM
xSeries (14-28 CPUs), 1 TB RAID Argonne,
U.Chicago, IIT, NCSA, others
Nortel Optera Metro Edge Optical Router
Distributed Optical Testbed (DOT) Private Optical
Network
Northwestern
8
Grid Computing
  • Flexible, secure, coordinated resource sharing
    among dynamic collections of individuals,
    institutions, and resources
  • I. Foster, C. Kesselman, S. Tuecke, The Anatomy
    of the Grid Enabling Scalable Virtual
    Organizations, International J. Supercomputer
    Applications, 15(3), 2001
  • Globus, Condor/G, Avaki, EU DataGrid SW,

9
Complexity from Users Perspective
  • Process or job model
  • Lots of complex state connections, special
    shared libraries, licenses, file descriptors
  • Operating system specificity
  • Perhaps even version-specific
  • Symbolic supercomputer example
  • Need to buy into some Grid API
  • Install and learn complex Grid software

10
Users already know how to deal with this
complexity at another level
11
Complexity from Resource Owners Perspective
  • Install and learn complex Grid software
  • Deal with local accounts and privileges
  • Associated with global accounts or certificates
  • Protection
  • Support users with different OS, library,
    license, etc, needs.

12
Virtual Machines
  • Language-oriented VMs
  • Abstract interpreted machine, JIT Compiler, large
    library
  • Examples UCSD p-system, Java VM, .NET VM
  • Application-oriented VMs
  • Redirect library calls to appropriate place
  • Examples Entropia VM
  • Virtual servers
  • Kernel makes it appear that a group of processes
    are running on a separate instance of the kernel
  • Examples Ensim, Virtuozzo, SODA,
  • Virtual machine monitors (VMMs)
  • Raw machine is the abstraction
  • VM represented by a single image
  • Examples IBMs VM, VMWare, Virtual PC/Server,
    Plex/86, SIMICS, Hypervisor, DesQView/TaskView.
    VM/386

13
VMWare GSX VM
14
Isnt It Going to Be Too Slow?
Application Resource ExecTime (103 s) Overhead
SpecHPC Seismic (serial, medium) Physical 16.4 N/A
SpecHPC Seismic (serial, medium) VM, local 16.6 1.2
SpecHPC Seismic (serial, medium) VM, Grid virtual FS 16.8 2.0
SpecHPC Climate (serial, medium) Physical 9.31 N/A
SpecHPC Climate (serial, medium) VM, local 9.68 4.0
SpecHPC Climate (serial, medium) VM, Grid virtual FS 9.70 4.2
Small relative virtualization overhead compute-in
tensive
Experimental setup physical dual Pentium III
933MHz, 512MB memory, RedHat 7.1, 30GB disk
virtual Vmware Workstation 3.0a, 128MB memory,
2GB virtual disk, RedHat 2.0 NFS-based grid
virtual file system between UFL (client) and NWU
(server)
15
Isnt It Going To Be Too Slow?
Synthetic benchmark exponentially arrivals of
compute bound tasks, background load provided by
playback of traces from PSC Relative overheads lt
10
16
Isnt It Going To Be Too Slow?
  • Virtualized NICs have very similar bandwidth,
    slightly higher latencies
  • J. Sugerman, G. Venkitachalam, B-H Lim,
    Virtualizing I/O Devices on VMware Workstations
    Hosted Virtual Machine Monitor, USENIX 2001
  • Disk-intensive workloads (kernel build, web
    service) 30 slowdown
  • S. King, G. Dunlap, P. Chen, OS support for
    Virtual Machines, USENIX 2003

17
Virtuoso
  • Approach Lower level of abstraction
  • Raw machines, not processes
  • Mechanism Virtual machine monitors
  • Our Focus Middleware support to hide complexity
  • Ordering, instantiation, migration of machines
  • Virtual networking and remote devices
  • Connectivity to remote files, machines
  • Information services
  • Monitoring and prediction
  • Resource control

18
The Virtuoso Model
  • User orders raw machine(s)
  • Specifies hardware and performance
  • Basic software installation available
  • OS, libraries, licenses, etc.
  • Virtuoso creates raw image and returns reference
  • Image contains disk, memory, configuration, etc.
  • User powers up machine
  • Virtuoso chooses provider
  • Information service
  • Virtuoso migrates image to provider
  • Efficient network transfer
  • rsync, demand paging, versioned filesystems

19
The Virtuoso Model
  • Provider instantiates machine
  • Virtual networking ties machine back to users
    home network
  • Remote device support makes users desktops
    devices available on remote VM
  • Remote display support gives user the console of
    the machine (VNC)
  • Resource control to give user expected
    performance
  • User goes to his network admin to get address,
    routing for his new machine
  • User customizes machine
  • Feeds in CDs, floppies, ftp, up2date, etc.

20
The Virtuoso Model
  • User uses machine
  • Shutdown, hibernate, power-off, throw away
  • Virtuoso continuously monitors and adapts
  • Various mechanisms, all invisible to user
  • Migrating the machine
  • Routing traffic between machines
  • Virtual network topology
  • Predictive scheduling versus reservations
  • Various goals
  • Price
  • Interactivity
  • Information service
  • Resource monitoring and prediction

21
Outline
  • Motivation
  • Virtuoso Model
  • Virtual networking and remote devices
  • Information services
  • Resource measurement and prediction
  • Resource control
  • Related work
  • Conclusions

R. Figueiredo, P. Dinda, J. Fortes, A Case For
Grid Computing on Virtual Machines, ICDCS 2003
22
Why Virtual Networking?
  • A machine is suddenly plugged into your network.
    What happens?
  • Does it get an IP address?
  • Is it a routeable address?
  • Does firewall let its traffic through?
  • To any port?

How do we make virtual machine hostileenvironment
s as friendly as the users LAN?
23
A Layer 2 Virtual Network (VLAN) for the Users
Virtual Machines
  • Why Layer 2?
  • Protocol agnostic
  • Mobility
  • Simple to understand
  • Ubiquity of Ethernet on end-systems
  • What about scaling?
  • Number of VMs limited
  • Hierarchical routing possible because MAC
    addresses can be assigned hierarchically

24
A Simple Layer 2 Virtual Network
Client
Server
VM monitor
SSH
Remote VM
Virtual NIC
Physical NIC
Physical NIC
Hostile Remote Network
Friendly Local Network
25
A Simple Layer 2 Virtual Network
Client
Server
VM monitor
SSH
Remote VM
Virtual NIC
Physical NIC
Physical NIC
Hostile Remote Network
Friendly Local Network
26
A Simple Layer 2 Virtual Network
Client
Server
VM monitor
Bridged
Bridged
SSH Tunnel
Remote VM
Virtual NIC
Physical NIC
Physical NIC
Hostile Remote Network
Friendly Local Network
27
An Overlay Network
  • Bridgeds and connections form an overlay network
    for routing traffic among virtual machines and
    the users home network
  • Links can trivially be added or removed

28
Bootstrapping the Virtual Network
  • Star topology always possible
  • TCP session from client must have been possible
  • Better topology may be possible
  • Depends on security at each site
  • Topology may change
  • Virtual machines can migrate
  • Bootstrap to higher layers
  • Virtual filesystems

29
Remote Devices
Client
Server
VM monitor
nbd-server
nbd-client
SSH Tunnel
Remote VM
Virtual CDROM
Physical CDROM
Linux Network Block Device Driver /dev/cdrom lt-gt
/dev/nb0 lt-gt VMWare CD Image
30
Extending a Grid Information Service (GIS) to
Support Virtual Machines
  • A GIS contains information about the available
    resources in a grid
  • Hosts, routers, switches, software, etc.
  • URGIS project at Northwestern
  • GIS based on the relational data model
  • Compositional queries (joins) to find collections
    of resources.
  • Find physical machines which can instantiate a
    virtual machine with 1 GB of memory
  • Find sets of four different virtual machines on
    the same network with a total memory between 512
    MB and 1 GB
  • Nondeterministic query extension for scalability

31
The RGIS Design (Per Site)
Web Interface
Scripts
C API
External user identification mapped to database
users and roles
Content Delivery Network Interface For loose
consistency
Update Manager
Query Manager and Rewriter
Oracle 9i Front End transactional inserts and
updates using stored procedures, queries using
select statements (uses databases access control)
Updates encrypted using asymmetric cryptography
on network. Only those with appropriate keys
have access
RDBMS Use of Oracle is not a requirement of
approach
Oracle 9i Back End Windows, Linux, Parallel
Server, etc
Oracle 9i Back End Windows, Linux, Parallel
Server, etc
Oracle 9i Back End Windows, Linux, Parallel
Server, etc
site-to-site (tentative)
Schema, type hierarchy, indices, PL/SQL stored
procedures for each object
32
Motivation for Non-deterministic Queries
  • Queries for compositions of resources easily
    expressed in SQL
  • But such queries can be very expensive to execute
  • However, we typically dont need the entire
    result set, just some rows, and not always the
    same ones
  • And we need them in a bounded amount of time
  • Approach return random sample of result set

select h1.insertid, h2.insertid from hosts
h1, hosts h2 where h1.osLINUX and
h2.osLINUX and h1.mem_mbh2.mem_mbgt3072
Find 2 hosts with Linux that together have 3 GB
of RAM
33
Implementing non-deterministic queries
select nondeterministically h1.insertid,
h2.insertid from hosts h1, hosts h2 where
h1.osLINUX and h2.osLINUX and
h1.mem_mbh2.mem_mbgt3072 within 2 seconds
SELECT H1.INSERTID, H2.INSERTID FROM
HOSTS H1, HOSTS H2 , INSERTIDS TEMP_H1 ,
INSERTIDS TEMP_H2 WHERE (H1.OS'LINUX' AND
H2.OS'LINUX' AND H1.MEM_MBH2.MEM_MBgt3072)
AND (H1.INSERTIDTEMP_H1.INSERTID AND
TEMP_H1.rand gt 982663452.975047 AND
TEMP_H1.rand lt 1025613125.93505) AND
(H2.INSERTIDTEMP_H2.INSERTID AND
TEMP_H2.rand gt 1877769069.94039 AND
TEMP_H2.rand lt 1920718742.90039)
Query Manager and Rewriter
Random sample ofinput tablesProbability of
inclusiondetermined by time constraintand
server load
34
Nondeterministic query performance
Meaningful tradeoff between query processing time
and result set size is possible
Select two hosts that together have gt3GB of
RAM 500,000 host grid generated by GridG Memory
distribution according to Smith study of MDS
contents Dual Xeon 1 GHz, 2 GB, 240 GB RAID,
RGIS2, Oracle 9i Enterprise Average of five
trials
35
Nondeterministic query performance
Can use tradeoff to controlquery time
independent of query complexity
Select n hosts that together have gt3GB of
RAM 500,000 host grid generated by GridG Memory
distribution according to Smith study of MDS
contents Dual Xeon 1 GHz, 2 GB, 240 GB RAID,
RGIS2, Oracle 9i Enterprise Average of five trials
36
Deadlines
37
Extending a Grid Information Service (GIS) to
Support Virtual Machines
  • Virtual indirection
  • Each RGIS object has a unique id
  • Virtualization table associates unique id of
    virtual resources with unique ids of their
    constituent physical resources
  • Virtual nature of resource is hidden unless query
    explicitly requests it
  • Futures
  • An RGIS object that does not exist yet
  • Futures table of unique ids
  • Future nature of resource hidden unless query
    explicitly requests it

38
Extending a Resource Monitoring and Prediction
System to Support Virtual Machines
  • Measuring and predicting dynamic resource
    availability to support adaptation
  • Virtual machine migration
  • Routing on the virtual network
  • Application-level adaptation
  • RPS System at Northwestern
  • Host and network measurements for Unix and
    Windows
  • Emphasis on prediction (wide range of linear and
    nonlinear models) and communication (wide range
    of transports)

39
RPS Toolkit
  • Extensible toolkit for implementing resource
    signal prediction systems CMU-CS-99-138
  • Growing RTA, RTSA, Wavelets, GUI, etc
  • Easy buy-in for users
  • C and sockets (no threads)
  • Prebuilt prediction components
  • Libraries (sensors, time series, communication)

40
Example Multiscale Network Prediction
  • Large, recent study of predictability
  • Hundreds of NLANR and other traces
  • Mostly WANs
  • Different resolutions
  • Binning and low-pass via wavelets
  • Sweet Spot
  • Predictability often maximized at particular
    resolution

41
Multiresolution Network Prediction
42
Extending a Resource Prediction System to Support
Virtual Machines
  • Goal monitor physical machine and infer behavior
    inside of virtual machine
  • Current approach /proc on physical machine to
    slowdown on resource rate in virtual machine
  • ARX models

43
Resource Control
  • Owner has an interest in controlling how much and
    when compute time is given to a virtual machine
  • Our approach A language for expressing these
    constraints, and compilation to real-time
    schedules, proportional share, etc.
  • Very early stages. Trying to avoid kernel
    modifications.

44
Outline
  • Motivation
  • Virtuoso Model
  • Virtual networking and remote devices
  • Information services
  • Resource measurement and prediction
  • Resource control
  • Related work
  • Conclusions

R. Figueiredo, P. Dinda, J. Fortes, A Case For
Grid Computing on Virtual Machines, ICDCS 2003
45
Related Work
  • Collective / Capsule Computing (Stanford)
  • VMM, Migration/caching, Hierarchical image files
  • Denali (U. Washington)
  • Highly scalable VMMs (1000s of VMMs per node)
  • CoVirt (U. Michigan)
  • Xenoserver (Cambridge)
  • SODA (Purdue)
  • Virtual Server, fast deployment of services
  • Internet Suspend/Resume (Intel Labs Pittsburgh)
  • Ensim
  • Virtual Server, widely used for web site hosting
  • WFQ-based resource control released into
    open-source Linux kernel
  • Virtouzzo (SWSoft)
  • Ensim competitor
  • Available VMMs IBMs VM, VMWare, Virtual
    PC/Server, Plex/86, SIMICS, Hypervisor,
    DesQView/TaskView. VM/386

46
Current Status (At Northwestern)
  • Bridged components done
  • Mechanism for virtual networking
  • No policy yet
  • Very preliminary system for acquiring and
    instantiating VMs done
  • RGIS schema extensions done
  • Work In Progress
  • Remote devices (management)
  • Virtual networking (policy adaptation)
  • VM Monitoring using RPS

47
For MoreInformation
  • Prescience Lab (Northwestern University)
  • http//plab.cs.northwestern.edu
  • ACIS (University of Florida)
  • http//acis.ufl.edu
Write a Comment
User Comments (0)
About PowerShow.com