Title: Carlos Varela, cvarela@cs.rpi.edu
1Towards a World-Wide Computer Software
Technology for Computational Grids
Williams College
- Carlos Varela, cvarela_at_cs.rpi.edu
- Department of Computer Science
- Rensselaer Polytechnic Institute
- http//wcl.cs.rpi.edu/
- Graduate Students
- Travis Desell
- Kaoutar El Maghraoui
- Wei-Jen Wang
- April 8, 2005
2Adaptive Partial Differential Equation Solvers
- Investigators
- J. Flaherty, M. Shephard B. Szymanski, C. Varela
(RPI) - J. Teresco (Williams), E. Deelman (ISI-UCI)
- Problem Statement
- How to dynamically adapt solutions to PDEs to
account for underlying computing infrastructure? - Applications/Implications
- Materials fabrication, biomechanics, fluid
dynamics, aeronautical design, ecology. - Approach
- Partition problem and dynamically map into
computing infrastructure and balance load. - Low communication overhead over low-latency
connections. - Software
- Rensselaer Partition Model (RPM)
- Algorithm Oriented Mesh Database (AOMD).
- Dynamic Resource Utilization Model) (DRUM)
3Virtual Surgical Planning
- Investigators
- K. Jansen, M. Shephard (RPI),
- C. Taylor, C. Zarins (Stanford)
- Problem Statement
- How to develop a software framework to enable
virtual surgical planning based on real patient
data? - Applications/Implications
- Surgeons will be able to virtually evaluate
vascular surgical options based on simulation
rather than intuition alone. - Approach
- Scan of real patient is processed to extract
solid model and inlet flow waveform. - Model is discretized and flow equations solved.
- Multiple alterations to model are made within
intuitive human-computer interface and evaluated
similarly. - Software
- MEGA (SCOREC discretization toolkit)
- PHASTA (RPI flow solver).
- Funded by NSF-ITR (7/02-7/07)
4Particle Physics and Bacterial Pathogenicity
- Investigators
- J. Cummings, J. Napolitano (RPI Physics),
- M. Nishiguchi (NMSU Biology), W. Wheeler (AMNH),
- B. Szymanski, C. Varela, J. Flaherty (RPI CS)
- Problem Statement
- Do missing baryons exist? Sub-atomic particles
that have not been observed. - How do bacteria evolve? What are the mechanisms
of infection and colonization? - Applications/Implications
- Physics particle physics, search for missing
baryons. - Biology origins of bacterial pathogenicity,
evolution of species. - Approach
- Experimental data analysis and simulation
- Comparison and analysis of complete genome
sequences to identify evolutionary patterns. - Software
- Domain-specific code for parallel computing on
homogeneous clusters.
5Milky Way Origin and Structure
- Investigators
- H. Newberg (RPI Astronomy), J. Teresco
(Williams) - M. Magdon-Ismail, B. Szymanski, C. Varela(RPI CS)
- Problem Statement
- What is the structure and origin of the Milky Way
galaxy? - How to use data from 10,000 square degrees of the
north galactic cap collected in five optical
filters over five years by the Sloan Digital Sky
Survey? - Applications/Implications
- Astrophysics origins and evolution of our
galaxy. - Approach
- Experimental data analysis and simulation
- Using A stars as tracers of the galactic halo,
and using photometrically determined
metallicities of main sequence F-K stars to
determine whether the thick disk is chemically
distinct from the thin disk and galactic halo of
our galaxy - Status
- Sequential code which takes multiple days to run
on a single node.
6The Rensselaer Grid
External Networks
694 Existing Processors 530 Projected
Processors ------------------------------- 1224
Processors Grid
Internet 2
155 Mbit
- CS Clusters
- 168 processors
- 64 Dual 2.4 GHz Xeon
- 40 800 MHz xSeries
- Multiscale Cluster
- 172 processors
- 66 Dual 2.0 GHz Xeon
- 40 400 MHz Nextra-X1
- Multipurpose Clusters
- 326 processors
- Biotechnology 134 P3 processors
- Nanotechnology 192 processors (Athlon, P4 and P3)
- WCL Cluster
- 28 processors
- 4 dual Sun Baldes 100
- 4 single IBM nodes
- 4 Quad IBM Power series
Existing Clusters
- Bioscience Cluster
- 160 processors
- 80 Dual 2.0 GHz Microway Navion-A Opreton
- Multiscale Cluster
- 160 processors
- 80 Dual 2.0 GHz Microway Navion-A Opreton
- CS Cluster
- 82 processors
- 41 Dual 2 GHz Power PC
- Multiscale Cluster
- 128 processors
- 64 Dual 2.0 GHz Opteron
Projected Clusters
7Map of Rensselaer Grid Clusters
Nanotech
Multiscale
Bioscience Cluster
CS /WCL
Multipurpose Cluster
CS
8TeraGrid
Site Resources
Site Resources
HPSS
HPSS
External Networks
External Networks
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 10.3 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
9Extensible TeraScale Facility (ETF)
Site Resources
Site Resources
HPSS
HPSS
External Networks
External Networks
Rensselaer Grid
Caltech
Argonne
External Networks
External Networks
NCSA/PACI 10.3 TF 240 TB
SDSC 4.1 TF 225 TB
Site Resources
Site Resources
HPSS
UniTree
10Extensible TeraScale Facility (ETF)
RPI
11Data Grid for High Energy Physics
Image courtesy Harvey Newman, Caltech
12iVDGLInternational Virtual Data Grid Laboratory
www.ivdgl.org
13Worlds Largest Computing Grid --CERN 3/2005
www.cern.ch
www.ivdgl.org
14PlanetLab An Open Platform for Worldwide
Services
550 nodes over 261 sites, as of April05
www.planet-lab.org
15Worldwide Computing Software
- Computational Resources and Devices
- Large pool of idle resources available in the
Internet - Heterogeneous platforms
- Networks
- Wide range of latencies/bandwidths
- Dynamic resources
- Different degrees of availability
- Different types of failures
- Research Goals
- Scalability to worldwide execution environments
- Inherent adaptability to environmental changes
and resource availability - Programmability and high-performance
- Approach
- Adaptive reflective middleware to trigger
automatic reconfiguration of applications - High-level programming abstractions
16Actors/SALSA
- Actor Model
- A reasoning framework to model concurrent
computations - Programming abstractions for distributed open
systems - G. Agha, Actors A Model of Concurrent
Computation in Distributed Systems. MIT Press,
1986. - SALSA
- Simple Actor Language System and Architecture
- An actor-oriented language for mobile and
internet computing - Programming abstractions for internet-based
concurrency, distribution, mobility, and
coordination - C. Varela and G. Agha, Programming dynamically
reconfigurable open systems with SALSA, ACM
SIGPLAN Notices, OOPSLA 2001, 36(12), pp 20-34.
17SALSA Basics
- Programmers define behaviors for actors.
- Messages are sent asynchronously.
- Messages are modeled as potential method
invocations. - Continuation primitives are used for
coordination.
18Actor Creation
- To create an actor locally
- TravelAgent a new TravelAgent()
- To create an actor with specified uan and ual
- TravelAgent a new TravelAgent() at (uan, ual)
- Other possibility
- TravelAgent a new TravelAgent() at (uan)
19Message Sending
- TravelAgent a new TravelAgent()
- alt-book( flight )
20Remote Message Sending
- Obtain a remote actor reference by name.
- TravelAgent a getReferenceByName(uan//myhost/t
a) - alt-printItinerary()
- Obtain a remote actor reference by location.
- TravelAgent a getReferenceByLocation(rmsp//myh
ost/ta1) - alt-printItinerary()
21Migration
- Obtaining a remote actor reference and migrating
the actor. - TravelAgent a getReferenceByName
(uan//myhost/ta) - alt-migrate( rmsp//yourhost/travel ) _at_
- alt-printItinerary()
22Token Passing Continuation
- Ensures that each message in the expression is
sent after the previous message has been
processed. It also allows that the return value
of one message invocation may be used as an
argument for a later invocation in the
expression. - Example
- a1lt-m1() _at_ a2lt-m2( token )
- Send m1 to a1 and then after m1 finishes, send
the result with m2 to a2.
23Join Blocks
- Provide a mechanism for synchronizing the
processing of a set of messages. - Set of results is sent along as a token.
- Example
- Actor actors searcher0, searcher1,
searcher2, searcher3 - Join actorslt-find( phrase ) _at_
- resultActorlt-output( token )
- Send the find( phrase ) message to each actor in
actors then after all have completed send the
result to resultActor with an output( )
message.
24Example Acknowledged Multicast
- join a1lt-m1(), a2lt-m2, a3lt-m3(), _at_
- custlt-n(token)
25Lines of Code Comparison
26First Class Continuations
- Enable actors to delegate computation to a third
party independently of the processing context.
27Fibonacci Example
- module examples.fibonacci
- behavior Fibonacci
- int n
-
- Fibonacci(int n) this.n n
-
- int add(int numbers) return (numbers0
numbers1) -
- int compute()
- if (n 0) return 0
- else if (n lt 2) return 1
- else
- Fibonacci fib1 new Fibonacci(n-1)
- Fibonacci fib2 new Fibonacci(n-2)
- join fib1lt-compute(), fib2lt-compute() _at_
add(token) _at_ currentContinuation -
-
-
28SALSA and Java
- SALSA source files are compiled into Java source
files before being compiled into Java byte code. - SALSA programs may take full advantage of Java
API.
29Hello World Example
- module demo
- behavior HelloWorld
- void act( String argv )
-
- standardOutputlt-print( "Hello" ) _at_
- standardOutputlt-print( "World!" )
-
30Hello World Example
- The act( String args ) message handler is
similar to the main() method in Java and is used
to bootstrap SALSA programs.
31Migration Example
behavior Migrate void print()
standardOutputlt-println( "Migrate actor just
migrated here." ) void act( String
args ) if (args.length ! 3)
standardOutputlt-println("Usage java
migration.Migrate ltUANgt ltsrcUALgt
ltdestUALgt") return UAN
uan new UAN(args0) UAL ual new
UAL(args1) Migrate migrateActor
new Migrate() at (uan, ual)
migrateActorlt-print() _at_
migrateActorlt-migrate( args2 ) _at_
migrateActorlt-print()
32Migration Example
- The program must be given valid name and
locations. - After remotely creating the actor. It sends the
print message to itself before migrating to the
second theater and sending the message again.
33Compilation
java SalsaCompiler demo/Migrate.salsa SALSA
Compiler Version 1.0 Reading from file
demo/Migrate.salsa . . . SALSA Compiler Version
1.0 SALSA program parsed successfully. SALSA
Compiler Version 1.0 SALSA program compiled
successfully. javac demo/Migrate.java java
demo.Migrate Usage java migration.Migrate
ltuangt ltualgt ltualgt
- Compile Migrate.salsa file into Migrate.java.
- Compile Migrate.java file into Migrate.class.
- Execute Migrate
34Migration Example
UAN Server
The actor will print "Migrate actor just migrated
here." at theater 1 then theater 2.
35World Migrating Agent Example
36Middleware/IOS
- Middleware
- A software layer between distributed applications
and operating systems. - Alleviates application programmers from directly
dealing with distribution issues - Heterogeneous hardware/O.S.s
- Load balancing
- Fault-tolerance
- Security
- Quality of service
- Internet Operating System (IOS)
- A decentralized framework for adaptive, scalable
execution - Modular architecture to evaluate different
distribution and reconfiguration strategies - T. Desell, K. El Maghraoui, and C. Varela, Load
Balancing of Autonomous Actors over Dynamic
Networks, HICSS-37 Software Technology Track,
Hawaii, January 2004. 10pp.
37World-Wide Computer Architecture
- SALSA application layer
- Programming language constructs for actor
communication, migration, and coordination. - IOS middleware layer
- A Resource Profiling Component
- Captures information about actor and network
topologies and available resources - A Decision Component
- Takes migration, split/merge, or replication
decisions based on profiled information - A Protocol Component
- Performs communication between nodes in the
middleware system - WWC run-time layer
- Theaters provide runtime support for actor
execution and access to local resources - Pluggable transport, naming, and messaging
services
38Autonomous Actors
- Actors
- Unit of concurrency
- Asynchronous message passing
- State encapsulation
- Universal actors
- Universal names
- Location/theater
- Ability to migrate between theaters
- Autonomous actors
- Performance profiling to improve quality of
service - Autonomous migration to balance computational
load - Split and merge to tune granularity
- Replication to increase fault tolerance
39Middleware Agents and Load Balancing
- Middleware agents are organized in a virtual
network and exchange information periodically - New peers join and old peers leave
- Work loads change
- Middleware Agents can organize in different
topologies, e.g., peer-to-peer (p2p) and
cluster-to-cluster (c2c) virtual networks - IOS modular architecture enables using different
load balancing and profiling strategies, e.g. - Random work-stealing (RS)
- Actor topology-sensitive work-stealing (ATS)
- Network topology-sensitive work-stealing (NTS)
- Weighted resource-sensitive work-stealing (WRS)
40Random Work Stealing (RS)
- Loosely based on Cilks random work stealing
- Lightly-loaded theaters periodically send work
steal packets to randomly picked peer theaters - Actors migrate from highly loaded theaters to
lightly loaded theaters - Simple strategy no broadcasts required
- Stable strategy it avoids additional traffic on
overloaded networks
41Actor Topology-Sensitive Work-Stealing (ATS)
- An extension of RS to collocate actors that
communicate frequently - Decision agent picks the actor that will minimize
inter-theater communication after migration,
based on - Location of acquaintances
- Profiled communication history
- Tries to minimize the frequency of remote
communication improving overall system throughput
42Network Topology-Sensitive Work-Stealing (NTS)
- An extension of ATS to take the network topology
and performance into consideration - Periodically profile end-to-end network
performance among peer theaters - Latency
- Bandwidth
- Tries to minimize the cost of remote
communication improving overall system throughput - Tightly coupled actors stay within reasonably low
latencies/ high bandwidths - Loosely coupled actors can flow more freely
43A General Model for Weighted Resource-Sensitive
Work-Stealing (WRS)
- Given
- A set of resources, R r0 rn
- A set of actors, A a0 an
- w is a weight, based on importance of the
resource r to the performance of a set of actors
A - 0 w(r,A) 1
- Sall r w(r,A) 1
- a(r,f) is the amount of resource r available at
foreign node f - u(r,l,A) is the amount of resource r used by
actors A at local node l - M(A,l,f) is the estimated cost of migration of
actors A from l to f - L(A) is the average life expectancy of the set of
actors A - The predicted increase in overall performance G
gained by migrating A from l to f, where G 1 - D(r,l,f,A) (a(r,f) u(r,l,A)) / (a(r,f)
u(r,l,A)) - G Sall r (w(r,A) D(r,l,f,A))
M(A,l,f)/(10log L(A)) - When work requested by f, migrate actor(s) A with
greatest predicted increase in overall
performance, if positive.
44Preliminary Results
- Application Actor Topologies
- Unconnected
- Sparse
- Tree
- Hypercube
- Middleware Agent Topologies
- Peer-to-peer
- Cluster-to-cluster
- Network Topologies
- Grid-like (set of homogeneous clusters)
- Internet-like (more heterogeneous)
- Migration Policies
- Single Actor
- Actor Groups
- Dynamic Networks
45Unconnected and Sparse Application Topologies
- Load balancing experiments use RR, RS and ATS
46Tree and Hypercube Application Topologies
- RS and ATS do not add substantial overhead to RR
- ATS performs best in all cases with some
interconnectivity
47Peer-to-Peer Middleware Agent Topology (P2P)
- List of peers, arranged in groups based on
latency - Local (0-10 ms)
- Regional (11-100 ms)
- National (101-250 ms)
- Global (251 ms)
- Work steal requests
- Propagated randomly within the closest group
until time to live reached or work found - Propagated to progressively farther groups if no
work is found - Peers respond to steal packets when the decision
component decides to reconfigure application
based on performance model
48Cluster-to-Cluster Middleware Agent Topology (C2C)
- Hierarchical peer organization
- Each cluster has a manager
- Each node in a cluster reports periodically
profiling information to manager - Managers perform intra-cluster load balancing
- Cluster managers form a dynamic peer-to-peer
network - Managers may join, leave at any time
- Clusters can split and merge depending on network
conditions - Inter-cluster load balancing is based on
work-stealing similar to p2p protocol component - Clusters are organized dynamically based on
latency
49Physical Network Topologies
- Grid-like Topology
- Relatively homogeneous processors
- Very high performance networking within clusters
(e.g., myrinet and gigabit ethernet) - Networking between clusters dedicated with high
bandwidth links (e.g., the extensible terascale
facility)
- Internet-like Topology
- Wider range of processor architectures and
operating systems - Nodes are less reliable
- Networking between nodes can range from low
bandwidth and latency to dedicated fiber optic
links
50Results for applications with high communication
to computation ratio
51Results for applications with low
communication-to-computation ratio
52Middleware Agent Topology Evaluation Summary
- Simulation results show that
- The peer-to-peer protocol generally performs
better in Internet-like environments, with the
exception of the sparse application topology - The cluster-to-cluster protocol generally
performs better on grid-like environments, with
the exception of the unconnected application
topology
53Single vs. Group Migration
54Dynamic Networks
- Theaters were added and removed dynamically to
test scalability. - During the 1st half of the experiment, every 30
seconds, a theater was added. - During the 2nd half, every 30 seconds, a theater
was removed - Throughput improves as the number of theaters
grows.
55Actor Distribution in Dynamic Networks
- Both RS and ATS distributed actors evenly across
the dynamic network of theaters
56Ongoing/Future Work
- Splitting, Merging, and Replication Components
- Profiling Memory and Storage resources
- Interoperability with existing high-performance
messaging implementations (e.g., MPI, OpenMP) - IOS/MPI project
- Interoperability with Globus/Open Grid Services
Architecture (OGSA) - Interoperability with Web Services
57Related Work Work Stealing/Internet
Computing/P2P Systems
- Work Stealing
- Cilks runtime system for multithreaded parallel
programming - Cilks schedulers techniques of work stealing
- R. D. Blumofe and C. E. Leiserson, Scheduling
Multithreaded Computations by Work Stealing,
FOCS 94 - Internet Computing
- SETI_at_home (Berkeley)
- Folding_at_home (Stanford)
- P2P Systems
- Distributed Storage Freenet, KaZaA
- File Sharing Napster, Gnutella
- Distributed Hashtables Chord, CAN, Pastry
58Related Work Grid/Distributed Computing
- Cluster/Grid Computing Software
- OGSA/Web Services
- Globus (Univa)
- Condor
- Legion
- Network Infrastructure
- PlanetLab
- Distributed Computing Services
- WebOS
- 2K
- Network Weather Service
- Much other work on distributed systems
59Thank you Software freely available at
http//wcl.cs.rpi.edu/
60Using the IOS middleware
- Start IOS Peer Servers a mechanism for peer
discovery - Start a network of IOS theaters
- Write your SALSA programs and extend all actors
to autonomous actors - Bind autonomous actors to theaters
- IOS automatically reconfigures the location of
actors in the network for improved performance of
the application. - IOS supports the dynamic addition and removal of
theaters