Title: A Framework for Collaborative Distributed Simulation over the Grid
1A Framework for Collaborative Distributed
Simulation over the Grid
- Stephen John Turner
- Parallel Distributed Computing Centre
- Nanyang Technological University
- Singapore
2Project Funding
SMA Inter-University Project Wentong CAI (Nanyang
Technological Univ) Stephen J TURNER (Nanyang
Technological Univ) Yong Meng TEO (National Univ
of Singapore) Rassul AYANI (Royal Institute of
Technology, Sweden)
UK e-Science Sister Project Georgios
THEODOROPOULOS (Univ of Birmingham) Brian LOGAN
(Univ of Nottingham) Stephen J TURNER (Nanyang
Technological Univ) Wentong CAI (Nanyang
Technological Univ)
3Outline
- Background
- Distributed Simulation
- Grid Computing
- Motivation
- Research Challenges
- HLA-based Distributed Simulation
- Grid Services and Service Discovery
- Load Management System
- Grid Enabled HLA/RTI
- Conclusions
4Distributed Simulation
- Provides a way of linking simulation components
(federates) of various types at possibly
different locations to create a common virtual
environment (federation)
5Example Application Areas
- Battlefield Simulation
- Linking different types of forces at multiple
physical locations to create a realistic and
complex virtual world - Supply Chain Simulation
- Managing material and information flow, from
manufacturers through distributors to customers - Air Traffic Control
- Simulating airports and airspace sectors to
provide faster than real-time simulation for
what-if analysis - Multi-player Internet Games
- Involving massive multi-player (10,000) virtual
world
6High Level Architecture
7High Level Architecture
- Features of High Level Architecture
- Each federate has a simulation object model (SOM)
defining the data to be shared with other
federates allowing reuse in different federations - The federation (set of federates) has a common
federation object model (FOM) - HLA supports distributed simulations linking the
federates of a federation over a LAN or the
Internet - Time Management can be used to ensure the correct
ordering of events - HLA is an IEEE (1516) and OMG standard
8Ambassador Paradigm
9Grid Computing
- Grid technology is the next step in the evolution
of computing, enabling new forms of collaboration
through the seamless sharing of distributed
computing and data resources
Communities can share geographically distributed
resources for their common purpose
10Grid Computing
Web Services Grid Services OGSA OGSI Globus
Toolkit
11Motivation
- Collaborative Simulation Development
- The development of complex simulations usually
requires collaborative effort from analysts with
different domain knowledge and expertise,
possibly at different locations - Sharing of Computing Resources
- Simulation systems often require huge computing
resources and the participants in the simulation
and/or data sets required may also be
geographically distributed
12Motivation
- HLA-based Distributed Simulation on the Grid
- HLA defines a standard for reuse and
interoperability - Grid technologies enable collaboration and the
use of distributed computing resources
- Collaborative
- Distributed
- Complex Multi-dimensional
13Simulation Life Cycle
14Research Challenges
- Service/Model Discovery
- Based on requirements, suitable component
models are selected to form an overall simulation - Research Issues
- How are simulation models registered as grid
services - How are simulation models discovered?
- How are the interfaces defined?
- Are the simulation models HLA compliant?
- Do they conform to any standard reference models
(e.g. HLA-CSPIF)?
15Research Challenges
- Service/Model Composition
- Checking semantic interoperability between
individual component simulation models from
different sources - Research Issues
- Can the output of one simulation model feed into
the input of another? - How is the work flow of the configuration
described? - What are the mechanisms for verifying the
correctness of the simulation?
16Research Challenges
- Security
- Simulation partners should be allowed to specify
selective access to their simulation models - Research Issues
- Does a user have access to a particular
simulation model or data? - Can a user selectively share sensitive data with
different partners? - Does the simulation model originate from a
trusted partner? - Must the model be executed on a particular
resource?
17Research Challenges
- Execution
- Simulation partners may obtain computing
resources from the Grid to supplement their needs - Research Issues
- How can the different simulation runs be
partitioned onto the available computing
resources? - What mechanisms should be used for scheduling and
load management of simulations on the Grid? - What kind of fault tolerance mechanisms are
required?
18Simulation Life Cycle
Semantic Interfaces
Resource Managemt
Workflow
Policies
19HLA-based Distributed Simulation
- Discovery and Composition of Models
- Management of Simulation Execution
20Grid Services and Service Discovery
- Query Index Service for RTI Service handle for
federation - Create RtiExec if necessary and get endpoint used
by RtiExec - Query Index Service for Federate Factory Service
handle - Create Federate Service and Federate Process
- Federate Processes join federation
21Grid Services and Service Discovery
- Query Index Service for Federate Factory Service
handle - Create Federate Service and Federate Process
- 4a.Federate Service can query Index Service for
RtiExec endpoint - 5. Federate Processes join federation
22Load Management System
- Use Grid software for
- Authentication,
- Resource Discovery, Allocation Monitoring, and
- Facilitating Federate Migration
23Load Management System
Resource Discovery Allocation Monitoring
Globus
Run Time Infrastructure
24Problems
- Developing a Grid-enabled, HLA-based simulation
requires a large effort - Check-pointing and state saving are application
dependent and are very difficult in general - Federate migration may require federation wide
synchronization an expensive operation - Messages may be delayed or lost in transit during
federate migration
25Objectives
- Develop a framework that allows modeler to
concentrate on the simulation - Provide an application-independent federate
execution model - Hide details of HLA/RTI interface and load
management realization from simulation designer - Make federate state saving easier and more
modular and simplify federate migration design - Achieve dynamic load balancing of HLA-based
distributed simulation over a Grid environment
26SimKernel
- Simulation code extended with two interfaces
- One for communicating with Runtime
Infrastructure (RTI) - One for communicating with Load Management
System (LMS)
27SimKernel
Design
Implementation
Execution
28Federate
- Each federate contains two threads (SimKernel)
and load management thread (LMClient) - SimKernel processes simulation events as defined
by the user and communicates with RTI - LMClient works with Load Manager (LM) to perform
federate migration - receive instruction from LM
- stop SimKernel
- get SimKernel execution state
- transfer SimKernel configuration and execution
state
29Load Manager
- Load Manager
- Constantly monitors and collects load information
of each individual participating computing node - Runs load balancing algorithm to determine which
federate should migrate from which host to which
destination - Communicates with the LMClients at both the
source and destination hosts until migration
succeeds
30Migration Approaches
- Federation wide synchronization
federate
federate
federate
Federation-Wide Save
Federate Migration
Federation-Wide Restore
Costly Operation!
31Migration Approaches
- Communication among federates
- Messages may be lost in transit during migration
publish
subscribe
msg
network
resign
join
subscribe
subscribe
unsubscribe
32Our Approach
- We developed an algorithm aiming to
- Provide transparent migration, and
- Minimize the migration overhead
- Run two instances of the migrating federate until
event integrity is ensured - No synchronization or FTP communication is
required - Implementation is specific to federates based on
SimKernel
33Federate Migration
migrating federate
resignFederationExec
sendOutgoingEvents
returnStatus
suspend
missingMsg
receivedInteraction
flushQueueRequest
receivedInteraction
collect
returnStatus
LMClient _at_source
Req_migrate
migrationSucceeded
notifyMissingMsg
returnInformation
returnInformation
requestInformation
RTI
Load Manager
pub/sub Interaction
flushQueueRequest
receivedInteraction
joinFederation
Req_migrate
getMsgCount
recvMsgCount
LMClient _at_destination
resume
restore
new
restarting federate
Latency period
34Experimental Results
35Grid Enabled HLA/RTI
Client 1
Client 1
Grid Network
Client n
Client n
Federation 1
Federation m
36Design
Grid Services indexing, discovery, resource
management, monitoring services
Grid Services
Globus
Proxy
Simulation Code
Proxies Federates
Grid-enabled API
HLA API
Grid-enabled HLA API
HLA API
Globus
RTI on LAN
Globus
Grid Network
Client
Resource
37Client Proxy Communication
Federate
Proxy
My FedAmb Notification sink
Proxy RTIamb Grid Service
Proxy Fedamb Notification
RTIamb call to Grid Service
Grid Network
38Proxy RTI Communication
Proxy
Proxy
Proxy RTIamb Grid Service
FedAmb
FedAmb
Proxy Fedamb Notification
RTIamb
RTIamb.
39Discussion
- Advantages
- Avoids firewall issues as client communicates
with proxy via grid services - Client application code can run on heterogenous
platforms - Provides easy migration of client code, proxy
does not need to be migrated - Disadvantages
- Overhead of communication as all simulation
events use grid services
40Conclusions
- Work Done
- Developed a simple prototype using Globus for
resource discovery, allocation and federate
deployment (DS-RT 02) - Developed SimKernel framework to allow modeler to
concentrate on the simulation, rather than
implementation (DS-RT 03) - Developed a federate migration protocol without
using federation synchronization (ICCS 04) - Developed Grid Service and Service Discovery
Framework (submitted to DS-RT 04)
41Conclusions
- Future Work
- Service/model discovery
- Service/model composition
- Grid workflow languages
- Grid enabled HLA/RTI
- Performance measurement
- Alternative communication mechanisms
- Migration and fault tolerance
- Integration of sub-projects
- Convert to GT4 (WS-RF)
42Thank you for your attention!
While the HLA defines a standard for the
construction of large-scale distributed
simulations, Grid technologies enable
collaboration and the use of distributed
computing resources, while also facilitating
access to geographically distributed data sets