Title: A Framework for Collaborative Distributed Simulation over the Grid
1A Framework for Collaborative Distributed
Simulation over the Grid
- Stephen John Turner
- Parallel Distributed Computing Centre
- Nanyang Technological University
- Singapore
2Outline
- Background
- Distributed Simulation
- Grid Computing
- Motivation
- Research Challenges
- HLA-based Distributed Simulation
- Grid Services and Service Discovery
- Load Management System
- Grid Enabled HLA/RTI
- Conclusions
3Distributed Simulation
- Provides a way of linking simulation components
(federates) of various types at possibly
different locations to create a common virtual
environment (federation)
4Example Application Areas
- Battlefield Simulation
- Linking different types of forces at multiple
physical locations to create a realistic and
complex virtual world - Supply Chain Simulation
- Managing material and information flow, from
manufacturers through distributors to customers - Air Traffic Control
- Simulating airports and airspace sectors to
provide faster than real-time simulation for
what-if analysis - Multi-player Internet Games
- Involving massive multi-player (10,000) virtual
world
5High Level Architecture
6High Level Architecture
- Features of High Level Architecture
- Each federate has a simulation object model (SOM)
defining the data to be shared with other
federates allowing reuse in different federations - The federation (set of federates) has a common
federation object model (FOM) - HLA supports distributed simulations linking the
federates of a federation over a LAN or the
Internet - Time Management can be used to ensure the correct
ordering of events - HLA is an IEEE (1516) and OMG standard
7Grid Computing
- Grid technology is the next step in the evolution
of computing, enabling new forms of collaboration
through the seamless sharing of distributed
computing and data resources
Communities can share geographically distributed
resources for their common purpose
8Grid Computing
Web Services Grid Services OGSA OGSI Globus
Toolkit
9Motivation
- Collaborative Simulation Development
- The development of complex simulations usually
requires collaborative effort from analysts with
different domain knowledge and expertise,
possibly at different locations - Sharing of Computing Resources
- Simulation systems often require huge computing
resources and the participants in the simulation
and/or data sets required may also be
geographically distributed
10Motivation
- HLA-based Distributed Simulation on the Grid
- HLA defines a standard for reuse and
interoperability - Grid technologies enable collaboration and the
use of distributed computing resources
- Collaborative
- Distributed
- Complex Multi-dimensional
11Simulation Life Cycle
12Research Challenges
- Service/Model Discovery
- Based on requirements, suitable component
models are selected to form an overall simulation - Research Issues
- How are simulation models registered as grid
services - How are simulation models discovered?
- How are the interfaces defined?
- Are the simulation models HLA compliant?
- Do they conform to any standard reference models
(e.g. HLA-CSPIF)?
13Research Challenges
- Service/Model Composition
- Checking semantic interoperability between
individual component simulation models from
different sources - Research Issues
- Can the output of one simulation model feed into
the input of another? - How is the work flow of the configuration
described? - What are the mechanisms for verifying the
correctness of the simulation?
14Research Challenges
- Security
- Simulation partners should be allowed to specify
selective access to their simulation models - Research Issues
- Does a user have access to a particular
simulation model or data? - Can a user selectively share sensitive data with
different partners? - Does the simulation model originate from a
trusted partner? - Must the model be executed on a particular
resource?
15Research Challenges
- Execution
- Simulation partners may obtain computing
resources from the Grid to supplement their needs - Research Issues
- How can the different simulation runs be
partitioned onto the available computing
resources? - What mechanisms should be used for scheduling and
load management of simulations on the Grid? - What kind of fault tolerance mechanisms are
required?
16Simulation Life Cycle
Semantic Interfaces
Resource Managemt
Workflow
Policies
17HLA-based Distributed Simulation
- Discovery and Composition of Models
- Management of Simulation Execution
18Grid Services and Service Discovery
- Query Index Service for RTI Service handle for
federation - Create RtiExec if necessary and get endpoint used
by RtiExec - Query Index Service for Federate Factory Service
handle - Create Federate Service and Federate Process
- Federate Processes join federation
19Grid Services and Service Discovery
- Query Index Service for Federate Factory Service
handle - Create Federate Service and Federate Process
- 4a.Federate Service can query Index Service for
RtiExec endpoint - 5. Federate Processes join federation
20Load Management System
- Use Grid software for
- Authentication,
- Resource Discovery, Allocation Monitoring, and
- Facilitating Federate Migration
21Load Management System
Resource Discovery Allocation Monitoring
Globus
Run Time Infrastructure
22SimKernel
- Simulation code extended with two interfaces
- One for communicating with Runtime
Infrastructure (RTI) - One for communicating with Load Management
System (LMS)
23SimKernel
Design
Implementation
Execution
24Federate
- Each federate contains two threads (SimKernel)
and load management thread (LMClient) - SimKernel processes simulation events as defined
by the user and communicates with RTI - LMClient works with Load Manager (LM) to perform
federate migration - receive instruction from LM
- stop SimKernel
- get SimKernel execution state
- transfer SimKernel configuration and execution
state
25Load Manager
- Load Manager
- Constantly monitors and collects load information
of each individual participating computing node - Runs load balancing algorithm to determine which
federate should migrate from which host to which
destination - Communicates with the LMClients at both the
source and destination hosts until migration
succeeds
26Migration Approaches
- Federation wide synchronization
federate
federate
federate
Federation-Wide Save
Federate Migration
Federation-Wide Restore
Costly Operation!
27Migration Approaches
- Communication among federates
- Messages may be lost in transit during migration
publish
subscribe
msg
network
resign
join
subscribe
subscribe
unsubscribe
28Our Approach
- We developed an algorithm aiming to
- Provide transparent migration, and
- Minimize the migration overhead
- Run two instances of the migrating federate until
event integrity is ensured - No synchronization or FTP communication is
required - Implementation is specific to federates based on
SimKernel
29Federate Migration
migrating federate
resignFederationExec
sendOutgoingEvents
returnStatus
suspend
missingMsg
receivedInteraction
flushQueueRequest
receivedInteraction
collect
returnStatus
LMClient _at_source
Req_migrate
migrationSucceeded
notifyMissingMsg
returnInformation
returnInformation
requestInformation
RTI
Load Manager
pub/sub Interaction
flushQueueRequest
receivedInteraction
joinFederation
Req_migrate
getMsgCount
recvMsgCount
LMClient _at_destination
resume
restore
new
restarting federate
Latency period
30Experimental Results
31Grid Enabled HLA/RTI
Client 1
Client 1
Grid Network
Client n
Client n
Federation 1
Federation m
32Design
Grid Services indexing, discovery, resource
management, monitoring services
Grid Services
Globus
Proxy
Simulation Code
Proxies Federates
Grid-enabled API
HLA API
Grid-enabled HLA API
HLA API
Globus
RTI on LAN
Globus
Grid Network
Client
Resource
33Discussion
- Advantages
- Avoids some firewall issues as client
communicates with proxy via grid services - Client application code can run on heterogenous
platforms - Provides easy migration of client code, proxy
does not need to be migrated - Disadvantages
- Overhead of communication as all simulation
events use grid services
34Conclusions
- Work Done
- Developed a simple prototype using Globus for
resource discovery, allocation and federate
deployment (DS-RT 02) - Developed SimKernel framework to allow modeler to
concentrate on the simulation, rather than
implementation (DS-RT 03) - Developed a federate migration protocol without
using federation synchronization (ICCS 04) - Developed Grid Service and Service Discovery
Framework (submitted to DS-RT 04)
35Conclusions
- Future Work
- Service/model discovery
- Service/model composition
- Grid workflow languages
- Grid enabled HLA/RTI
- Performance measurement
- Alternative communication mechanisms
- Migration and fault tolerance
- Integration of sub-projects
- Convert to GT4 (WS-RF)
36Thank you for your attention!
While the HLA defines a standard for the
construction of large-scale distributed
simulations, Grid technologies enable
collaboration and the use of distributed
computing resources, while also facilitating
access to geographically distributed data sets