Title: SPEEDES Session Fall SIW 1999
1SPEEDES SessionFall SIW 1999
- High-Performance Computing Division
- Metron Incorporated
- Manager Dr. Jeffrey S. Steinman
- Senior Software Analyst Dr. Ron Van Iwaarden
- September 16, 1999
2Topics
- Introduction to SPEEDES
- SPEEDES Communications Library
- Persistence and Checkpoint/Restart
- Data Distribution Management
- HPC-RTI
3- 1. Introduction to SPEEDES
4Historical Perspective
1. Late 1980s
2. Early 1990s
3. Late 1990s
4. 2000...
Other RTIs
SIMNET
ALSP
TWOS
WG2K JSIMS EADTB
5SPEEDES Project Time Line
6SPEEDES Project
- Background
- Developed by NASA under DoD contracts by the Jet
Propulsion Laboratory in 1990 - Strategic, Air, and Ballistic Missile Defense
Organizations - Government-owned and patented software licensed
by NASA, maintained and distributed by Metron - PDES Users Group - Configuration Management
Board - Joint Simulation System (JSIMS)
- Wargame 2000 through the Joint National Test
Facility (JNTF) - Space and Naval Warfare Systems Command (SPAWAR)
- New Member Extended Air Defense Test Bed (EADTB)
- New Member Joint Modeling and Simulation System
(JMASS)
7The SPEEDES Ten Commandments
The SPEEDES Ten Commandments 1. Thou shalt
execute on all platforms and operating systems 2.
Thou shalt be optimizable for all communication
architectures 3. Thou shalt compile without
warnings on all C compilers 4. Thou shalt
completely scale with low overheads 5. Thou shalt
provide logically correct time management 6. Thou
shalt support unconstrained object
interactions 7. Thou shalt allow interactions
with external systems 8. Thou shalt provide fault
tolerance 9. Thou shalt have powerful, yet easy
to use, modeling constructs 10. Thou shalt permit
interoperability within SPEEDES and HLA
8SPEEDES Architecture
NSS, JSIMS, EADTB, WG2K
HLA Run-Time Infrastructure Distributed
Simulation Management Services
SPEEDES Modeling Framework (Events, Processes,
Event Handlers, Components, Object Proxies, DDM,
Clusters, Persistence, Utilities)
SPEEDES Event-Processing Engine (Event List
Management, State-Saving, Rollbacks, Message
Handling)
9Network Connectivity(an Example)
10SPEEDES Interoperability
High Performance Computer
11Distribution of SPEEDES Software
SPEEDES
SPEEDES
Version 0.8
Synchronous Parallel Environment for
Emulation and Discrete-Event Simulation
High Performance Computing Division Metron
Incorporated SPEEDES Software Development Team
Jeff Steinman Jim Brutocao Jacob Burckhardt Ron
Van Iwaarden Gary Blank
Kurt Stadsklev Scott Shupe Tuan Tran Mitch
Peckham Guy Berliner
Software downloadable from the SPEEDES
website www.ca.metsci.com/speedes/
Software Licensing by NASA (818)
354-7770 Distribution by Metron Incorporated
(619) 792-8904 SPEEDES Version 0.8, September 10,
1999 PDES Users Group JNTF, SPAWAR,
JSIMS Government Sponsors HPCMO/CHSSI, BMDO,
NRL, EADTB, DMSO
12- 2. SPEEDES Communications Library
13The SpeedesComm Library
- Services
- Heterogeneous data representation
- Performance results
14Services Provided by the SpeedesComm Lib.
- General operations
- Starting up the communications
- int SpComm_StartUp(int nLoc, int nTot, int group,
char miscData) - If nTot is 0, nTot is set equal to nLoc. Group
is an additional integer that can separate
SpeedesComm executables using the same
SpeedesComm interface at the same time - miscData can be used to pass any additional
required startup information - Obtaining node information
- int SpComm_GetNumNodes()
- int SpComm_GetNodeId()
15Services Provided by the SpeedesComm Lib.
- Global operations
- Synchronizations
- void SpComm_BarrierSync()
- void SpComm_EnterFuzzyBarrier()
- int SpComm_ExitFuzzyBarrier()
- Global Sums
- int SpComm_GlobalSum(int value)
- double SpComm_GlobalSum(double value)
16Services Provided by the SpeedesComm Lib.
- Global operations Continued
- Global Minimums
- int SpComm_GlobalMin(int value)
- double SpComm_GlobalMin(double value)
- SIMTIME SpComm_GlobalMin(SIMTIME Time)
- Global Maximums
- int SpComm_GlobalMax(int value)
- double SpComm_GlobalMax(double value)
- SIMTIME SpComm_GlobalMax(SIMTIME Time)
17Services Provided by the SpeedesComm Lib.
- Asynchronous message passing
- Message types (values can range from 0-255)
- define N_MESSAGE_TYPES 256
- Destination types
- Unicast (i.e., a node number)
- Multicast (subset of all nodes)
- Broadcast (no destination provided)
18Services Provided by the SpeedesComm Lib.
- Asynchronous message passing continued
- Sending messages uses overloaded functions
- void SpComm_Send(int Destination, int Type, int
Nbytes, void Buff) - void SpComm_Send(DESTINATION Destination, int
Type, int Nbytes, void Buff) - void SpComm_Send(int Type, int Nbytes, void
Buff) - Receiving messages (Array of message queues holds
unread messages) - void SpComm_Receive()
- void SpComm_Receive(int Type, int Nbytes)
- void SpComm_GetPendingMessage(int Type, int
Nbytes)
19Services Provided by the SpeedesComm Lib.
- SIMTIME is a generalization of time to support
parallel discrete event simulations - Includes a double representation of time as well
as four tie breaking fields - DESTINATION is another class to support
multi-cast messages - Flat class (no pointers)
- Supports at least three methods
- int GetFirstNode() //Returns -1 on failure
- int GetNextNode() //Returns -1 on failure
- void SetNode (int n)
20Services Provided by the SpeedesComm Lib.
- Coordinated message passing
- Step 1 Nodes pass all outgoing messages to
blocking send routines - Step 2 Nodes receive messages until NULL value
returned - NULL value only returned when node has received
all of its messages - Guarantees no coordinated messages from other
nodes are in transit
21Services Provided by the SpeedesComm Lib.
- Coordinated message passing API
- Sending messages (node-to-node, destination-based
multicast, broadcast) - void SpComm_BlockingSend(int Destination, int
Nbytes, void Buff) - void SpComm_BlockingSend(DESTINATION
Destination, int Nbytes, void Buff) - void SpComm_BlockingSend(int Nbytes, void Buff)
- Receiving messages (Single message queue holds
unread messages) - void SpComm_CoordinatedReceive(int Nbytes)
22Heterogeneous Data Representations
- NET_INT and NET_FLOAT as C objects
- Eliminates the need for packing and unpacking
data in messages - Operator overloading used to hide conversions
- Operate as integers and floats in normal use
- Assignments work in normal representation
- Accessors convert on first access if necessary
- Will guarantee 8-byte alignment
- Access is slower than normal integers and doubles
- Modern multi-pipelined, branch predicting CPUs
will optimize this quite well - Users can also attempt to minimize number of
accesses of NET types in messages
23Communications Performance
- Several systems were used for benchmarking
- System of 8 dual Pentium Pro 200 Linux machines
connected with 10base-t ethernet - 20 processor SGI Power Challenge (195mhz R8000
chips) - 64 processor SGI Origin 2000 (250mhz R12000
chips)
24Communications Performance
- TCP/IP performance with the Linux network
25Communications Performance
- Shared memory performance using the Origin 2000
26Communications Performance
- Reduction time 62 nodes of the Origin 2000
27Conclusions
- SpeedesComm is a reusable parallel communications
library that is suitable for most parallel
applications - The shared memory implementation provides high
performance - TCP/IP links high performance computers,
workstations, and PCs in a network - Runs under System V UNIX and Windows NT
28- 3. Persistence and Checkpoint/Restart
29Overview
- What is persistence memory management
- Basic implementation
- How to make a SPEEDES simulation checkpoint
restartable - Performance results, techniques, and areas for
future research
30What is Persistence Memory Management
- Objects are then recreated and pointers are
updated
- Objects exist with pointers to other objects
Object A
Object D
Object B
Object B
Object C
Object A
Object C
Object D
31Basic Implementation
- Macro rather than template based for portability
- Database records memory ranges rather than
actually storing copies of the objects - Pointers are attached indicating that they need
to be restored on update - Virtual function table pointers are restored for
C objects - At any time, the entire database can be stored as
a buffer, compressed, and then written to disk
for later reconstruction of the objects
32Basic Implementation
Object A
Object D
Object B
Object C
33Basic Implementation for Checkpoint/Restart
- Only entity state data and event messages are
stored - Persistence is automatically integrated with
rollbackable datatypes in SPEEDES - Dynamic memory creation/deletion
- Smart pointers
- Container classes
- Object proxies
- Event messages can also have persistent pointers
34Main Rules for Checkpoint/Restart
- Register classes that are dynamically created
during initialization - Always use RB_NEW and RB_DELETE
- Always use rollbackable pointers
- All classes that have virtual functions must
inherit from SpPersistenceBaseClass - Never use reference data members
- Static data members must be reinitialized in
constructors - Provide 0 argument constructors for dynamically
created objects
35Performance Results for Initial Release
- Two demos with low event granularities have been
tested - A queuing network demo saw a 60 reduction in
processing speed - A regression test that exercises many of the data
structures and object proxies saw a 66 reduction
in speed - Primary overheads
- Adding/removing messages from database
- Free lists could greatly improve performance
- Users should avoid many adds/deletes and use free
lists whenever possible
36Conclusions
- Simple interface for enabling persistence memory
management - Portable C implementation
- Standalone GOTS product that can be reused in any
C program
37- 4. Data Distribution Management
38Need for DDM
- DDM is needed to limit distribution of object
proxies - Scalability required for memory, messages,
computations - Number of entities, sensors, nodes
- WG2K, JSIMS, EADTB experienced scalability
problems without DDM - Limit without DDM is about 1,000 entities
- Can currently support 1,000,000 entities with DDM
- HLA Routing Spaces
- Used as foundation for SPEEDES DDM
- Extended to support enumerations and categories
- Range-based Filtering
- The most common dynamically changing filter
requirement - Routing space extensions for geographical
theaters - Range solver determines exactly when targets
enter and exit field of view
39Distributed Routing Spaces
- Routing spaces are distributed to provide
scalability - Reduces bottlenecks for grid overlap computations
- Can control which dimensions are used for
distributing regions - Dont care dimensions should not be distributed
- Routing spaces support multiple resolutions
- Hierarchical grids are used to support arbitrary
sets of resolutions for each dimension
40Overall Coordination of DDM in SPEEDES
Distributed Hierarchical Grids
41Object Proxies and HLA Services
- Object Management
- Discover/Remove Objects
- Update/Reflect Attributes
- Dynamic Attributes
- Ownership Management
- Two-Way Proxies
- Declaration Management
- Class-based Subscription
- Data Distribution Management
- Range-based filtering
- Routing Spaces
42Decomposition of Space into HiGrids
43Conceptual Diagram of Routing Spaces
Universe
Space Dimension
44Geographical Filtering Based on Range
45Problem Decomposing Latitude/Longitude
North Pole
Equator
46Latitude Bands for Equal Area Grids
North Pole
North Pole
df0
Ri
df1
fi
df2
df3
Equator
Equator
df4
df5
df6
df7
Figure (a)
Figure (b)
47Longitude Cells for Equal Area Grids
48Angular Cones Used for Grid Lookups
49Example of a Three Dimensional HiGrid
50The X-SubTree
51The Y-SubTree
52The Z-SubTree
53SPEEDES Components Implement DDM
S_SpSimObj
54Test Scenario
- 1,000,000 entities randomly moving about the
globe - Great Circle trajectories between way-points
- Each entity has one radar sensor with 100 km
range - Routing space
- One THEATER dimension covers entire globe
- Lat/Lon regions distributed, (not Altitude),
Multiple Resolutions - Regions automatically coordinated using (Position
and Max Velocity) - One ENUM dimension with five enumerations
- Distributed
- Publishers and Subscribers randomly select one
value - One DIMENSION standard HLA dimension
- Not distributed, several resolutions
- Experiment
- Establish filter to average one detection per
entity
55Proxy Distribution Statistics
56Wall Clock Vs. Number of Nodes
57Memory Usage Vs. Number of Nodes
58Maximum Speedup Vs. Number of Nodes
59Number of Events Processed by Type
60Total Processing Time by Event Type
61Average Processing Time by Event Type
62Next Steps for DDM
- Reduction in Overheads
- Several events appear to have excessive overhead
and should be optimized - Support destination-based multicasting for
scheduling events - Refactor event queue to more efficiently support
direct retraction - Several Rollback Reduction Optimizations
- Query-Reply optimization will prohibit an object
from processing events beyond the time tag of
events it is expecting to receive from other
objects - Automated lazy cancellation reprocess events when
possible - Event Reparation will allow events to fix
themselves to minimize the effects of straggler
messages - Further Testing
- Attribute updates
- Scalability test for Magnet drawing objects
together - Apply DDM for Interactions
63Conclusions
- SPEEDES DDM has achieved its initial goals
- Scalability
- Parallel performance
- Memory
- Messages
- Support for multiple resolution filtering
- Automated range-based filtering
- Time Management guarantees repeatable results
- DDM is compatible with HLA
- DDM in SPEEDES applications will work
transparently with HLA - SPEEDES DDM provides HLA DDM in the HPC-RTI
- 1,000,000 entities!
64 65SPEEDES HLA
- SPEEDES will support three HLA Interoperability
strategies - I. HLA Gateway
- Connects SPEEDES to another federation using any
RTI - II. External HLA RTI interfaces
- Connects HLA federates to SPEEDES using standard
interfaces - III. Direct HLA RTI interfaces
- Provides an RTI for HLA federates on
high-performance computers - Features
- Portable across all computing platforms
- Rigorous time management for all services
- Programmable translations between SOMs and FOMs
66I. HLA Gateway
- SPEEDES as a federate in an HLA Federation
SOM/FOM File-Driven Translator
SOM/FOM Programmable Translator
SPEEDES
67II. External HLA Interfaces
- HLA Interfaces provided to external modules
Federate
Simulation
External HLA I/F
FedStateMgr
SOM/FOM Programmable Translator
Host Router
SPEEDES
68III. Internal HLA Interfaces
- Federate as a SPEEDES Node
Federate
FedStateMgr
Simulation
Node
Direct HLA I/F
SOM/FOM Programmable Translator
SPEEDES
69HLA Objects
Federate
70Current Status of RTI Development
- Federation Management
- Create federation, join federation
- Declaration Management
- Subscribe object class (publication not needed)
- Subscribe interaction (publication not needed)
- Object Management
- Register object, update attributes (put on hold
for now) - Discover object, reflect attributes (testing
using SPEEDES simulation) - Send/Receive Interaction
- Time Management
- Next event request, time advance grant (other
services are easier) - All activities between federate and RTI are
coordinated in time - All internal activities inside RTI are
coordinated in time
71RTI Ambassador Progress
72RTI Ambassador Progress
73Fed Ambassador Progress
74Schedule
- Phase 1
- Will be released in SPEEDES Version 0.9 (March,
2000) - External federate capability will be available
prior to 0.9 in minor release - Flexible federate translations between SOM FOM
- Everything but OM DDM
- Phase 2
- Will be released in SPEEDES Version 1.0
(November, 2000) - Will provide updates in minor releases
- Complete HLA interface specification
75Conclusions
- SPEEDES-based HPC-RTI focus
- Performance using high performance computing
resources - Executes on all HPC architectures
- Current measurements predict high performance
- Automatic logical time management across all HLA
services - Integrates seamlessly with real-time
- Interoperability
- Between SPEEDES applications (clusters)
- HLA Federates (HPC-RTI)
- Federations (gateway)
- FOM/SOM agility through programmable and
file-driven translation