Title: Efficient Optimistic Parallel Simulations Using Reverse Computation
1Efficient Optimistic Parallel Simulations Using
Reverse Computation
- Chris Carothers
- Department of Computer Science
- Rensselaer Polytechnic Institute
- Kalyan Permulla
- and
- Richard M. Fujimoto
- College of Computing
- Georgia Institute of Technology
2Why Parallel/Distributed Simulation?
- Goal speed up discrete-event simulation programs
using multiple processors - Enabling technology for
- intractable simulation models tractable
- off-line decision aides on-line aides
for time critical situation analysis - DPAT A distributed
simulation success story - simulation model of the National Airspace
- developed _at_ MITRE using Georgia Tech Time Warp
(GTW) - simulates 50,000 flights in lt 1 minute, which
use to take 1.5 hours. - web based user-interface
- to be used in the FAA Command Center for on-line
what if planning - Parallel/distributed simulation has the potential
to improve how what if planning strategies are
evaluated
3How to Synchronize Distributed Simulations?
parallel time-stepped simulation lock-step
execution
parallel discrete-event simulation must allow
for sparse, irregular event computations
barrier
Problem events arriving in the past
Solution Time Warp
Virtual Time
Virtual Time
PE 2
PE 3
PE 1
PE 2
PE 1
PE 3
processed event
straggler event
4Time Warp...
Local Control Mechanism error detection and
rollback
Global Control Mechanism compute Global Virtual
Time (GVT)
V i r t u a l T i m e
V i r t u a l T i m e
collect versions of state / events perform
I/O operations that are lt GVT
(1) undo state Ds (2) cancel sent events
GVT
LP 2
LP 3
LP 1
LP 2
LP 1
LP 3
unprocessed event
processed event
straggler event
committed event
5Challenge Efficient Implementation?
- Advantages
- automatically finds available parallelism
- makes development easier
- outperforms conservative schemes by a factor of N
- Disadvantages
- Large memory requirements to support rollback
operation - State-saving incurs high overheads for fine-grain
event computations - Time Warp is out of performance envelop for
many applications
Our Solution Reverse Computation
6Outline...
- Reverse Computation
- Example ATM Multiplexor
- Beneficial Application Properties
- Rules for Automation
- Reversible Random Number Generator
- Experimental Results
- Conclusions
- Future Work
7Our Solution Reverse Computation...
- Use Reverse Computation (RC)
- automatically generate reverse code from model
source - undo by executing reverse code
- Delivers better performance
- negligible overhead for forward computation
- significantly lower memory utilization
8Example ATM Multiplexor
Original
N
if( qlen lt B ) qlen delaysqlen else lost
B
on cell arrival...
9Gains.
- State size reduction
- from B2 words to 1 word
- e.g. B100 gt 100x reduction!
- Negligible overhead in forward computation
- removed from forward computation
- moved to rollback phase
- Result
- significant increase in speed
- significant decrease in memory
- How?...
10Beneficial Application Properties
- 1. Majority of operations are constructive
- e.g., , --, etc.
- 2. Size of control state lt size of data state
- e.g., size of b1 lt size of qlen, sent, lost, etc.
- 3. Perfectly reversible high-level operations
- gleaned from irreversible smaller operations
- e.g., random number generation
11Rules for Automation...
Generation rules, and upper-bounds on bit
requirements for various statement types
12Destructive Assignment...
- Destructive assignment (DA)
- examples x y x y
- requires all modified bytes to be saved
- Caveat
- reversing technique for DAs can degenerate to
traditional incremental state saving - Good news
- certain collections of DAs are perfectly
reversible! - queueing network models contain collections of
easily/perfectly reversible DAs - queue handling (swap, shift, tree insert/delete,
) - statistics collection (increment, decrement, )
- random number generation (reversible RNGs)
13Reversing an RNG?
double RNGGenVal(Generator g) long k,s
double u u 0.0 s Cg 0g k s
/ 46693 s 45991 (s - k 46693) - k
25884 if (s lt 0) s s 2147483647
Cg 0g s u u 4.65661287524579692e-10
s s Cg 1g k s / 10339 s
207707 (s - k 10339) - k 870 if (s lt
0) s s 2147483543 Cg 1g s u
u - 4.65661310075985993e-10 s if (u lt 0)
u u 1.0
s Cg 2g k s / 15499 s
138556 (s - k 15499) - k 3979 if (s lt
0.0) s s 2147483423 Cg 2g s
u u 4.65661336096842131e-10 s if (u gt
1.0) u u - 1.0 s Cg 3g k s /
43218 s 49689 (s - k 43218) - k
24121 if (s lt 0) s s 2147483323
Cg 3g s u u - 4.65661357780891134e-10
s if (u lt 0) u u 1.0 return
(u)
Observation k s / 46693 is a Destructive
AssignmentResult RC degrades to classic
state-savingcan we do better?
14RNGs A Higher Level View
The previous RNG is based on the following
recurrence. xi,n aixi,n-1 mod mi where xi,n
one of the four seed values in the Nth set, mi is
one the four largest primes less than 231, and ai
is a primitive root of mi. Now, the above
recurrence is in fact reversible. inverse of ai
modulo mi is defined, bi aimi-2 mod mi Using
bi, we can generate the reverse recurrence as
follows xi,n-1 bixi,n mod mi
15Reverse Code Efficiency...
- Future RNGs may result in even greater savings.
- Consider the MT19937 Generator...
- Has a period of 219937
- Uses 2496 bytes for a single generator
- Property...
- Non-reversibility of indvidual steps DO NOT imply
that the computation as a whole is not
reversible. - Can we automatically find this higher-level
reversibility? - Other Reversible Structures Include...
- Circular shift operation
- Insertion deletion operations on trees (i.e.,
priority queues).
Reverse computation is well-suited for queuing
network models!
16Performance Study
17Why the large increase in parallel performance?
million events/second
18Cache Performance...
- Faults TLB P cache S
cache - SS 12pe 43966018 1283032615
162449694 - RC 12pe 11595326 590555715 94771426
19Related Work...
- Reverse computation used in
- low power processors, debugging, garbage
collection, database recovery, reliability, etc. - All previous work either
- prohibit irreversible constructs, or
- use copy-on-write implementation for every
modification(correspond to incremental state
saving) - Many operate at coarse, virtual page-level
20Contributions
- We identify that
- RC makes Time Warp usable for fine-grain models!
- disproved previous beliefthat fine grain models
cant be optimistically simulated efficiently - less memory consumption, more speed, without
extra user effort - RC generalizes state saving
- e.g., incremental state saving, copy state saving
- For certain data types, RC is more memory
efficient than SS - e.g., priority queues
21Future Work
- Develop state minimization algorithms, by
- State compressionbit size for reversibility lt
bit size of data variables - State reusesame state bits for different
statements - based on liveness, analogous to register
allocation - Complete RC automation algorithm designavoiding
the straightforward incremental state saving
approach - Lossy integer and floating point arithmetic
- Jump statements
- Recursive functions
22Geronimo! System Architecture
High Performance Simulation Application
Geronimo
distributed compute server
rack-mounted CPUs (not in demonstration)
multiprocessor
Geronimo Features (1) risky or speculative
processing of object computations, (2) reverse
computation to support undo operation, (3)
Active Code in a combination, heterogeneous,
shared-memory, message passing environment...
23Geronimo! Risky Processing...
- Execution Framework
- Objects
- schedule Threads / Tasks
- at some virtual time
- Applications
- discrete-event simulations
- scientific computing applications
processed thread
CAVEAT Good performance relies on cost of
recovery probability of failure being less than
cost of being safe!
straggler thread
unprocessed thread
24Geronimo! Efficient Undo
- Traditional approach State Saving
- save byte-copies of modified items
- high overhead for fine-granularity computations
- memory utilization is large
- need alternative for large-scale, fine-grain
simulations
- Our approach Reverse Computation
- automatically generate reverse code from model
source - utilize reverse code to do rollback
- negligible overhead for forward computation
- significantly lower memory utilization
- joint with Kalyan Perumalla and Richard Fujimoto
Observation reverse computation treats code
asstate. This results in a code-state
duality.Can we generalize notion?..
25Geronimo! Active Code
- Key idea allow object methods/code to be
dynamically changed during run-time. - objects can schedule in the future a new method
or re-define old methods of other objects and
themselves. - objects can erase/delete methods on themselves or
other objects. - new methods can contain Active Code which can
re-specialize itself or other objects. - work in a heterogeneous environment.
- How is this useful?
- increase performance by allowing the program to
consistently execute the common case fast. - adaptive, perturbation-free, monitoring of
distributed systems. - potential for increasing a languages
expressive power. - Our approach?
- Javano, need higher performancemaybe used in
the future... - special compilerno, cant keep up with changes
to microprocessors.
26Geronimo! Active Code Implementation
- Runtime infrastructure
- modifies source code tree
- start a rebuild of the executable on a another
existing machine - uses a systems naïve compiler
- Re-exec system call
- reloads only the new text or code segment of new
executable - fix-up old stack to reflect new code changes
- fix-up pointers to functions
- will run in user-space for portability across
platforms - Language preprocessor
- instruments code to support stack and function
pointer fix-up - instruments code to support stack reconstruction
and re-start process
27Research Issues
- Software architecture for the heterogeneous,
shared-memory, message passing environment. - Development of distributed algorithms that are
fully optimized for this combination
environment. - What language to use for development, C or C or
both? - Geronimo! API.
- Active Code Language and Systems Support.
- Mapping relevant application types to this
framework
Homework Problem Can you find specific
applications/problems where we can apply
Geronimo!?