Title: Techniques for Measuring and Modeling Federation Performance
1Techniques for Measuring andModeling Federation
Performance
Steven B. Boswell, Ph.D. Duncan C. Miller,
Sc.D. 01E-SIW-063
This work was conducted at MIT Lincoln Laboratory
with the sponsorship of DMSO, under Air Force
Contract F-19628-00-C-0002
2001 European Simulation Interoperability Workshop
2Background
- Modeling the performance of large distributed
systems is a long-standing problem - Symptoms of system overloads often appear at
points far from the sources of these overloads - Many SIMNET/DIS applications experience
significant overload problems in large exercises - Typical solutions involve incorporating specific
geographical knowledge into the network - The HLA rejected this approach to avoid
application-specific infrastructure modifications - Federation developers need tools and techniques
for predicting and avoiding overload problems
3Caveats and Clarifications
- These techniques cannot predict a priori the
performance of a Federation whose component
behaviors are unknown - This is an iterative process
- Start with whatever performance data exist from
this or a similar Federation - Use logger data collected during Federation
integration - Instrument Federates that appear to be the
primary factors in determining Federation
performance - Incorporate any specific knowledge gained about
Federation behavior - Use existing tools and data wherever feasible
- Note that techniques are not specific to HLA
Federations
4Outline
- Transaction-based model of Federation
performance(see 01S-SIW-070) - Transaction categories and processing loads
- Simulation-to-real time ratio as a Figure of
Merit - Linearized model of changes in Federation
performance - Tools and techniques for measuring transaction
loads on Federates - Perturbing transaction rates by recording and
playback - Inferring loads through statistical regression
- Case study of a hypothetical Federation
5Transaction-based Model of Federation Performance
- Each Federate is responsible for processing a set
of transactions (object attribute updates,
interactions, etc.) generated by other Federates,
as well as its own (internal) processing load. - Each Federates processing of these transactions
results in the generation of other transactions,
which require processing by other Federates. - Transactions can be characterized in terms of
average frequency, size, direct processing
required, internal processing generated, etc. - Transactions can be traced to determine their
initiators and responders.
6Transaction Categories andProcessing Loads
- Transactions with similar statistical
characteristics (average frequency, size,
processing, etc.) can be aggregated into
categories for modeling purposes. - The incremental processing load (as a fraction of
total processing capacity) on Federate j
resulting from one additional transaction of
category X per second of real time can be
represented as LXj - A linear approximation of the processing load (as
a fraction of total processing capacity) on
Federate j is its baseline load plus this
incremental load
Lj L0j LXj
7Time Ratio as a Figure of Merit
- For any Federation, the average ratio of
simulated time to real time (Rsr) at which the
Federation can operate is limited by the
effective processing capacity of its most
congested Federate, because of - For a real-time Federation, Rsr gt 1 (1.5 is
better) - For a time-managed Federation, Rsr is a figure
of merit for predicting the possible number of
runs per unit time
8Modeling Changes in Processing Loads
- Under steady state operating conditions, a
linearized approximation of the total processing
load on each Federate j from all sources can be
represented as
Lj L0j SX Si Rsr TXij LXj
Units Lj total fraction of processing
capacity used L0j baseline fraction of
processing capacity used Rsr ratio of
simulated time to real time TXij
transactions per second of simulated time LXj
incremental load of one additional
transaction per second of real time
9Identifying Bandwidth Limitations
- Having determined the Transaction Matrix TXij, we
can compute the total incoming and outgoing
bandwidth for each Federate
Bj(in) SX Si Rsr TXij SX Bj(out) SX Si
Rsr TXji SX (note transpose)
where Sx is the average size of a transaction of
category X (in bits)
Bj(in) Bj(out) lt Cj
So
where Cj is the total capacity of the tail
circuit to j
10Transaction Tagging and Tracing
5
6
4
3
To 6From 1
To 6From 3
2
To 6From 2
1
Start sample 133000End sample 134000
Ten-minute sample of transactions arriving at
Federate 6 during steady-state operation
11Estimating Incremental Processing Loads (Step 1)
5
j
4
3
2
1
Capture time-ordered sample of transactions
arriving at Federate j during nominal Federation
operation
12Estimating Incremental Processing Loads (Step 2)
Added transactions?
5
j
4
3
removed
2
1
- Isolate Federate j
- Remove known fraction of type X transactions
- Measure change in processing load on j
13Estimating Incremental Processing Loads (Step 3)
Replayedtransactions
System statistics
Use available system statistics, where available
(e.g., proctool for Solaris) cheap,
accurate,but OS-specific Use cycle sponge (or
similar approach) to infer effects of changes in
transaction rates
14Techniques for VaryingTransaction Rates
- Straightforward problem if scenario and message
semantics are insensitive to transaction details.
Can be difficult if interactions are complex . - Logging transactions and filtering on playback
allow perturbation by subtraction - Proxy Federates allow perturbation by addition
- (In either case, must avoid semantic faults and
scenario inconsistencies that may cause
anomalous Federation behavior) - Run real or proxy Federates with controlled
settings - Use regression techniques that exploit known rate
variations within logged data
15Battlefield Communication NetworkTactical
Engagement SimulationRadio Propagation Model
Giovanelli Multiple Knife Edge Diffraction
Lees Approximation to Fresnel Integral
Reflection Loss g (q, e, s terrain profile)
Propagation Loss R-4 (low antennas)
16Communications Server FederateCalculates
Effective Connectivity
ModSAF(modified to discard unreceived messages)
17Case Study Federation Design Scaling BCN to
theater-level (see paper)
- How will BCN interactions change workloads of
ModSAF Federates? - How many entities canModSAF support?
ModSAF
ModSAF
ModSAF
ModSAF
RTI
RTI
RTI
RTI
- How many BCN federates are required for 20,000
ModSAF entities, with 4,000 SINCGARS radios? - Will more efficient BCN algorithms be required?
18Baseline BCN Federate
(Loosely coupled minimal change in loads on
ModSAF federates)
- Use current BCN federate as performance baseline
while newversion is being developed - Log and replay typical transactions, with
perturbations - Measure changes in load for different transaction
rates
Replayedtransactions
Existing BCN Federate
Transactioncounts
CPU load measurement
19Measure ModSAF Federates
(Tightly coupled small perturbations can
produce large changes)
Stand-in BCN Federate e.g., AEgis
FedProxyusing a crude radio propagation model
with realistic response times
- Conduct realistic mini-exercise, log data and
scale up transaction rates
BCN
20Off-line Modeling of New Federate
- Use VTC HLAResults or MAK DataLogger to drive
new BCN Federate - Selectively vary transaction rates to measure
incremental loads
Replayedtransactions
Replayedtransactions
New BCN Federate
Transactioncounts
CPU load measurement
21Use Predicted Loads to Estimate Computing
Requirements
45 ModSAF Federates
4 BCN Federates
22Summary and Philosophy
- A crude performance model is better than no
model. - It provides a testing framework and diagnostic
benchmarks. - Start early, measure often
- Use logged transaction files from a similar
Federation as a starting point for performance
modeling - Conduct sensitivity analyses using parameterized
models for Federates where no credible data
exists - Focus on results that are important and/or
surprising - Often, 10-20 of transactions represent 80-90 of
processing - Concentrate actual measurements on critical
transactions - Investigate test results that deviate from model
predictions - Postpone fine tuning until the main issues are
under control - Iterate, iterate, iterate,