Title: Slide 1 of 39
1Further Statistical Issues
Chapter 12
Last revision June 9, 2003
2What Well Do ...
- Random-number generation
- Generating random variates
- Nonstationary Poisson processes
- Variance reduction
- Sequential sampling
- Designing and executing simulation experiments
- Backup material
- Appendix C A Refresher on Probability and
Statistics - Appendix D Arenas Probability Distributions
3Random-Number Generators (RNGs)
- Algorithm to generate independent, identically
distributed draws from the continuous UNIF (0, 1)
distribution - These are called random numbersin simulation
- Basis for generating observations from all other
distributions and random processes - Transform random numbers in a way that depends on
the desired distribution or process (later in
this chapter) - Its essential to have a good RNG
- There are a lot of bad RNGs this is very tricky
- Methods, coding are both tricky
4The Nature of RNGs
- Recursive formula (algorithm)
- Start with a seed (or seed vector)
- Do something weird to the seed to get the next
one - Repeat generate the same sequence
- Will eventually repeat (cycle) want long cycle
length - Not really random as in unpredictable
- Does this matter? Philosophically? Practically?
- Want to design RNGs
- Long cycle length
- Good statistical properties (uniform,
independent) -- tests - Fast
- Streams (subsegments) many and long (for
variance reduction later) - This is not easy!
- Doing something weird isnt enough
5Linear Congruential Generators(LCGs)
- The most common of several different methods
- But not the one in Arena (though its related)
more later - Generate a sequence of integers Z1, Z2, Z3, via
the recursion - Zi (a Zi1 c) (mod m)
- a, c, and m are carefully chosen constants
- Specify a seed Z0 to start off
- mod m means take the remainder of dividing by m
as the next Zi - All the Zis are between 0 and m 1
- Return the ith random number as Ui Zi / m
6Example of a Toy LCG
- Parameters m 63, a 22, c 4, Z0 19
- Zi (22 Zi1 4) (mod 63), seed with Z0 19
- i 22 Zi14 Zi Ui
- 0 19
- 1 422 44 0.6984
- 2 972 27 0.4286
- 3 598 31 0.4921
- 4 686 56 0.8889
-
- 61 158 32 0.5079
- 62 708 15 0.2381
- 63 334 19 0.3016
- 64 422 44 0.6984
- 65 972 27 0.4286
- 66 598 31 0.4921
-
- Cycling will repeat forever
- Cycle length m
- (could be ltlt m depending
- on parameters)
- Pick m BIG
- But that might not be enoughfor good statistical
properties
7Issues with LCGs
- Cycle length lt m
- Typically, m 2.1 billion ( 231 1) or more
- Which used to be a lot more later
- Other parameters chosen so that cycle length m
or m 1 - Statistical properties
- Uniformity, independence
- There are many tests of RNGs
- Empirical tests
- Theoretical tests lattice structure (next
slide ) - Speed, storage both are usually fine
- Must be carefully, cleverly coded BIG integers
- Reproducibility streams (long internal
subsequences) with fixed seeds
8Issues with LCGs (contd.)
- Regularity of LCGs (and other kinds of RNGs)
For the earlier toy LCG - Design RNGs dense lattice in high dimensions
- Other kinds of RNGs longer memory in recursion,
combination of several RNGs
Plot of Ui vs. i
Plot of Ui1 vs. Ui
Random Numbers Fall Mainly in the Planes
George Marsaglia
9The Original (1983) Arena RNG
- LCG with m 231 1 2,147,483,647
- a 75 16,807
- c 0
- Cycle length m 1 2.1 ? 109
- Ten different automatic streams with fixed seeds
- In its day, this was a good, well-tested
generator in an efficient code - But current computer speed make this cycle length
inadequate - Exhaust in 10 minutes on 2 GHz PC if we just
generate - Can get this generator (not recommended)
- Place a Seeds module (Elements panel) in your
model
10The Current (2000) Arena RNG
- Uses some of the same ideas as LCG
- Modulo division, recursive on earlier values
- But is not an LCG
- Combines two separate component generators
- Recursion involves more than just the preceding
value - Combined multiple recursive generator (CMRG)
- An (1403580 An-2 810728 An-3) mod 4294967087
- Bn (527612 Bn-1 1370589 Bn-3) mod 4294944443
- Zn (An Bn) mod 4294967087
- Seed a six-vector of first three Ans, Bns
Two simultaneous recursions
Combine the two
The next random number
11The Current (2000) Arena RNG Properties
- Extremely good statistical properties
- Good uniformity in up to 45-dimensional hypercube
- Cycle length 3.1 ? 1057
- To cycle, all six seeds must match up
- On 2 GHz PC, would take 2.8 ? 1040 millennia to
exhaust - Under Moores law, it will be 216 years until
this generator can be exhausted in a year of
nonstop computing - Only slightly slower than old LCG
- And RNG is usually a minor part of overall
computing time
12The Current (2000) Arena RNG Streams and
Substreams
- Automatic streams and substreams
- 1.8 ? 1019 streams of length 1.7 ? 1038 each
- Each stream further divided into 2.3 ? 1015
substreams of length 7.6 ? 1022 each - 2 GHz PC would take 669 million years to exhaust
a substream - Default stream is 10 (historical reasons)
- Also used for Chance-type Decide module
- To use a different stream, append its number
after a distributions parameters - For example, EXPO(6.7, 4) to use stream 4
- When using multiple replications, Arena
automatically advances to next substream in each
stream for the next replication - Helps synchronize for variance reduction
13Generating Random Variates
- Have Desired input distribution for model
(fitted or specified in some way), and RNG (UNIF
(0, 1)) - Want Transform UNIF (0, 1) random numbers into
draws from the desired input distribution - Method Mathematical transformations of random
numbers to deform them to the desired
distribution - Specific transform depends on desired
distribution - Details in online Help about methods for all
distributions - Do discrete, continuous distributions separately
14Generating from Discrete Distributions
- Example probability mass function
- Divide 0, 1
- Into subintervals of length 0.1, 0.5, 0.4
- Generate U UNIF (0, 1)
- See which subinterval its in
- Return X corresponding value
15Discrete Generation Another View
- Plot cumulative distribution function generate U
and plot on vertical axis read across and down
- Inverting the CDF
- Equivalent to earlier method
16Generating from Continuous Distributions
- Example EXPO (5) distribution
- Density (PDF)
- Distribution (CDF)
- General algorithm (can be rigorously justified)
- 1. Generate a random number U UNIF(0, 1)
- 2. Set U F(X) and solve for X F1(U)
- Solving analytically for X may or may not be
simple (or possible) - Sometimes use numerical approximation to solve
17Generating from Continuous Distributions (contd.)
- Solution for EXPO (5) case
- Set U F(X) 1 eX/5
- eX/5 1 U
- X/5 ln (1 U)
- X 5 ln (1 U)
- Picture (inverting the CDF, as in discrete case)
Intuition (garden hose) More Us will hit F(x)
where its steep This is where the density f(x)
is tallest, and we want a denser distribution of
Xs
18Nonstationary Poisson Processes
- Many systems have externally originating events
affecting them e.g., arrivals of customers - If process is stationary over time, usually
specify a fixed interevent-time distribution - But process could vary markedly in its rate
- Fast-food lunch rush
- Freeway rush hours
- Ignoring nonstationarity can lead to serious
model and output errors - Already seen this automotive repair
shop,Chapter 5
19Nonstationary Poisson Processes Definition
- Usual model nonstationary Poisson process
- Have a rate function l(t)
- Number of events in t1, t2 Poisson with mean
- Issues
- How to estimate rate function?
- Given an estimate, how to generate during
simulation?
l(t)
t
20Nonstationary Poisson Processes Estimating the
Rate Function
- Estimation of the rate function
- Probably the most practical method is piecewise
constant - Decide on a time interval within which rate is
fixed - Estimate from data the (constant) rate during
each interval - Be careful to get the units right
- Other (more complicated) methods exist in the
literature
21Nonstationary Poisson Processes Generation
- Arena has a built-in method to generate, assuming
a piecewise-constant rate function - Arrival Schedule in Create module auto repair
(Model 5-2) - Method is to invert a rate-one stationary Poisson
process against the cumulate rate function L - Similar to inverting CDF for continuous
random-variable generation - Exploits some speed-up possibilities
- Details in Help topic Non-Stationary Exponential
Distribution - Alternative method thinning of a stationary
Poisson process at the peak rate
22Variance Reduction
- Random input Þ random output (RIRO)
- In other words, output has variance
- Higher output variance means less precise results
- Would like to eliminate or reduce output variance
- One (bad) way to eliminate replace all input
random variables by constants (like their mean) - Will get rid of random output, but will also
invalidate model - Thus, best hope is to reduce output variance
- Easy (brute-force) variance reduction just
simulate some more - Terminating additional replications
- Steady-state additional replications or a
longer run
23Variance Reduction (contd.)
- But sometimes can reduce variance without more
runs free lunch (?) - Key unlike physical experiments, can control
randomness in computer-simulation experiments via
manipulating the RNG - Re-use the same random numbers either as they
were, in some opposite sense, or for a similar
but simpler model - Several different variance-reduction techniques
- Classified into categories common random
numbers, antithetic variates, control variates,
indirect estimation, - Usually requires thorough understanding of model,
code - Will look only at common random numbers in detail
24Common Random Numbers (CRN)
- Applies when objective is to compare two (or
more) alternative configurations or models - Interest is in difference(s) of performance
measure(s) across alternatives - Model 7-2 (small mfg. system), total avg. WIP
output two alternatives - A. Base case (as is)
- B. 3.5 increase in business (interarrival-time
mean falls from 13 to 12.56 minutes) - Same run conditions, but change model into Model
12-1 - Remove Output File Total WIP History.dat
- Add entry to Statistic module to compute and save
to a .dat file the total avg. WIP on each
replication
25The Natural Comparison
- Run case A, make the change to get to case B and
run it, then Compare Means via Output Analyzer - Difference is not statistically significant
- Were the runs of A and B statistically
independent? - Did we use the same random numbers running A and
B? - Did we use the same random numbers intelligently?
26CRN Intuition
- Get sharper comparison if you subject all
alternatives to the same conditions - Then observed differences are due to model
differences rather than random differences in the
conditions - Small mfg. system For both A and B runs, cause
- The same parts arrive at the same times
- Be assigned same attributes (job type)
- Have the same process times at each step
- Then observed differences will be attributable to
system differences, not random bounce - There isnt any random bounce
27Synchronization of Random Numbers in CRN
- Generally, get CRN by using the same RNG, seed,
stream(s) for all alternatives - Already are using the same stream, default
stream 10 - But its usage generally gets mixed up across
alternatives - Must use the same random numbers for the same
purposes across the alternatives
synchronization of random-number usage - Usually requires some work, understanding of
model - Usually use different streams in the RNG
- Usually different ways to do this in a given
model - Sometimes cant synchronize completely for
complex models settle for partial
synchronization
28Synchronization of Random Numbers in CRN
(contd.)
- Synchronize by source of randomness (well do)
- Assign stream to each point of variate generation
- Separate random-number faucets extra
parameter in r.v. calls - Model 12-1 14 sources of randomness, separate
stream for each (see book for details), modify
into Model 12-2 - Fairly simple but might not ensure complete
synchronization still usually get some benefit
from this - Synchronize by entity (wont do see Exercises)
- Pre-generate every possible random variate an
entity might need when it arrives, assign to
attributes, used downstream - Better synchronization insurance but uses more
memory - Across replications, RNG automatically goes to
next substream within each stream - Maintains synchronization if alternatives
disagree on number of random numbers used per
stream per replication
29Effect of CRN
Natural Comparison
Synchronized CRN
- CRN was no more computational effort
- Effect here is fairly dramatic, but it will not
always be this strong - Depends on how similar A and B are
30CRN Statistical Issues
- In Output Analyzer, Analyze gt Compare Means
option, have choice of Paired-t or Two-Sample-t
for c.i. test on difference between means - Paired-t subtracts results replication by
replication must use this if doing CRN - Two-Sample-t treats the samples independently
can use this if doing independent sampling, often
better than Paired-t - Mathematical justification for CRN
- Let X output r.v. from alternative A, Y
output from B - Var(X Y) Var(X) Var(Y) 2 Cov(X, Y)
31Other Variance-Reduction Techniques
- For single-system simulation, not comparisons
- Antithetic variates make pairs of runs
- Use Us on first run of pair, 1 Us on
second run of pair - Take average of two runs negatively correlated,
reducing variance of average - Like CRN, must take care to synchronize
- Control variates
- Use internal variate generation to control
results up, down - Indirect estimation
- Simulation estimates something other than what
you want, but related to what you want by a fixed
formula
32Sequential Sampling
- Always try to quantify imprecision in results
- If imprecision is small enough, youre done
- If not, need to do something to increase
precision - Just saw one way variance-reduction techniques
- Obvious way to increase precision keep
simulating one more step at a time, quit when
you achieve desired precision - Terminating models step another replication
- Cannot extend length of replications thats
part of the model - Steady-state models
- step another replication if using truncated
replications, or - step some extension of the run if using batch
means
33Sequential Sampling Terminating Models
- Modify Model 12-2 (small mfg., base case, with
random-number streams) into Model 12-3 - Made 25 replications, get 95 c.i. on expected
average Total WIP as 13.03 1.28 - Suppose the 1.28 is too big want to reduce to
0.5 - Approximate formulas from Sec. 6.3 need 124 or
164 total replications (depending on which
formula) rather than 25 - Instead, just make one more at a time, re-compute
c.i., stop as soon as half-width is less than 0.5 - Trick Arena to keep making more replications
until c.i. half-width lt tolerance 0.5
34Sequential Sampling Terminating Models
(contd.)
- Recall With gt 1 replication, automatically get
cross-replication 95 c.i.s for expected values
in Category Overview report - Related internal Arena variables
- ORUNHALF(Output ID) half-width of 95 c.i.
using completed replications (Output ID Avg
Total WIP) - MREP total number of replications asked for
(initially, MREP Number of Replications in Run
gt Setup gt Replication Parameters) - NREP replication number now in progress ( 1,
2, 3, ) - Use, manipulate these variables
35Sequential Sampling Terminating Models
(contd.)
- Initially set MREP to huge value in Number of
Replications in Run gt Setup gt Replication
Parameters - Keep replicating until we cut off when half-width
0.5 - Add a logic (in a submodel) to sense when done
- Create one control entity at beginning of each
replication - Control entity immediately checks to see if
- NREP 2 This is beginning of 1st or 2nd
replication - ORUNHALF(Avg Total WIP) gt 0.5 c.i. on completed
replications is still too big - In either case, keep going with this replication
(and the next one too) control entity is
Disposed and takes no action - If both conditions are false, Control entity
Assigns MREP NREP to stop after this
replication, and is Disposed
36Sequential Sampling Terminating Models
(contd.)
- Details
- This overshoots required number of replications
by one - In Assign module setting MREP to NREP, have to
select Type Other since MREP is a built-in
Arena variable - Results Stopped with 232 total replications,
yielding half width 0.49699 (barely less than
0.5) - Different from earlier number-of-replications
approximations (theyre just that) - Generalizations
- Precision demands on several outputs
- Relative-width stopping (half-width) / (pt.
estimate) small
37Sequential Sampling Steady-State Models
- If doing truncated-replications approach to
steady-state statistical analysis - Same strategy as above for terminating models
- Warm-up Period specified in Run gt Setup gt
Replication Parameters - Err on the side of too much warmup
- Point-estimator bias is especially dangerous in
sequential sampling - Getting tight c.i. centered in the wrong place
- The tighter the c.i. demand, the worse the
coverage probability
38Sequential Sampling Steady-State Models
(contd.)
- Batch-means approach
- Model 12-4 modification of Model 7-4 (small mfg.
system) want half-width on E(average WIP) to be
lt 1 - Keep extending the run to reduce c.i. half-width
- Use automatic run-time batch-means 95 c.i.s
- Stopping criterion Terminating Condition field
of Run gt Setup gt Replication Parameters - Half-width variables are THALF(Tally ID) or
DHALF(Dstat ID) - For us, condition is DHALF(Total WIP) lt 1
- Remove all other stopping devices from model
- If Insuf or Corr would be returned because of
too little data, half-width variables set to huge
value keep going - Could demand multiple smallness criteria,
relative precision (use TAVG, DAVG variables for
point-estimate denominator)
39Designing and Executing Simulation Experiments
- Think of a simulation model as a convenient
testbed or laboratory for experimentation - Look at different output responses
- Look at effects, interaction of different input
factors - Apply classical experimental-design techniques
- Factorial experiments main effects,
interactions - Fractional-factorial experiments
- Factor-screening designs
- Response-surface methods, metamodels
- CRN is blocking in experimental-design
terminology - Process Analyzer (PAN) provides a convenient way
to carry out a designed experiment - See Chapt. 6 for an example of using PAN for a
factorial experiment