Review of Probability and Statistics in Simulation - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Review of Probability and Statistics in Simulation

Description:

Review of Probability and Statistics in Simulation * Review of Probability and Statistics in Simulation * Review of Probability and Statistics in Simulation * Review ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 33

Provided by: acsuBuff1

Category:

more less

Transcript and Presenter's Notes

Title: Review of Probability and Statistics in Simulation

1
Review of Probability and Statistics in Simulation
2
In this review

Use of Probability and Statistics in Simulation
Random Variables and Probability Distributions
Discrete, Continuous, and Discrete and Continuous
Random Variables - Mixed Distribution
Expectation and Moments
Covariance
Sample Mean and Variance
Data Collection and Analysis
Properties of a Good Estimator
Parameter Estimation
--------------------------------------------------
-----------------------------------
Simulation data and output stochastic processes
Two Types of Statistics in simulation output
Distribution Estimation
Confidence Intervals (CI)
Run Length and Number of Replications

3
Use of Probability and Statistics in Simulation

Stochastic systems with variability in their
components
Required time to complete an operation is not
fixed/deterministic ? need in the model
Time between arrivals of customers to a store ?
need in input data analysis
Because simulation results are also stochastic
(estimation of means, variance, etc.) ?
need in output analysis
Conclusion
dealing with random variables in simulation

4
Random Variables and Probability Distributions

Random Variable (RV)
A real number assigned to each outcome of an
experiment in the sample space
Discrete Random Variable
Can only take a finite or a countable infinite
set of values
e.g., hit or miss 0 or 1, Flip a coin of
shooting a basketball, outcome of throwing a dart
1, 2, , 20, number of customers waiting in a
queue
Simulate Monte Carlo throw a dice 1, 2, 3, 4,
5, 6, a pair of dices 2, 3, 4, , 10, 11, 12
Continuous Random Variable
Can take on a continuum of values (infinite)
e.g., customer interarrival time

5
Discrete Random Variables

Probability of getting each value specified by a
probability mass function, p(x)
Definition p(xi) P(X xi) where
P() is a function that maps experiment outcomes
into real numbers satisfying three axioms
(1) 0 ? P(E) ? 1 for any outcome E
(2) P(S) 1 S is the sample space (all possible
values - certain outcome)
(3) If E1, E2, E3, are mutually exclusive
outcomes
P(E1 ? E2 ? E3 ? ) P(E1) P(E2)
P(E3)
e.g., throwing a dice S 1, 2, 3, 4, 5, 6
P(1 ? 2 ? 3 ? 4 ? 5 ? 6)
P(1) P(2) P(3) P(4) P(5) P(6)
1/6 1/6 1/6 1/6 1/6 1/6 1
certain outcome
X is a random variable that is the outcome of a
random experiment and xi is a specific value of X

6
Discrete Random Variables

Restrictions/Conditions
0 ? p(xi) ? 1 for all I
?(all i) p(xi) 1 certain outcome
Alternative representation for the probability
distribution is the cumulative distribution
function, F(x)
Definition F(x) P(X ? x)
relative to probability mass function F(x)
?(xi ? x) p(xi)
Properties of F(x)
(1) 0 ? F(x) ? 1
(2) F(-?) 0
(3) F(?) 1

7
Discrete Random Variables

Ex S 0, 1, 2, 3 four possible outcomes
p(0) 1/8 p(1) 3/8 p(2) 3/8 p(3) 1/8
?(i 0 to 3) p(xi) 1/8 3/8 3/8 1/8 1

F(Xi)
p(Xi)
1
1
1.000
3
0.875
3/4
3/4
2
0.500
1/2
1/2
1
1/4
1/4
0.125
0
Xi
Xi
1
2
1
2
0
3
0
3
8
Discrete Random Variables

Random numbers in digital computers
Used to recapture a discrete distribution in
digital computers
Generated in digital computers - pseudo-random
numbers
Uniformly distributed between 0 and 1 - RN
UN(0, 1)
How many possible values specifying between
adjacent RNs?
Depending on the bit capacity of the computer
(the largest integer that can be represented)
e.g., an 8-bit computer - 28 256 integers
(i) Integer (ri) RN ri/255 xi in previous ex.
1 35 0.1372549 1
2 219 0.8588235 2
3 172 0.6745098 2
4 105 0.4117647 1
5 1 0.0039216 0
6 91 0.3568627 1
? ? ? ?

9
Continuous Random Variables

Probability density function (pdf), f(x)
Definition P(a ? X ? b) ?(from a to b) f(x)
dx
Conditions
(1) f(x) ? 0 and
(2) ?(from -? to ? ) f(x) dx 1
Cumulative distribution function (cdf), F(x)
Definition F(x) ?(from -? to x) f(y) dy P(X
? x)
Defines the probability that the continuous
random variable X assuming a value less than or
equal to x

f(x)
f(x)
x
a
b
P(a ? X ? b)
10
Continuous Random Variables

Ex RNs with uncountable infinite number of
possible continuous values between 0 and 1 and
equal probability

F (X)
f (X)
P (0.50 ? X ? 0.75) 1 dX 0.25
1
1
X
X
0
0.25
0.5
0.75
1
1
cdf
pdf
UN (0,1) or UNFRM (0,1)
11
Discrete and Continuous Random Variables -
Mixed Distribution

Ex Value 1 - 1/3 probability p(1)
1/3
Value 2 - 1/3 probability p(2)
1/3
Between 1 and 2 - 1/3 probability p(1?x?2)
1/3
? 1

f (X)
F (X)
X
1
1.00
1
1/3
1/3
2
2/3
0.67
2/3
(1,2)
1/3
0.33
1/3
1
X
X
0
1
2
0
1
2
1
0
1/3 (x-1)/3
12
Expectation and Moments

Used to characterize probability distribution
functions
The Expectation (expected value) of a random
variable x, Ex Ex ?(all i) xi p(xi) when
x is discrete
Ex ?(all x) x f(x)dx when x is continuous
In general, can be a function of x
Exn ?(all i) xin p(xi) when x is discrete
Exn ?(all x) xn f(x)dx when x is
continuous
The expectation of xn is defined as the nth
moment of a random variable
Expected value is a special case when n 1, it
is thus called the first moment - the mean

13
Expectation and Moments

A variant of the nth moment is the nth moment of
a random variable about the mean
E(x - Ex)n
Important the second moment about the mean
E(x - Ex)2 ?2 Varx
where
the variance of x Varx measures of the
spread of probability distribution
? standard deviation of the random variable

14
Expectation and Moments

Other higher order of moments - measures of
probability of distributions
Skewness - measures if the distribution is
symmetric
kurtosis - measures flatness or peakedness

Mode
Median
Mean
Skewed Positively
Skewed Negatively
Flat (with short broad tails)
Peaked
long thin tails
Platykurtic (like a platypus) Aquatic mammal in
Australia-- eats ants
Leptokurtic (leeping as Kangaroos)
15
Covariance

For two random variables x and y
Covx, y E(x - Ex) (y - Ey)
Measures the linear association between x and y
Causal relationship
x and y are independent if Covx, y 0
Formally, p(yx) p(y) for discrete
f (yx) f (y) for continuous
Measure of dependence - correlation coefficient, ?

16
Functions of Random Variables and Their Properties

Properties for Expectation of functions of RV
Ex y Ex Ey x y is a RV
Ekx kEx kx is a RV
Ex k Ex k x k is a RV
where k is an arbitrary constant
Properties for Variances
Varx y Varx Vary 2Covx, y
If x, y are independent, Varx y Varx
Vary
Varkx k2 Varx
Varx k Varx
Varkx ny k2 Varx n2 Vary 2kn Covx,
y

17
Sample Mean and Variance

For I samples from a probability distribution
the sum of the samples divided by the number of
samples
If Xi are assumed to be independent and
identically distributed (iid), the expected value
and variance of sample mean
EXI (EX1 EX2 EXI) / I EX
VarXI (VarX1 VarX2 VarXI) / I2
VarX / I
only applicable when Xis are IID

18
Sample Mean and Variance

The variance of the sample mean of I identical
samples is a factor 1/I smaller than the variance
of the random variable from which samples are
drawn
So use large I to reduce variance of sample mean
to improve the accuracy in estimating the mean
Only when samples are independent! If not, need
to calculate covariance between samples
Simulation results are all correlated
Ex Waiting times for successive customers are
correlated because i1th customers waiting time
depends on ith customers waiting (and service)
i1 ? i ? i-1 ? ? 1 (correlated
samples/autocorrelated)
So, cannot estimate the variance of the average
waiting time by simply dividing the variance of
the waiting time by the number of samples

19
Random Sum of Independent Random Variables

If x1, x2, , xk are IID random variables, and k
is a discrete random variable independent of xi,
for the sum
we have Ey Ex Ek
and Vary Ek Varx Vark E2x
Ex Drive-through bank teller
Number of transactions of each customer, k
Time to complete each transaction, xi
(k and xi are independent)
Time to serve each customer, y

20
Law of Large Numbers Central Limit Theorem

Characteristics of Xi when number of samples ? ?
Law of Large Numbers
when sample size I ? to ? (with probability 1),
X ? EX
The associated result weak law of large
numbers
lim(I??) P X - EX gt ? 0 for any
positive small ?
The probability that the difference of X and EX
exceeds ? approaches 0 as I approaches infinity
Central Limit Theorem
Under certain mild conditions, the distribution
of the sum of I independent samples of X
approaches the normal distribution as I
approaches infinity, regardless of the
distribution of X

21
Data Collection and Analysis

For input data preparation and output analysis
Descriptive Statistics
Data Representation Organizing data in form of
probability distributions

22
Data Collection and Analysis

Ex Customer waiting time collected
Raw data (in second) 15, 65, 31, 3, 125,
Frequency distribution table Count the frequency
of occurrence in cells/ranges
Waiting Time (Sec.) Number of Customers
0 ? 20 21 0.122 0.122
20 ? 40 35 0.203 0.325
40 ? 60 Mode 42 0.244 0.569
60 ? 80 35 0.203 0.772
80 ? 100 19 0.110 0.882
100 ? 120 10 0.058 0.940
gt 120 10 0.058 0.998 ? 1
172 ? 1

42
of Cust
35
35
Histogram Adiscrete distribution often
user-defined
30
21
19
20
10
10
10
Waiting Time
0-20
20-40
40-60
60-80
80-100
100-120
gt120
23
Data Collection and Analysis
1.000
1.0
0.940

Note
Class width (cell width) should be of equal
length except the first (maybe from -? to) and
the last (to ?)
No overlapping in class intervals - a data point
has a unique class assignment
5 - 20 classes normally used (application
dependent)
Carefully choose first and last classes

0.882
0.9
0.772
0.8
Distribution between 0 20, 20 40, ...
? Uniform or defined
0.7
0.569
0.6
0.5
0.4
0.325
0.3
0.2
0.122
0.1
20
40
60
80
100
120
gt 120
24
Data Collection and Analysis

A stem-leaf diagram
Special type of frequency diagram/histogram
91-100 9 3 1 0
81-90 8 3 7 9 8 6 5
71-80 7 7 8 3 4 5 7 8 9 6
Underlying
61-70 6 7 9 3 1 0 Distribution
51-60 5 3 4 8 1
lt 50 4 7
n 28 H 93 L 47
X 73.64 ?n-1 13.38 (? 13.14)
Median 77

25
Parameter Estimation
Population
Sample
?
?2
Population Mean Variance
Sample Mean Variance
?
?2
Estimate
26
Formulas for Sample Mean and Sample Variance

Statistics based Statistics for time
on observation persistent variables
Sample
mean
Sample
variance
Another useful statistics coefficient of
variation Sx/XI
Formally, estimates that specify a single value
(parameter) of the population are called point
estimates, while estimates that specify a range
of values are called interval estimates

27
Distribution Estimation

Use collected data to identify (fit) the
underlying distribution of the population
Approach
Assume the data follow a particular statistical
distribution - Hypothesis
Apply one or more goodness-of-fit tests to the
sample data - Inference (see how parameters are
estimated)
Commonly used tests Chi-Square test and
Kolmogorov-Sminov test
Judging the outcome of the tests - If fit (under
a specified level of statistical significance)

28
Four Properties of a Good Estimator (1)

Unbiasedness
An unbiased estimator has an expected value that
is equal to the true value of the parameter being
estimated, i.e.,
Eestimator population parameter
for mean EXI ?
ESx2 ?2
but ESx ? ? - the square root of a sum of s
is not usually equal to the sum of the square
roots of those same s

29
Four Properties of a Good Estimator (2a)

Efficiency
The net efficient estimator among a group of
unbiased estimators is the one with the smallest
variance
Ex Three different estimators distributions
1 and 2 expected value population parameter
(unbiased)
3 positive biased
Variance decreases from 1, to 2, to 3 (3 is the
smallest)
Conclusion 2 is the most efficient

1, 2, 3 based on samples of the same size
2
3
1
Value of Estimator
Population Parameter
30
Four Properties of a Good Estimator (2b)

Efficiency (-continued)
Relative Efficiency since it is difficult to
prove that an estimator is the best among all
unbiased ones, use
Ex Sample mean vs. sample median
Variance of sample mean ?2/n
Variance of sample median ??2/2n
Varmedian / Varmean (??2/2n) / (?2/n)
?/2 1.57
Therefore, sample median is 1.57 times less
efficient than the sample mean

31
Four Properties of a Good Estimator (4)

Sufficiency
A necessary condition for efficiency
Should use all the information about the
population parameter that the sample can provide
- take into account each of the sample
observations
Ex Sample median is not a sufficient estimator
because only ranking of the observations is used
and distances between adjacent values are ignored

32
Four Properties of a Good Estimator (4)

Consistency
Should yield estimates that converge in
probability to the population parameter being
estimated when n (sample size) becomes larger
That is, when n ? ?, estimator becomes unbiased
and the variance of the estimator approaches 0
Ex X/n is an unbiased estimator of the
population proportion i.e., X/n is a consistent
estimator of p
Variance VarX/n 1/n2 VarX 1/n2 (npq)
pq/n
(since X is binomially distributed)
When n ? ?, pq/n ? 0