Review of Probability and Statistics in Simulation - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Review of Probability and Statistics in Simulation

Description:

Review of Probability and Statistics in Simulation * Review of Probability and Statistics in Simulation * Review of Probability and Statistics in Simulation * Review ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 33
Provided by: acsuBuff1
Category:

less

Transcript and Presenter's Notes

Title: Review of Probability and Statistics in Simulation


1
Review of Probability and Statistics in Simulation
2
In this review
  • Use of Probability and Statistics in Simulation
  • Random Variables and Probability Distributions
  • Discrete, Continuous, and Discrete and Continuous
    Random Variables - Mixed Distribution
  • Expectation and Moments
  • Covariance
  • Sample Mean and Variance
  • Data Collection and Analysis
  • Properties of a Good Estimator
  • Parameter Estimation
  • --------------------------------------------------
    -----------------------------------
  • Simulation data and output stochastic processes
  • Two Types of Statistics in simulation output
  • Distribution Estimation
  • Confidence Intervals (CI)
  • Run Length and Number of Replications

3
Use of Probability and Statistics in Simulation
  • Stochastic systems with variability in their
    components
  • Required time to complete an operation is not
    fixed/deterministic ? need in the model
  • Time between arrivals of customers to a store ?
    need in input data analysis
  • Because simulation results are also stochastic
    (estimation of means, variance, etc.) ?
  • need in output analysis
  • Conclusion
  • dealing with random variables in simulation

4
Random Variables and Probability Distributions
  • Random Variable (RV)
  • A real number assigned to each outcome of an
    experiment in the sample space
  • Discrete Random Variable
  • Can only take a finite or a countable infinite
    set of values
  • e.g., hit or miss 0 or 1, Flip a coin of
    shooting a basketball, outcome of throwing a dart
    1, 2, , 20, number of customers waiting in a
    queue
  • Simulate Monte Carlo throw a dice 1, 2, 3, 4,
    5, 6, a pair of dices 2, 3, 4, , 10, 11, 12
  • Continuous Random Variable
  • Can take on a continuum of values (infinite)
  • e.g., customer interarrival time

5
Discrete Random Variables
  • Probability of getting each value specified by a
    probability mass function, p(x)
  • Definition p(xi) P(X xi) where
  • P() is a function that maps experiment outcomes
    into real numbers satisfying three axioms
  • (1) 0 ? P(E) ? 1 for any outcome E
  • (2) P(S) 1 S is the sample space (all possible
    values - certain outcome)
  • (3) If E1, E2, E3, are mutually exclusive
    outcomes
  • P(E1 ? E2 ? E3 ? ) P(E1) P(E2)
    P(E3)
  • e.g., throwing a dice S 1, 2, 3, 4, 5, 6
  • P(1 ? 2 ? 3 ? 4 ? 5 ? 6)
  • P(1) P(2) P(3) P(4) P(5) P(6)
  • 1/6 1/6 1/6 1/6 1/6 1/6 1
    certain outcome
  • X is a random variable that is the outcome of a
    random experiment and xi is a specific value of X

6
Discrete Random Variables
  • Restrictions/Conditions
  • 0 ? p(xi) ? 1 for all I
  • ?(all i) p(xi) 1 certain outcome
  • Alternative representation for the probability
    distribution is the cumulative distribution
    function, F(x)
  • Definition F(x) P(X ? x)
  • relative to probability mass function F(x)
    ?(xi ? x) p(xi)
  • Properties of F(x)
  • (1) 0 ? F(x) ? 1
  • (2) F(-?) 0
  • (3) F(?) 1

7
Discrete Random Variables
  • Ex S 0, 1, 2, 3 four possible outcomes
  • p(0) 1/8 p(1) 3/8 p(2) 3/8 p(3) 1/8
  • ?(i 0 to 3) p(xi) 1/8 3/8 3/8 1/8 1

F(Xi)
p(Xi)
1
1
1.000
3
0.875
3/4
3/4
2
0.500
1/2
1/2
1
1/4
1/4
0.125
0
Xi
Xi
1
2
1
2
0
3
0
3
8
Discrete Random Variables
  • Random numbers in digital computers
  • Used to recapture a discrete distribution in
    digital computers
  • Generated in digital computers - pseudo-random
    numbers
  • Uniformly distributed between 0 and 1 - RN
    UN(0, 1)
  • How many possible values specifying between
    adjacent RNs?
  • Depending on the bit capacity of the computer
    (the largest integer that can be represented)
  • e.g., an 8-bit computer - 28 256 integers
  • (i) Integer (ri) RN ri/255 xi in previous ex.
  • 1 35 0.1372549 1
  • 2 219 0.8588235 2
  • 3 172 0.6745098 2
  • 4 105 0.4117647 1
  • 5 1 0.0039216 0
  • 6 91 0.3568627 1
  • ? ? ? ?

9
Continuous Random Variables
  • Probability density function (pdf), f(x)
  • Definition P(a ? X ? b) ?(from a to b) f(x)
    dx
  • Conditions
  • (1) f(x) ? 0 and
  • (2) ?(from -? to ? ) f(x) dx 1
  • Cumulative distribution function (cdf), F(x)
  • Definition F(x) ?(from -? to x) f(y) dy P(X
    ? x)
  • Defines the probability that the continuous
    random variable X assuming a value less than or
    equal to x

f(x)
f(x)
x
a
b
P(a ? X ? b)
10
Continuous Random Variables
  • Ex RNs with uncountable infinite number of
    possible continuous values between 0 and 1 and
    equal probability

F (X)
f (X)
P (0.50 ? X ? 0.75) 1 dX 0.25
1
1
X
X
0
0.25
0.5
0.75
1
1
cdf
pdf
UN (0,1) or UNFRM (0,1)
11
Discrete and Continuous Random Variables -
Mixed Distribution
  • Ex Value 1 - 1/3 probability p(1)
    1/3
  • Value 2 - 1/3 probability p(2)
    1/3
  • Between 1 and 2 - 1/3 probability p(1?x?2)
    1/3
  • ? 1

f (X)
F (X)
X
1
1.00
1
1/3
1/3
2
2/3
0.67
2/3
(1,2)
1/3
0.33
1/3
1
X
X
0
1
2
0
1
2
1
0
1/3 (x-1)/3
12
Expectation and Moments
  • Used to characterize probability distribution
    functions
  • The Expectation (expected value) of a random
    variable x, Ex Ex ?(all i) xi p(xi) when
    x is discrete
  • Ex ?(all x) x f(x)dx when x is continuous
  • In general, can be a function of x
  • Exn ?(all i) xin p(xi) when x is discrete
  • Exn ?(all x) xn f(x)dx when x is
    continuous
  • The expectation of xn is defined as the nth
    moment of a random variable
  • Expected value is a special case when n 1, it
    is thus called the first moment - the mean

13
Expectation and Moments
  • A variant of the nth moment is the nth moment of
    a random variable about the mean
  • E(x - Ex)n
  • Important the second moment about the mean
  • E(x - Ex)2 ?2 Varx
  • where
  • the variance of x Varx measures of the
    spread of probability distribution
  • ? standard deviation of the random variable

14
Expectation and Moments
  • Other higher order of moments - measures of
    probability of distributions
  • Skewness - measures if the distribution is
    symmetric
  • kurtosis - measures flatness or peakedness

Mode
Median
Mean
Skewed Positively
Skewed Negatively
Flat (with short broad tails)
Peaked
long thin tails
Platykurtic (like a platypus) Aquatic mammal in
Australia-- eats ants
Leptokurtic (leeping as Kangaroos)
15
Covariance
  • For two random variables x and y
  • Covx, y E(x - Ex) (y - Ey)
  • Measures the linear association between x and y
  • Causal relationship
  • x and y are independent if Covx, y 0
  • Formally, p(yx) p(y) for discrete
  • f (yx) f (y) for continuous
  • Measure of dependence - correlation coefficient, ?

16
Functions of Random Variables and Their Properties
  • Properties for Expectation of functions of RV
  • Ex y Ex Ey x y is a RV
  • Ekx kEx kx is a RV
  • Ex k Ex k x k is a RV
  • where k is an arbitrary constant
  • Properties for Variances
  • Varx y Varx Vary 2Covx, y
  • If x, y are independent, Varx y Varx
    Vary
  • Varkx k2 Varx
  • Varx k Varx
  • Varkx ny k2 Varx n2 Vary 2kn Covx,
    y

17
Sample Mean and Variance
  • For I samples from a probability distribution
  • the sum of the samples divided by the number of
    samples
  • If Xi are assumed to be independent and
    identically distributed (iid), the expected value
    and variance of sample mean
  • EXI (EX1 EX2 EXI) / I EX
  • VarXI (VarX1 VarX2 VarXI) / I2
    VarX / I
  • only applicable when Xis are IID

18
Sample Mean and Variance
  • The variance of the sample mean of I identical
    samples is a factor 1/I smaller than the variance
    of the random variable from which samples are
    drawn
  • So use large I to reduce variance of sample mean
    to improve the accuracy in estimating the mean
  • Only when samples are independent! If not, need
    to calculate covariance between samples
  • Simulation results are all correlated
  • Ex Waiting times for successive customers are
    correlated because i1th customers waiting time
    depends on ith customers waiting (and service)
  • i1 ? i ? i-1 ? ? 1 (correlated
    samples/autocorrelated)
  • So, cannot estimate the variance of the average
    waiting time by simply dividing the variance of
    the waiting time by the number of samples

19
Random Sum of Independent Random Variables
  • If x1, x2, , xk are IID random variables, and k
    is a discrete random variable independent of xi,
    for the sum
  • we have Ey Ex Ek
  • and Vary Ek Varx Vark E2x
  • Ex Drive-through bank teller
  • Number of transactions of each customer, k
  • Time to complete each transaction, xi
  • (k and xi are independent)
  • Time to serve each customer, y

20
Law of Large Numbers Central Limit Theorem
  • Characteristics of Xi when number of samples ? ?
  • Law of Large Numbers
  • when sample size I ? to ? (with probability 1),
    X ? EX
  • The associated result weak law of large
    numbers
  • lim(I??) P X - EX gt ? 0 for any
    positive small ?
  • The probability that the difference of X and EX
    exceeds ? approaches 0 as I approaches infinity
  • Central Limit Theorem
  • Under certain mild conditions, the distribution
    of the sum of I independent samples of X
    approaches the normal distribution as I
    approaches infinity, regardless of the
    distribution of X

21
Data Collection and Analysis
  • For input data preparation and output analysis
  • Descriptive Statistics
  • Data Representation Organizing data in form of
    probability distributions

22
Data Collection and Analysis
  • Ex Customer waiting time collected
  • Raw data (in second) 15, 65, 31, 3, 125,
  • Frequency distribution table Count the frequency
    of occurrence in cells/ranges
  • Waiting Time (Sec.) Number of Customers
  • 0 ? 20 21 0.122 0.122
  • 20 ? 40 35 0.203 0.325
  • 40 ? 60 Mode 42 0.244 0.569
  • 60 ? 80 35 0.203 0.772
  • 80 ? 100 19 0.110 0.882
  • 100 ? 120 10 0.058 0.940
  • gt 120 10 0.058 0.998 ? 1
  • 172 ? 1

42
of Cust
35
35
Histogram Adiscrete distribution often
user-defined
30
21
19
20
10
10
10
Waiting Time
0-20
20-40
40-60
60-80
80-100
100-120
gt120
23
Data Collection and Analysis
1.000
1.0
0.940
  • Note
  • Class width (cell width) should be of equal
    length except the first (maybe from -? to) and
    the last (to ?)
  • No overlapping in class intervals - a data point
    has a unique class assignment
  • 5 - 20 classes normally used (application
    dependent)
  • Carefully choose first and last classes

0.882
0.9
0.772
0.8
Distribution between 0 20, 20 40, ...
? Uniform or defined
0.7
0.569
0.6
0.5
0.4
0.325
0.3
0.2
0.122
0.1
20
40
60
80
100
120
gt 120
24
Data Collection and Analysis
  • A stem-leaf diagram
  • Special type of frequency diagram/histogram
  • 91-100 9 3 1 0
  • 81-90 8 3 7 9 8 6 5
  • 71-80 7 7 8 3 4 5 7 8 9 6
    Underlying
  • 61-70 6 7 9 3 1 0 Distribution
  • 51-60 5 3 4 8 1
  • lt 50 4 7
  • n 28 H 93 L 47
  • X 73.64 ?n-1 13.38 (? 13.14)
  • Median 77

25
Parameter Estimation
Population
Sample
?
?2
Population Mean Variance
Sample Mean Variance
?
?2
Estimate
26
Formulas for Sample Mean and Sample Variance
  • Statistics based Statistics for time
  • on observation persistent variables
  • Sample
  • mean
  • Sample
  • variance
  • Another useful statistics coefficient of
    variation Sx/XI
  • Formally, estimates that specify a single value
    (parameter) of the population are called point
    estimates, while estimates that specify a range
    of values are called interval estimates

27
Distribution Estimation
  • Use collected data to identify (fit) the
    underlying distribution of the population
  • Approach
  • Assume the data follow a particular statistical
    distribution - Hypothesis
  • Apply one or more goodness-of-fit tests to the
    sample data - Inference (see how parameters are
    estimated)
  • Commonly used tests Chi-Square test and
    Kolmogorov-Sminov test
  • Judging the outcome of the tests - If fit (under
    a specified level of statistical significance)

28
Four Properties of a Good Estimator (1)
  • Unbiasedness
  • An unbiased estimator has an expected value that
    is equal to the true value of the parameter being
    estimated, i.e.,
  • Eestimator population parameter
  • for mean EXI ?
  • ESx2 ?2
  • but ESx ? ? - the square root of a sum of s
    is not usually equal to the sum of the square
    roots of those same s

29
Four Properties of a Good Estimator (2a)
  • Efficiency
  • The net efficient estimator among a group of
    unbiased estimators is the one with the smallest
    variance
  • Ex Three different estimators distributions
  • 1 and 2 expected value population parameter
    (unbiased)
  • 3 positive biased
  • Variance decreases from 1, to 2, to 3 (3 is the
    smallest)
  • Conclusion 2 is the most efficient

1, 2, 3 based on samples of the same size
2
3
1
Value of Estimator
Population Parameter
30
Four Properties of a Good Estimator (2b)
  • Efficiency (-continued)
  • Relative Efficiency since it is difficult to
    prove that an estimator is the best among all
    unbiased ones, use
  • Ex Sample mean vs. sample median
  • Variance of sample mean ?2/n
  • Variance of sample median ??2/2n
  • Varmedian / Varmean (??2/2n) / (?2/n)
    ?/2 1.57
  • Therefore, sample median is 1.57 times less
    efficient than the sample mean

31
Four Properties of a Good Estimator (4)
  • Sufficiency
  • A necessary condition for efficiency
  • Should use all the information about the
    population parameter that the sample can provide
    - take into account each of the sample
    observations
  • Ex Sample median is not a sufficient estimator
    because only ranking of the observations is used
    and distances between adjacent values are ignored

32
Four Properties of a Good Estimator (4)
  • Consistency
  • Should yield estimates that converge in
    probability to the population parameter being
    estimated when n (sample size) becomes larger
  • That is, when n ? ?, estimator becomes unbiased
    and the variance of the estimator approaches 0
  • Ex X/n is an unbiased estimator of the
    population proportion i.e., X/n is a consistent
    estimator of p
  • Variance VarX/n 1/n2 VarX 1/n2 (npq)
    pq/n
  • (since X is binomially distributed)
  • When n ? ?, pq/n ? 0
Write a Comment
User Comments (0)
About PowerShow.com