Title: Sampling Distributions
1Chapter 5
2Introduction
- Distribution of a Sample Statistic The
probability distribution of a sample statistic
obtained from a random sample or a randomized
experiment - What values can a sample mean (or proportion)
take on and how likely are ranges of values? - Population Distribution Set of values for a
variable for a population of individuals.
Conceptually equivalent to probability
distribution in sense of selecting an individual
at random and observing their value of the
variable of interest
3Sampling Distributions for Counts and Proportions
- Binary outcomes Each individual or realization
can be classified as a Success or Failure
(Presence/Absence of Characteristic of interest) - Random Variable X is the count of the number of
successes in n trials - Sample proportion Proportion of succeses in the
sample - Population proportion Proportion of successes in
the population
4Binomial Distribution for Sample Counts
- Binomial Experiment
- Consists of n trials or observations
- Trials/observations are independent of one
another - Each trial/observation can end in one of two
possible outcomes often labelled Success and
Failure - The probability of success, p, is constant across
trials/observations - Random variable, X, is the number of successes
observed in the n trials/observations. - Binomial Distributions Family of distributions
for X, indexed by Success probability (p) and
number of trials/observations (n). Notation
XB(n,p)
5Binomial Distributions and Sampling
- Problem when sampling from a finite sample the
sequence of probabilities of Success is altered
after observing earlier individuals. - When the population is much larger than the
sample (say at least 20 times as large), the
effect is minimal and we say X is approximately
binomial - Obtaining probabilities
Table C gives probabilities for various n and p.
Note that for p gt 0.5, use 1-p and you are
obtaining P(Xn-k)
6Example - Diagnostic Test
- Test claims to have a sensitivity of 90 (Among
people with condition, probability of testing
positive is .90) - 10 people who are known to have condition are
identified, X is the number that correctly test
positive
- Compare with Table C, n10, p.10
- Table obtained in EXCEL with function
BINOMDIST(k,n,p,FALSE) - (TRUE option gives cumulative distribution
function P(X?k)
7Binomial Mean Standard Deviation
- Let Si1 if the ith individual was a success, 0
otherwise - Then P(Si1) p and P(Si0) 1-p
- Then E(Si)mS 1(p) 0(1-p) p
- Note that X S1Sn and that trials are
independent - Then E(X)mX nmS np
- V(Si) E(Si2)-mS2 p-p2 p(1-p)
- Then V(X)sX2 np(1-p)
For the diagnostic test
8Sample Proportions
- Counts of Successes (X) rarely reported due to
dependency on sample size (n) - More common is to report the sample proportion of
successes
9Sampling Distributions for Counts Proportions
- For samples of size n, counts (and thus
proportions) can take on only n distinct possible
outcomes - As the sample size n gets large, so do the number
of possible values, and sampling distribution
begins to approximate a normal distribution.
Common Rule of thumb np ? 10 and n(1-p) ? 10 to
use normal approximation
10Sampling Distribution for XB(n1000,p0.2)
11Using Z-Table for Approximate Probabilities
- To find probabilities of certain ranges of counts
or proportions, can make use of fact that the
sample counts and proportions are approximately
normally distributed for large sample sizes. - Define range of interest
- Obtain mean of the sampling distribution
- Obtain standard deviation of sampling
distribution - Transform range of interest to range of Z-values
- Obtain (approximate) Probabilities from Z-table
12Sampling Distribution of a Sample Mean
- Obtain a sample of n independent measurements of
a quantitative variable X1,,Xn from a
population with mean m and standard deviation s - Averages will be less variable than the
individual measurements - Sampling distributions of averages will become
more like a normal distribution as n increases
(regardless of the shape of the population of
individual measurements)
13Central Limit Theorem
- When random samples of size n are selected from
aamy population with mean m and finite standard
deviation s, the sampling distribution of the
sample mean will be approximately distributed for
large n
Z-table can be used to approximate probabilities
of ranges of values for sample means, as well as
percentiles of their sampling distribution
14Exponential Distribution
- Often used to model times survival of
components, to complete tasks, between customer
arrivals at a checkout line, etc. Density is
highly skewed
Sample means of size 10 (m1, s1/100.50.32)
Individual Measurements (m1,s1)
15Miscellaneous Topics
- Normal Approximation for sample counts and
proportions is example of CLT (XS1Sn) - Any linear function of independent normal random
variables is normal (use rules on means and
variances to get parameters of distribution) - Generalizations of CLT apply to cases where
random variables are correlated (to an extent)
and have different distributions (within reason) - Variables made up of many small random influence
will tend to be approximately normal