Sampling Distributions - PowerPoint PPT Presentation

1 / 30

About This Presentation

Title:

Sampling Distributions

Description:

(Hays uses S for sample SD and s to for population estimate from sample SD. ... The statistics estimate population values, e.g. ... – PowerPoint PPT presentation

Number of Views:31

Avg rating:3.0/5.0

Slides: 31

Provided by: MichaelB195

Category:

more less

Transcript and Presenter's Notes

Title: Sampling Distributions

1
Sampling Distributions Point Estimation
2
Questions

What is a sampling distribution?
What is the standard error?
What is the principle of maximum likelihood?
What is bias (in the statistical sense)?
What is a confidence interval?
What is the central limit theorem?
Why is the number 1.96 a big deal?

3
Population

Population Sample Space
Population vs. sample
Population parameter, sample statistic

4
Parameter Estimation

We use statistics to estimate parameters,
e.g., effectiveness of pilot training,
effectiveness of psychotherapy.

5
Sampling Distribution (1)

A sampling distribution is a distribution of a
statistic over all possible samples.
To get a sampling distribution,
1. Take a sample of size N (a given number like
5, 10, or 1000) from a population
2. Compute the statistic (e.g., the mean) and
record it.
3. Repeat 1 and 2 a lot (infinitely for large
pops).
4. Plot the resulting sampling distribution, a
distribution of a statistic over repeated
samples.

6
Suppose

Population has 6 elements 1, 2, 3, 4, 5, 6 (like
numbers on dice)
We want to find the sampling distribution of the
mean for N2
If we sample with replacement, what can happen?

7
1st 2nd M 1st 2nd M 1st 2nd M
1 1 1 3 1 2 5 1 3
1 2 1.5 3 2 2.5 5 2 3.5
1 3 2 3 3 3 5 3 4
1 4 2.5 3 4 3.5 5 4 4.5
1 5 3 3 5 4 5 5 5
1 6 3.5 3 6 4.5 5 6 5.5
2 1 1.5 4 1 2.5 6 1 3.5
2 2 2 4 2 3 6 2 4
2 3 2.5 4 3 3.5 6 3 4.5
2 4 3 4 4 4 6 4 5
2 5 3.5 4 5 4.5 6 5 5.5
2 6 4 4 6 5 6 6 6
Possible Outcomes
8
Histogram
Sampling distribution for mean of 2 dice.
123456 21. 21/6 3.5
There is only 1 way to get a mean of 1, but 6
ways to get a mean of 3.5.
9
Sampling Distribution (2)

The sampling distribution shows the relation
between the probability of a statistic and the
statistics value for all possible samples of
size N drawn from a population.

10
Sampling Distribution Mean and SD

The Mean of the sampling distribution is defined
the same way as any other distribution (expected
value).
The SD of the sampling distribution is the
Standard Error. Important and useful.
Variance of sampling distribution is the expected
value of the squared difference a mean square.
Review

11
Review

What is a sampling distribution?
What is the standard error of a statistic?

12
Statistics as Estimators

We use sample data compute statistics.
The statistics estimate population values, e.g.,
An estimator is a method for producing a best
guess about a population value.
An estimate is a specific value provided by an
estimator.
We want good estimates. What is a good
estimator? What properties should it have?

13
Maximum Likelihood (1)

Likelihood is a conditional probability.
L is the probability (say) that x has some value
given that the parameter theta has some value.
L1 is the probability of observing heights of 68
inches and 70 data inches given adult
malestheta. L2 is the probability of 68 and 70
inches given adult females.
Theta ( ) could be continuous or discrete.

14
Maximum Likelihood (2)

Suppose we know the function (e.g., binomial,
normal) but not the value of theta.
Maximum likelihood principle says take the
estimate of theta that makes the likelihood of
the data maximum.
MLP says Choose the value of theta that makes
this maximum

15
Maximum Likelihood (3)

Suppose we have 2 values hypothesized for
proportions of male grad students at USF, 50 and
40. We randomly sample 15 students and find that
9 are male.
Calculate likelihood for each using binomial
The .50 estimate is better because the data are
more likely.

16
Likelihood Function
The binomial distribution computes probabilities
Likelihood
Theta (p value)
17
Maximum Likelihood (4)

In example, best (max like) estimate would be
9/15 .60.
There is a general class called maximum
likelihood estimators that find values of theta
that maximizes the likelihood of a sample result.
ML is one principle of goodness of an estimator

18
More Goodness (1)

Bias. If E(statistic)parameter, the estimator
is unbiased. If its unbiased, the mean of the
sampling distribution equals the parameter. The
sample mean has this property .
Sample variance is biased.

19
More Goodness (2)

Efficiency size of the sampling variance.
Relative Efficiency. Relative efficiency is the
ratio of two sampling variances.
More efficient statistics have smaller sampling
variances, smaller standard error, and are
preferred because if both are unbiased, one is
closer than the other to the parameter on average.

20
Goodness (3)

Sometimes we trade off bias and efficiency. A
biased estimator is sometime preferred if it is
more efficient, especially if the magnitude of
bias is known.
Resistance. Indicates minimal influence of
outliers. Median is more resistant than the mean.

21
Sampling Distribution of the Mean

Unbiased
Variance of sampling distribution of means based
on N obs
Standard Error of the Mean
Law of large numbers Large samples produce
sample estimates very close to the parameter.

22
Unbiased Estimate of Variance

It can be shown that
The sample variance is too small by a factor of
(N-1)/N.
We fix with
Although the variance is unbiased, the SD is
still biased, but most inferential work is based
on the variance, not SD.

23
Review

What is the principle of maximum likelihood?
Define
Bias
Efficiency
Resistance
Is the sample variance (SS divided by N) a biased
estimator?

24
Interval Estimation

Use the standard error of the mean to create a
bracket or confidence interval to show where good
estimates of the mean are.
The sampling distribution of the mean is nice
when Ngt20. Therefore
Suppose M100, SD14, N49. Then SDM14/72.
Bracket 100-6 94 to 1006 106 is 94 to 106.
P is probability of sample not mu.

Unimodal and symmetric
25
Review

What is a confidence interval?
Suppose M 50, SD 10, and N 100. What is the
confidence interval?

SEM 10/sqrt(100) 10/10 1 CI (lower)
M-3SEM 50-3 47 CI (upper) M3SEM 503
53 CI 47 to 53
26
Central Limit Theorem

1. Sampling distribution of means becomes normal
as N increases, regardless of shape of original
distribution.
2. Binomial becomes normal as N increases.
3. Applies to other statistics as well (e.g.,
variance)

27
Properties of the Normal

If a distribution is normal, the sampling
distribution of the mean is normal regardless of
N.
If a distribution is normal, the sampling
distributions of the mean and variance are
independent.

28
Confidence Intervals for the Mean

Over samples of size N, the probability is .95
for
Similarly for sample values of the mean, the
probability is .95 that
The population mean is likely to be within 2
standard errors of the sample mean.
Can use the Normal to create any size confidence
interval (85, 99, etc.)

29
Size of the Confidence Interval

The size of the confidence interval depends on
desired certainty (e.g., 95 vs 99 pct) and the
size of std error of mean ( ).
Std err of mean is controlled by population SD
and sample size. Can control sample size.
SD 10. If N25 then SEM 2 and CI width is about
8. If N100, then SEM 1 and CI width is about
4. CI shrinks as N increases. As N gets large,
decreasing change in CI because of square root.
Less bang for buck as N gets big.

30
Review

What is the central limit theorem?
Why is the number 1.96 a big deal?
Assume that scores on a curiosity scale are
normally distributed. If the sample mean is 50
based on 100 people and the population SD is 10,
find an approx 99 pct CI for the population mean.

Write a Comment

User Comments (0)