Title: Estimation Bias, Standard Error and Sampling Distribution
1 Estimation Bias, Standard Error and Sampling
Distribution
Topic 9
2From sample to population
- Inductive (inferential) statistical methods
Make inference about a population based on
information from a sample derived from that
population
Population
inductive statistical methods
sample
3Statistical Concepts of Sampling
- Suppose we want to estimate the mean birthweight
of Malay male live births in Singapore, 1992 - Due to logistical constraints, we decide to take
a random sample of 50 live births from the
records of all Malay male live births for that
year
4Sampling from Target Population
Target population
random sample of 50 Malay male live births in
Singapore, 1992
All Malay male live births in Singapore, 1992
Suppose sample mean 3.55 kg sample SD (S)
0.92 kg What can we say about the population mean?
5Statistical Modeling
- Assume the population values follow a normal or
some other appropriate distribution. This means a
relative frequency histogram of the population
values will look like a normal or that
appropriate distribution. - Assume we have a random sample, i.e., we sample
n (50 in example) values independently from the
population
6Notation
Sample data
Assume
are independent and each is
distributed according to say a normal distribution
Population parameters
Population mean mean of the normal population
Population variance variance of the normal
population
Population standard deviation
7Statistical Inference
Two general areas (a) Statistical
Estimation i.e. estimating population parameters
based on sample statistics
(b) Hypothesis Testing i.e. testing certain
assumptions about the population
Also called Test of Statistical Significance
8Statistical Estimation
- There are two ways by which a population
parameter can be estimated from a sample - (1) Point estimate
- (2) Interval estimate
9Point Estimate
- Estimate the population parameter by a
- single value
- Sample mean population mean
- Sample median population median
- Sample variance population variance
- Sample SD population SD
- Sample proportion population proportion
10Point Estimate
- If the average birthweight for a random sample of
Malay male births was 3.55 kg and we use it to
estimate m, the mean birthweight of all Malay
male births in the population, we would be making
a point estimate for m
- Poor practice to report just the point estimate
because people cannot judge how good the estimate
is - Should also report the accuracy of the estimate.
- Remember that the quality of an estimator is
judged by its performance over REPEATED SAMPLING
although we have just one sample in hand.
Inference for population parameter should make
allowance for sampling error
11Accuracy of statistical estimation
- Two types of error
- (a) Sampling error or fluctuation
- random error or fluctuation that is due
entirely to chance in the process of sampling.
Minimizing the sampling error maximizes the
precision of a statistical estimate.
(b) Systematic error or bias Non-random
error/bias which is either a property of the
estimator itself or due to bias in the sampling
or measurement process. Minimizing the
systematic error maximizes the validity of a
statistical estimate. Systematic errors can be
minimized by making efforts to reduce measurement
bias (eg non-random sampling, non-response and
non-coverage, untruthful answers, unreliable
calibration, errors with data recording and
coding etc)
12Unbiased estimation of the mean
i.e., the sample mean equals the population
mean when averaged over repeated samples
13Hypothetical results of repeated sampling
- Unbiasedness means the sample mean equals the
population mean when averaged over repeated
samples - However, there is fluctuation from sample to
sample - Variance ?
14(No Transcript)
15(No Transcript)
16Standard Error (SE) of an estimator
- The SE of an estimator (e.g., the sample mean) is
just the standard deviation (SD) of the
estimator. It measures the variability of the
estimator under repeated sampling - SE is just a special case of SD
- The reason why the standard deviation of an
estimator is called standard error is because it
is a measure the magnitude of the estimation
error due to sampling fluctuation
17Standard Deviation vs Standard Error
- The population standard deviation (SD) measures
the amount of variation among the individual
measurements that make up the population and can
be estimated from a sample using the sample
standard deviation. - The standard error (e.g. of the sample mean), on
the other hand, measures how much the value of
the estimator changes from sample to sample under
repeated sampling. - As we take only 1 sample rather that repeated
samples in practice, it seems impossible at first
to estimate standard error which is defined with
reference to repeated sampling. - Fortunately, the standard error of the sample
mean is a function of the population SD. As the
latter is estimable from a single sample, so is
the standard error.
18Estimated standard error of the sample mean
- Let denote the population SD
- It was shown earlier that
- SE SD(sample mean) / , where n is
the sample size - Since can be estimated by the sample standard
deviation S, we can estimate the standard error
by SE S/
Note that SE decreases with n at the rate 1/
, i.e., the precision of the sample mean improves
as sample size increases
19Knowing the mean and standard error of an
estimator still doesnt tell us the whole story
The whole story is told by the sampling
distribution since that helps in calculating the
probabilities
20Sampling distribution of the sample mean
- The distribution of the sample mean under
repeated sampling from the population
- Distribution of the sample mean rather than
individual measurements - In practice, we take only one sample, not
repeated samples and so the sampling distribution
is unobserved but fortunately it can often be
derived theoretically
Demo http//www.ruf.rice.edu/lane/stat_sim/inde
x.html
21Exact result when sampling from a normal
population
- If the population is normal with mean and
variance , then the sample mean based on a
random sample of size n is also normal with mean
and variance - Note how we can derive theoretically the
distribution of the sample mean under repeated
sampling without actually drawing repeated
samples - This is important because we usually only have
one sample at our disposal in practice
22Topic 10 Interval Estimate
- Provides an estimate of the population parameter
by defining an interval or range of plausible
values within which the population parameter
could be found with a given confidence. - This interval is called a confidence interval.
- The sampling distribution is used in constructing
confidence intervals.
23Confidence interval for the mean of a normal
population
Fact With probability 0.95, a normally
distributed variable is within 1.96 standard
deviations from its mean.
Now
- It follows that the sample mean must be within
1.96 standard errors from the population mean
with probability 0.95. - Equivalently, the population mean is within 1.96
standard errors from the sample mean.
24We call
a 95 confidence interval for the population mean.
If is unknown, replace it by the sample SD
and replace 1.96 by the upper 2.5-percentile of a
t-distribution with n-1 degrees of freedom to
yield
25as a 95 confidence interval for the population
mean
26The t densities
- t densities are symmetric and similar in
appearance to N(0,1) density but with heavier
tails - Tables for t distributions are widely available
- As d.f. increases, t distribution converges to
standard normal distribution
Demo http//www.isds.duke.edu/sites/java.html
2795 confidence interval for the population mean
Birthweight data revisited
- n 100, Sample mean 3.55 kg, S 0.92 kg
- SE .92/sqrt(50) 0.13 kg
- d.f. 49, upper 2.5-percentile of t 2.01
- 95 C.I. for the mean Malay male birthweight is
- 3.55 /- 2.01 (0.13) (3.29 kg, 3.81 kg)
28The meaning of confidence interval
Under repeated sampling,
will contain the true mean 95 of the times.
29Demo http//www.isds.duke.edu/sites/java.html