Estimation Bias, Standard Error and Sampling Distribution - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Estimation Bias, Standard Error and Sampling Distribution

Description:

Assume the population values follow a normal or some other ... i.e. estimating population parameters based on sample statistics. Statistical Inference ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 30
Provided by: stak9
Category:

less

Transcript and Presenter's Notes

Title: Estimation Bias, Standard Error and Sampling Distribution


1
Estimation Bias, Standard Error and Sampling
Distribution
Topic 9
2
From sample to population
  • Inductive (inferential) statistical methods

Make inference about a population based on
information from a sample derived from that
population
Population
inductive statistical methods
sample
3
Statistical Concepts of Sampling
  • Suppose we want to estimate the mean birthweight
    of Malay male live births in Singapore, 1992
  • Due to logistical constraints, we decide to take
    a random sample of 50 live births from the
    records of all Malay male live births for that
    year

4
Sampling from Target Population

Target population
random sample of 50 Malay male live births in
Singapore, 1992
All Malay male live births in Singapore, 1992
Suppose sample mean 3.55 kg sample SD (S)
0.92 kg What can we say about the population mean?
5
Statistical Modeling
  • Assume the population values follow a normal or
    some other appropriate distribution. This means a
    relative frequency histogram of the population
    values will look like a normal or that
    appropriate distribution.
  • Assume we have a random sample, i.e., we sample
    n (50 in example) values independently from the
    population

6
Notation
Sample data
Assume
are independent and each is
distributed according to say a normal distribution
Population parameters
Population mean mean of the normal population
Population variance variance of the normal
population
Population standard deviation
7
Statistical Inference
Two general areas (a) Statistical
Estimation i.e. estimating population parameters
based on sample statistics
(b) Hypothesis Testing i.e. testing certain
assumptions about the population
Also called Test of Statistical Significance
8
Statistical Estimation
  • There are two ways by which a population
    parameter can be estimated from a sample
  • (1) Point estimate
  • (2) Interval estimate

9
Point Estimate
  • Estimate the population parameter by a
  • single value
  • Sample mean population mean
  • Sample median population median
  • Sample variance population variance
  • Sample SD population SD
  • Sample proportion population proportion

10
Point Estimate
  • If the average birthweight for a random sample of
    Malay male births was 3.55 kg and we use it to
    estimate m, the mean birthweight of all Malay
    male births in the population, we would be making
    a point estimate for m
  • Poor practice to report just the point estimate
    because people cannot judge how good the estimate
    is
  • Should also report the accuracy of the estimate.
  • Remember that the quality of an estimator is
    judged by its performance over REPEATED SAMPLING
    although we have just one sample in hand.

Inference for population parameter should make
allowance for sampling error
11
Accuracy of statistical estimation
  • Two types of error
  • (a) Sampling error or fluctuation
  • random error or fluctuation that is due
    entirely to chance in the process of sampling.
    Minimizing the sampling error maximizes the
    precision of a statistical estimate.

(b) Systematic error or bias Non-random
error/bias which is either a property of the
estimator itself or due to bias in the sampling
or measurement process. Minimizing the
systematic error maximizes the validity of a
statistical estimate. Systematic errors can be
minimized by making efforts to reduce measurement
bias (eg non-random sampling, non-response and
non-coverage, untruthful answers, unreliable
calibration, errors with data recording and
coding etc)
12
Unbiased estimation of the mean
i.e., the sample mean equals the population
mean when averaged over repeated samples
13
Hypothetical results of repeated sampling
  • Unbiasedness means the sample mean equals the
    population mean when averaged over repeated
    samples
  • However, there is fluctuation from sample to
    sample
  • Variance ?

14
(No Transcript)
15
(No Transcript)
16
Standard Error (SE) of an estimator
  • The SE of an estimator (e.g., the sample mean) is
    just the standard deviation (SD) of the
    estimator. It measures the variability of the
    estimator under repeated sampling
  • SE is just a special case of SD
  • The reason why the standard deviation of an
    estimator is called standard error is because it
    is a measure the magnitude of the estimation
    error due to sampling fluctuation

17
Standard Deviation vs Standard Error
  • The population standard deviation (SD) measures
    the amount of variation among the individual
    measurements that make up the population and can
    be estimated from a sample using the sample
    standard deviation.
  • The standard error (e.g. of the sample mean), on
    the other hand, measures how much the value of
    the estimator changes from sample to sample under
    repeated sampling.
  • As we take only 1 sample rather that repeated
    samples in practice, it seems impossible at first
    to estimate standard error which is defined with
    reference to repeated sampling.
  • Fortunately, the standard error of the sample
    mean is a function of the population SD. As the
    latter is estimable from a single sample, so is
    the standard error.

18
Estimated standard error of the sample mean
  • Let denote the population SD
  • It was shown earlier that
  • SE SD(sample mean) / , where n is
    the sample size
  • Since can be estimated by the sample standard
    deviation S, we can estimate the standard error
    by SE S/

Note that SE decreases with n at the rate 1/
, i.e., the precision of the sample mean improves
as sample size increases
19
Knowing the mean and standard error of an
estimator still doesnt tell us the whole story
The whole story is told by the sampling
distribution since that helps in calculating the
probabilities
20
Sampling distribution of the sample mean
  • The distribution of the sample mean under
    repeated sampling from the population
  • Distribution of the sample mean rather than
    individual measurements
  • In practice, we take only one sample, not
    repeated samples and so the sampling distribution
    is unobserved but fortunately it can often be
    derived theoretically

Demo http//www.ruf.rice.edu/lane/stat_sim/inde
x.html
21
Exact result when sampling from a normal
population
  • If the population is normal with mean and
    variance , then the sample mean based on a
    random sample of size n is also normal with mean
    and variance
  • Note how we can derive theoretically the
    distribution of the sample mean under repeated
    sampling without actually drawing repeated
    samples
  • This is important because we usually only have
    one sample at our disposal in practice

22
Topic 10 Interval Estimate
  • Provides an estimate of the population parameter
    by defining an interval or range of plausible
    values within which the population parameter
    could be found with a given confidence.
  • This interval is called a confidence interval.
  • The sampling distribution is used in constructing
    confidence intervals.

23
Confidence interval for the mean of a normal
population
Fact With probability 0.95, a normally
distributed variable is within 1.96 standard
deviations from its mean.
Now
  • It follows that the sample mean must be within
    1.96 standard errors from the population mean
    with probability 0.95.
  • Equivalently, the population mean is within 1.96
    standard errors from the sample mean.

24
We call
a 95 confidence interval for the population mean.
If is unknown, replace it by the sample SD
and replace 1.96 by the upper 2.5-percentile of a
t-distribution with n-1 degrees of freedom to
yield
25
as a 95 confidence interval for the population
mean
26
The t densities
  • t densities are symmetric and similar in
    appearance to N(0,1) density but with heavier
    tails
  • Tables for t distributions are widely available
  • As d.f. increases, t distribution converges to
    standard normal distribution

Demo http//www.isds.duke.edu/sites/java.html
27
95 confidence interval for the population mean
Birthweight data revisited
  • n 100, Sample mean 3.55 kg, S 0.92 kg
  • SE .92/sqrt(50) 0.13 kg
  • d.f. 49, upper 2.5-percentile of t 2.01
  • 95 C.I. for the mean Malay male birthweight is
  • 3.55 /- 2.01 (0.13) (3.29 kg, 3.81 kg)

28
The meaning of confidence interval
Under repeated sampling,
will contain the true mean 95 of the times.
29
Demo http//www.isds.duke.edu/sites/java.html
Write a Comment
User Comments (0)
About PowerShow.com