Sampling Experiment - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

Sampling Experiment

Description:

... people, asking who they would vote for in the upcoming presidential election. ... Exit poll: ask 1000 voters who they just voted for. 480 say 'Bush'; = 0.48 ... – PowerPoint PPT presentation

Number of Views:28
Avg rating:3.0/5.0
Slides: 52
Provided by: SteveF6
Category:

less

Transcript and Presenter's Notes

Title: Sampling Experiment


1
Sampling Experiment
  • Please randomly select 10 slips of paper and pass
    the bucket to the next person
  • Each slip represents a household in a small city
    the household income is written on the slip
  • Calculate the mean income for your sample and
    save your result
  • If you have a computer (or a calculator with this
    function), also calculate the sample standard
    deviation
  • Return the slips to the bucket

2
Sampling Terminology
  • Population set of all members, events, or
    measurements that we wish to characterize
  • Random sample sample chosen from the population
    by means of a random mechanism
  • the best way to ensure an unbiased,
    representative sample
  • the only way to allow accuracy of inferences to
    be quantified
  • Judgment/convenience samples are hopelessly
    biased, making statistical analysis worthless

3
Predicting the 1936 Election
  • In 1936, Literary Digest mailed questionnaires to
    10 million people, asking who they would vote for
    in the upcoming presidential election. The list
    was complied from magazine subscribers, car
    owners and telephone directories. Based on the
    2.3 million responses, they predicted a victory
    for Republican Landon over Roosevelt by a 60 to
    40 margin.
  • Roosevelt won with 61 of the vote, to 36 for
    Landon.
  • George Gallup correctly predicted the
    electionand the results of the Literary Digest
    poll!to within 1 percent, using random samples.

4
Sources of Bias
  • Selection bias some members of population more
    likely to be selected for sample than others
  • Non-response bias members of sample who do not
    respond to survey would have answered differently
    from those who do respond
  • Evasive or untruthful response respondents give
    socially acceptable answer
  • Recall or reporting bias some respondents are
    more likely to report an event than others
  • Measurement error poorly worded questions,
    imprecise answers

5
Non-response rate of HIV infection
  • It is important for policy and predictive
    purposes to know the proportion of the population
    that is infected with HIV.
  • In a survey conducted by the National Center for
    Health Statistics, the screening response rate
    was 95 percent of those, 85 percent gave a blood
    sample. Of those giving a sample, about 0.5
    percent were infected with HIV.
  • What is the best estimate of the rate of HIV
    infection in the population?

6
Non-response rate of HIV infection
  • We dont know the rate of infection among those
    refusing to give a blood sample.
  • What should we do? Ignore this group? Assume the
    same rate of infection as those agreeing to give
    a blood sample?
  • Do you expect the rate of infection in this group
    to be higher or lower than for those agreeing to
    give a sample?
  • How could we estimate the rate of infection in
    the second group?
  • Would stratifying the population help?

7
Random Sampling Techniques
  • Simple Random Sample
  • Systematic sampling
  • Stratified sampling
  • Cluster sampling
  • Capture-recapture sampling

8
Simple Random Sample (SRS)
  • Identify every member of the population, N
  • Use a random mechanism to select a sample of size
    n, such that every member of the population has
    the same chance of being selected
  • assign random number to each member of population
  • sort random numbers, select members with the n
    smallest random numbers
  • The statistical techniques used in this course
    apply only to an SRS

9
Simple Random Sample n 20, N 2000
10
Systematic Sampling
  • Select every mth member of population
  • Select sampling interval divide population size
    (N) by sample size (n), m N/n
  • Use a random mechanism to choose a number k
    between 1 and m
  • Choose members k, (mk), (2mk)...
  • Example N 2,000, n 20, m 100 select
    members 45, 145, 2451945 for sample
  • Better (more representative) than SRS if no
    natural trends or strata

11
Systematic sample n 20, N 2000, k 45
12
Stratified Sampling
  • Suppose we can identify various subpopulations or
    strata within the population.
  • Select a simple random sample from each stratum
    instead of from the entire population. This is
    called stratified sampling.
  • Size of samples from each strata can be equal, or
    proportional to size of strata.
  • Better than SRS when there is considerable
    variation between the various strata but
    relatively little variation within a given
    stratum.

13
Stratified sample of 20 from 4 strata
14
Cluster Sampling
  • To sample households, divide city into blocks,
    choose a simple (or stratified) random sample of
    blocks, then sample all the households in the
    chosen blocks.
  • In this case the city blocks are called clusters
    and the sampling is called cluster sampling.
  • The advantage of cluster sampling is convenience
    and lower cost.
  • Real applications are often more complex and use
    multistage sampling schemes.

15
Cluster Sample of 20 (cluster size 4)
16
Multi-Stage Sampling Schemes CPS
  • Current Population Survey 60,000
    households/month
  • 1. 3,141 U.S. cities and counties are grouped
    into 2,007 Primary Sampling Units (PSUsgroups of
    counties)
  • 2. The PSUs are grouped into 754 strata (428 with
    1 PSU)
  • 3. One PSU is randomly selected from each
    stratum probability of selection is proportional
    to population
  • 4. PSUs are divided into Census Enumeration Units
    (CEU ? 300 households) ? 5 CEUs are randomly
    selected from each PSU
  • Each CEU is divided into Ultimate Survey Units
    (USU ? 4 households) ? 6 USUs are randomly
    selected from each CEU, interviewed during week
    of 12th day
  • Each month, one quarter of the USUs are replaced

17
A Simple Random Sample in Excel
  • Data Analysis Toolpack
  • Tools/Data Analysis/Random Number Generation
  • Tools/Data Analysis/Sampling
  • RAND Function
  • Insert new column in data set
  • Enter RAND() in each new cell
  • Copy random numbers, Paste Special/Values
  • Sort observations by random number, select first
    n observations

18
Capture-Recapture Sampling
  • Collect and tag a sample wait collect a
    second sample and determine tagged fraction
  • Used to estimate the size of difficult-to-count
    populations
  • trout in a lake, insects in a field
  • homeless, illegal immigrants, drug users

19
Number of Trout in a Lake
  • Suppose I want to estimate the number of trout in
    a remote mountain lake, as part of a program to
    monitor the effects of acid deposition
  • Catch 50 trout tag and release each trout.
  • One month later, return to the same lake and
    catch 60 trout. Of these, 6 are tagged.
  • How many trout are in the lake?
  • What assumptions did you make?

20
Number of Trout in a Lake
  • The second sample reveals that 10 of the trout
    in the lake (6 of 60) are tagged
  • There are 50 tagged trout in the lake if 10 are
    tagged, there must be a total of 500 trout

21
Assumptions
  • Tagged and untagged trout have roughly equal
    probabilities of being caught
  • Number of births between samples small compared
    with population
  • Number of deaths between samples small compared
    with population (or death rate roughly equal for
    tagged and untagged trout)

22
Sources of Estimation Error
  • Two types of errors can occur when we sample
  • Sampling error no sample is perfectly
    representative of the population some samples
    will be particularly unlucky
  • sampling errors can be understood and quantified
    using statistics
  • Nonsampling errors various mechanisms (e.g.,
    selection, nonresponse, recall bias) can
    systematically bias estimates
  • difficult to quantify not covered here

23
Sampling Error
  • Suppose we are estimating a population mean, m.
    We draw a sample of size n and calculate the
    sample mean, . This is a point estimate of m.
  • The sampling error is the difference between the
    sample mean and the population mean
  • How big is the sampling error? In other words,
    how accurate is the estimate?
  • We dont know, because we dont know m. But we
    can answer this question probabilistically.

24
Distribution of the Sample Mean
  • Imagine that we collect many random samples, of
    size n and compute many sample means
  • We make a frequency table of all the sample means
    and construct a histogram
  • If the number of samples is very large, the
    histo-gram becomes a continuous probability
    distribu-tionthe distribution of the sample mean
  • The sample mean is normally distributed with
  • a mean of m
  • a standard deviation of

25
Central Limit Theorem
  • Regardless of how x is distributed, the sample
    mean is normally distributed (as long as the
    sample size, n, is reasonably large)
  • The standard deviation of the sample mean is
    called the standard error
  • if you dont know population standard deviation,
    s, use the sample standard deviation, s

26
Example
  • An auditor selects a sample of 100 account
    balances from a population of 10,000
  • The sample mean is 279 the sample standard
    deviation is 420 (obvious positive skew)
  • The auditor can be 95 certain that the mean of
    all 10,000 accounts is somewhere in the interval
    279 84, that is, between 195 and 363.

27
Caveats
  • Sample is reasonably large
  • depends on distribution of x if normal, any n
    if reasonably symmetrical, n gt 30 if highly
    asymmetrical, then n gt 100
  • Sample is small fraction of the population
    otherwise use the finite population correction
  • N and n are size of population and sample if
    N 10,000, n 100, fpc (9900/9999)½ 0.995.

28
Sampling Experiment
  • Population mean m 40,000
  • Population standard deviation s 15,000
  • Sample size n 10
  • 68 chance that any particular sample mean will
    be within one SE of the population mean 40,000
    4750, or 35,250 lt lt 44,750
  • 95 of sample means will be within two SE 40,000
    9500, or 30,500 lt lt 49,500

29
Sampling Experiment
  • Our experiment is limited by the small sample
    size (n10) and the small number of samples
  • Using Excel, we can explore larger samples and
    larger numbers of samples
  • In income sampling.xls, we investigate the actual
    distribution of sample means for 250 random
    samples of size 10, 100, and 1000, and compare
    the results to what we would expect from the
    Central Limit Theorem

30
Population Distribution
31
Distribution of Sample Means
32
Distribution of Sample Means
33
Distribution of Sample Means
34
Distribution of Sample Means
35
Determining Sample Size
  • What sample size is necessary to estimate the
    population mean with a given accuracy?
  • Let B acceptable sampling error 2SE (i.e., a
    95 percent chance that will be in interval m
    B
  • In above example, if we want to estimate mean
    household income with an accuracy of 1,000

36
Standard Error for Proportions
  • Let p population proportion
  • Example percent voting for Bush
  • Draw a random sample of size n
  • Determine sample proportion
  • The standard error of the sample proportion
  • if np 5, n(1 p) 5 (i.e., more than 5
    voting for Bush and more than 5 voting for Gore)

37
Heads in 1, 10, 100, 1000 Tosses
38
Example
  • Exit poll ask 1000 voters who they just voted
    for
  • 480 say Bush 0.48
  • 68 chance that the population proportion is 48
    1.6, or between 46.4 and 49.6
  • 95 chance that the population proportion is 48
    3.2, or between 44.8 and 51.2

39
Determining Sample Size Proportions
  • Let B acceptable sampling error 2SE (i.e., a
    95 percent chance that will be in interval p
    B
  • The unemployment rate is about 5 if we want to
    measure the rate with an accuracy of 0.1, we
    need to survey almost 200,000 workers

40
Difference Between Two Means
  • Suppose X1 and X2 are independent random
    variables
  • If Y X1 X2, then

Sample means are independent random variables.
41
Study Design
  • Two types of studies
  • experimental
  • identifies a cohort or group of subjects, imposes
    one or more treatments in order to observe a
    response
  • observational
  • gathers data without influencing response
    sometimes called a natural experiment

42
Experimental Design
  • The ideal experiment random assignment of
    subjects into a control group and one or more
    treatment groups
  • treatment is a combination of explanatory
    variables except for treatment, subjects in all
    groups are handled same
  • Only systematic reason for differences between
    groups is the treatment
  • One must still account for random effects
  • differences considered too large to be due to
    random effects are statistically significant

43
Comparative Design
  • Comparative design is necessary to ensure that
    the measured treatment effect is due solely to
    the treatment
  • blind experiment subject does not know whether
    he is in control or treatment group placebo used
    for contols
  • double-blind the person interacting with
    subjects, measuring response does not know which
    group the subject is in

44
Matching
  • Some studies match control, experimental group
    subjects by age, gender, race, etc.
  • This can lead to smaller random effects, but
    random selection is still necessary to control
    for other variables (stratify population and then
    apply random selection to each strata)
  • Matched pairs is particularly powerful
  • each subject subjected to various treatments,
    difference in response measured

45
Policy Experiments
  • Welfare recipients randomly assigned to control
    or treatment group treatment group required to
    attend classes, look for work, or lose benefits
  • Reemployment bonus applicants randomly assigned
    to control or various treatment groups treatment
    group offered a bonus (3 or 6 x WBA) if they find
    work in specified time (6 or 12 weeks)
  • Class size students and teachers randomly
    assigned to small (13-17) or large (22-26) class
  • Vouchers students apply for voucher, half are
    randomly selected

46
Experiments Not Always Possible
  • Experiments can be
  • expensive
  • controversial (subjects often do not want to be
    in the control or treatment group)
  • unethical (split twins, stuttering, etc.)
  • impossible (effect of economy on election
    results, greenhouse gas emissions on climate,
    etc.)

47
Observational Studies
  • Natural experiments differences in explanatory
    variable occur between groups or over time
  • prospective identify groups that differ in some
    aspect (diet), track and measure outcomes
  • retrospective examine data collected after the
    response, correlate to explanatory variable
  • Observational studies are suffer from
  • confounding variables variables correlated to
    explanatory and response
  • selection bias systematic differences in group
    characteristics

48
Natural Experiments
  • Teen smoking price of cigarettes, laws vary from
    state to state and over time
  • Vouchers track performance of students who
    receive vouchers, compare to other students
  • Cancer search for patternsgeographical or
    occupational clusters of diseasein public health
    records
  • Discrimination search for differences in salary,
    promotions, mortgage lending, etc. by gender and
    race

49
Case-Control
  • Some conditions are too rare to permit
    prospective studies for example
  • brain cancer from cell phone use or exposure to
    high-voltage transmission lines
  • leukemia or thyroid cancer from exposure to
    fallout
  • Case group is composed of those with condition
    control group selected to match other
    characteristics of case group
  • Explanatory variable measured for both groups

50
The studies that found high suicide rates did not
include women who had implants after
mastectomies. Some researchers say the high
suicide rate reflects the psychological makeup of
women who seek implants -- that as a group they
are more likely to have psychological problems
than the general population. But others say the
high suicide rate is a function of the
difficulties and pain that sometimes occur years
after the surgery. Although the FDA has
restricted the use of silicone gel implants for
cosmetic purposes, saline-filled implants have
gained popularity. According to the American
Society of Plastic Surgeons, more than 225,000
women had the operation last year, and some say
many more will opt for it should silicone gel
implants become more available. Many women say
silicone looks more natural and feels better.
The Finnish study, which included women who
received the implants as long as 30 years ago,
reported on 2,166 women. It was conducted by the
private International Epidemiology Institute of
Rockville and funded by Dow Corning Corp., a
former manufacturer of silicone gel breast
implants. Dow Corning also funded the larger
Swedish study, which examined 3,521 women with
implants and also found a suicide rate about
three times above normal. "The ironic thing is
that nobody was looking for this suicide
information," said Joseph K. McLaughlin, lead
investigator on the Finnish study, published in
the Annals of Plastic Surgery. "There have been
lots of studies of women with breast implants,
and the only consistent finding that's
problematic is the suicide excess." But
McLaughlin said that the data did not prove a
cause-and-effect connection between breast
implants and suicide, and that the high rate may
be related to the nature of women who choose to
have implants. "In fact," he said, "it could be
that because of characteristics of women who get
implants, it may be that women who get them may
reduce their risk of later suicide."
Breast Implants Linked to Suicide By Marc
KaufmanWashington Post Staff WriterThursday,
October 2, 2003 Page A13
A series of studies has found a surprisingly
high suicide rate among women who have had
cosmetic breast implants, renewing the
controversy about the procedure just as the Food
and Drug Administration weighs whether to allow
silicone gel implants back on the market. The
latest study, published yesterday, found that
Finnish women who had cosmetic implants were more
than three times more likely to commit suicide
than the general population -- in line with
findings from a similar study of Swedish women
and one of American women conducted by the
National Cancer Institute. The three studies
also found that the overall death rate for women
with implants was the same as or lower than for
the general population, suggesting that the
implants themselves were not causing illness, as
once feared. But all three found that the suicide
rate was significantly, and at this point
inexplicably, higher than expected. The
question of why women with implants are so much
more likely to commit suicide has become
controversial, especially with an FDA advisory
panel preparing to consider an application by
Inamed Corp. to allow silicone gel breast
implants back on the market for breast
enhancement. The FDA restricted their use to
mastectomy patients and women in clinical trials
in 1992 after concerns arose about their safety.
51
For Love and Money By Richard MorinWashington
Post, September 28, 2003 Page B05 Want to
be wealthy? If you're a woman, two distinct paths
seem to increase the odds that you'll strike it
rich Marry young and don't have kids, or remain
single your entire life. But if you're a man and
dreaming of making a fortune, you can flip a coin
before deciding whether to go to the altar --
married or single men have about the same chance
of becoming wealthy, claim three sociologists who
have studied earnings over the course of a
person's lifetime. Forget kids if you're
seeking financial rather than emotional riches.
In statistical terms, children are a lousy
short-term financial investment, assert Thomas A.
Hirschl and Joyce Altobelli of Cornell
University. Past research has repeatedly shown
a link between marriage and affluence. Generally,
individuals who got married were more likely to
achieve wealth than people who didn't -- in fact,
married couples were more than twice as likely as
singles to have experienced at least a year of
affluence during the 25-year study period.
"Marriage enhances the odds of female affluence,
but not male affluence," they report in an
article scheduled to appear in a forthcoming
issue of the Journal of Marriage and Family.
"There is no statistically significant difference
between the life course of marital affluence and
the life course of nonmarried male affluence,
suggesting the decision to marry is not crucial
for men. This decision would, however, appear to
be quite crucial for most women." But who's
the richest of them all? It wasn't middle-aged or
older couples. Instead, it was younger (under 45)
marrieds with no children. Nearly two-thirds of
all childless couples between the ages of 25 and
45 were rich for at least a year during the study
period, compared with fewer than one in four
couples with children. So to get rich, if only
for a little while, Hirschl's advice is "marry
young and use contraceptives or have a
vasectomy." But, he was quick to add, kids are
cool, as well as critical for the perpetuation of
the species. "There are more important things in
life than merely financial success," he said.
Write a Comment
User Comments (0)
About PowerShow.com