Title: Producing Data: Sampling
1Chapter 7
2Population and Sample
- Researchers often want to answer questions about
some large group of individuals (this group is
called the population) - Often the researchers cannot measure (or survey)
all individuals in the population, so they
measure a subset of individuals that is chosen to
represent the entire population (this subset is
called a sample) - The researchers then use statistical techniques
to make conclusions about the population based on
the sample
3How Data are Obtained
- Observational Study
- Observes individuals and measures variables of
interest but does not attempt to influence the
responses - Describes some group or situation
- Sample surveys are observational studies
- Experiment
- Deliberately imposes some treatment on
individuals in order to observe their responses - Studies whether the treatment causes change in
the response.
4Experiment versusObservational Study
- Both typically have the goal of detecting a
relationship between the explanatory and response
variables. - Experiment
- create differences in the explanatory variable
and examine any resulting changes in the response
variable (cause-and-effect conclusion) - Observational Study
- observe differences in the explanatory variable
and notice any related differences in the
response variable (association between variables)
5Why Not Always Use an Experiment?
- Sometimes it is unethical or impossible to assign
people to receive a specific treatment. - Certain explanatory variables, such as handedness
or gender, are inherent traits and cannot be
randomly assigned.
6Confounding
- The problem
- in addition to the explanatory variable of
interest, there may be other variables
(explanatory or lurking) that make the groups
being studied different from each other - the impact of these variables cannot be separated
from the impact of the explanatory variable on
the response
7Confounding
- The solution
- Experiment randomize experimental units to
receive different treatments (possible
confounding variables should even out across
groups) - Observational Study measure potential
confounding variables and determine if they have
an impact on the response(may then adjust for
these variables in the statistical analysis)
8Question
A recent newspaper article concluded that smoking
marijuana at least three times a week resulted in
lower grades in college. How do you think the
researchers came to this conclusion? Do you
believe it? Is there a more reasonable
conclusion?
9Case Study
The Effect of Hypnosis on the Immune System
reported in Science News, Sept. 4, 1993, p. 153
10Case Study
The Effect of Hypnosis on the Immune System
Objective To determine if hypnosis strengthens
the disease-fighting capacity of immune cells.
11Case Study
- 65 college students.
- 33 easily hypnotized
- 32 not easily hypnotized
- white blood cell counts measured
- all students viewed a brief video about the
immune system.
12Case Study
- Students randomly assigned to one of three
conditions - subjects hypnotized, given mental exercise
- subjects relaxed in sensory deprivation tank
- control group (no treatment)
13Case Study
- white blood cell counts re-measured after one
week - the two white blood cell counts are compared for
each group - results
- hypnotized group showed larger jump in white
blood cells - easily hypnotized group showed largest immune
enhancement
14Case Study
The Effect of Hypnosis on the Immune System
What is the population? What is the sample?
15Case Study
The Effect of Hypnosis on the Immune System
Is this an experiment or an observational study?
16Case Study
The Effect of Hypnosis on the Immune System
Does hypnosis and mental exercise affect the
immune system?
17Case Study
Weight Gain Spells Heart Risk for Women
Weight, weight change, and coronary heart
disease in women. W.C. Willett, et. al., vol.
273(6), Journal of the American Medical
Association, Feb. 8, 1995. (Reported in Science
News, Feb. 4, 1995, p. 108)
18Case Study
Weight Gain Spells Heart Risk for Women
Objective To recommend a range of body mass
index (a function of weight and height) in terms
of coronary heart disease (CHD) risk in women.
19Case Study
- Study started in 1976 with 115,818 women aged 30
to 55 years and without a history of previous
CHD. - Each womans weight (body mass) was determined
- Each woman was asked her weight at age 18.
20Case Study
- The cohort of women were followed for 14 years.
- The number of CHD (fatal and nonfatal) cases were
counted (1292 cases). - Results were adjusted for other variables
(smoking, family history, menopausal status,
post-menopausal hormone use).
21Case Study
- Results compare those who gained less than 11
pounds (from age 18 to current age) to the
others. - 11 to 17 lbs 25 more likely to develop heart
disease - 17 to 24 lbs 64 more likely
- 24 to 44 lbs 92 more likely
- more than 44 lbs 165 more likely
22Case Study
Weight Gain Spells Heart Risk for Women
What is the population? What is the sample?
23Case Study
Weight Gain Spells Heart Risk for Women
Is this an experiment or an observational study?
24Case Study
Weight Gain Spells Heart Risk for Women
Does weight gain in women increase their risk for
CHD?
25Bad Sampling Designs
- Voluntary response sampling
- allowing individuals to choose to be in the
sample - Convenience sampling
- selecting individuals that are easiest to reach
- Both of these techniques are biased
- systematically favor certain outcomes
26Voluntary Response
- To prepare for her book Women and Love, Shere
Hite sent questionnaires to 100,000 women asking
about love, sex, and relationships. - 4.5 responded
- Hite used those responses to write her book
- Moore (Statistics Concepts and Controversies,
1997) noted - respondents were fed up with men and eager to
fight them - the anger became the theme of the book
- but angry women are more likely to respond
27Convenience Sampling
- Sampling mice from a large cage to study how a
drug affects physical activity - lab assistant reaches into the cage to select the
mice one at a time until 10 are chosen - Which mice will likely be chosen?
- could this sample yield biased results?
28Simple Random Sampling
- Each individual in the population has the same
chance of being chosen for the sample - Each group of individuals (in the population) of
the required size (n) has the same chance of
being the sample actually selected - Random selection
- drawing names out of a hat
- table of random digits
- computer software
29Table of Random Digits
- Table B on pg. 654 of text
- each entry is equally likely to be any of the 10
digits 0 through 9 - entries are independent of each other (knowledge
of one entry gives no information about any other
entries) - each pair of entries is equally likely to be any
of the 100 pairs 00, 01,, 99 - each triple of entries is equally likely to be
any of the 1000 values 000, 001, , 999
30Choosing a Simple Random Sample (SRS)
- STEP 1 Label each individual in the population
- STEP 2 Use Table B to select labels at random
31Probability Sample
- a sample chosen by chance
- must know what samples are possible and what
chance, or probability, each possible sample has
of being selected - a SRS gives each member of the population an
equal chance to be selected
32Stratified Random Sample
- first divide the population into groups of
similar individuals, called strata - second, choose a separate SRS in each stratum
- third, combine these SRSs to form the full sample
33Stratified Random SampleExample
- Suppose a university has the following student
demographics - Undergraduate Graduate First Professional
Special - 55 20
5 20
A stratified random sample of 100 students could
be chosen as follows select a SRS of 55
undergraduates, a SRS of 20 graduates, a SRS of 5
first professional students, and a SRS of 20
special students combine these 100 students.
34Multistage Sample
- several stages of sampling are carried out
- useful for large-scale sample surveys
- samples at each stage may be SRSs, but are often
stratified - stages may involve other random sampling
techniques as well (cluster, systematic, random
digit dialing, )
35Cautions about Sample Surveys
- Undercoverage
- some individuals or groups in the population are
left out of the process of choosing the sample - Nonresponse
- individuals chosen for the sample cannot be
contacted or refuse to cooperate/respond - Response bias
- behavior of respondent or interviewer may lead to
inaccurate answers or measurements - Wording of questions
- confusing or leading (biased) questions words
with different meanings
36Nonresponse
- To prepare for her book Women and Love, Shere
Hite sent questionnaires to 100,000 women asking
about love, sex, and relationships. - 4.5 responded
- Hite used those responses to write her book
- angry women are more likely to respond
37Response Bias
- A door-to-door survey is being conducted to
determine drug use (past or present) of members
of the community. Respondents may give socially
acceptable answers (maybe not the truth!) - For this survey on drug use, would it matter if a
police officer is conducting the interview?
(bias from interviewer)
38Asking the UninformedWashington Post National
Weekly Edition (April 10-16, 1995, p. 36)
Response Bias
- A 1978 poll done in Cincinnati asked people
whether they favored or opposed repealing the
1975 Public Affairs Act. - There was no such act!
- About one third of those asked expressed an
opinion about it.
39Wording of Questions
A newsletter distributed by a politician to his
constituents gave the results of a nationwide
survey on Americans attitudes about a variety of
educational issues. One of the questions asked
was, Should your legislature adopt a policy to
assist children in failing schools to opt out of
that school and attend an alternative
school--public, private, or parochial--of the
parents choosing? From the wording of this
question, can you speculate on what answer was
desired? Explain.
40Wording Deliberate Bias
- If you found a wallet with 20 in it, would you
return the money? - If you found a wallet with 20 in it, would you
do the right thing and return the money?
41Wording Unintentional Bias
- I have taught several students over the past few
years. - How many students do you think I have taught?
- How many years am I referring to?
- Over the past few days, how many servings of
fruit have you eaten? - How many days are you considering?
- What constitutes a serving?
42Wording Unnecessary Complexity
- Do you sometimes find that you have arguments
with your family members and co-workers? - Arguments with family members
- Arguments with co-workers
43Wording Ordering of Questions
- How often do you normally go out on a date?
about ___ times a month. - How happy are you with life in general.
- Strong association between these questions.
- If the ordering is reversed, then there would be
no strong association between these questions
44Inferences about the Population
- Values calculated from samples are used to make
conclusions (inferences) about unknown values in
the population - Variability
- different samples from the same population may
yield different results for a particular value of
interest - estimates from random samples will be closer to
the true values in the population if the samples
are larger - how close the estimates will likely be to the
true values can be calculated -- this is called
the margin of error