Title: Collecting Data Sensibly
1Chapter 2
- Collecting Data Sensibly
- Note Correct usage of the vocabulary in this
chapter is VERY important!
2Consider the following headlines which occurred
on September 25, 2009.
These headlines imply that spanking is the CAUSE
of the observed difference in IQ. Is this
conclusion reasonable?
- Spanking lowers a childs IQ (Los Angeles
Times) - Do you spank Studies indicate it could lower
your kids IQ. (SciGuy, Houston Chronicle) - Spanking can lower IQ (NBC4i, Columbus, Ohio)
- Smacking hits kids IQ (newscientist.com)
In this study, two groups of children were
followed for 4 years 806 children ages 2 to 4
and 704 children ages 5 to 9. IQ was measured at
the beginning of the study and again four years
later. Researchers found that the average IQ of
children, ages 2 to 4, who were not spanked was
5 points higher than those who were spanked and
2.8 points higher for children, ages 5 to 9.
3Observation versus Experimentation
- How do these two examples differ? Think about
- How the groups were determined?
- Were any variables controlled?
- What did the researcher do?
- Look at the following two examples
- A social scientist studying a rural community
wants to determine whether gender and attitudes
toward abortion are related. Using a telephone
survey, 100 residents are contacted at random and
their gender and attitude toward abortion are
recorded. - A professor might wonder what would happen to
final test scores if the required lab time for a
chemistry course is increased from 3-hours to
6-hours. For 100 chemistry students, half were
randomly assigned to the 3-hour lab and half to
the 6-hour lab. The rest of the course remained
the same for the two groups. The difference in
their final test scores will be examined.
Which is the experiment and which is the
observational study?
4Definitions
- Observational study a study in which the
researcher observes characteristics of a sample
selected from one or more populations. - Experiment - a study in which the researcher
observes how a response variable behaves when one
or more explanatory variables (factors) are
manipulated.
A well-designed experiment can result in data
that provides evidence for a cause-effect
relationship.
5Lets return to the study on spanking and IQ
- In this study, two groups of children were
followed for 4 years 806 children ages 2 to 4
and 704 children ages 5 to 9. IQ was measured at
the beginning of the study and again four years
later. Researchers found that the average IQ of
children, ages 2 to 4, who were not spanked was
5 points higher than those who were spanked and
2.8 points higher for children, ages 5 to 9. - Does spanking CAUSE a decrease in IQ? Why or
why not? - Are there other variables connected to the
response (decreased IQ) and the groups of
children?
These are called confounding variables.
6Definition
- Confounding variable a variable that is related
to both group membership and the response
variable of interest in a research study
Because observational studies may contain
confounding variables, their results can NOT be
used to show cause-effect relationships.
7- Observational studies CAN be generalized to the
population if the sample is randomly selected
from the population of interest, but CANNOT show
cause-effect relationships. - Well-designed experiments CAN show cause-effect
relationships, but CANNOT be generalized to the
population if the groups are volunteers or are
not randomly assigned.
8Sampling
9Census versus Sample
- Why might we prefer to take select a sample
rather than perform a census? - Measurements that require destroying the item
- Measuring how long batteries last
- Safety ratings of cars
- Difficult to find entire population
- Length of fish in a lake
- Limited resources
- Time and money
Obtaining information about the entire population
is called a census.
Most common reason to use a sample
10Methods of selecting random samples
- Simple Random Sample (SRS)
- A sample of size n is selected from the
population in a way that ensures that every
different possible sample of the desired size has
the same chance of being selected.
It has to be possible for all 100 students in the
sample to be seniors or any other combination
of students!
A simple random sample does NOT guarentee that
the sample is representative of the population.
11Methods of selecting random samples
- Simple Random Sample (SRS) continued
- A sample of size n is selected from the
population in a way that ensures that every
different possible sample of the desired size has
the same chance of being selected. - Sampling frame list of all the objects or
individuals in the population.
Another way to select a simple random sample is
to create a list of all the students in the
school (called a sampling frame).
Another way to select a simple random sample is
to create a list of all the students in the
school (called a sampling frame). Number each
student with a unique number from 1 to 2000. Use
a random digit table or random number generator
(a calculator or computer software) to select the
100 students for the sample.
12How to use a Random digit table
- The following is part of the random digit table
found in the back of your textbook
Row Row Row
6 0 9 3 8 7 6 7 9 9 5 6 2 5 6 5 8 4 2 6 4
7 4 1 0 1 0 2 2 0 4 7 5 1 1 9 4 7 9 7 5 1
8 6 4 7 3 6 3 4 5 1 2 3 1 1 8 0 0 4 8 2 0
9 8 0 2 8 7 9 3 8 4 0 4 2 0 8 9 1 2 3 3 2
We would continue in this fashion until we had
selected 100 numbers. It would be faster to use
a random number generator.
Since our students are numbered 1-2000, we will
select 4-digit numbers from the table. If the
number is not within 1-2000, we will ignore it.
13Methods of selecting random samples
- Simple Random Sample (SRS) continued
- A sample of size n is selected from the
population in a way that ensures that every
different possible sample of the desired size has
the same chance of being selected. - Although sampling with and without replacement
are different, they can be treated as the same
when the sample size n is relatively small
compared to the population size (no more than 10
of the population).
Most often sampling is done without replacement.
That is once an individual or object is selected,
they are not replaced and cannot be selected
again. Sampling with replacement allows an object
or individual to be selected more than once for a
sample.
14Methods of selecting random samples
- Stratified Random Sample
- Population is divided into non-overlapping
subgroups called strata - Simple random samples are selected from each
stratum - Sometimes easier to implement and is more cost
effective than simple random sampling - Sometimes allows more accurate inferences about
a population than simple random sampling
Strata are groups that are similar (homogeneous)
based upon some characteristic of the group
members.
15Methods of selecting random samples
- Cluster Sampling
- Population is divided into non-overlapping
subgroups called clusters - Randomly select clusters and then all the
individuals in the clusters are included in the
sample - Cluster sampling is often easier to perform and
more cost effective.
Clusters are often based upon location. It is
best if the clusters are heterogeneous subgroups
from the population.
16Methods of selecting random samples
- Systematic Sampling
- A value k is specified (for example k 50 or
k 200). - One of the first k individuals is selected at
random. - Then every kth individual in the sequence is
included in the sample. - This method works reasonably well as long as
there are no repeating patterns in the population
list.
17Identify the sampling design
- 1)The Educational Testing Service (ETS) needed a
sample of colleges. ETS first divided all
colleges into groups of similar types (small
public, small private, medium public, medium
private, large public, and large private). Then
they randomly selected 3 colleges from each group.
Stratified random sample
18Identify the sampling design
- 2) A county commissioner wants to survey people
in her district to determine their opinions on a
particular law up for adoption. She decides to
randomly select blocks in her district and then
survey all who live on those blocks.
Cluster sampling
19Identify the sampling design
- 3) A local restaurant manager wants to survey
customers about the service they receive. Each
night the manager randomly chooses a number
between 1 10. He then gives a survey to that
customer, and to every 10th customer after them,
to fill it out before they leave.
Systematic sampling
20- Consider the following example
- In 1936, Franklin Delano Roosevelt had been
President for one term. The magazine, The
Literary Digest, predicted that Alf Landon would
beat FDR in that year's election by 57 to 43
percent. The Digest mailed over 10 million
questionnaires to names drawn from lists of
automobile and telephone owners, and over 2.3
million people responded - a huge sample. - At the same time, a young man named George Gallup
sampled only 50,000 people and predicted that
Roosevelt would win. Gallup's prediction was
ridiculed as naive. After all, the Digest had
predicted the winner in every election since
1916, and had based its predictions on the
largest response to any poll in history. But
Roosevelt won with 62 of the vote. The size of
the Digest's error is staggering.Â
Bias is the tendency for samples to differ from
the corresponding population in some systematic
way.
This is a classic example of how bias affects the
results of a sample!
21Sources of bias
- Selection bias
- Occurs when the way the sample is selected
systematically excludes some part of the
population of interest called undercoverage - May also occur if only volunteers or
self-selected individuals are used in a study
Suppose you take a sample by randomly selecting
names from the phone book some groups will not
have the opportunity of being selected!
22Sources of bias
An example would be the surveys in magazines that
ask readers to mail in the survey. Other
examples are call-in shows, American Idol,
etc. Remember, the respondent selects themselves
to participate in the survey!
- Convenience sampling
- Using an easily available or convenient group to
form a sample. - The group may not be representative of the
population of interest - Results should not be generalized to the
population - Can also occur when samples rely entirely on
volunteers to be part of the sample called
voluntary response
Suppose we decide to survey only the students in
our statistics class why might that cause bias
in a survey?
23Sources of bias
Suppose we wanted to survey high school students
on drug abuse and we used a uniformed police
officer to interview each student in our sample
would we get honest answers?
- Measurement or Response bias
- Occurs when the method of observation tends to
produce values that systematically differ from
the true value in some way - Improperly calibrated scale is used to weigh
items - Tendency of people not to be completely honest
when asked about illegal behavior or unpopular
beliefs - Appearance or behavior of the person asking the
questions - Questions on a survey are worded in a way that
tends to influence the response
A Gallup survey sponsored by the American Paper
Institute (Wall Street Journal, May 17, 1994)
included the following question It is estimated
that disposable diapers accounts for less than 2
of the trash in todays landfills. In contrast,
beverage containers, third-class mail and yard
waste are estimated to account for about 21 of
trash in landfills. Given this, in your opinion,
would it be fair to tax or ban disposable
diapers?
24Sources of bias
- Nonresponse
- occurs when responses are not obtained from all
individuals selected for inclusion in the sample - To minimize nonresonse bias, it is critical that
a serious effort be made to follow up with
individuals who did not respond to the initial
request for information
The phone rings you answer. Hello, the
person says, do you have time for a survey about
radio stations? You hang up!
People are chosen by the researchers, BUT refuse
to participate. NOT self-selected! This is
often confused with voluntary response!
How might this follow-up be done?
25Identify a potential source of bias.
1) Before the presidential election of 1936, FDR
against Republican ALF Landon, the magazine
Literary Digest predicting Landon winning the
election in a 3-to-2 victory. A survey of 2.3
million people. George Gallup surveyed only
50,000 people and predicted that Roosevelt would
win. The Digests survey came from magazine
subscribers, car owners, telephone directories,
etc.
Undercoverage since the Digests survey comes
from car owners, etc., the people selected were
mostly from high-income families and thus mostly
Republican! (other answers are possible)
26Identify a potential source of bias.
- 2) Suppose that you want to estimate the total
amount of money spent by students on textbooks
each semester at a local college. You collect
register receipts for students as they leave the
bookstore during lunch one day.
Convenience sampling easy way to collect
data or Undercoverage students who buy books
from on-line bookstores are excluded.
27Identify a potential source of bias.
- 3) To find the average value of a home in Plano,
one averages the price of homes that are listed
for sale with a realtor.
Undercoverage leaves out homes that are not for
sale or homes that are listed with different
realtors. (other answers are possible)
28Comparative Experiments
29- Suppose we are interested in determining the
effect of room temperature on the performance on
a first-semester calculus exam. So we decide to
perform an experiment. - What variable will we measure?
- the performance on a calculus exam
- What variable will explain the results on the
calculus exam? - the room temperature
This is called the response variable. Response
variable a variable that is not controlled by
the experimenter and that is measured as part of
the experiment
This is called the explanatory
variable. Explanatory variables those
variables that have values that are controlled by
the experimenter (also called factors)
30Room temperature experiment continued . . .
- We decide to use two temperature settings, 65
and 75. - How many treatments would our experiment have?
- the 2 treatments are the 2 temperature
settings
Experimental condition any particular
combination of the explanatory variables (also
called treatments)
31Room temperature experiment continued . . .
- Suppose we have 10 sections of first-semester
calculus that have agree to participate in our
study. - On who or what will we impose the treatments?
- the 10 sections of calculus
- How would we determine which sections would be in
rooms with the temperature set at 65 and which
sections in rooms set at 75? - we need to randomly assign them to
the treatments
Random assignment of subjects to treatments or
treatments to trials ensures that the experiment
does not systematically favor one treatment over
another.
These are our subjects or experimental
units. Experimental units the smallest unit to
which a treatment is applied.
32Room temperature experiment continued . . .
To randomly assign the 10 sections of
first-semester calculus to the 2 treatment
groups, we would first number the classes 1-10.
Place the numbers 1-10 on identical slips of
paper and put them in a hat. Mix well.
Sections assigned Sections assigned Sections assigned Sections assigned Sections assigned
Treatment 1 (65)
Treatment 2 (75)
8
5
3
7
9
9
7
5
8
3
1
2
4
6
10
Randomly select 5 numbers from the hat. Those
will be the sections that have the room
temperature set at 65.
The remaining sections will have the room
temperature set at 75.
33Room temperature experiment continued . . .
Notice that there are five sections assigned to
each treatment. This is called replication.
Why is replication an important trait of a
well-designed experiment?
Sections assigned Sections assigned Sections assigned Sections assigned Sections assigned
Treatment 1 (65) 9 7 5 8 3
Treatment 2 (75) 1 2 4 6 10
Replication ensures that we have multiple
observations for each treatment.
34Room temperature experiment continued . . .
In an experiment, these extraneous variables need
to be controlled. Direct control is holding
the extraneous variables constant so that their
effects are not confounded with those of the
experimental conditions (treatments).
- Remember the explanatory variable is the room
temperature setting, 65 and 75. The response
variable is the grade on the calculus exam. - Are there other variables that could affect the
response?
These other variables are called extraneous
variables. An extraneous variable is a variable
that is NOT one of the explanatory variables
(factors) but it is thought to affect the
response.
What about the variables that the experimenter
cant directly control? What can be done to
avoid confounding results?
Can the experimenter control these extraneous
variables? If so, how?
Remember - two variables are confounding if
their effects on the response cannot be
distinguished from each other.
Instructor?
Textbook?
Time of day?
Ability level of students?
35Room temperature experiment continued . . .
- Suppose that there were five instructors who
taught the first-semester calculus. We do not
have direct control of this variable however, we
could have each instructor teach 2 sections.
Then we could randomly assign which one of the 2
sections would have a temperature setting of 65
and the other would have a temperature setting of
75.
This is an example of blocking. Blocking is
process by which an extraneous variables effects
are filtered out. Similar groups, called blocks,
are created. All treatments are tried in each
block.
36Room temperature experiment continued . . .
- What about extraneous variables that we cannot
control directly or that we cannot block for or
that we dont even think about? - Random assignment should evenly spread all
extraneous variables, that are not controlled
directly or that are not blocked, into all
treatment groups. We expect these variables to
affect all the experimental groups in the same
way therefore, their effects are not confounding.
37Room temperature experiment continued . . .
- Would the students in each section of calculus
know to which treatment group, 65 or 75, they
were assigned? - If the students knew about the experiment, they
would probably know which treatment group they
were in. - So this experiment is probably NOT blinded.
An experiment in which the subjects do not know
which treatment they were in is called a
single-blind experiment.
A double-blind experiment is one in which
neither the subjects nor the individuals who
measure the response knows which treatment is
received.
38- In the room temperature experiment, we only have
2 treatment groups, 65 and 75. We do NOT have
a control group. - Control group is an experimental group that does
NOT receive any treatment. - The use of a control group allows the
experimenter to assess how the response variable
behaves when the treatment is not used. - This provides a baseline against which the
treatment groups can be compared to determine
whether the treatment had an effect.
39- Consider Anna, a waitress. She decides to
perform an experiment to determine if writing
Thank you on the receipt increases her tip
percentage. - She plans on having two groups. On one group she
will write Thank you on the receipt and on the
other group she will not write Thank you on the
receipt.
Which of these is the control group?
40- Suppose we want to test an herbal supplement to
determine if it aided in weight loss. - Why would it not be beneficial have two groups in
the experiment one that takes the supplement and
a control group that takes nothing? - What could be done to remedy this problem?
- Give one group the supplement and give the other
group a pill that is the same size, color, taste,
smell, etc. as the supplement, but contains no
active ingredient.
This is called a placebo. A placebo is something
that is identical to the treatment group but
contains no active ingredient.
41Lets recap some ideas-
Random assignment removes the potential for
confounding variables.
Blocking uses extraneous variables to create
groups (blocks) that are similar. All treatments
are then tried in each block.
Direct control holds extraneous variables
constant so their effects are not confounded with
the treatments.
42Experimental Designs
- Completely randomized design experimental units
are assigned at random to treatments or
treatments are assigned at random to trials
Lets look at two examples of completely
randomized experiments.
Random Assignment
43Example 1 A farm-product manufacturer wants to
determine if the yield of a crop is different
when the soil is treated with three different
types of fertilizers. Fifteen similar plots of
land are planted with the same type of seed but
are fertilized differently. At the end of the
growing season, the mean yield from the sample
plots is compared.
Experimental units? Factors? Response
variable? How many treatments?
Plots of land
Type of fertilizer
Yield of crop
3
44Fertilizer experiment continued A farm-product
manufacturer wants to determine if the yield of a
crop is different when the soil is treated with
three different types of fertilizers. Fifteen
similar plots of land are planted with the same
type of seed but are fertilized differently. At
the end of the growing season, the mean yield
from the sample plots is compared. Why is the
same type of seed used on all 15 plots? What are
other potential extraneous variables? Does this
experiment have a placebo? Explain
It is part of the controls in the experiment.
Type of soil, amount of water, etc.
NO a placebo is not needed in this experiment
45Example 2 A consumer group wants to test cake
pans to see which works the best (bakes evenly).
It will test aluminum, glass, and plastic pans in
both gas and electric ovens. There are 30 boxes
of cake mix to use for this experiment. Experiment
units? Factors? Response variable? Name the
treatments?
Cake mixes
Two factors - type of pan (aluminum, glass, and
plastic) and type of oven (electric and gas)
How evenly the cake bakes
Aluminum pan in electric oven, aluminum pan in
gas oven, glass pan in electric oven, glass pan
in gas oven, plastic pan in electric oven, and
plastic pan in gas oven
46Cake experiment continued A consumer group wants
to test cake pans to see which works the best
(bakes evenly). It will test aluminum, glass,
and plastic pans in both gas and electric ovens.
There are 30 boxes of cake mix to use for this
experiment. Describe how to randomly assign the
cake mixes to the treatments so that there is an
even number in each treatment.
Could we roll a die for each box? If we roll a
1 assign the box to the first treatment
(aluminum pan in electric oven). If we roll a 2,
assign the box to the 2nd treatment, and so on.
This is just one way that you can perform this
randomization.
Number the boxes of cake mix from 1 to 30. Write
the numbers 1 to 30 on identical slips of paper
and place into a hat. Mix well. Randomly select
6 numbers from the hat and assign those boxes to
the treatment of aluminum pan in electric oven.
Randomly select 6 more numbers and assign those
boxes to the treatment aluminum pan in gas oven.
Continue this process, randomly assigning 6 boxes
to each treatment glass pan in electric oven,
glass pan in gas oven, and plastic pan in
electric oven. The remaining 6 are assigned to
plastic pan in gas oven
47Experimental Designs Continued . . .
Units should be blocked on a variable that
effects the response!!!
- 2. Randomized block units are blocked into
groups (homogeneous) and then randomly assigned
to treatments
Random Assignment
Create blocks
Random Assignment
48- Fertilizer experiment revisited A farm-product
manufacturer wants to determine if the yield of a
crop is different when the soil is treated with
two different types of fertilizers. Twenty plots
of land (10 plots are along a river and 10 plots
are away from the river) are planted with the
same type of seed but are fertilized differently.
At the end of the growing season, the mean yield
from the sample plots is compared. - Can the experimenter directly control the types
of soil in the different plots of land? - What can be done to account for this variable?
No they must use the plots that are available
They could block by type of land
49- Fertilizer experiment revisited
- Describe how to create the blocks of land and
then to randomly assign plots to the 2 types of
fertilizer.
- First create 2 blocks of land. Block 1 would be
the 10 plots that are by the river. Block 2
would be the 10 plots away from the river. - Number the 10 plots in block 1 from 1 to 10.
Write the numbers 1 to 10 on identical slips of
paper and place into a hat. Mix well. Randomly
select 5 numbers from the hat and assign those
boxes to fertilizer A. The remaining 5 are
assigned to Fertilizer B. - Number the 10 plots in block 2 from 1 to 10.
Write the numbers 1 to 10 on identical slips of
paper and place into a hat. Mix well. Randomly
select 5 numbers from the hat and assign those
boxes to fertilizer A. The remaining 5 are
assigned to Fertilizer B.
50Experimental Designs Continued . . .
- 3. Matched pairs - a special type of block design
where the blocks consist of 2 experimental units
that are similar with each being randomly
assigned to a treatment - OR
- the block consist of individual units that are
assigned both treatments in random order
51Example 3 Two new word-processing programs are
to be compared by measuring the speed with which
a standard task can be completed. One hundred
volunteers are will perform the same task on each
of the programs in random order and their speeds
will be measured. Explain why this is a matched
pairs design. How could we determine which
program the volunteers use first?
Each block consist of an individual who will do
both treatments
We could flip a coin for each volunteer heads
they do program A first, tails they do program B
first.
52The ONLY way to show a cause-effect relationship
is with a well-designed, well-controlled
experiment!!!