Collecting Data Sensibly - PowerPoint PPT Presentation

1 / 52
About This Presentation
Title:

Collecting Data Sensibly

Description:

Title: Chapter 5 Author: Type Your Name Here Last modified by: Plano ISD Created Date: 10/22/2001 12:29:24 PM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0
Slides: 53
Provided by: Typey83
Category:

less

Transcript and Presenter's Notes

Title: Collecting Data Sensibly


1
Chapter 2
  • Collecting Data Sensibly
  • Note Correct usage of the vocabulary in this
    chapter is VERY important!

2
Consider the following headlines which occurred
on September 25, 2009.
These headlines imply that spanking is the CAUSE
of the observed difference in IQ. Is this
conclusion reasonable?
  • Spanking lowers a childs IQ (Los Angeles
    Times)
  • Do you spank Studies indicate it could lower
    your kids IQ. (SciGuy, Houston Chronicle)
  • Spanking can lower IQ (NBC4i, Columbus, Ohio)
  • Smacking hits kids IQ (newscientist.com)

In this study, two groups of children were
followed for 4 years 806 children ages 2 to 4
and 704 children ages 5 to 9. IQ was measured at
the beginning of the study and again four years
later. Researchers found that the average IQ of
children, ages 2 to 4, who were not spanked was
5 points higher than those who were spanked and
2.8 points higher for children, ages 5 to 9.
3
Observation versus Experimentation
  • How do these two examples differ? Think about
  • How the groups were determined?
  • Were any variables controlled?
  • What did the researcher do?
  • Look at the following two examples
  • A social scientist studying a rural community
    wants to determine whether gender and attitudes
    toward abortion are related. Using a telephone
    survey, 100 residents are contacted at random and
    their gender and attitude toward abortion are
    recorded.
  • A professor might wonder what would happen to
    final test scores if the required lab time for a
    chemistry course is increased from 3-hours to
    6-hours. For 100 chemistry students, half were
    randomly assigned to the 3-hour lab and half to
    the 6-hour lab. The rest of the course remained
    the same for the two groups. The difference in
    their final test scores will be examined.

Which is the experiment and which is the
observational study?
4
Definitions
  • Observational study a study in which the
    researcher observes characteristics of a sample
    selected from one or more populations.
  • Experiment - a study in which the researcher
    observes how a response variable behaves when one
    or more explanatory variables (factors) are
    manipulated.

A well-designed experiment can result in data
that provides evidence for a cause-effect
relationship.
5
Lets return to the study on spanking and IQ
  • In this study, two groups of children were
    followed for 4 years 806 children ages 2 to 4
    and 704 children ages 5 to 9. IQ was measured at
    the beginning of the study and again four years
    later. Researchers found that the average IQ of
    children, ages 2 to 4, who were not spanked was
    5 points higher than those who were spanked and
    2.8 points higher for children, ages 5 to 9.
  • Does spanking CAUSE a decrease in IQ? Why or
    why not?
  • Are there other variables connected to the
    response (decreased IQ) and the groups of
    children?

These are called confounding variables.
6
Definition
  • Confounding variable a variable that is related
    to both group membership and the response
    variable of interest in a research study

Because observational studies may contain
confounding variables, their results can NOT be
used to show cause-effect relationships.
7
  • Observational studies CAN be generalized to the
    population if the sample is randomly selected
    from the population of interest, but CANNOT show
    cause-effect relationships.
  • Well-designed experiments CAN show cause-effect
    relationships, but CANNOT be generalized to the
    population if the groups are volunteers or are
    not randomly assigned.

8
Sampling
  • Section 2.2

9
Census versus Sample
  • Why might we prefer to take select a sample
    rather than perform a census?
  • Measurements that require destroying the item
  • Measuring how long batteries last
  • Safety ratings of cars
  • Difficult to find entire population
  • Length of fish in a lake
  • Limited resources
  • Time and money

Obtaining information about the entire population
is called a census.
Most common reason to use a sample
10
Methods of selecting random samples
  • Simple Random Sample (SRS)
  • A sample of size n is selected from the
    population in a way that ensures that every
    different possible sample of the desired size has
    the same chance of being selected.

It has to be possible for all 100 students in the
sample to be seniors or any other combination
of students!
A simple random sample does NOT guarentee that
the sample is representative of the population.
11
Methods of selecting random samples
  • Simple Random Sample (SRS) continued
  • A sample of size n is selected from the
    population in a way that ensures that every
    different possible sample of the desired size has
    the same chance of being selected.
  • Sampling frame list of all the objects or
    individuals in the population.

Another way to select a simple random sample is
to create a list of all the students in the
school (called a sampling frame).
Another way to select a simple random sample is
to create a list of all the students in the
school (called a sampling frame). Number each
student with a unique number from 1 to 2000. Use
a random digit table or random number generator
(a calculator or computer software) to select the
100 students for the sample.
12
How to use a Random digit table
  • The following is part of the random digit table
    found in the back of your textbook

Row Row Row
6 0 9 3 8 7 6 7 9 9 5 6 2 5 6 5 8 4 2 6 4
7 4 1 0 1 0 2 2 0 4 7 5 1 1 9 4 7 9 7 5 1
8 6 4 7 3 6 3 4 5 1 2 3 1 1 8 0 0 4 8 2 0
9 8 0 2 8 7 9 3 8 4 0 4 2 0 8 9 1 2 3 3 2
We would continue in this fashion until we had
selected 100 numbers. It would be faster to use
a random number generator.
Since our students are numbered 1-2000, we will
select 4-digit numbers from the table. If the
number is not within 1-2000, we will ignore it.
13
Methods of selecting random samples
  • Simple Random Sample (SRS) continued
  • A sample of size n is selected from the
    population in a way that ensures that every
    different possible sample of the desired size has
    the same chance of being selected.
  • Although sampling with and without replacement
    are different, they can be treated as the same
    when the sample size n is relatively small
    compared to the population size (no more than 10
    of the population).

Most often sampling is done without replacement.
That is once an individual or object is selected,
they are not replaced and cannot be selected
again. Sampling with replacement allows an object
or individual to be selected more than once for a
sample.
14
Methods of selecting random samples
  • Stratified Random Sample
  • Population is divided into non-overlapping
    subgroups called strata
  • Simple random samples are selected from each
    stratum
  • Sometimes easier to implement and is more cost
    effective than simple random sampling
  • Sometimes allows more accurate inferences about
    a population than simple random sampling

Strata are groups that are similar (homogeneous)
based upon some characteristic of the group
members.
15
Methods of selecting random samples
  • Cluster Sampling
  • Population is divided into non-overlapping
    subgroups called clusters
  • Randomly select clusters and then all the
    individuals in the clusters are included in the
    sample
  • Cluster sampling is often easier to perform and
    more cost effective.

Clusters are often based upon location. It is
best if the clusters are heterogeneous subgroups
from the population.
16
Methods of selecting random samples
  • Systematic Sampling
  • A value k is specified (for example k 50 or
    k 200).
  • One of the first k individuals is selected at
    random.
  • Then every kth individual in the sequence is
    included in the sample.
  • This method works reasonably well as long as
    there are no repeating patterns in the population
    list.

17
Identify the sampling design
  • 1)The Educational Testing Service (ETS) needed a
    sample of colleges. ETS first divided all
    colleges into groups of similar types (small
    public, small private, medium public, medium
    private, large public, and large private). Then
    they randomly selected 3 colleges from each group.

Stratified random sample
18
Identify the sampling design
  • 2) A county commissioner wants to survey people
    in her district to determine their opinions on a
    particular law up for adoption. She decides to
    randomly select blocks in her district and then
    survey all who live on those blocks.

Cluster sampling
19
Identify the sampling design
  • 3) A local restaurant manager wants to survey
    customers about the service they receive. Each
    night the manager randomly chooses a number
    between 1 10. He then gives a survey to that
    customer, and to every 10th customer after them,
    to fill it out before they leave.

Systematic sampling
20
  • Consider the following example
  • In 1936, Franklin Delano Roosevelt had been
    President for one term.  The magazine, The
    Literary Digest, predicted that Alf Landon would
    beat FDR in that year's election by 57 to 43
    percent.  The Digest mailed over 10 million
    questionnaires to names drawn from lists of
    automobile and telephone owners, and over 2.3
    million people responded - a huge sample.
  • At the same time, a young man named George Gallup
    sampled only 50,000 people and predicted that
    Roosevelt would win.  Gallup's prediction was
    ridiculed as naive.  After all, the Digest had
    predicted the winner in every election since
    1916, and had based its predictions on the
    largest response to any poll in history.  But
    Roosevelt won with 62 of the vote.  The size of
    the Digest's error is staggering. 

Bias is the tendency for samples to differ from
the corresponding population in some systematic
way.
This is a classic example of how bias affects the
results of a sample!
21
Sources of bias
  • Selection bias
  • Occurs when the way the sample is selected
    systematically excludes some part of the
    population of interest called undercoverage
  • May also occur if only volunteers or
    self-selected individuals are used in a study

Suppose you take a sample by randomly selecting
names from the phone book some groups will not
have the opportunity of being selected!
22
Sources of bias
An example would be the surveys in magazines that
ask readers to mail in the survey. Other
examples are call-in shows, American Idol,
etc. Remember, the respondent selects themselves
to participate in the survey!
  • Convenience sampling
  • Using an easily available or convenient group to
    form a sample.
  • The group may not be representative of the
    population of interest
  • Results should not be generalized to the
    population
  • Can also occur when samples rely entirely on
    volunteers to be part of the sample called
    voluntary response

Suppose we decide to survey only the students in
our statistics class why might that cause bias
in a survey?
23
Sources of bias
Suppose we wanted to survey high school students
on drug abuse and we used a uniformed police
officer to interview each student in our sample
would we get honest answers?
  • Measurement or Response bias
  • Occurs when the method of observation tends to
    produce values that systematically differ from
    the true value in some way
  • Improperly calibrated scale is used to weigh
    items
  • Tendency of people not to be completely honest
    when asked about illegal behavior or unpopular
    beliefs
  • Appearance or behavior of the person asking the
    questions
  • Questions on a survey are worded in a way that
    tends to influence the response

A Gallup survey sponsored by the American Paper
Institute (Wall Street Journal, May 17, 1994)
included the following question It is estimated
that disposable diapers accounts for less than 2
of the trash in todays landfills. In contrast,
beverage containers, third-class mail and yard
waste are estimated to account for about 21 of
trash in landfills. Given this, in your opinion,
would it be fair to tax or ban disposable
diapers?
24
Sources of bias
  • Nonresponse
  • occurs when responses are not obtained from all
    individuals selected for inclusion in the sample
  • To minimize nonresonse bias, it is critical that
    a serious effort be made to follow up with
    individuals who did not respond to the initial
    request for information

The phone rings you answer. Hello, the
person says, do you have time for a survey about
radio stations? You hang up!
People are chosen by the researchers, BUT refuse
to participate. NOT self-selected! This is
often confused with voluntary response!
How might this follow-up be done?
25
Identify a potential source of bias.
1) Before the presidential election of 1936, FDR
against Republican ALF Landon, the magazine
Literary Digest predicting Landon winning the
election in a 3-to-2 victory. A survey of 2.3
million people. George Gallup surveyed only
50,000 people and predicted that Roosevelt would
win. The Digests survey came from magazine
subscribers, car owners, telephone directories,
etc.
Undercoverage since the Digests survey comes
from car owners, etc., the people selected were
mostly from high-income families and thus mostly
Republican! (other answers are possible)
26
Identify a potential source of bias.
  • 2) Suppose that you want to estimate the total
    amount of money spent by students on textbooks
    each semester at a local college. You collect
    register receipts for students as they leave the
    bookstore during lunch one day.

Convenience sampling easy way to collect
data or Undercoverage students who buy books
from on-line bookstores are excluded.
27
Identify a potential source of bias.
  • 3) To find the average value of a home in Plano,
    one averages the price of homes that are listed
    for sale with a realtor.

Undercoverage leaves out homes that are not for
sale or homes that are listed with different
realtors. (other answers are possible)
28
Comparative Experiments
  • Sections 2.3 2.4

29
  • Suppose we are interested in determining the
    effect of room temperature on the performance on
    a first-semester calculus exam. So we decide to
    perform an experiment.
  • What variable will we measure?
  • the performance on a calculus exam
  • What variable will explain the results on the
    calculus exam?
  • the room temperature

This is called the response variable. Response
variable a variable that is not controlled by
the experimenter and that is measured as part of
the experiment
This is called the explanatory
variable. Explanatory variables those
variables that have values that are controlled by
the experimenter (also called factors)
30
Room temperature experiment continued . . .
  • We decide to use two temperature settings, 65
    and 75.
  • How many treatments would our experiment have?
  • the 2 treatments are the 2 temperature
    settings

Experimental condition any particular
combination of the explanatory variables (also
called treatments)
31
Room temperature experiment continued . . .
  • Suppose we have 10 sections of first-semester
    calculus that have agree to participate in our
    study.
  • On who or what will we impose the treatments?
  • the 10 sections of calculus
  • How would we determine which sections would be in
    rooms with the temperature set at 65 and which
    sections in rooms set at 75?
  • we need to randomly assign them to
    the treatments

Random assignment of subjects to treatments or
treatments to trials ensures that the experiment
does not systematically favor one treatment over
another.
These are our subjects or experimental
units. Experimental units the smallest unit to
which a treatment is applied.
32
Room temperature experiment continued . . .
To randomly assign the 10 sections of
first-semester calculus to the 2 treatment
groups, we would first number the classes 1-10.
Place the numbers 1-10 on identical slips of
paper and put them in a hat. Mix well.
Sections assigned Sections assigned Sections assigned Sections assigned Sections assigned
Treatment 1 (65)
Treatment 2 (75)
8
5
3
7
9
9
7
5
8
3
1
2
4
6
10
Randomly select 5 numbers from the hat. Those
will be the sections that have the room
temperature set at 65.
The remaining sections will have the room
temperature set at 75.
33
Room temperature experiment continued . . .
Notice that there are five sections assigned to
each treatment. This is called replication.
Why is replication an important trait of a
well-designed experiment?
Sections assigned Sections assigned Sections assigned Sections assigned Sections assigned
Treatment 1 (65) 9 7 5 8 3
Treatment 2 (75) 1 2 4 6 10
Replication ensures that we have multiple
observations for each treatment.
34
Room temperature experiment continued . . .
In an experiment, these extraneous variables need
to be controlled. Direct control is holding
the extraneous variables constant so that their
effects are not confounded with those of the
experimental conditions (treatments).
  • Remember the explanatory variable is the room
    temperature setting, 65 and 75. The response
    variable is the grade on the calculus exam.
  • Are there other variables that could affect the
    response?

These other variables are called extraneous
variables. An extraneous variable is a variable
that is NOT one of the explanatory variables
(factors) but it is thought to affect the
response.
What about the variables that the experimenter
cant directly control? What can be done to
avoid confounding results?
Can the experimenter control these extraneous
variables? If so, how?
Remember - two variables are confounding if
their effects on the response cannot be
distinguished from each other.
Instructor?
Textbook?
Time of day?
Ability level of students?
35
Room temperature experiment continued . . .
  • Suppose that there were five instructors who
    taught the first-semester calculus. We do not
    have direct control of this variable however, we
    could have each instructor teach 2 sections.
    Then we could randomly assign which one of the 2
    sections would have a temperature setting of 65
    and the other would have a temperature setting of
    75.

This is an example of blocking. Blocking is
process by which an extraneous variables effects
are filtered out. Similar groups, called blocks,
are created. All treatments are tried in each
block.
36
Room temperature experiment continued . . .
  • What about extraneous variables that we cannot
    control directly or that we cannot block for or
    that we dont even think about?
  • Random assignment should evenly spread all
    extraneous variables, that are not controlled
    directly or that are not blocked, into all
    treatment groups. We expect these variables to
    affect all the experimental groups in the same
    way therefore, their effects are not confounding.

37
Room temperature experiment continued . . .
  • Would the students in each section of calculus
    know to which treatment group, 65 or 75, they
    were assigned?
  • If the students knew about the experiment, they
    would probably know which treatment group they
    were in.
  • So this experiment is probably NOT blinded.

An experiment in which the subjects do not know
which treatment they were in is called a
single-blind experiment.
A double-blind experiment is one in which
neither the subjects nor the individuals who
measure the response knows which treatment is
received.
38
  • In the room temperature experiment, we only have
    2 treatment groups, 65 and 75. We do NOT have
    a control group.
  • Control group is an experimental group that does
    NOT receive any treatment.
  • The use of a control group allows the
    experimenter to assess how the response variable
    behaves when the treatment is not used.
  • This provides a baseline against which the
    treatment groups can be compared to determine
    whether the treatment had an effect.

39
  • Consider Anna, a waitress. She decides to
    perform an experiment to determine if writing
    Thank you on the receipt increases her tip
    percentage.
  • She plans on having two groups. On one group she
    will write Thank you on the receipt and on the
    other group she will not write Thank you on the
    receipt.

Which of these is the control group?
40
  • Suppose we want to test an herbal supplement to
    determine if it aided in weight loss.
  • Why would it not be beneficial have two groups in
    the experiment one that takes the supplement and
    a control group that takes nothing?
  • What could be done to remedy this problem?
  • Give one group the supplement and give the other
    group a pill that is the same size, color, taste,
    smell, etc. as the supplement, but contains no
    active ingredient.

This is called a placebo. A placebo is something
that is identical to the treatment group but
contains no active ingredient.
41
Lets recap some ideas-
Random assignment removes the potential for
confounding variables.
Blocking uses extraneous variables to create
groups (blocks) that are similar. All treatments
are then tried in each block.
Direct control holds extraneous variables
constant so their effects are not confounded with
the treatments.
42
Experimental Designs
  • Completely randomized design experimental units
    are assigned at random to treatments or
    treatments are assigned at random to trials

Lets look at two examples of completely
randomized experiments.
Random Assignment
43
Example 1 A farm-product manufacturer wants to
determine if the yield of a crop is different
when the soil is treated with three different
types of fertilizers. Fifteen similar plots of
land are planted with the same type of seed but
are fertilized differently. At the end of the
growing season, the mean yield from the sample
plots is compared.
Experimental units? Factors? Response
variable? How many treatments?
Plots of land
Type of fertilizer
Yield of crop
3
44
Fertilizer experiment continued A farm-product
manufacturer wants to determine if the yield of a
crop is different when the soil is treated with
three different types of fertilizers. Fifteen
similar plots of land are planted with the same
type of seed but are fertilized differently. At
the end of the growing season, the mean yield
from the sample plots is compared. Why is the
same type of seed used on all 15 plots? What are
other potential extraneous variables? Does this
experiment have a placebo? Explain
It is part of the controls in the experiment.
Type of soil, amount of water, etc.
NO a placebo is not needed in this experiment
45
Example 2 A consumer group wants to test cake
pans to see which works the best (bakes evenly).
It will test aluminum, glass, and plastic pans in
both gas and electric ovens. There are 30 boxes
of cake mix to use for this experiment. Experiment
units? Factors? Response variable? Name the
treatments?
Cake mixes
Two factors - type of pan (aluminum, glass, and
plastic) and type of oven (electric and gas)
How evenly the cake bakes
Aluminum pan in electric oven, aluminum pan in
gas oven, glass pan in electric oven, glass pan
in gas oven, plastic pan in electric oven, and
plastic pan in gas oven
46
Cake experiment continued A consumer group wants
to test cake pans to see which works the best
(bakes evenly). It will test aluminum, glass,
and plastic pans in both gas and electric ovens.
There are 30 boxes of cake mix to use for this
experiment. Describe how to randomly assign the
cake mixes to the treatments so that there is an
even number in each treatment.
Could we roll a die for each box? If we roll a
1 assign the box to the first treatment
(aluminum pan in electric oven). If we roll a 2,
assign the box to the 2nd treatment, and so on.
This is just one way that you can perform this
randomization.
Number the boxes of cake mix from 1 to 30. Write
the numbers 1 to 30 on identical slips of paper
and place into a hat. Mix well. Randomly select
6 numbers from the hat and assign those boxes to
the treatment of aluminum pan in electric oven.
Randomly select 6 more numbers and assign those
boxes to the treatment aluminum pan in gas oven.
Continue this process, randomly assigning 6 boxes
to each treatment glass pan in electric oven,
glass pan in gas oven, and plastic pan in
electric oven. The remaining 6 are assigned to
plastic pan in gas oven
47
Experimental Designs Continued . . .
Units should be blocked on a variable that
effects the response!!!
  • 2. Randomized block units are blocked into
    groups (homogeneous) and then randomly assigned
    to treatments

Random Assignment
Create blocks
Random Assignment
48
  • Fertilizer experiment revisited A farm-product
    manufacturer wants to determine if the yield of a
    crop is different when the soil is treated with
    two different types of fertilizers. Twenty plots
    of land (10 plots are along a river and 10 plots
    are away from the river) are planted with the
    same type of seed but are fertilized differently.
    At the end of the growing season, the mean yield
    from the sample plots is compared.
  • Can the experimenter directly control the types
    of soil in the different plots of land?
  • What can be done to account for this variable?

No they must use the plots that are available
They could block by type of land
49
  • Fertilizer experiment revisited
  • Describe how to create the blocks of land and
    then to randomly assign plots to the 2 types of
    fertilizer.
  • First create 2 blocks of land. Block 1 would be
    the 10 plots that are by the river. Block 2
    would be the 10 plots away from the river.
  • Number the 10 plots in block 1 from 1 to 10.
    Write the numbers 1 to 10 on identical slips of
    paper and place into a hat. Mix well. Randomly
    select 5 numbers from the hat and assign those
    boxes to fertilizer A. The remaining 5 are
    assigned to Fertilizer B.
  • Number the 10 plots in block 2 from 1 to 10.
    Write the numbers 1 to 10 on identical slips of
    paper and place into a hat. Mix well. Randomly
    select 5 numbers from the hat and assign those
    boxes to fertilizer A. The remaining 5 are
    assigned to Fertilizer B.

50
Experimental Designs Continued . . .
  • 3. Matched pairs - a special type of block design
    where the blocks consist of 2 experimental units
    that are similar with each being randomly
    assigned to a treatment
  • OR
  • the block consist of individual units that are
    assigned both treatments in random order

51
Example 3 Two new word-processing programs are
to be compared by measuring the speed with which
a standard task can be completed. One hundred
volunteers are will perform the same task on each
of the programs in random order and their speeds
will be measured. Explain why this is a matched
pairs design. How could we determine which
program the volunteers use first?
Each block consist of an individual who will do
both treatments
We could flip a coin for each volunteer heads
they do program A first, tails they do program B
first.
52
The ONLY way to show a cause-effect relationship
is with a well-designed, well-controlled
experiment!!!
Write a Comment
User Comments (0)
About PowerShow.com