Chapter 1: Data Collection - PowerPoint PPT Presentation

About This Presentation
Title:

Chapter 1: Data Collection

Description:

Given a question, statistics is the art and science of designing studies, ... Statistic: A statistic is numerical summary for a variable obtained from a ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 47
Provided by: philip52
Category:

less

Transcript and Presenter's Notes

Title: Chapter 1: Data Collection


1
Chapter 1 Data Collection
1.1 Introduction to the Practice of
Statistics 1.2 Observational Studies,
Experiments, and Simple Random Sampling 1.3 Other
Effective Sampling Methods 1.4 Sources of Errors
in Sampling 1.5 The Design of Experiments
1
September 3, 2008
2
Definition of Statistics
  • Given a question, statistics is the art and
    science of designing studies,
  • collecting the data, summarizing the data, and
    then analyzing the data
  • to draw conclusions. In particular, statistics
    is
  • collecting data
  • organizing this data
  • summarizing the organized data
  • analyzing the summarized data
  • draw conclusions from this analysis

2
Section 1.1
3
Data
Data is information that is collected about a
generic population (people, animals, machines,
etc.). In the social sciences it is usually about
people the characteristics (height, weight, age,
etc.) or attitudes (believes, political opinions,
religion, etc.).
3
4
Types of Statistics
  • Descriptive Statistics This type of statistics
    uses graphs, tables, charts and the calculation
    of various statistical measures (mean, standard
    deviation, etc.) to organize and summarize
    information about a population. This is
    material in Math 127A.
  • Inferential Statistics This type of statistics
    consists of techniques (hypothesis testing,
    confidence intervals, etc.) to reach conclusions
    about a population based upon information
    obtained by a subset of the population. This is
    the material in Math 127B.

4
5
Average Yearly Temperature in Nashville
Question Is the climate of Nashville
warming? The average temperature of Nashville is
available National Weather Service website from
1872-2007. Average daily temperature is
calculated by summing the highest and lowest
hourly temperature and then dividing by 2. The
monthly average temperature is obtained by the
computing the average of the daily average
temperatures and yearly average temperature is
obtained by computing the average of the monthly
temperatures.
5
6
Mathematica Notebook
6
7
The Statistical Method (QDDI)
  • Question What is the problem of interest?
    Identify your research objective.
  • Design How will the data be collected? From
    whom? About what?
  • Description Give the characteristics of the
    data. This is were mathematics can play a major
    role. Summarize the data. Give a graphical
    description of the data. (Descriptive Statistics)
  • Inference What does the data tell us? If you
    started with a hypothesis, does the data confirm
    this hypothesis? (Inferential Statistics)

7
8
Example
Harvard Medical School studied 22,000 male
physician to determine if taking aspirin could
prevent heart attacks. The physician were split
into two equal groups 11,000 would receive an
aspirin per day and the other 11,000 would
receive a placebo. The assignment of physicians
was done randomly. During the course of the
study, 0.9 of the male physicians in the study
who were taking aspirin had a heart attacked and
while 1.7 taking the placebo experienced a heart
attack. They then used the statistical method to
predict that if all male physicians could have
participated in the study, the percentage having
a heart attack would have been lower for those
taking aspirin.
8
9
QDDI
  • Question Does taking aspirin each day reduce the
    incidence of heart attacks in male physicians?
  • Design Take sample with half taking aspirin and
    half taking a placebo. This is called an
    experiment.
  • Description Heart attack rate aspirin (0.9)
    versus placebo (1.7).
  • Inference All male physicians would benefit from
    taking daily aspirin.

9
10
Terminology of Statistics
  • Population A population is the complete
    collection of all elements to be studied.
  • Sample Any subset or group of a population is
    called a sample.
  • Variable A variable is characteristic of the
    individuals in the population that will be
    analyzed.
  • Parameter A parameter is numerical summary of a
    variable for the population.
  • Statistic A statistic is numerical summary for a
    variable obtained from a sample of the
    population.

10
11
Types of Data
  • Quantitative data is composed of measurements
    (numbers) about the population.
  • Categorical (or qualitative) data is data that
    can be separated into categories and can be
    identified by some non-numeric characteristic.
  • Continuous data is quantitative data that can
    take any value.
  • Discrete data is quantitative data is not
    continuous .

11
12
Example
  • Population All of the students in Math 127A that
    are in WH 103 today.
  • Sample The students in Row 10 of the classroom.
  • Variables
  • Color of eyes
  • Month of birth
  • Home state
  • Age
  • Religion

12
13
Example (continued)
  • Data (Qualitative/Qualitative)
  • Blue eyes
  • October
  • Georgia
  • 18
  • Lutheran
  • Parameter
  • The average age.
  • The standard deviation of heights.
  • Statistics
  • The average age of students in Row 5.
  • The fraction of students with blue eyes in Row 9.

13
14
Data for Statistical Studies
  • Census A census is list of all individuals in a
    population along with certain characteristics of
    each individual in the population (e.g., age,
    race, home ownership, etc.).
  • Observational Study An observational study
    attempts to measure a characteristic of the
    population by examining a sample, but does not
    manipulate the sample. An observational study
    often uses a sample survey to collect data.
  • Experimental Study An experiment selects a
    sample of the population and manipulates one or
    more variables of the population. The variable
    that is manipulated is called an independent
    variable and variable that is effected is called
    a dependent variable.

14
Section 1.2
15
Census Website
http//www.census.gov
15
16
Observational Study
  • Observational Study An observational study
    measures the characteristics of a population by
    studying a sample of individuals. It attempts to
    find connections between these characteristics
    without manipulation of the sample. The study is
    passive or ex post facto.

16
17
Design of Observational Studies
17
18
Example of Sample Survey
  • Sample Survey A random sample of 10,000 people
    were the individual are interviewed to determine
    information about the following variables of the
    population
  • age
  • race
  • gender
  • number of children
  • income bracket (0-25K, 25K-50K, .)
  • wealth bracket
  • homeowner
  • Question Is there a relationship between
    homeownership and number of children?

18
19
Algorithm for Setting Up a Sample Survey
  • Step 1 Identify the population from which the
    sample is to be drawn.
  • Step 2 Compile a list of subjects in the
    population from which the sample will be taken.
    This is called the sampling frame.
  • Step 3 Specify a method for selecting subjects
    from the sampling frame. This is called the
    sampling design.
  • Step 4 Collect the data.

19
20
Designed Experiments
  • Experimental Study An experiment is a study in
    which data is used and manipulated to determine
    the effects of one or more variables (called
    explanatory variables) on another variable
    (called the response variable). That is, the
    explanatory variable is controlled to see how the
    response variable changes with changes in the
    explanatory variable. The conditions placed on
    the explanatory variable are called treatments.
    In this type of study, the explanatory variable
    is sometimes called a factor of the experiment.

20
21
Design of Experiments
21
22
Remark
Observational studies are useful for detecting
connections between two variables in a
population. Experimental studies are useful to
determine the nature of the connection.
22
23
Types of Sampling
  • Random (good)
  • Non-random (bad)

Examples Suppose that our population is 200
students who are seated in a classroom of 10 rows
with 20 seats per row. If we chose a sample as
the subset of students who sit in the rows that
end with an even integer, then this would be a
non-random sample. Suppose that we place 10
balls each marked with a separate number (1-10)
in a bag. We would generate a random sample of
20 by choosing one of the balls out of the bag
and using the number on the ball as the row for
our sample.
23
Section 1.3
24
Simple Random Sample
  • Simple Random Sampling each individual in the
    population has the same or equal chance of being
    selected for a sample as any other individual. A
    list of individuals in the population from which
    a sample is to be drawn is called a frame.

24
25
Two Sets of Random Numbers
Frequency Chart of Numbers
25
26
Types of Samples
Simple Random Sample A sample that is obtained
by randomly choosing individuals in the
population. Stratified Sample A stratified
sample is sample that is obtained by separating
the population into non-overlapping groups (call
strata) and then randomly selecting individuals
from each stratum. Systematic Sample A
systematic sample is a sample that is obtained by
selecting individuals in the population is a
systematic way e.g., every 5th individual. Cluste
r Sample A cluster sample that is obtained by
selecting all individuals with a randomly
selected subset or group of the
population. Convenience Sample A convenience
sample is a type of sample that is drawn because
it is easy or convenient to collect. Convenience
samples are likely to under represent portions of
the population. They may not be random and may
contain bias due to time or location.
26
Section 1.3
27
Three Main Sampling Methods
Random
Cluster
Stratified
27
28
Advantages of Different Random Sampling Methods
  • Simple Random Sampling Gives a good picture of
    the whole population.
  • Cluster Random Sampling Often it easier and
    cheaper to implement because subjects are close
    together and well-defined once clusters are
    chosen.
  • Stratified Random Sampling Guarantees that each
    stratum (segment) is sampled.

28
29
Sources of Errors in Sampling
  • Fact Erroneous conclusions can be drawn from
    observational or experimental studies due to
    faulty statistical design and sampling.
  • Non-sampling Errors These errors occur when the
    sampling process (design) are faulty. This
    usually occurs when there is a problem with the
    sampling frame or sampling design. In other
    words, preference is given to selecting some
    individuals over other individuals in the
    population.
  • response errors
  • non-response errors
  • processing error
  • analysis errors
  • coverage errors
  • Sampling or Estimation Errors This error
    occurs when the sample gives an incomplete
    picture of the population. This type of error is
    due to the fact that we are using a sample
    instead of the whole population.

29
Section 1.4
30
Non-sampling Errors
  • Response Errors Poor questionnaire design,
    interview bias, respondent errors, poor survey
    process. For example, the organization of the
    survey could be confusing, individuals give
    deceptive responses to questions, the data
    collector may not speak the language of the
    individual to be interviewed, etc.
  • Non-response Errors Complete or partial
    non-response. For example, individuals may agree
    to be interviewed, but then choose not to answer
    some or all of the questions.
  • Processing Errors There are computational
    errors in coding, capturing, editing and
    presenting the final data.
  • Analysis Errors Incorrect statistical tests are
    applied to the data resulting in erroneous
    conclusions.
  • Coverage Errors There are errors in the
    duplication or omission of individuals in the
    sample.

31
Non-sampling Bias
Example Suppose we are interested the approval
rating of Mayor Dean and we will conduct a random
telephone survey on whether citizens of Nashville
approve or disapprove of his job performance
since he took office. Is there bias in this
sample survey? Answer Maybe, since it will miss
citizens who do not have a telephone and this
group of people may have different opinions about
the mayor than those who do have a telephone.
31
32
Design of Experiments
Review from Section 1.3 An experiment is a
study for the collection of data that is used to
determine the effects of one or more variables
(called explanatory variables) on another
variable (called the response variable). The
individuals from which the data is collected are
called subjects or experimental units. The
conditions placed on the explanatory variable are
called treatments. In this type of study, the
explanatory variable is sometimes called a
factor. An experiment is called double-blind if
the subjects and the experimenter do not know
which treatments are being administered to each
subject. We say that the experiment is
completely randomized if each experimental unit
is randomly assigned to a treatment. A randomized
experiment comparing medical treatments is called
a clinical trial.
32
Section 1.5
33
Types of Experiments
  • Completely Randomized Design Each experimental
    unit is randomly assigned a treatment.
  • Randomized Matched-pairs Design Experimental
    units are paired with each experiment unit in the
    pair assigned a different treatment. The
    matched-pair can be the same individual so that
    the individual receives both treatments (e.g.,
    before and after).
  • Randomized Block Design Experimental units are
    grouped together in groups. Units in each group
    (block) are randomly assigned treatments.

34
Example
Object of Study Does aspirin reduce the heart
attack rate? Population Male physicians in the
U.S. Sample 20,071 male physicians between the
ages or 40 and 84. Study The sample was split in
two groups. One group took an aspirin per day
and the other group took a placebo. The doctors
were randomly assigned to these two groups. The
doctors were monitored over a 5 year
period. Explanatory Variable aspirin yes or no
(categorical) Response Variable heart attack
yes or no (categorical) Type of Experiment
Completely randomized design.
34
35
Example (continued)
Yes No Total
Aspirin 104 10,933 11,037
Placebo 189 10,845 11,034
Total 293 21,778 22,071
This is an experiment and the aspirin/placebo are
the treatments. We manipulated the explanatory
variable to see the effect on the response
variable.
35
36
Example (continued)
Fraction of Heart Attacks for both Treatments
Yes No
Aspirin 0.0094 0.9906 1.0
Placebo 0.0171 0.9829 1.0
36
37
Example (continued)
Conclusion from Study The heart attack rate per
1000 male physicians is 9.4 for those taking
aspirins and 17.1 for those not taking aspirin.
Hence, we would conclude that taking aspirin
reduces the heart attack rate.
37
38
Matched-pairs Designs
A matched-pair design experiment is a study
where there are only two treatments and
experimental units are matched. One experimental
unit receives one treatment and the other
experimental unit receives the second treatment.
The pairs may be the same individual (before
treatment and after treatment) or it may be two
individuals who have similar characteristics
(e.g., gender, age, etc.). The assignment of the
treatments to each pair should be random.
38
39
Example of Matched-Pairs
Purpose Study the effect of taking caffeine one
half hour before swimming. Sample 50 randomly
chosen swimmers. Explanatory Variable A
caffeine pill or a placebo. Response Variable
Time to swim one mile. Study Design
Experiment Matched-pair Design The 50 swimmers
are selected. Each swimmer is randomly given the
caffeine pill or the placebo and swims one mile
with the time recorded. After 1 week, the same
50 swimmers return and are given the treatment
that they did not receive the previous week.
They swim the mile and the time is recorded.
Each swimmers times is compared against both
treatments.
39
40
Blocks and Block Designs
  • A collection of experimental units that have the
    same (or similar values) on a key variable is
    called a block. In the previous example, each
    subject (person) is a block.
  • Experimental units are divided into groups
    (blocks) and each treatment is randomly assign to
    one or more of the units in each block. In other
    words, a block design identifies blocks before
    the start of the experiment and assigns subjects
    to treatments within those blocks.
  • To reduce bias, order of treatments within each
    block is randomized and we call this a randomized
    block design.
  • A matched-pair design is a special type of block
    design. Here each paired experimental units form
    a block.
  • In a block design study, an experimental unit
    (subject) may receive only one treatment.

40
41
Example of Block Design
Purpose Study the effect of taking caffeine one
half hour before swimming. Sample 50 swimmers,
but 16 males who swim competitively, 14 males who
do not swim competitively, 8 females who swim
competitively and 12 females who do not swim
competitively. Explanatory Variable A caffeine
pill or a placebo. Response Variable Time to
swim one mile. Study Design Experiment Randomized
Block Design We create four blocks (16, 14, 8,
12 subjects). Within each block, individuals
take either the caffeine pill or the placebo.
Each subjects swim time is recorded. The times
of each swimmer within each block as well as
across the blocks are compared (caffeine pill
versus placebo).
41
42
What type of experiment?
A drug company wanted to test a new arthritis
medication. The researchers found 200 adults
aged 25-35 and randomly assigned them to two
groups. The first group received the new drug,
while the second received a placebo. After one
month of treatment, the percentage of each group
whose arthritis symptoms decreased was recorded
and compared with their original condition. What
type of experimental design is this?
43
What type of experiment?
A medical journal published the results of an
experiment on insomnia. The experiment
investigated the effects of a controversial new
therapy for insomnia. Researchers measured the
insomnia levels of 86 adult women who suffer
moderate conditions of the disorder. After the
therapy, the researchers again measured the
women's insomnia levels. The differences between
the the pre- and post-therapy insomnia levels
were reported. What type of experimental design
is this?
44
What type of experiment?
A farmer wishes to test the effects of a new
fertilizer on her tomato yield. She has four
equal-sized plots of land--one with sandy soil,
one with rocky soil, one with clay-rich soil, and
one with average soil. She divides each of the
four plots into three equal-sized portions and
randomly labels them A, B, and C. The four A
portions of land are treated with her old
fertilizer. The four B portions are treated with
the new fertilizer, and the four C's are treated
with no fertilizer. At harvest time, the tomato
yield is recorded for each section of land. What
type of experimental design is this?
45
What type of experiment?
A random sample of 1,000 overweight male adults
is recruited. Each male is weighed and his
weight is recorded. Each individual is given a
diet and are told to follow it for one month.
After one month, each individual is weighed and
recorded. The before and after are compared.
What type of experimental design is this?
46
What type of experiment?
A random sample of 30 Vanderbilt students is
selected. We are interested in the reaction times
when using or not using a cell phone during
driving. Each students reaction time was
measured when he or she was using or not using a
cell phone on a driving course in a Vanderbilt
parking lot. What type of experimental design is
this?
Write a Comment
User Comments (0)
About PowerShow.com