Title: Chapter One Data Collection
1Unit 1Learning to Describe and Understand
Statistics
Lesson 1 What are Statistics anyway? Its process.
2- You hear and see statistics practically everyday!
Trident Gum
3Definition of Statistics
Statistics is the science of collecting,
organizing, summarizing and analyzing information
in order to draw conclusions.
4The Process of Statistics
Step 1 Identify a Research Objective
Researcher must determine question he/she wants
answered - question must be detailed.
Identify the group to be studied. This group is
called the population.
An individual is a person or object that is a
member of the population being studied
5The Process of Statistics
Step 2 Collect the information needed to answer
the questions.
In conducting research, we typically look at a
subset of the population, called a sample.
Step 3 Organize and summarize the information.
Descriptive statistics consists of organizing and
summarizing the information collected. Consists
of charts, tables, and numerical summaries
6The Process of Statistics
Step 4 Draw conclusions from the information.
The information collected from the sample is
generalized to the population..
Inferential statistics uses methods that
generalize results obtained from a sample to the
population and measure their reliability.
7Example 1 Identifying the Population Sample
- P 10, 23. A farmer wanted to learn about the
weight of his soybean crop. He randomly selected
100 plants and measured the weight of the
soybeans on the plant.
Identify the population and the sample.
The population is every soybean plant in the
farms crop.
The sample is the 100 soybean plants selected and
weighed.
8Example 2 Identifying the Population Sample
- P 10, 25. A study was conducted to determine
the genetic and non-genetic factors to structural
brain abnormalities on schizophrenia.
Researchers determined brain volumes of 29 twins
with schizophrenia and 29 healthy twins.
Based on MRIs, it was determined that brain
volumes were 2.2 smaller in the schizophrenic
patients.
Researchers concluded that an increased genetic
risk to develop schizophrenia is related to
reduced brain growth.
Step 1 Identify (a) the research objective.
To determine the genetic and non-genetic factors
to structural brain abnormalities on
schizophrenia.
9Example 2 Identifying the Population Sample
- P 10, 25. A study was conducted to determine
the genetic and non-genetic factors to structural
brain abnormalities on schizophrenia. - Researchers determined brain volumes of 29 twins
with schizophrenia and 29 healthy twins. - Based on MRIs, it was determined that brain
volumes were 2.2 smaller in the schizophrenic
patients. - Researchers concluded that an increased genetic
risk to develop schizophrenia is related to
reduced brain growth.
Step 2 Identify (b) the sample.
The sample consisted of 58 pairs of twins (29
with schizophrenia and 29 without).
10Example 2 Identifying the Population Sample
- P 10, 25. A study was conducted to determine
the genetic and non-genetic factors to structural
brain abnormalities on schizophrenia. - Researchers determined brain volumes of 29 twins
with schizophrenia and 29 healthy twins. - Based on MRIs, it was determined that brain
volumes were 2.2 smaller in the schizophrenic
patients. - Researchers concluded that an increased genetic
risk to develop schizophrenia is related to
reduced brain growth.
Step 3 Identify (c) list the descriptive
statistics.
The study calculated average brain volume in two
subject groups and found brain volume was 2.2
smaller in schizophrenics.
11Example 2 Identifying the Population Sample
P 10, 25. A study was conducted to determine
the genetic and non-genetic factors to structural
brain abnormalities on schizophrenia.
Researchers determined brain volumes of 29 twins
with schizophrenia and 29 healthy twins. Based on
MRIs, it was determined that brain volumes were
2.2 smaller in the schizophrenic
patients. Researchers concluded that an increased
genetic risk to develop schizophrenia is related
to reduced brain growth.
Step 4 Identify (d) state the conclusion made in
the study.
An increased genetic risk to develop
schizophrenia is related to reduced brain growth.
12Types of Data or Statistical Variables
Variables are the characteristics of the
individuals within the population.
Key Point Variables are quantities that vary
from one individual to the another.
Consider the variable of the height of a 5-year
old child starting school.
If all 5-year olds had the same height, then
obtaining the height of one individual would be
sufficient in knowing the heights of all
individuals. Of course, this is not the case.
As researchers, we wish to identify the factors
that influence variability.
13Types of Variables
Qualitative or Categorical variables allow for
classification of individuals based on some
attribute or characteristic.
Quantitative variables provide numerical measures
of individuals.
Arithmetic operations such as addition and
subtraction can be performed on the values of the
quantitative variable and provide meaningful
results.
ExamplesGender Temperature Zip Code
Qualitative Quantitative Qualitative
14Example 3 Classifying Variables as Qualitative
or Quantitative
- P 9, 3, 7, 9. Classify the variable as
qualitative or quantitative.
The weight of a car.
Quantitative
The number of customers served at Wendys during
lunch.
Quantitative
Types of surgical procedures offered at Morton
Plant Hospital.
Qualitative
15Discrete and Continuous Variables
A discrete variable is a quantitative variable
that either has a finite number of possible
values or a countable number of possible values.
The term countable means that the values of the
variable can be listed and assigned a counting
number such as 1, 2, 3, and so on.
A continuous variable is a quantitative variable
that has an infinite number of possible values it
can take on and can be measured to any desired
level of accuracy.
16Example 4 Identifying Discrete and Continuous
Variables
- P 10, 12, 13, 14, 16, 18. Determine whether the
quantitative variable is discrete or continuous.
Continuous
Time spent studying for your first statistics
exam.
Strength of concrete in pounds per square inch.
Continuous
The number of typos in a 500-page novel.
Discrete
The number of people in a poll of 500 who believe
Albert Einstein was the greatest scientist of the
20th century.
Discrete
Continuous
The speed of a car on the highway.
17Data
The list of observations a variable assumes is
called data.
While gender is a variable, the observations,
male or female, are data.
Qualitative data are observations corresponding
to a qualitative variable.
Quantitative data are observations corresponding
to a quantitative variable.
Discrete data are observations corresponding to a
discrete variable.
Continuous data are observations corresponding to
a continuous variable.
18Problem 31 page 11
Identify individuals, variables and data for
following survey
Individuals Neta, Dave, Kristen, Michael, Junita
Variables Gender - Qualitative, Age
Quantitative - Continuous, Siblings -
Qualitative
Data F,M,,19,19,1,1,2
19Four Sources of Data
(1) A census
A census is a list of all individuals in a
population along with certain characteristics of
each individual.
(2) Existing sources
There are many different existing data sources
(Current Population Survey, National Health
Survey, etc), many of them available through the
Internet.
20Four Sources of Data
(3) Survey Sampling
An observational study measures the
characteristics of a population by studying
individuals in a sample, but does not attempt to
manipulate or influence the individuals.
Observational studies are sometimes referred to
as ex post facto (after the fact) studies because
the value of the variable of interest has already
been established.
21Four Sources of Data
(4) Designed Experiments
A designed experiment applies a treatment to
individuals (referred to as experimental units)
and attempts to isolate the effects of the
treatment on a response variable.
Important Note
Observational studies may be great tools for
determining if there is a relation between two
variables, but it requires an experiment to
isolate the cause of the relation.
22Example 5 Identifying Observational Studies and
Experiments
- P 19, 3, 5. Determine whether the study is
observational or an experiment.
Seventh-grade students are randomly divided into
two groups. One group is taught math using
traditional techniques while the other is taught
math using a reform technique.
After one year, each group is given an
achievement test to compare its proficiency with
that of the other group.
Experiment
A survey is conducted asking 400 people, Do you
prefer Coke or Pepsi?
Observational study
23Simple Random Samples
A sample of size n from a population of size N is
obtained through simple random sampling if every
possible sample of size n has an equally likely
chance of occurring.
The sample is then called a simple random sample.
24Steps for Obtaining a Simple Random Sample
(1) Obtain a frame (list) of all the individuals
in the population of interest.
(2) Number the individuals in the frame from 1 to
N.
(3) Use a random number table, graphing
calculator, or statistical software to randomly
generate n numbers where n is the desired sample
size.
Time to try the TI -83
25Example 6 Identifying Observational Studies and
Experiments
- P 19, 15. Use the randint() function on the
TI-83 to obtain two random samples of size 10
from the frame of 50 states, listed in
alphabetical order.
26Statistical Error
Non-sampling errors are errors that result from
the survey process.
They are due to the non-response of individuals
selected to be in the survey, inaccurate
response, poorly worded questions, bias in the
selection of individuals to be in the survey, etc.
A sampling error is the error that results in
using sampling to estimate information regarding
a population. - did not sample enough
This type of error occurs because a sample does
not give complete information about the
population.
27Sources of Non-Sampling Error
1. Incomplete Frame
2. Non-response
3. Poor Interviewer
4. Incorrect Data Entry
5. Poorly worded question
Open versus Closed Question
Ordering of Questions
28Example 7 Identifying Errors
- P 34, 2, 6. Identify the source of the
non-sampling error.
The Village of Oak Lawn wishes to conduct a study
regarding the income levels of households within
the village. The village manager selects 10
homes in the southwest corner of the village and
sends an interviewer to ascertain the household
income.
Incomplete frame. Homes in other quadrants of
the village are excluded from the sampling
process. Non-sampling error with flawed
sampling method.
29Example 7 Identifying Errors
P 34, 2, 6. Identify the source of the
non-sampling error.
Petland is considering opening a new store in
Orland Park. Prior to opening the store, the
company would like to know the proportion of
households in the city that own a pet.
The market researcher obtains a list of
households in Orland Park and randomly selects
100 of them. She mails a questionnaire that asks
questions regarding pets in the house to the 100
households.
Of the 100 questionnaires sent out, she receives
3 in return.
Non-response. The sample is too small and
unrepresentative to make any conclusions.
30What is wrong with this picture?
Non - Sampling error incomplete frame,
non-responsive, poor interviewer, poorly worded
question