Title: INTRODUCTION TO STATISTICS
1INTRODUCTION TO STATISTICS
2Overview
There are three kinds of lies lies, damned
lies, and statistics. Benjamin Disraeli
3What is Statistics?
4Why do we need statistics?
5Who uses statistics?
6Reasons to Study Statistics
- Being an Informed Information Consumer
- Understanding and Making Decisions
- Evaluating Decisions That Affect Your Life
Statistics The Exploration and analysis of Data,
4th ed. Devore/Peck
7To Make Decisions, you must be able to do the
following
- Decide whether existing information is adequate
or whether additional information is required. - If necessary, collect more information in a
reasonable and thoughtful way. - Summarize the available data in a useful and
informative manner. - Analyze the available data.
- Draw conclusions, make decisions, and assess the
risk of an incorrect decision.
Statistics The Exploration and analysis of Data,
4th ed. Devore/Peck
8The Official Definition
- Statistics is a collection of methods for
planning studies and experiments, obtaining data,
and then organizing, summarizing, presenting,
analyzing, interpreting, and drawing conclusions
based on the data.
9Types of Data
10Some Definitions
- Data are observations (such as measurements,
genders, survey responses) that have been
collected. - A population is the complete collection of all
elements to be studied. The collection is
complete in the sense that it includes all
subjects to be studied. - A sample is a subcollection of members selected
from a population. - A census is the collection of data from every
member of the population.
11More Definitions
- A parameter is a numerical measurement describing
some characteristic of a population. - A statistic is a numerical measurement describing
some characteristic of a sample.
12Still More Definitions
- Quantitative data consists of numbers
representing counts or measurements. - Qualitative (or categorical or attribute) data
can be separated into different categories that
are distinguished by some nonnumeric
characteristic.
13Even More Definitions
- Discrete data result when the number of possible
values is either a finite number or a countable
number. - Continuous (numerical) data result from
infinitely many possible values that correspond
to some continuous scale that covers a range of
values without gaps, interruptions or jumps.
14Levels of Measurement
- The nominal level of measurement is characterized
by data that consist of names, labels, or
categories only. The data cannot be arranged in
an ordering scheme. - Data are at the ordinal level of measurement if
they can be arranged in some order, but
differences between data values either cannot be
determined or are meaningless.
15Levels of Measurement
- The interval level of measurement is like the
ordinal level, with the additional property that
the difference between any two data values is
meaningful. However, data at this level do not
have a natural zero starting point (where none of
the quantity is present).
16Levels of Measurement
- The ratio level of measurement is the interval
level with the additional property that there is
also a natural zero starting point (where zero
indicates that none of the quantity is present).
For values at this level, differences and ratios
are both meaningful.
17Example
- For the study, identify the following
- What is the population? Sample?
- Is the data quantitative or qualitative?
- Is the data discrete or continuous?
- Identify the level of measurement.
18Critical Thinking
Statistical thinking will one day be as
necessary for efficient citizenship as the
ability to read and write. H. G. Wells
19Statistical Thinking
- Data beats anecdotes
- Beware the lurking variable
- Where the data come from is important
- Variation is everywhere
- Conclusions are not certain
The Basic Practice of Statistics, 3rd ed. Moore
20Sampling
- Sample data must be collected in an appropriate
way, such as through a process of random
selection. - If sample data are not collected in an
appropriate way, the data may be so completely
useless that no amount of statistical torturing
can salvage them.
21Non-response
- How many people are actually called in order to
get a sample of 1000 people? The Pew Research
Center of the People and Press conducted a study
on non-response and found - 938 Households never screened (no answer, busy,
answering machine, not available, callback) - 678 Households that refused
- 221 Households with no eligible person
(Language barrier, health problem, no person 18
or older) - 42 Households with eligible person (Incomplete
interviews) - 1000 Households with eligible person (Complete
interview) - So, Pew had to call 2879 residential phone
numbers to get a sample of 1000 people.
22Another Definition
- A voluntary response sample (or self-selected
sample) is one in which the respondents
themselves decide whether to be included.
23Watch out for . . .
- Small Samples
- Graphs
- Pictographs
- Percentages
- Loaded Questions
- Order of Questions
- Nonresponse
24Watch out for . . .
- Missing Data
- Correlation and Causality
- Self-Interest Study
- Precise Numbers
- Partial Pictures
- Deliberate Distortions
25Evaluating a Research Study
- What were the researchers trying to learn? What
questions motivated their research? - Was relevant information collected? Were the
right things measured? - Was the data collected in a sensible way?
- Was the data summarized in an appropriate way?
- Was an appropriate method of analysis selected,
given the type of data and how the data was
collected? - Are the conclusions drawn by the researchers
supported by the data analysis?
Statistics The Exploration and analysis of Data,
4th ed. Devore/Peck
26Design of Experiments
27Planning and Conducting a Study
- Understand the Nature of the Problem
- Decide What to Measure and How to Measure It
- Data Collection
- Data Summarization and Preliminary Analysis
- Formal Data Analysis
- Interpretation of Results
Statistics The Exploration and analysis of Data,
4th ed. Devore/Peck
28Types of Studies
- In an observational study, we observe and measure
specific characteristics, but we dont attempt to
modify the subjects being studied. - In an experiment, we apply some treatment and
then proceed to observe its effects on the
subjects. (Subjects in experiments are called
experimental units.)
29Types of Observational Studies
- In a cross-sectional study, data are observed,
measured, and collected at one point in time. - In a retrospective (or case-control) study, data
are collected from the past by going back in time
(through examination of records, interviews, and
so on). - In a prospective (or longitudinal or cohort)
study, data are collected in the future from
groups sharing common factors (called cohorts).
30One More Definition
- Confounding occurs in an experiment when you are
not able to distinguish among the effects of
different factors.
31Example
- Some studies have suggested that drinking wine
rather than beer or other alcohol has added
health benefits. - Wine drinkers eat less fried food, more
vegetables and fruit. - They are less likely to smoke.
- As a group, they are better educated and
wealthier than the groups that consume beer or
other alcohol. - These results may be the result of confounding
by dietary habits and other lifestyle factors.
32Important Factors to Consider in Designing an
Experiment
- Control the effects of variables.
- Use replication.
- Use randomization.
33Controlling Effects of Variables
- Blinding is a technique in which the subject
doesnt know whether he or she is receiving a
treatment or a placebo. An experiment is
double-blind if both the subjects and the
administers of the treatment dont know whether a
subject has received a treatment or a placebo. - The placebo effect occurs when an untreated
subject reports an improvement in symptoms.
34Controlling Effects of Variables
- A block is a group of subjects that are similar,
but blocks are different in the ways that might
affect the outcome of the experiment. - Randomized Block design If conducting an
experiment of testing one or more different
treatments, and there are different groups of
similar subjects, but the groups are different in
ways that are likely to affect the response to
treatments, use this experimental design - Form blocks (or groups) of subjects with similar
characteristics. - Randomly assign treatments to the subjects within
each block.
35Controlling Effects of Variables
- With a completely randomized experimental design,
subjects are assigned to different treatment
groups through a process of random selection. - With rigorously controlled design, subjects are
very carefully chosen so that those given each
treatment are similar in the ways that that are
important to the experiment.
36Replication and Sample Size
- Repetition of an experiment on sufficiently large
groups of subjects is called replication. - Use a sample size that is large enough so that we
can see the true nature of any effects and obtain
the sample using an appropriate method, such as
one based on randomness.
37Randomization and Other Sampling Strategies
- In a random sample members from the population
are selected in such a way that each individual
member has an equal chance of being selected. - A simple random sample of n subjects is selected
in such a way that every possible sample of the
same size n has the same chance of being chosen. - A probability sample involves selecting members
from a population in such a way that each member
has a known (but not necessarily the same) chance
of being selected.
38Randomization and Other Sampling Strategies
- With convenience sampling, we simply use results
that are very easy to get. - In systematic sampling, we select some starting
point and then select every kth element in the
population.
39Randomization and Other Sampling Strategies
- With stratified sampling, we subdivide the
population into at least two different subgroups
(or strata) so that subjects within the same
subgroup share the same characteristics, then we
draw a sample from each subgroup (or stratum). - In cluster sampling, we first divide the
population area into sections (or clusters), then
randomly select some of those clusters, and then
choose all the members from those selected
clusters.
40Randomization and Other Sampling Strategies
- A multistage sample design involves the selection
of a sample in different stages that might use
different methods of sampling.
41Sampling Errors
- A sampling error is the difference between a
sample result and the true population result
such an error results from chance sample
fluctuations. - A nonsampling error occurs when the sample data
are incorrectly collected, recorded, or analyzed.
42Should Polls Be Banned?
- What is your interest in polls about campaigns
and elections? - Very interested,
- Somewhat interested
- Little interest
- No interest
- What is your interest in polls which measure how
Americans feel about the major political issues
of the day, including those on which Congress is
debating and voting. - Very interested
- Somewhat interested
- Little interest
- No interest
- Do you favor banning the publication of the
polling results prior to an election? - Yes
- No
43Should Polls Be Banned? Results
44Our Interest in Polls
- 76 of Americans were interested in polls about
campaigns and elections. - 23 said they had little or no interest in polls
about campaigns and elections. - 77 of Americans were interested in the results
of polls which measure how Americans feel about
the major political issues of the day, including
those on which Congress is debating and voting. - 22 said they had little or no interest in polls
about political issues of the day.
45Compare Results
- How do our results compare?
- What can account for any differences?