Title: Where do data come from and Why we dont always trust statisticians'
1Where do data come from and Why we dont
(always) trust statisticians.
2Induction vs. Deduction the gist of statistics
- Deduction What is true about the whole, must be
true about a part. - Induction What is true about the part might be
true about the whole.
3Population vs. Sample
- Population is the entire group of individuals
about which we want information. - Sample is a part of population from which we
actually collect information. - We use samples to study population because,
often, populations are impossible or impractical
to study. -
4Real Life Example of a Bad Sample
- Ann Landers, a famous columnist, collected a
sample of 10,000 people who wrote in to answer
this question If you could do it all over
again, would you have children? - 70 of the respondents said that they would not
have children. - When a sample was selected at random, 91 of the
people said that they would have children.
5Potential problems with sample surveys
- Undercoverage occurs when some groups in
population are left out of the process of
choosing the sample. - Nonresponse occurs when an individual chosen for
the sample cannot be contacted or refuses to
respond.
6Another Real life Example of a Bad Sample
- In 1936 Literary Digest mailed out 10,000,000
ballots asking who the respondents are going to
vote for A. Landon or F.D. Roosevelt. - 2,300,000 ballots were returned, predicting a
strong win (57) for Landon.
7Another Real life Example of a Bad Sample
- George Gallup surveyed 50,000 people chosen
randomly. - Comparison of forecasts
- Gallups Prediction for Roosevelt 56
- Gallups prediction of Digest 44
- Digest prediction for Roosevelt 43
- Actual vote 62
- Literary Digest used their subscription list,
phone directory, lists of car owners, club
members.
8(No Transcript)
9Right and Wrong Ways to Sample
- A simple random sample is a sample where (1) each
unit of population has an equal chance of being
chosen and (2) all units are chosen
independently. - The sample is biased if at least one group of
individuals has greater chances of being selected.
10Example of a good sample
- You want to study effects of computers on GPA.
You dont have the resources to study all
students. - To select a sample of students for the study you
- Get a list of all students,
- Select at random students on the list,
- Collect information from the students selected,
- Compare those who have computer with those who
dont.
11Example of a bad sample
- You want to study effects of computers on GPA.
You dont have the resources to study all
students. - To select a sample of students for the study you
- Use your friends.
- Hang an ad in the computer lab.
- Post an on-line questionnaire on WKU site.
12Stratified Random Sample
- When we know proportions of each group in the
population Stratified random sample is better
than SRS. - In stratified sample, number of people chosen
from each group is proportional to the size of
that group in the population.
13Confounding
- Two explanatory variables are confounded when
their effects on the response variable cannot be
distinguished from each other. - Confounding is often a problem with a study that
uses sample surveys to collect data (even if
sampling is done right).
14Observation vs. Experiment
- Observational study - observes individuals and
measures variables but does not attempt to
influence responses. - Experiment imposes treatment on individuals to
observe their responses.
15How to design an Experiment
- The purpose of an experiment is to find out how
one variable (response variable) changes in
response to change in another variable
(explanatory variable). - Experiment
- Subject ?Treatment ?Response
16Placebo Effect
- Placebo effect change in behavior due to
participation in experiment. - Placebo effect is a problem when experiment does
not have a control group (a basis for
comparison) - To avoid the problem design a randomized
comparative experiment.
17How to design a Randomized Comparative Experiment
- Randomly split the subjects into two groups
- control group receives no treatment
- treatment group receives treatment
- Compare the results.
- Both will be equally affected by Placebo effect,
so the difference between the groups shows
whether the treatment works.
18How to interpret results of an experiment
- Observe outcomes for treatment and control
groups. - If outcomes are different enough so that we can
say that this difference would rarely occur by
chance, we conclude that the difference is
statistically significant.
19Population vs. Sample
- Population is the entire group of individuals
about which we want information. - Sample is a part of population from which we
actually collect information. - Based on the sample, we make conclusion about the
whole population.
20Parameter vs. Statistic
- A Parameter is the number that describes the
population. - A Statistic is a number that describes the
sample. - We use statistics to estimate parameters.
21Sampling Distribution
- The result of your study is a statistic, which
can vary from sample to sample - Sampling Distribution of a statistic is the
distribution of values taken by the statistic in
all possible samples of the same size from the
same population - EstimateTrue Parameter Sampling Error
22Bias and variability
- A statistic is biased if the mean of the sampling
distribution is not equal to the true value of
the parameter being estimated. - Variability of a statistic is the spread of
sampling distribution. - Bias does not go away with larger samples.
- Variability goes away with larger samples.