The eternal tension in statistics...

About This Presentation

Title:

The eternal tension in statistics...

Description:

Election forecast based only on those likely to vote. ... Interviewer Bias: Build redundancy into questionnaire to check for consistency. ... –

Number of Views:10

Avg rating:3.0/5.0

Slides: 28

Provided by: rash83

Learn more at: https://courses.ischool.berkeley.edu

Category:

more less

Transcript and Presenter's Notes

Title: The eternal tension in statistics...

1
The eternal tension in statistics...
2
Between what you really really want (the
population) but can never get to...
3
So you have to make do (with the sample) you
can estimate the population, make educated
guesses,
4
but bottomline is you can never have the
population
5
An investigator usually wants to generalize about
a class of individuals/things (the
population)For example in forecasting the
results of elections, population votersfor
the Furniture.com class group Population all
potential users
6

Parameters Usually there are some numerical
facts about the population which you want to
estimate
Statistic You can do that by measuring the same
aspect in the sample (Descriptive Statistics)
Depending on the accuracy of measurement, and
representativeness of your sample, you can make
inferences about the population (Inferential
Statistics)

One persons sample is another persons
population
IS 271 students are a sample for the larger
student population of UC Berkeley
IS271 students could be population for some other
study

8
The 1936 election the literary digest poll

Candidates Democrat FD Roosevelt and Republican
Alfred Landon
The Literary Digest had called the winner in
every election since 1916
Its prediction Roosevelt will get 43
polled 2.4 million people!

9
The election results

The election result 62
The Digest prediction 43
Gallups prediction 44
of Digest Prediction
Gallupss prediction 56
of election result

10
Why the Digest went wrong How they picked their
sample

Selection Bias A systematic tendency on the part
of the sampling procedure to exclude one kind of
person or another from sample
Sample Size When a selection procedure is
biased, making the sample larger does not help
repeats the mistake on a larger level

11
How they picked their sample

Non Response Bias Non respondents differ from
respondents
they did not respond as compared to respondents
who did!
Lower income and upper income people tend not to
respond, so middle class over represented.
Non Response Bias One can give more weightage to
people who were available but hard to get.

For Example Predicting Elections
Non Voters Gallup uses a few questions to
predict if people will vote at all. Election
forecast based only on those likely to vote.
Undecided Asks people who they are leaning
towards as of today.
Non Response Bias One can give more weightage to
people who were available but hard to get.
Ratio Estimation Look at sample obtained, and
compares it to population. If there are too many
educated people weigh them lesser.
Interviewer Bias Build redundancy into
questionnaire to check for consistency. Also
reinterview a small sample to check for
consistency.

13
Distribution of brown MMs
Yellow 20
Brown 30
Orange 10
Blue 10
Red 20
Green 10
14
The distribution of the population

15
Sample 1
16
Sample 2
17
Sample 3
18
Population
Sample 1
Sample3
Sample2
5 Samples
Sample3
19
How much is each sample going to deviate from the
population? (how big is the chance error for
each sample likely to be?)
Computation of Standard Error ? number of
samples x SD of sample
9, 7, 6, 9, 11, 12
Mean 9 Standard Deviation 2.2 Standard Error
4.4
20
Why is knowing the chance error important?

Allows us to estimate the accuracy of our
estimates and is we are justified in using
inferential statistics.
Allows us to make inferences about the population

21
If there is a lot of spread in the samples, the
SD is big and it will be hard to predict how
accurate the sample will be. So the standard
error will be big as well. Standard Deviation
(SD) and Standard Error (SE) SD refers to a
list of number. How far are most numbers from the
mean? SE refers to the variability in samples.
How variable is each sample going to be.
22
Should the sample for Texas be larger than that
for Rhode Island?
23
Surprisingly No
Analogy If you took a drop of liquid for
analysis. If the liquid is well mixed, then it
would not matter if the liquid was from a small
or a large bottle, whether the sample is 1 or
.1 of the population..
The statistical rationale The accuracy of
sampling is related to the standard deviation of
the sample. Example Election of 1992, voters
who chose Clinton 46 of voters in New Mexico,
SD .50 37 of voters in Texas .48 Therefor
accuracy of sample in Texas and New Mexico will
be similar
24
Types of Samples

The convenient sample More convenient elementary
units are chosen from a population.
The judgement sample Units are chosen according
to judgement made by someone who is familiar with
the relevant characteristics of the population.
The random sample Units are chosen randomly with
a known probability.

Quota Sampling Each interviewer is assigned a
fixed quota of subjects fitting certain
demographic characteristics. Within the quota is
a judgement sample.
Problems quotas might not be representative, and
judgement sampling is bad.

26
Types of Random Sample

Simple Random Sample Every unit of the
population has an equal chance of being chosen.
A systematic random sample One unit is chosen on
a random basis, additional elementary units are
taken from evenly spaced intervals until the
desired number of units is obtained.

The stratified random sample Obtained by
independently selecting a separate simple random
sample from each population stratum. A population
can be divided into different groupsbased on
some characteristic or variable like income of
education.
The cluster sample Obtained by selecting
clusters from the population on the basis of
simple random sampling. The sample comprises a
census of each random cluster selected. For
example, a cluster may be some thing like a
village or a school, a state.