The eternal tension in statistics... - PowerPoint PPT Presentation

About This Presentation
Title:

The eternal tension in statistics...

Description:

Election forecast based only on those likely to vote. ... Interviewer Bias: Build redundancy into questionnaire to check for consistency. ... – PowerPoint PPT presentation

Number of Views:10
Avg rating:3.0/5.0
Slides: 28
Provided by: rash83
Category:

less

Transcript and Presenter's Notes

Title: The eternal tension in statistics...


1
The eternal tension in statistics...
2
Between what you really really want (the
population) but can never get to...
3
So you have to make do (with the sample) you
can estimate the population, make educated
guesses,
4
but bottomline is you can never have the
population
5
An investigator usually wants to generalize about
a class of individuals/things (the
population)For example in forecasting the
results of elections, population votersfor
the Furniture.com class group Population all
potential users
6
  • Parameters Usually there are some numerical
    facts about the population which you want to
    estimate
  • Statistic You can do that by measuring the same
    aspect in the sample (Descriptive Statistics)
  • Depending on the accuracy of measurement, and
    representativeness of your sample, you can make
    inferences about the population (Inferential
    Statistics)

7
  • One persons sample is another persons
    population
  • IS 271 students are a sample for the larger
    student population of UC Berkeley
  • IS271 students could be population for some other
    study

8
The 1936 election the literary digest poll
  • Candidates Democrat FD Roosevelt and Republican
    Alfred Landon
  • The Literary Digest had called the winner in
    every election since 1916
  • Its prediction Roosevelt will get 43
  • polled 2.4 million people!

9
The election results
  • The election result 62
  • The Digest prediction 43
  • Gallups prediction 44
  • of Digest Prediction
  • Gallupss prediction 56
  • of election result

10
Why the Digest went wrong How they picked their
sample
  • Selection Bias A systematic tendency on the part
    of the sampling procedure to exclude one kind of
    person or another from sample
  • Sample Size When a selection procedure is
    biased, making the sample larger does not help
    repeats the mistake on a larger level

11
How they picked their sample
  • Non Response Bias Non respondents differ from
    respondents
  • they did not respond as compared to respondents
    who did!
  • Lower income and upper income people tend not to
    respond, so middle class over represented.
  • Non Response Bias One can give more weightage to
    people who were available but hard to get.

12
  • For Example Predicting Elections
  • Non Voters Gallup uses a few questions to
    predict if people will vote at all. Election
    forecast based only on those likely to vote.
  • Undecided Asks people who they are leaning
    towards as of today.
  • Non Response Bias One can give more weightage to
    people who were available but hard to get.
  • Ratio Estimation Look at sample obtained, and
    compares it to population. If there are too many
    educated people weigh them lesser.
  • Interviewer Bias Build redundancy into
    questionnaire to check for consistency. Also
    reinterview a small sample to check for
    consistency.

13
Distribution of brown MMs
Yellow 20
Brown 30
Orange 10
Blue 10
Red 20
Green 10
14
The distribution of the population


15
Sample 1
16
Sample 2
17
Sample 3
18
Population
Sample 1
Sample3
Sample2
5 Samples
Sample3
19
How much is each sample going to deviate from the
population? (how big is the chance error for
each sample likely to be?)
Computation of Standard Error ? number of
samples x SD of sample
9, 7, 6, 9, 11, 12
Mean 9 Standard Deviation 2.2 Standard Error
4.4
20
Why is knowing the chance error important?
  • Allows us to estimate the accuracy of our
    estimates and is we are justified in using
    inferential statistics.
  • Allows us to make inferences about the population

21
If there is a lot of spread in the samples, the
SD is big and it will be hard to predict how
accurate the sample will be. So the standard
error will be big as well. Standard Deviation
(SD) and Standard Error (SE) SD refers to a
list of number. How far are most numbers from the
mean? SE refers to the variability in samples.
How variable is each sample going to be.
22
Should the sample for Texas be larger than that
for Rhode Island?
23
Surprisingly No
Analogy If you took a drop of liquid for
analysis. If the liquid is well mixed, then it
would not matter if the liquid was from a small
or a large bottle, whether the sample is 1 or
.1 of the population..
The statistical rationale The accuracy of
sampling is related to the standard deviation of
the sample. Example Election of 1992, voters
who chose Clinton 46 of voters in New Mexico,
SD .50 37 of voters in Texas .48 Therefor
accuracy of sample in Texas and New Mexico will
be similar
24
Types of Samples
  • The convenient sample More convenient elementary
    units are chosen from a population.
  • The judgement sample Units are chosen according
    to judgement made by someone who is familiar with
    the relevant characteristics of the population.
  • The random sample Units are chosen randomly with
    a known probability.

25
  • Quota Sampling Each interviewer is assigned a
    fixed quota of subjects fitting certain
    demographic characteristics. Within the quota is
    a judgement sample.
  • Problems quotas might not be representative, and
    judgement sampling is bad.

26
Types of Random Sample
  • Simple Random Sample Every unit of the
    population has an equal chance of being chosen.
  • A systematic random sample One unit is chosen on
    a random basis, additional elementary units are
    taken from evenly spaced intervals until the
    desired number of units is obtained.

27
  • The stratified random sample Obtained by
    independently selecting a separate simple random
    sample from each population stratum. A population
    can be divided into different groupsbased on
    some characteristic or variable like income of
    education.
  • The cluster sample Obtained by selecting
    clusters from the population on the basis of
    simple random sampling. The sample comprises a
    census of each random cluster selected. For
    example, a cluster may be some thing like a
    village or a school, a state.
Write a Comment
User Comments (0)
About PowerShow.com