Title: 13.1 13.3 Populations, Surveys and Random Sampling
1 13.1 - 13.3 Populations, Surveys and Random
Sampling
- Kent Mr. Simpson, how do you respond to the
charges that petty vandalism such as graffiti is
down eighty percent, while heavy sack-beatings
are up a shocking 900? - Homer Aw, people can come up with statistics
to prove anything, Kent. Forty percent of all
people know that.The Simpsons, Homer the
Vigilante
2So what is statistics, anyway?
- The gathering, organizing, interpreting and
understanding of data.
3The Population
- The complete set of individuals or objects
about which we are seeking information is
referred to as the population. - A silly exampleMayor Quimby takes a poll to
find how many Springfielders plan to vote for him
in the next election then the population would be
the voters of Springfield.
4The N - value
- If it were possible to accurately count every
member of a population we would get a number, N,
called the N - value of the population.
5The N - value (A Point or Two)
- This value is often difficult to make--and
therefore can require various adjustments.For
instance, if a scientist were studying the effect
of genetically modified corn on the monarch
butterfly it would be practically impossible to
accurately calculate the N - value.
6The N - value (A Point or Two)
- This value can change with time.
- In the case of the monarch butterfly, the
number of actual insects will obviously be
different from year-to-year.
7Example U.S. Census
8Example U.S. Census
9Example U.S. Census
10Surveys
- Since collecting information from large
populations is so difficult, researchers instead
gather data from selected subgroups and use that
information to make inferences regarding the
population as a whole. - This process is what math-and-science-y types
refer to as a survey. - The selected subgroup is called a sample.
11Surveys (contd)
- There are two major issues when setting up a
survey1. You need a sample that is a good
representative of the population being
studied.2. You need a sample that is large
enough to draw accurate information from, yet
still small enough to be practical.
12Example Literary Digest and the 1936
Presidential Election
- Literary Digest was a popular magazine that had
accurately predicted the winner in the five
elections prior to 1936. - That year, the publication ambitiously decided
to poll 10 million Americans. - The individuals contacted came from magazine
subscription lists and telephone directory
listings.
13Example Literary Digest and the 1936
Presidential Election
- When the results came in 2.4 million people had
responded and the survey predicted that the vote
would end Landon 57 FDR 43 - What actually happened?
14- The actual results were FDR 61 Landon
36.5 Other 2.5 (Landon, in fact, did not even
carry his home state.)
15- George Gallup, however, made an accurate
prediction with a sample of only 50,000 people. - Why were his results superior?
16- George Gallup, however, made an accurate
prediction with a sample of only 50,000 people. - Why were his results superior? There are two
main reasons 1. The names were taken from
phone directories and subscription lists--the
people surveyed were disproportionately
wealthy. When a survey has an inherent tendency
to exclude a segment of the population being
studied it is said to have selection bias. 2.
Out of 10 million people contacted only 24
replied. This example of what is called
nonresponse bias only magnified the first
problem.
17(No Transcript)
18- The actual result was. . . Truman 49.9 Dewey
44.5 Others 5 - So, what went wrong this time?
19- The actual result was. . . Truman 49.9 Dewey
44.5 Others 5 - So, what went wrong this time?
- There are too many characteristics you could use
for your quota. - The methods used in 1948 did not take economic
status into account and oversampled Republican
voters. - Most pollsters stopped gathering data when
Dewey was coming in 13 ahead of Truman in some
of the surveys.
20Lessons to take from these occurrences. . .
- A small, well-chosen sample is better than a
poorly-chosen large one. - Selection bias and nonresponse bias need to be
taken into account. - Dont stop surveying early.
- Quota sampling is flawed.
21Random Sampling
- Random sampling methods in which a level of
chance is used to choose a sample - Simple random sampling a larger scale version
of picking names out of a hat. - The problem with simple random sampling is one
of practicality.
22Random Sampling
- The solution--used in modern opinion
polling--is stratified sampling. - This method breaks the population down into
strata (categories) and then randomly choose a
sample from the strata. - The strata are then divided into substrata and
the process is continued
23(No Transcript)
24(No Transcript)
25 13.1 - 13.3 Populations, Surveys and Random
Sampling
- Kent Mr. Simpson, how do you respond to the
charges that petty vandalism such as graffiti is
down eighty percent, while heavy sack-beatings
are up a shocking 900? - Homer Aw, people can come up with statistics
to prove anything, Kent. Forty percent of all
people know that.The Simpsons, Homer the
Vigilante
26So what is statistics, anyway?
- The gathering, organizing, interpreting and
understanding of data.
27The Population
- The complete set of individuals or objects
about which we are seeking information is
referred to as the population. - A silly exampleMayor Quimby takes a poll to
find how many Springfielders plan to vote for him
in the next election then the population would be
the voters of Springfield.
28The N - value
- If it were possible to accurately count every
member of a population we would get a number, N,
called the N - value of the population.
29The N - value (A Point or Two)
- This value is often difficult to make--and
therefore can require various adjustments.For
instance, if a scientist were studying the effect
of genetically modified corn on the monarch
butterfly it would be practically impossible to
accurately calculate the N - value.
30The N - value (A Point or Two)
- This value can change with time.
- In the case of the monarch butterfly, the
number of actual insects will obviously be
different from year-to-year.
31Example U.S. Census
32Example U.S. Census
33Example U.S. Census
34Surveys
- Since collecting information from large
populations is so difficult, researchers instead
gather data from selected subgroups and use that
information to make inferences regarding the
population as a whole. - This process is what math-and-science-y types
refer to as a survey. - The selected subgroup is called a sample.
35Surveys (contd)
- There are two major issues when setting up a
survey1. You need a sample that is a good
representative of the population being
studied.2. You need a sample that is large
enough to draw accurate information from, yet
still small enough to be practical.
36Example Literary Digest and the 1936
Presidential Election
- Literary Digest was a popular magazine that had
accurately predicted the winner in the five
elections prior to 1936. - That year, the publication ambitiously decided
to poll 10 million Americans. - The individuals contacted came from magazine
subscription lists and telephone directory
listings.
37Example Literary Digest and the 1936
Presidential Election
- When the results came in 2.4 million people had
responded and the survey predicted that the vote
would end Landon 57 FDR 43 - What actually happened?
38- The actual results were FDR 61 Landon
36.5 Other 2.5 (Landon, in fact, did not even
carry his home state.)
39- George Gallup, however, made an accurate
prediction with a sample of only 50,000 people. - Why were his results superior?
40- George Gallup, however, made an accurate
prediction with a sample of only 50,000 people. - Why were his results superior? There are two
main reasons 1. The names were taken from
phone directories and subscription lists--the
people surveyed were disproportionately
wealthy. When a survey has an inherent tendency
to exclude a segment of the population being
studied it is said to have selection bias. 2.
Out of 10 million people contacted only 24
replied. This example of what is called
nonresponse bias only magnified the first
problem.
41(No Transcript)
42- The actual result was. . . Truman 49.9 Dewey
44.5 Others 5 - So, what went wrong this time?
43- The actual result was. . . Truman 49.9 Dewey
44.5 Others 5 - So, what went wrong this time?
- There are too many characteristics you could use
for your quota. - The methods used in 1948 did not take economic
status into account and oversampled Republican
voters. - Most pollsters stopped gathering data when
Dewey was coming in 13 ahead of Truman in some
of the surveys.
44Lessons to take from these occurrences. . .
- A small, well-chosen sample is better than a
poorly-chosen large one. - Selection bias and nonresponse bias need to be
taken into account. - Dont stop surveying early.
- Quota sampling is flawed.
45Random Sampling
- Random sampling methods in which a level of
chance is used to choose a sample - Simple random sampling a larger scale version
of picking names out of a hat. - The problem with simple random sampling is one
of practicality.
46Random Sampling
- The solution--used in modern opinion
polling--is stratified sampling. - This method breaks the population down into
strata (categories) and then randomly choose a
sample from the strata. - The strata are then divided into substrata and
the process is continued
47(No Transcript)
48(No Transcript)
49Slide 0