Title: Section 13'1: Sampling Techniques
1Section 13.1 Sampling Techniques
- Dr. Fred Butler
- Math 121 Fall 2004
2Definition of Statistics
- Statistics is the science of gathering,
analyzing, and making predictions from numerical
information obtained in an experiment. - Descriptive statistics is concerned with the
collection, organization, and analysis of data. - Inferential statistics is concerned with making
predictions based on data.
3More Statistics Definitions
- The entire group you are studying using
statistics is called the population. - The subset selected for statistical study is
called the sample. - For example in a telephone poll about who people
are going to vote for in an election, the entire
electorate is the population, while the people
who are called and polled are the sample.
4An Example
- Suppose you were given a box with 100 marbles in
it 90 blue marbles and 10 red marbles as
pictured below. - Let us consider different ways to draw
conclusions about the population in this example,
the entire contents of 100 marbles in the box.
5Probability vs. Statistics
- In probability, we would count the marbles and
see that there are 100 total, 10 red and 90 blue,
so the probability of picking a red marble is
10/1001/10. - In statistics, we would select a sample of
marbles from the box and try to predict what the
probability of selecting a red marble is based on
this sample.
6Probability vs. Statistics contd.
- There is always the possibility that predictions
based on a sample will be incorrect. - For example, we could randomly select 5 marbles,
get all blue marbles, and incorrectly predict
that there are only blue marbles in the box. - If we selected a larger sample (say 15 marbles)
we would likely choose some red marbles, and in
this case we could probably make a more accurate
prediction based on the sample selected.
7Why Sample?
- It is often impossible to obtain data on an
entire population. - Sampling is less expensive and takes less time
and effort.
8Unbiased Samples
- An unbiased sample is one that is a small replica
of the entire population with regard to income,
education, gender, race, political affiliation,
age, etc. - Statisticians use sophisticated techniques to
obtain an unbiased sample. - Unbiased samples allow accurate predications
based on relatively small samples of the entire
population.
9Random Sampling
- If a sample is drawn in such a way that each item
in the population has an equal chance of being
selected, the sample is said to be a random
sample. - This technique is used when all items in the
population are similar with regard to the
specific characteristics we are interested in
studying.
10Systematic Sampling
- When a sample is obtained by drawing every nth
item on a list or production line, the sample is
a systematic sample. - It is important that every item from the
population is included on the list used. - Also be careful of a constantly recurring
characteristic problem (Robot X example).
11Cluster Sampling
- Cluster sampling is a sampling technique in which
we divide a geographic area into sections or
clusters, and then randomly select sections or
clusters. - Either all members of each selected cluster are
included in the sample, or a random sample of the
members of each cluster is used.
12Stratified Sampling
- Stratified sampling involves dividing the
population into strata by characteristics called
stratifying factors (such as gender, race,
religion, or income), and taking random samples
from each stratum or class. - The use of stratified sampling requires some
knowledge about the population.
13Convenience Sampling
- A convenience sample uses data that are easily or
readily obtained. - One must be very careful with this sampling
technique, because convenience sampling can be
extremely biased.
14Determining Sampling Methods
- On the next several slides we will consider
specific sampling scenarios, and we will try to
determine what sampling method is being used in
each of these scenarios.
15Class Question 5.11
- I look at the Math 121 class list and select
every tenth student on the list to take a survey
about the course. - 1. random 2. systematic 3. cluster
- 4. stratified 5. convenience
16Answer to Class Question 5.11
- This is an example of systematic sampling,
because it involves choosing every nth item (in
our case n10).
17Class Question 5.12
- Students at WVU are classified by their major,
and a random sample of 25 students from each
major is selected. - 1. random 2. systematic 3. cluster
- 4. stratified 5. convenience
18Answer to Class Question 5.12
- This is an example of stratified sampling.
- Students in this example are divided into strata
based on their majors, and a random sample from
each of the strata is chosen.
19Class Question 5.13
- I ask the students sitting in the front row what
they think of this course. - 1. random 2. systematic 3. cluster
- 4. stratified 5. convenience
20Answer to Class Question 5.13
- This is an example of a convenience sample,
because I am picking people that are easily
obtained. - One possible bias that could be built into the
survey is that students sitting in the front row
might pay closer attention, and might think more
highly of the course than other students.
21Class Question 5.14
- I write each Math 121 students name on a
separate slip of paper, put the slips into a box,
and draw ten student names to complete a survey
about the course. - 1. random 2. systematic 3. cluster
- 4. stratified 5. convenience
22Answer to Class Question 5.14
- This is an example of a random sample.
- Each student has an equal chance of being chosen
from the box of names.
23Class Question 5.15
- Students at WVU are divided according to which
dorm they live in, a random sample of dorms is
chosen, and all students living in the chosen
dorms are polled. - 1. random 2. systematic 3. cluster
- 4. stratified 5. convenience
24Answer to Class Question 5.15
- This is an example of a cluster sample.
- The students are divided into geographic areas
of the dorm they live in, and a random sample of
these geographic areas are chosen.
25Section 13.2 The Misuses of Statistics
- Dr. Fred Butler
- Math 121 Fall 2004
26Questions to Ask When Examining Statistical
Statements
- Was the sample used to gather the statistical
data unbiased? - Was the sample used to gather the statistical
data of sufficient size? - Is the statistical statement ambiguous in any way?
27An Example
- Consider the statement, Four out of five
dentists recommend sugarless gum for their
patients who chew gum. - Is the sample unbiased? (Maybe they sampled only
dentists who own stock in the gum company
conducting the survey.) - How large is the sample? (Did they only survey
five total dentists?) - Is the statement ambiguous? (Maybe only 1 out of
100 dentists recommend gum at all.)
28Ambiguous Words Average
- There are at least four different averages in
statistics - mean (traditional average)
- median (value in the middle)
- mode (most frequently occurring piece of data)
- midrange (half way between highest and lowest
value in data set)
29Misusing the Word Average
- For example in a union contract negotiating, the
company could state that the average salary of
its employees is 35,000, while the union states
that the average employee salary is 30,000. - It is possible for both parties to be telling the
truth, but to be using different meanings of the
word average.
30Misusing the Word Largest
- Another word that can be used vaguely in
statistical statements is largest. - If company ABC claims it is the largest
department store in the US, this could mean they
have the largest - 1. profit 4. staff
- 2. total sales 5. acreage
- 3. building 6. number of outlets.
31Drawing Irrelevant Conclusions
- Another deceptive technique (commonly used in
advertising) is to state a claim from which the
public may draw irrelevant conclusions. - For example a paper towel company may claim that
its paper towels are heavier than the
competition, from which you are expected to
conclude that they are more absorbent. - Heaviness doesnt have anything to do with
absorbency a rock is heavier than a sponge, but
a sponge is more absorbent.
32Drawing Irrelevant Conclusions contd.
- A foreign car companys ad claims that 9 out of
10 of one of the popular model cars it sold in
the U.S. in the past 10 years are still on the
road, from which you conclude that the car is
well manufactured and will last a long time. - The ad could neglect to mention that this model
has only been sold in the U.S. for 5 years. - They could have just as easily claimed that 9 out
of 10 of this model car sold in the U.S. in the
past 100 years are still on the road.
33Misleading Graphics
- It is also easy to be misled by graphical
representations of statistical data. - Some examples of this are discussed in the lab.
34Class Question 5.16
- According to the graphs below, which stock
performed better between January and May? - 1. Stock A 2. Stock B 3. Same
35Answer to Class Question 5.16
36Lecture Summary
- Sampling techniques include random sampling,
systematic sampling, cluster sampling, stratified
sampling, and convenience sampling. - One must be careful when analyzing claims based
on statistical data.
37Homework
- Do problems from Sections 13.1 and 13.2 of
textbook. - I will be giving lab help in the IML computer lab
today (December 02) 330-430 and tomorrow
(December 03) 1030-1130. - Lab 5 is due Monday December 06 by 1100 PM