Title: Exercise 19: Sample Size
1Exercise 19 Sample Size
2Part One
- Explore how sample size affects the distribution
of sample proportions - This was achieved by first taking random samples
20 times when n10 and then taking 20 random
samples where n40. These random samples were
then summarized as sample statistics (p-hat).
3Tally for Discrete Variable Live
- Live Count Percent
- off 223 50.11
- on 222 49.89
- N 445
- 1
- This verifies that the proportion of students
living on campus and off campus is approximately
50. This would be the population proportion (p).
4Mean, Shape Standard Deviation
- What would you expect if 20 random samples of 10
were taken? - What would you expect if 20 random samples of 40
were taken?
5Results from 20 samples where n10 resulting in
phatlive
0.6000 0.5000 0.5000 0.4000 0.5000 0.5556 0.7000 0
.4000 0.6000 0.8000
- 0.3000
- 0.4000
- 0.5000
- 0.4000
- 0.5000
- 0.4000
- 0.5000
- 0.3000
- 0.5000
- 0.6000
6Descriptive Statistics phatlive10
- Variable N N Mean SE Mean
StDev - Phatlive 20 0 0.4978
0.0278 0.1242 - Minimum Q1 Median Q3
Maximum - 0.3000 0.4000 0.5000 0.5889
0.8000
7Lets Look At A Stem Plot
- Stem-and-leaf of phatlive10 (N 20)
- Leaf Unit 0.010
-
- 3 00
- 3
- 4 00000
- 4
- 5 0000000
- 5 5
- 6 000
- 6
- 7 0
- 7
- 8 0
8Sample Proportions
- What is the center, spread and shape for this
sample proportion? - Center mean 0.4978 phat
- Spread st.dev 0.1242
- Shape np and/or n(1-p) does not equal atleast
10, therefore guidelines for normality are not
met. However, as shown in the stem plot, the
results appear relatively normal because of the
perfectly balanced population proportions of .5
and .5.
9What if the sample size increases
- Results from 20 samples where n40 resulting in
phatlive
0.5750 0.4750 0.4500 0.4250 0.4750 0.3250 0.4250 0
.4000 0.4250 0.3500
0.5500 0.5000 0.5385 0.4359 0.4500 0.5000 0.4750 0
.4250 0.4500 0.4750
10Descriptive Statistics phatlive40
- Variable N N Mean SE Mean
StDev - Phatlive40 20 0 0.4562 0.0137
0.0611 - Minimum Q1 Median Q3
Maximum - 0.3250 0.4250 0.4500 0.4938
0.5750
11Stem-plot for phatlive40
- N 20 Leaf Unit 0.010
- 3 2
- 3 5
- 3
- 3
- 4 0
- 4 22223
- 4 555
- 4 7777
- 4
- 5 00
- 5 3
- 5 5
- 5 7
12Sample Proportions for phatlive40
- What is the center, spread and shape for this
sample proportion? - Center mean.4562
- Spread st. dev. .0611
- Shape np and n(1-p) are greater then 10 there
normality satisfied.
13Lets compare them simultaneously
-
- Descriptive Statistics phatlive40, phatlive10
-
- Variable N N Mean SE Mean
StDev Minimum Q1 Median - phatlive40 20 0 0.4562 0.0137
0.0611 0.3250 0.4250 0.4500 - phatlive10 20 0 0.4978 0.0278
0.1242 0.3000 0.4000 0.5000 - Variable Q3 Maximum
- phatlive40 0.4938 0.5750
- phatlive10 0.5889 0.8000
- How do their centers, spreads and shapes
compare?
14Box-plots
15What does this mean?
- The mean for n40 is more consistent with the
population mean. - The spread is smaller for n40
- The shape is more normal for n40
16As outlined in Chapter 6
- A random variable X for count of sampled
individuals in the category of interest is
binomial with parameters n and p if - There is a fixed sample size n
- Each selection is independent of the others
- Each individual sampled takes just two possible
values - The Probability of each individual falling in the
category of interest is always p.
17However
- The second condition isnt really met when
sampling without replacement. But as long as the
population is at least 10n, then approximate
independence can still be concluded. - Since the population is greater then 400, both
sample sizes of 10 and 40 follow this rule.
18Part 2
- Explores how population shape affects the
distribution of sample proportion. - First, 20 random samples of 10 were taken and
then 20 random samples of 40 were taken. The
results were compared.
19Handedness
- Tally for Discrete Variables Handed
-
- Handed Count Percent
- ambid 13 2.91
- left 40 8.97
- right 393 88.12
- N 446
- Proportion of ambidextrous is very skewed since
only approximately 3 of population is vs. 97
who is not.
20For Handedness n10
- Variable N N Mean
SE Mean - phathandedn10 20 0 0.0300 0.0164
- StDev Min. Q1 Median Q3
Max. - 0.0733 0.00 0.00 0.00 0.00
0.3000
21Stem-plot n10
- Stem-and-leaf of phathandedn10
- N 20 Leaf Unit 0.010
-
- 0 0000000000000000
- 1 000
- 2
- 3 0
22What does this data show?
- The center or mean is 0.0300
- The spread is .0073
- The shape is not normal because the guidelines of
np and n(1-p) being greater then 10 are not met
23Handedness n40
- Descriptive Statistics phathandedn40
-
- Variable N N Mean SE
Mean StDev - phathandedn40 20 0 0.04000 0.00612
0.02739 - Minimum Q1 Median Q3 Maximum
- 0.00000 0.02500 0.03750 0.05000 0.10000
-
24Stem-plot n-40
-
- Stem-and-leaf of phathandedn40 N 20
- Leaf Unit 0.0010
-
- 0 000
- 1
- 2 5555555
- 3
- 4
- 5 000000
- 6
- 7 555
- 8
- 9
- 10 0
25What does this mean?
- The center or mean is 0.0400
- The spread is 0.02739
- The shape is normal because the guidelines of np
and n(1-p) being greater then 10 are met.
26Lets compare them
- Variable N N Mean
SE Mean StDev - phathandedn40 20 0 0.0400 0.00612
0.02739 - phathandedn10 20 0 0.0300 0.0164
0.0733 - Minimum Q1 Median Q3
Maximum - 0.00000 0.02500 0.03750 0.05000
0.10000 - 0.0000 0.0000 0.0000 0.0000
0.3000 -
27Lets compare them
28What does it mean?
- By increasing the sample size, the box plot
became less skewed. - There was less of a spread and fewer outliers.
- The center remained at approximately .03
- The shape became more normal.
29Overall
- Live seemed to be more normal the handedness.
This was because the population was no skewed for
the live variable like for handedness. - In both situation, n40 caused the distributions
to be more normal.