Population A: 10,000

About This Presentation

Title:

Population A: 10,000

Description:

The process of obtaining information from a subset (sample) of a larger group (population) ... They tasted the new 'Guacamole Doritos' ... – PowerPoint PPT presentation

Number of Views:14

Avg rating:3.0/5.0

Slides: 39

Provided by: chrishol

Category:

more less

Transcript and Presenter's Notes

Title: Population A: 10,000

1
Sample Size Determination
Population A 10,000
Population B 5,000
Sample 15
Sample 10
Sample size 1000
Sample size 750
2
Sampling

The process of obtaining information from a
subset (sample) of a larger group (population)
The results for the sample are then used to make
estimates of the larger group
Faster and cheaper than asking the entire
population
Two keys
Selecting the right people
Have to be selected scientifically so that they
are representative of the population
Selecting the right number of the right people
To minimize sampling errors I.e. choosing the
wrong people by chance

3
Selecting the right number of the right people

Three Issues
Financial
Managerial
Statistical

Generally, the larger the sample size the smaller
the statistical error, but the greater the cost,
both financial and in terms of managerial
resources
4
SubGroups
The number of subgroups to be analyzed will have
an impact on the size of the sample needed. As
the number of subgroups increases the sampling
error increases and it becomes harder to tell
whether differences between two groups are real
or due to error
5
Determining sample size Balance between financial
and statistical issues 1. What can I
afford 2. Rule of thumb past experience historic
al precedence gut feeling some consideration
of sample error 3. Make up of sub-groups
(cells) What statistical inferences do you hope
to make between sub groups (rare to fall below 20
for a sub group) 4. Statistical Methods
A critical factor will be the size of the
expected difference or change to be measured, The
smaller it is, the larger the sample needs to be.
6
Statistical determination

Three Pieces of Information Required
An estimate of the population Standard Deviation
The Acceptable Level of Sampling Error
The Desired Level of Confidence that the Sample
Result will fall within a certain range (result
/- sampling error) of true population values

7
Normal Distribution
The height of a normal distribution can be
uniquely specified mathematically in terms of two
parameters the mean (m) and the standard
deviation (s).
8
The total area under the curve is equal to
1. I.e. It takes in all observations The area of
a region under the normal distribution between
any two values equals the probability of
observing a value in that range when an
observation is randomly selected from the
distribution For example, on a single draw there
is a 34 chance of selecting from the
distribution a person with an IQ between 100 and
115
9
Normal Distributions

Curve is basically bell shaped from - ? to ?
symmetric with scores concentrated in the middle
(i.e. on the mean) than in the tails.
Mean, medium and mode coincide
They differ in how spread out they are.

10
Standard Normal Distribution (z)
Any normal distribution can be converted into a
standard normal distribution by a simple
transformation formula. Z value of the variable
Mean of variable/SD of the variable The mean
always zero standard deviation always equal to
one. The probabilities in the tables are always
based on a normal distribution
11
Area Under Standard Normal Curve for Z values
(Standard deviations) of 1, 2 and 3
Z values (Standard deviations)
Area Under Standard Normal Curve
/- 1 68.26 /- 2 95.44 /- 3 99.74
12
Population Vs. Sample
Population of Interest
Population Sample Parameter Statistic
Sample
We measure the sample using statistics in order
to draw inferences about the population and its
parameters.
Population Mean µ Standard Deviation
? Sample Mean X Standard Deviation S
13
Sampling Distribution of the Mean

Necessary for understanding the basis for
computing sampling error for simple random
samples.
A conceptual and theoretical probability
distribution of the means of all possible samples
of a given size drawn from a given population
i.e. A distribution of sample means.
If you take a sample of 100 from a population of
1000 there are are thousands of different subsets
of the population that can be drawn, each sample
will have a slightly different mean. Those means
will have also have a distribution.
Central Limit Theory says that that distribution
will approximate a normal distribution the larger
the number of samples drawn

Suppose you conducted a research study
Took a random sample of n100 subjects
They tasted the new "Guacamole Doritos
They rated the flavor of the chip on the
following scale
Too Perfect Too
Mild Flavor Hot

1
2
3
4
5
6
7
15

Results show x1 2.3 and S1 1.5
Can you conclude that on average the target
population thought the flavor was mild?
Suppose you take a series of random samples of
n100 subjects
x2 3.7 and S2 2
x3 4.3 and S3 0.5
x4 2.8 and S4 .97
.
.
.
x50 3.7 and S50 2

16
The Sampling Distribution
The means of all the samples will have their own
distribution called the sampling distribution of
the means It is a normal distribution The mean of
the sampling distribution of the mean equals
the population parameter
17
Sampling Distribution The standard deviation of
the sampling distribution is called the sampling
error of the mean Often the population standard
deviation ? is unknown and has to be estimated
from the sample
?p ?p(1-p)/n
18
Population distribution of the Doritos flavor (X)
?
X
?
Sample distribution of the x Doritos flavor
x
1
2
3
4
5
6
7
19

What relationship does the Population
Distribution have to the Sample Distribution?
The Central Limit Theorem
Let x1, x2.. xn denote a random sample selected
from a population having mean ? and variance ?2.
Let X denote the sample mean. If n is large, the
X has approximately a Normal Distribution with
mean ? and variance ?2/n.

The Central Limit Theorem does not mean that the
sample mean population mean.
It means that you can attach a probability to
that value and decide.

The sampling distribution of the mean for simple
random samples that are over 30 has the following
characteristics
The distribution is a normal distribution
The distribution has a mean equal to the
population mean
The distribution has a standard deviation (the
standard error of the mean ) equal to the
population standard deviation divided by the
square root of the sample size

Note The statistic is referred to as the
standard error of the mean instead of the
standard deviation to indicate that it applies to
a distribution of sample means rather than the SD
of a sample or of the population
21
Sampling Distribution of Proportions

We are often interested in estimating proportions
or percentages rather than means
Is the sample proportion representative of the
population proportion
The percentage of the population that has used
the product
The percentage of the population that has
purchased over the Internet in the last month
The proportion of men who read a particular
magazine
The sampling distribution of the proportion
approximates a normal distribution
The mean proportion of all possible samples is
equal to the population proportion
The standard error of a sampling distribution cab
be calculated

In practice we want to make inferences from our
sample about the population it was drawn from
What is the probability that our sample of any
given size will produce an estimate that is
within one standard error (plus or minus) of the
true population
The answer is 68.26 that any one sample from a
particular population will produce an estimate of
the population mean that is within /- one
standard error of the true value.
This is because 68.26 of all sample means from
a given population fall in this range
There is a 95.44 probability that the mean from
any one sample will within /- two SDs

23
Sampling Distribution of Means
Point Estimates

The sample mean is the best point estimate of a
population mean

The sample mean is most likely to be close to the
population mean, but could be any of the means on
the left including one that is a far distance
from the population mean.
The distance between the sample mean and the
population mean is the sampling error
Only a small percentage of samples will have the
same mean as the population (I.e. a sampling
error of zero)

24
Interval Estimates

Interval estimates are preferred
An interval estimate is a range of all values
within which the true population mean is
estimated to fall
Normally state the size of the interval, plus the
probability that the interval will include the
true population mean.
The probability is called the confidence level
(e.g. 95)
And the Interval is called the confidence
interval (e.g. between 72 and 98)

25
Sample Confidence Probability we can take
results as accurate representation of universe
(i.e. that sample statistics are
generalisable to the real population
parameters) Typically a 95 probability
(i.e. 19 times out of 20 we would expect results
in this range)
26
Example We can be 95 sure that, say, 65
of a target market will name Martinis V2
vodka in an unprompted recall test plus or
minus 4
27
We can be 95 sure (level of confidence) that,
say, 65 (predicted result) of a target market
(of a given total population) will name
Martinis V2 vodka in an unprompted recall test
plus or minus 4 (to a known margin of error)
28
95 confidence If we do the same test 20 times
then it is statistically probable that the
results will fall between 61-69 , (i.e. 65 /
4) at least 19 times If we lower the
probability then we lower the sample
error e.g.. at a 90 confidence level, result
might be between 64 - 66 (a tighter range
but we are less sure the sample is
representative of the real population)
29
Implications for sample size (Given reliability
and validity hold) Above a certain size little
extra information is gathered by increasing the
sample size. Generally, there is no relationship
between the size of a population and the size of
sample needed to estimate a particular population
parameter, with a particular error range and
level of confidence.
30

To determine Sample Size we need three pieces of
information
The acceptable level of sampling error
The acceptable level of confidence
The estimate of the population standard deviation

31
Sample Size Determination

3 Statistical Determinants of Sample Size
DEGREE OF CONFIDENCE
Statistical Confidence
95 Confidence or .05 Level of Significance
DEGREE OF PRECISION
Accuracy in Estimating Population Proportion
/- 5.00 versus /- 1.00
/- 10 versus /- 5
VARIABILITY IN THE POPULATION
To What Degree do the Sampling Units Differ

We can choose an error range (e.g. 5)
We can set a confidence level (e.g. 95)
But
Without knowing the spread of results (i.e. the
standard deviation for the population) we cannot
work out the sample size required
So
How can we estimate the population standard
deviation before selecting the sample
pilot tests
guess
previous experience
Secondary data

n Z2s2 E2 Z level of confidence s
population SD E acceptable amount of sampling
error
33

Example
Number of fast food restaurant visits in past
month
We need our estimate to be within 1/10 (.01) of a
visit from the population average (E)
We need to be 95.44 confident that the true
population mean falls in the interval defined by
the sample mean plus or minus E (i.e. within 2
standard deviations) Z2
Standard deviation guess at 1.39 days

7.72 .01
n Z2s2 E2
772
22(1.39) 2 (01) 2
4(2.93) 2 .01
34
Sample Size Determination
To be More confident More precise If more
variable Sample size must increase
Too big - its a waste of money Too small - you
cannot make a big decision
35
Significance level
In hypothesis testing, the significance level is
the criterion used for rejecting the null
hypothesis. The significance level is used as
follows First, the difference between the
results of the experiment and the null hypothesis
is determined. Then, assuming the null
hypothesis is true, the probability of a
difference that large or larger is
computed. Finally, this probability is compared
to the significance level. If the probability is
less than or equal to the significance level,
then the null hypothesis is rejected and the
outcome is said to be statistically significant.
36
Traditionally, experimenters have used either the
.05 level (sometimes called the 5 level) or the
.01 level (1 level), although the choice of
levels is largely subjective. The lower the
significance level, the more the data must
diverge from the null hypothesis to be
significant. Therefore, the .01 level is more
conservative than the .05 level. The Greek letter
alpha is sometimes used to indicate the
significance level.
37
Critical value

A critical value is the value that a test
statistic must exceed in order for the the null
hypothesis to be rejected.
For example, the critical value of t (with 12
degrees of freedom using the .05 significance
level) is 2.18.
This means that for the probability value to be
less than or equal to .05, the absolute value of
the t statistic must be 2.18 or greater.

critical value
Significance level (.05)
Test statistic
38
The t distribution

The t distribution is used instead of the normal
distribution whenever the standard deviation is
estimated.
The t distribution has relatively more scores in
its tails than does the normal distribution.
The shape of the t distribution depends on the
degrees of freedom (df) that went into the
estimate of the standard deviation.
As the degrees of freedom increases, the t
distribution approaches the normal distribution.
With 100 or more degrees of freedom, the t
distribution is almost indistinguishable from the
normal distribution.