Title: CSC 323 Quarter: Spring
1CSC 323 Quarter Spring 02/03
- Daniela Stan Raicu
- School of CTI, DePaul University
2Outline
- Confidence intervals when the population
distribution is unknown - Confidence intervals when the sample size is
small - Tests of significance
3Is x normal distributed?
Is the population normal?
Yes
No
Is ?
Is ?
may or may not be considered normal
has t-student distribution
is considered to be normal
(We need more info)
4Assumptions when applying z-statistic
- 1. The population has a normal distribution with
mean µ and standard deviation ?. - 2. The standard deviation ? is known
- 3. The size n of the simple random sample
(SRS) is large
5Assumptions when applying z-statistic
- Is z-statistic appropriate to use when
- The sample size is small?
- 2. The population does not have a normal
distribution? - 3. The population has a normal distribution
but the standard deviation ? is unknown?
What is the distribution of
It is not normal!
6Inference on averages for small samples (cont.)
If data arise from a population with normal
distribution and n is small (nlt30), we can use a
different curve, called t- distribution or
Students curve.
The t-distribution was discovered by W. S. Gosset
(born on 13 June 1876 in Canterbury, England),
the chief statistician of the Guinness brewery in
Dublin, Ireland. He discovered the
t-distribution in order to deal with small
samples arising in statistical quality control.
The brewery had a policy against employees
publishing under their own names, thus he
published his results about the t-distribution
under the pen name "Student", and that name has
become attached to the distribution.
7The t-student distributions
Suppose that an SRS of size n is drawn from an
N(µ, ?). Then the one-sample t statistic
has the t-distribution with n-1 degrees of
freedom.
- The degrees of freedom come from the standard
deviation s in the denominator of t.
- There are many students curves! There is one
students curve for each number of degrees of
freedom for tests on averages Degrees of
freedom number of observations 1
8Comparing the students curve and the standard
normal curve
d.f.5
d.f.15
t
t
Students curve Standard Normal curve Students
curve has fatter tails. For d.f. around 30, the
students curve is very similar to the standard
normal curve.
d.f.30
t
9When to use the t-test
- When should we use it? Each of the following
conditions should hold - For computing a statistical test on averages.
- The sample is a simple random sample.
- The number of observations is small, the sample
size n is less than 30. - The distribution of the population is
bell-shaped, it is not too different from the
normal distribution. (Not easy to check,
typically true for measurements!)
10Tests on averages z-test or t-test?
If the amount of current data is
large
Small (n lt30)
Use the z-test the normal curve
The distribution of the population is
Unknown but quite different from the normal curve
Unknown but not different from the normal curve
Use the t-test the students curve
Do not use the t-test!
11Confidence intervals for proportions
- Assignment
- 1. Draw the flowchart for estimating the
population - proportions
- 2. Calculate the confidence interval for the
population proportion - for different situations from the flowchart
- 3. Calculate the confidence intervals for
different confidence - levels such as C.96, .98, etc.
- 4. Give examples where the way we calculated the
confidence - Intervals does not work.
12Tests of Significance
Example 1 In the courtroom, juries must make a
decision about the guilt or innocence of a
defendant. Suppose you are on the jury in a
murder trial. It is obviously a mistake if the
jury claims the suspect is guilty when in fact he
or she is innocent.
What is the other type of mistake the jury could
make?
Which is more serious?
13Tests of Significance
Example 2 Suppose exactly half, or 0.50, of a
certain population would answer yes when asked if
they support the death penalty. A random sample
of 400 people results in 220, or 0.55, who answer
yes. The Rule for Sample Proportions tells us
that the potential sample proportions in this
situation are approximately bell-shaped, with
standard deviation of 0.025. Find the
standardized score for the observed value of
0.55. Then determine how often you would
expect to see a standardized score at least that
large or larger.
14Tests of Significance
Example 2 (cont.)
n 400 mean 0.50 STD0.025
2.27
15The Five Steps of Hypothesis Testing
- 1. Determining the Two Hypotheses H0, Ha
- 2. Computing the Sampling Distribution
- 3. Collecting and Summarizing the
Data(calculating the observed test statistic) - 4. Determining How Unlikely the Test Statistic is
if the Null Hypothesis is True (calculating the
P-value) - 5. Making a Decision/Conclusion(based on the
P-value, is the result statistically significant?)
161.A. The Null Hypothesis H0
- population parameter equals some value
- no relationship
- no change
- no difference in two groups, etc.
- When performing a hypothesis test, we assume that
the null hypothesis is true until we have
sufficient evidence against it.
1. B. The Alternative Hypothesis Ha
- population parameter differs from some value
- relationship exists
- a change occurred
- two groups are different, etc.
17The Hypotheses for Proportions
- Null H0 pp0
- One sided alternatives
- Ha pgtp0
- Ha pltp0
- Two sided alternative
- Ha p p0
18The Hypotheses for Proportions
- Null H0 pp0
- One sided alternatives
- Ha pgtp0
- Ha pltp0
- Two sided alternative
- Ha p p0
19Example Parental Discipline
- Nationwide random telephone survey of 1,250
adults. - 474 respondents had children under 18 living at
home - results on behavior based on the smaller sample
- reported margin of error
- 3 for the full sample
- 5 for the smaller sample
Results of the study
The 1994 survey marks the first time a majority
of parents reported not having physically
disciplined their children in the previous year.
Figures over the past six years show a steady
decline in physical punishment, from a peak of 64
percent in 1988 The 1994 sample proportion who
did not spank or hit was 51 ! Is this evidence
that a majority of the population did not spank
or hit?
20Case Study The Hypotheses
- Null The proportion of parents who physically
disciplined their children in the previous year
is the same as the proportion p of parents who
did not physically discipline their children.
H0 p.5 - Alt A majority of parents did not physically
discipline their children in the previous year.
Ha pgt.5
2. Sampling Distributions of p
If numerous samples or repetitions of size n are
taken, the sampling distribution of the sample
proportions from various samples will be
approximately normal with mean equal to p (the
population proportion) and standard deviation
equal to
Since we assume the null hypothesis is true, we
replace p with p0 to complete the test.
213. Test Statistic for Proportions
To determine if the observed proportion is
unlikely to have occurred under the assumption
that H0 is true, we must first convert the
observed value to a standardized score
Case study Based on the sample n474 (large, so
proportions follow normal distribution) no
physical discipline (.50 is p0 from the null
hypothesis) standardized score (test statistic) z
(0.51 - 0.50) / 0.023 0.43
224. P-value
- The P-value is the probability of observing data
this extreme or more so in a sample of this size,
assuming that the null hypothesis is true. - A small P-value indicates that the observed data
(or relationship) is unlikely to have occurred if
the null hypothesis were actually true. - The P-value tends to be small when there is
evidence in the data against the null hypothesis.
23Case Study P-value
P-value 0.3446
From the normal distribution table (Table B),
z0.4 is the 65.54th percentile.
245. Decision
- If we think the P-value is too low to believe the
observed test statistic is obtained by chance
only, then we would reject chance (reject the
null hypothesis) and conclude that a
statistically significant relationship exists
(accept the alternative hypothesis). - Otherwise, we fail to reject chance anddo not
reject the null hypothesis of no relationship
(result not statistically significant).
Typical Cut-off for the P-value
- Commonly, P-values less than 0.05 are considered
to be small enough to reject chance. - Some researchers use 0.10 or 0.01 as the cut-off
instead of 0.05. - This cut-off value is typically referred to as
the significance level ? of the test
25P-value for Testing Proportions
- Ha pgtp0
- P-value is the probability of getting a value as
large or larger than the observed test statistic
(z) value. - Ha pltp0
- P-value is the probability of getting a value as
small or smaller than the observed test statistic
(z) value. - Ha pp0
- P-value is two times the probability of getting a
value as large or larger than the absolute value
of the observed test statistic (z) value.