Title: Statistical Inference and Hypothesis Testing
1Statistical Inference and Hypothesis Testing
- Political Science 102
- Introduction to Political Inquiry
- Lecture 18
2Hypothesis Testing Procedures
3Factors Influencing Statistical Significance
4Errors in Hypothesis Testing
- In selecting threshold for judging statistical
significance, we should balance the costs of type
I and type II errors - Standard scientific practice is to set threshold
for rejecting H0 at 95 (i.e. .05 level) - Assumes type II errors are preferable (a
conservative standard) - Other thresholds may be more appropriate for
different problems or applications
5Is it a Toss Up?
6Hypothesis Tests and Sampling Distributions
- Hypothesis tests depend on estimated confidence
intervals for the parameter - Confidence interval depends on sampling
distribution of the statistic - Distributions of variables may differ
substantially and idiosyncratically - Seems to make problem intractable
- But we want the sampling distribution of the mean
(or some other estimator statistic) - NOT sampling distribution of parent variable
7Calculating Confidence Intervals
- Central Limit Theorem tells us
- Sum of a set of random variables approaches a
normal distribution as n approaches 8 - Normal distribution is the bell curve
- The mean (and other sample statistics) are sums
of random variables - We can rely on the normal distribution to test
hypotheses about the value of population
parameters based on information from sample
statistics
8An Example of the Central Limit Theorem
- Expected value of a fair coin toss0.5
- Heads1, Tails0
- Distribution of coin toss result is NOT normal
- Result is 0 or 1
- Bernouli distribution
- But mean value of coin toss is normally
distributed!
9Statistical Inference
10Hypothesis Tests
- Our theories give us hypotheses about population
parameters - µX gt µY
- The value of X has a positive impact on the value
of Y - We can estimate sample statistics
- The mean of X and Y in our sample
- We need a way to assess the validity of
statements about population parameters - We can rely on our sample estimators and their
variance to use probability theory to test such
statements.
11Z Scores Hypothesis Tests
- We know that N( µX , sX )
- Subtracting µX from both sides, we can see that
- - µX N( 0 , sx )
- The if we divide by the standard deviation we can
see that - - µX / sX N( 0 , 1 )
- This variable is a z-score based on the
standard normal distribution. - 95 of cases are within 1.96 standard deviations
of the mean.
12Z-Scores Hypothesis Tests
- We can use this to test the hypothesis that µX is
equal to a particular value given our sample mean - Same logic will be used later to test for a
relationship between X and Y - If - hypothesized value / sX gt 1.96 then
- there is a 95 chance that µX ? hypothesized
value
13Hypothesis Testing
- How do you test hypotheses with statistics?
- Comparing the means of two groups
- Consider an experiment
- Research hypothesis
- Null hypothesis
-
-
X1 ? X2
-
-
X1 X2
14Hypothesis Testing
- Hypothesis College students read fewer political
news stories per week than other voting-age
citizens. - (college mean) 5
- ? (population mean) 10
- ? (college std. dev.) 2
- n 25
_
(X ?)
z
__________
(? / vn)
15Hypothesis Testing
- Hypothesis College students read fewer political
news stories per week than other voting-age
citizens. - (college mean) 5
- ? (population mean) 10
- ? (college std. dev.) 2
- n 25
-12.5
z
16Hypothesis Testing
- Hypothesis College students read fewer political
news stories per week than other voting-age
citizens. - 95 confidence
- z critical 1.96
-12.5
z
17Z-Scores and t-scores
- Problem with z-scores for testing hypothesis is
that we generally do not know the true variance
of X - Obvious solution is to substitute the SAMPLE
variance of x - Problem The sample mean of X divided by the
sample variance is the ratio of two random
variables, and this will not be normally
distributed - Fortunately, an employee of Guiness Brewery
figured out this distribution in 1919 - The statistic is called Students t, and the
t-distribution looks similar to a normal
distribution
18The t-statistic
- More generally X-bar / sX-hat t(n-k)
- k is the of parameters estimated
- Note the addition of a degrees of freedom
constraint - Relates to how much independent information we
have - Lose one piece of independent information each
time we estimate a parameter (like the variance
of X) - Thus the more data points we have relative to the
number of parameters we are trying to estimate,
the more the t distribution looks like the z
distribution. - When Ngt100 the difference is negligible
19A Real World ExampleFeminists vs.
Environmentalists
. sum v5072 v5059 Variable Obs
Mean Std. Dev. Min
Max ---------------------------------------------
------------------------ v5072 1043
66.0326 20.19167 0 100
v5059 1028 56.33171 21.68065
0 100
20A Real World ExampleFeminists vs.
Environmentalists
One-sample t test --------------------------------
---------------------------------------------- Var
iable Obs Mean Std. Err. Std.
Dev. 95 Conf. Interval ---------------------
--------------------------------------------------
------ v5059 1028 56.33171 .6762009
21.68065 55.00482 57.65861 --------------
--------------------------------------------------
-------------- mean mean(v5059)
t 9.3637 Ho
mean 50
degrees of freedom 1027 Ha mean lt 50
Ha mean ! 50 Ha
mean gt 50 Pr(T lt t) 1.0000 Pr(T gt
t) 0.0000 Pr(T gt t) 0.0000
21A Real World ExampleFeminists vs.
Environmentalists
Paired t test ------------------------------------
------------------------------------------ Variabl
e Obs Mean Std. Err. Std. Dev.
95 Conf. Interval ---------------------------
--------------------------------------------------
v5059 1021 56.34574 .6790026
21.69623 55.01334 57.67814 v5072
1021 66.0431 .6326196 20.21415
64.80171 67.28448 ----------------------------
-------------------------------------------------
diff 1021 -9.697356 .663641
21.20538 -10.99961 -8.395098 -----------------
--------------------------------------------------
----------- mean(diff) mean(v5059 - v5072)
t -14.6124 Ho
mean(diff) 0
degrees of freedom 1020 Ha mean(diff) lt
0 Ha mean(diff) ! 0 Ha
mean(diff) gt 0 Pr(T lt t) 0.0000 Pr(T
gt t) 0.0000 Pr(T gt t) 1.0000