Title: STAT131 W6La Association from Contingency Tables
1STAT131W6La Association from Contingency Tables
- by
- Anne Porter
- alp_at_uow.edu.au
2Null and Alternative hypothesesActivity
3Activity Outcomes
- We draw a card from a pack until such time as
there is a protest. - The cards have been stacked such that all red
come first (or all black) - The draw is meant be be random ie a mix of red
and black. - At some point students reject the idea of
fairness - The proportion of reds is higher than expected by
chance - (or blacks depending on which was drawn first)
- Students are in fact rejecting the null
hypothesis that the proportion - of red cards is 0.5.
- (or that the proportion of red and black cards
is equal) - They are accepting the hypothesis that the
proportion is not equal 0.5
4Null and Alternative hypotheses
- Null hypothesis is that the proportion of red
cards (females) is 0.5 (or that the proportion of
red and black cards is equal) - Alternative hypothesis is that the proportion of
red cards (females) is not equal 0.5
5Null and Alternative hypothesesformal
Tests of proportions
- H0 p 0.5 and
- HA p ? 0.5
- The p we refer to is the population proportion
- We do not hypothesise about a sample proportion
- We make inference about a population parameter p
-
-
6Lecture Outline
- Test hypotheses about association between
categorical variables - Testing Hypotheses (5 steps)
- Null and alternative hypotheses
- a level of significance
- Select test and state decision rule
- Perform experiment
- Draw conclusions
- test of association AND
- model fit
- p values
7Contingency tables
- For this contingency table what is
- P(Male)
- P(Support)
20/70
40/70
8Contingency tables
- If event male is independent of event support
then - P(Male and Support)
P(Male)xP(Support)
20/70 x 40/70 0.1632
9Contingency tables
- Given 70 observed people, if P(Male
Support)0.1632 - How many are expected to be male and support
given independence?
11.43
0.1632 x70 11.43 if events Males and Support
are independent
10Contingency tables
- Knowing the expected frequency for (male and
support) we have no more degrees of freedom, the
remaining values are fixed.
11.43
20-11.438.57
30-8.5721.43
40-11.4338.57
Note We had 1 degree of freedom
11Contingency tables
- If we observe a sample of data we may ask if the
variables sex and level of support are
associated? To test this we formally test the
hypotheses
E11.43
E8.57
E38.57
E21.43
12Hypotheses no association
- Ho Under model of independence, E distributed
- (Row total column total)/grand total
- Ha E not distributed
- (row totalcolumn total)/grand total
E8.57
E11.43
E38.57
E21.43
132. Assign a
- a is determined such that we have a desired
level of confidence in our procedures (ie in our
results). - For the chi-square test for association we will
use a0.05 - We will examine choosing alpha (a) later
14Degrees of freedom
- Knowing the expected frequency for (male and
support) we have no more degrees of freedom, the
remaining values are fixed.
11.43
20-11.438.57
21.43
38.57
Note We had 1 degree of freedom
15Degrees of freedom
- The degrees of freedom for a rows x column matrix
may be calculated as (r-1)x(c-1)(2-1)x(2-1)1 - r is the number or rows and c is the number of
columns
11.43
8.57
21.43
38.57
Note We had 1 degree of freedom
16Hypotheses no association
- Ho Under model of independence, E distributed
- (Row total column total)/grand total
- Ha E not distributed
- (row totalcolumn total)/grand total
E8.57
E11.43
E38.57
E21.43
173. Select a test statistic and... determine the
rejection region
- To test about association in contingency tables
we calculate - And determine the region of rejection ie how big
chi-square has to be before we conclude that the
observed are sufficiently different to the
expected to reject the null hypothesis - eij expected count for the ith row and jth column
of the table
183... determine the rejection region
- For our contingency table
- df1,
a0.05
Then reject Ho there is evidence that the
variables are not independent
If the calculated gt
3.841
193... determine the rejection region
- For our contingency table
- df1,
a0.05
Then reject Ho there is evidence that the
variables are not independent
If the calculated gt
3.841
204. Calculate
E11.43
E8.57
E28.57
E21.43
21Decision
- As calculated value of 0.70 lt 3.841 (tabulated
value) there insufficient evidence to reject the
model that sex and level of support are
independent. That is there is no evidence of an
association between sex and level of support. The
profile of support by males is similar to the
profile of support for females. 13/40
(32.5)males support, 7/30 (23.3) females support
22SPSS data entry looks like
- Data, weight cases by freq has been selected
- Analyse, Descriptives, Crosstabs and options have
been selected
23SPSS output contingency table
24SPSS output Pearson Chi-Square
Value of chi-square
Assumption of expected frequencies gt 5 hold
25SPSS output Pearson Chi-Square
Probability of getting a statistic as high or
greater than 0.706 is 0.401. This is high gt0.05
therefore retain Ho, we can get this chi value by
chance under independence
Value of chi-square
26Example from Utts p. 528SPSS data
- Yes / No Ear infection
- P Placebo gum
- X xylitol gum
- L xylitol lozenge
- Is there an association between ear infection and
gum used?
27Under Independence Expected frequency
28Under Independence Expected
29Under Independence Expected
Degrees of freedom
2
30Hypotheses
- Ho Under model of independence, E distributed
- - (Row total column total)/grand total
- Ha E not distributed
- - (row totalcolumn total)/grand total
- If
- p1proportion who get an infection in population
given placebo - p2proportion who get an infection given Xylitol
gum - P3proportion who get an infection in a
population given Xylitol lozenges - Ho
- Ha
p1p2p3
p1, p2 and p3 are not all the same
315 step hypothesis test
- Ho Under model of independence, E distributed
- (Row total column total)/grand total
- Ha E not distributed in this manner
- a
- df
- Statistic and Region of rejection
0.05
(3-1)x(2-1)2
If calculated chi-square gt5.991 reject Ho there
is evidence that the variables are not independent
32Conclusion using decision rule SPSS
gt5.991 therefore there is evidence that the data
do not fit the model of independence
33P values (sig)
- For chi-square test (one tailed) the p value is
- the probability of getting this statistic or
greater
34Conclusion using p value from SPSS
The probability of getting a chi-square as high
as this or higher is 0.035. This is a small
probability (lt0.05) if the H0 were true. There is
evidence of an association between infection and
gum used
Assumptions re expected frequencygt5 OK
35Significance Tests - Formal
- 1. Null and alternative hypotheses
- 2. Assign a
- 3. Select a statistic and determine the rejection
region - 4. Perform the experiment and calculate the
observed value of c2 or T or Z orother statistic
- 5. Draw conclusions in context of problem
36Previous hypothesis testing situations
- Model fit
- Ho Expected distributed Binomial(2,0.5)
- Ha Expected not distributed Binomial (2,0.5)
- 2. Ho Expected distributed Poisson (0.4)
- Ha Expected not Poisson (0.4)
- 3. Ho Expected distributed as per the random
stopping model - Ha Expected not distributed as per random
stopping model
37Future hypothesis testing situations
Tests of proportions
- the null hypothesis may be proportion 0.5 and
- alternative hypothesis proportion ? 0.5
- the null hypothesis may be m 0 and
- alternative hypothesis m ? 0.
-
-
Tests of means