Lab 7 - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Lab 7

Description:

Chi-Square. Overview. Frequency Data. Two Uses of ?2. ?2 Test of Goodness of Fit ... Tests of goodness of fit are used to determine if the distribution (shape) of a ... – PowerPoint PPT presentation

Number of Views:11
Avg rating:3.0/5.0
Slides: 56
Provided by: asz3
Category:
Tags: lab

less

Transcript and Presenter's Notes

Title: Lab 7


1
Lab 7
  • Chi-Square

2
Overview
  • Frequency Data
  • Two Uses of ?2
  • ?2 Test of Goodness of Fit
  • ?2 Test of Association
  • Assumptions of ?2
  • SPSS Crosstabs
  • Post-Hoc Interpretations
  • Assignment 7

3
Frequency Data
4
Frequency Data
  • What are frequency (or count) data?
  • There are two types of data in the world
  • Categorical
  • Continuous

5
Frequency Data
  • In all of the previous labs, we have dealt with
    continuous data (as the DV)
  • Also called measurement data because the data
    are derived from a measurement process (i.e.,
    measuring each participants score on a specific
    variable)
  • For example, height, IQ, self-esteem,
    neuroticism, etc.
  • These variables are continuous because there are
    no distinct breaks between numbers in the scale
    in other words, fractional values are meaningful

6
Frequency Data
  • But what if we want to compare different kinds of
    things, not just degrees of difference on the
    same dimension?
  • In this case, we have to use categorical data
  • For example, eye color, sex, University major,
    etc.
  • These variables are categorical because they
    identify discrete categories that have no
    numerical relationship
  • Thus, the only meaningful measure is the
    frequency of observations of each category in a
    given sample Hence the term frequency data

7
Two Uses of ?2
8
Two Uses of ?2
  • The general logic of ?2 involves the comparison
    between expected and observed frequencies of a
    category

O Observed frequency E Expected frequency
9
Two Uses of ?2
  • As with all significance tests, the computed ?2
    is then compared to a critical ?2 (e.g., at a
    .05)
  • The critical value is found in the ?2
    distribution adjusted for the degrees of freedom

10
Two Uses of ?2
  • There are two common uses of the ?2 significance
    test
  • Tests of goodness of fit are used to determine if
    the distribution (shape) of a sample matches an
    expected/theoretical distribution
  • For example, is the distribution of class marks
    as expected?
  • Tests of association are used to determine if
    there is a relation between two categorical
    variables
  • For example, is there an association between sex
    (M vs. F) and smoking (Y vs. N) in a given
    sample?

11
Two Uses of ?2
  • Note that both tests are essentially the same, in
    that they assess the match between expected and
    observed frequencies of particular categoriesbut
  • With goodness of fit tests, you usually want the
    ?2 to be non-significant, which indicates a good
    match to the expected distribution (i.e., the
    difference is not significant)
  • With tests of association, you usually want the
    ?2 to be significant, which indicates a
    significant relation between two categorical
    variables

12
?2 Test of Goodness of Fit
13
?2 Test of Goodness of Fit
  • In goodness of fit tests, your observed
    frequencies come from one variable
  • For example, marks A, B, C, D, F
  • Expected frequencies come from an
    expected/theoretical distribution
  • For example 10 A, 30 B, 30 C, 25 D, 5 F

14
?2 Test of Goodness of Fit
  • The same formula is used to compute ?2
  • For goodness of fit tests, df (number of
    categories 1)

15
?2 Test of Goodness of Fit
  • ?2(4) 8.61, ns
  • Therefore, the observed frequencies of marks are
    not significantly different from the expected
    frequencies

16
?2 Test of Association
17
?2 Test of Association
  • Lets say you work at a clothing store in the
    mall
  • One day, you start to wonder if people tend to
    buy clothes that match their eye color
  • In other words, is there an association between
    preferred clothing color and eye color?
  • Because both of these variables are categorical,
    we have to test this association using ?2

18
?2 Test of Association
  • As it happens, your store is holding a t-shirt
    sale, and the shirts come in four colors
  • Brown, Blue, Green, and Black
  • For one week, you record the eye color of
    everyone who purchases a t-shirt (N 200) and
    the color of the shirt

19
?2 Test of Association
Observed Frequencies
20
?2 Test of Association
  • To begin, you need to figure out what the
    expected number of t-shirts purchased would be
    for each eye/shirt combination
  • If there is no association, the proportion of any
    shirt color to any eye color would be equal to
    the proportion of that shirt color to the total
    number of shirts in the sample

21
?2 Test of Association
  • For example, to figure out the expected frequency
    of brown shirts purchased by brown-eyed people
  • In this case, E11 15.90, which is quite a bit
    lower than the observed frequency of 27
  • Now compute every expected frequency the same way

22
?2 Test of Association
Expected Frequencies
23
?2 Test of Association
  • To calculate the ?2, you plug the observed and
    expected frequencies for each combination into
    the formula and add up the all of the values
  • For example, starting with the brown/brown
    combination

24
?2 Test of Association
  • For tests of association, the degrees of freedom
    (number of rows 1) x (number of columns 1)
  • In our case, df 3 x 3 9
  • Now, compare your calculated ?2 against the
    critical value at 9 df
  • There is a significant association between eye
    color and color of shirt purchased, ?2(9)
    54.31, p lt .001
  • NOTE This is not a causal relationship, just an
    association between two variables nothing was
    manipulated

25
Assumptions of ?2
26
Assumptions of ?2
  • There are two major assumptions underlying ?2
  • 1) The sampling distributions of the deviations
    of the observed frequencies from the expected
    frequencies (i.e., O E) are normal
  • This assumption is satisfied if expected values
    are not too small
  • Potential for violation if df 1
  • In other words, if the table was a 2 x 2, you
    might need to use the Yates correction (a more
    conservative estimate of ?2)
  • See p. 170 for details on the Yates correction

27
Assumptions of ?2
  • There are two major assumptions underlying ?2
  • 2) Each individual can only contribute once to
    the frequency count (independent observations)

28
SPSS Crosstabs
29
SPSS Crosstabs
  • The easiest way to enter frequency data is to use
    the weighting option in SPSS
  • Create a variable for both of your measurements
    (e.g., eye_color and shirt_color) and define the
    value labels
  • Create a variable for the observed frequency and
    enter your data

30
SPSS Crosstabs
31
SPSS Crosstabs
  • From the Data menu, select Weight Cases and
    tell SPSS to weight the cases by your frequency
    variable
  • This is an easy way to tell SPSS how many
    observations belong in each category
  • Once the data are entered and weighted, select
    Crosstabs from the Analyze Descriptives menu
  • See pp. 172-173 for details on running the
    analysis

32
SPSS Crosstabs
  • Use this table to double-check your numbers and
    make sure everything looks right

33
SPSS Crosstabs
  • This table summarizes your data and is equivalent
    to the one we calculated by hand

34
SPSS Crosstabs
  • This table provides the calculated ?2 (the
    Pearson Chi-Square) with the associated df and
    p value
  • The other tests are not important for us

35
Post-Hoc Interpretations
36
Post-Hoc Interpretations
  • The significant ?2 value means that the observed
    frequencies differ from the expected frequencies
    more than would be expected on the basis of
    chance
  • But that doesnt tell us which particular
    categories are driving the effect, so we need to
    interpret our results in more detail

37
Post-Hoc Interpretations
  • There is no clear-cut method for post-hoc
    interpretation of ?2 data
  • Three methods will be discussed here
  • 1) Examining the cells
  • 2) Specific contrasts
  • 3) Simple effects

38
Post-Hoc Interpretations
  • 1) Examining the cells
  • Visually inspect the table for large residuals

39
Post-Hoc Interpretations
  • 1) Examining the cells
  • Calculate a partial ?2 for just these six cells

40
Post-Hoc Interpretations
  • 1) Examining the cells
  • That means that these six cells accounted for
    (44.66 / 54.31 ) 82.23 of the ?2 value
  • If you look back at the table, you can see that
    four of these cells were consistent with our
    prediction (matching shirt/eye color)

41
Post-Hoc Interpretations
  • 1) Examining the cells
  • We can also treat the standardized adjusted
    residuals like z-scores to determine which
    observed frequencies are significantly different
    from the expected frequency
  • But with so many contrasts, we have to control
    Type I error using the Bonferonni adjustment
  • Divide a by the number of cells in the table (16
    in this case)
  • a .05 / 16 .003

42
Post-Hoc Interpretations
  • 1) Examining the cells
  • Find the critical z-value for this a-level
  • http//www.fourmilab.ch/rpkp/experiments/analysis/
    zCalc.html
  • Critical z 2.75
  • Now compare all of the adjusted residuals in your
    table to this critical value any that exceed it
    are significant
  • For example, the adjusted residual for the
    brown/brown combination is 3.90, so this observed
    frequency is significantly different from the
    expected frequency at the .05 level
  • For each significant contrast, explain how it
    fits (or fails to fit) with your research question

43
Post-Hoc Interpretations
  • 2) Specific contrasts
  • This method involves forming 2 x 2 tables from
    your overall table
  • These tables can be formed by
  • Selecting a specific 2 x 2 subset from your
    overall table, or
  • Collapsing across categories to form a more
    general 2 x 2 table

44
Post-Hoc Interpretations
  • 2) Specific contrasts
  • Once you have your 2 x 2 table, you compute a ?2
    for it
  • Correct a using the Bonferonni adjustment (see
    formula on p. 181)
  • Compare the new ?2 to the critical ?2 at that
    a-level

45
Post-Hoc Interpretations
  • 2) Specific contrasts
  • For example, lets say you were interested in
    only blue vs. brown eyes and blue vs. brown shirts

46
Post-Hoc Interpretations
47
Post-Hoc Interpretations
  • 2) Specific contrasts
  • The obtained ?2 is 9.24
  • Use the formula on p. 181 to calculate the total
    number of possible 2 x 2 tables within a 4 x 4
    table

48
Post-Hoc Interpretations
  • 2) Specific contrasts
  • Perform the Bonferroni adjustment using k
  • a .05 / 36 .001
  • Compare your obtained ?2 (9.24) to the critical
    ?2 at df 1 (because its from a 2 x 2 table) at
    this a-level

49
Post-Hoc Interpretations
  • 2) Specific contrasts

50
Post-Hoc Interpretations
  • 2) Specific contrasts
  • It turns out that using the Bonferonni
    adjustment, the specific contrast between blue
    vs. brown eyes and blue vs. brown shirts is not
    significant, ?2(1) 9.24, ns

51
Post-Hoc Interpretations
  • 2) Specific contrasts
  • Another way of forming a 2 x 2 table is to
    collapse across categories in a meaningful way
  • For example, you might lump blue and green
    together, and lump brown and black together, for
    both eye and shirt color
  • Something like spring colors vs. fall colors
  • If you use this method, you dont need to use the
    Bonferonni adjustment just set a at .01

52
Post-Hoc Interpretations
  • 3) Simple effects
  • With this method, you compare specific
    proportions by holding one factor constant and
    varying the other
  • Similar to the simple main effects analysis from
    ANOVA
  • For example, we could compare all pairs of
    shirt-colors for blue-eyed customers, then for
    green-eyed customers, etc.
  • This method has low power and is quite
    conservative, so the first two methods are
    recommended instead

53
Assignment 7
54
Assignment 7
  • Question 1
  • Introduction
  • Discuss assumptions
  • Report 2 x 4 ?2 analysis
  • Interpret overall ?2 results
  • Question 2
  • Use post-hoc method 1 (with z-score comparisons)
  • Not necessarily one right answer just use common
    sense and provide the information that is most
    important to evaluate the researchers hypothesis

55
Assignment 7
  • ?2 Table
  • http//www.itl.nist.gov/div898/handbook/eda/sectio
    n3/eda3674.htm
  • Z-Score Table
  • http//www.fourmilab.ch/rpkp/experiments/analysis/
    zCalc.html
Write a Comment
User Comments (0)
About PowerShow.com